Doctools > Docbook, Namespaces & Mortality > Proposal
What follows is a proposal for the next iteration of the design of Doctools. I want to emphasis that I don't want a perfect implementation: I want a framework which can scale along with each new implementation.
Motivations for the modular system described by this proposal include:
Making efficient use of development time. Developers have limited interest and motivation, so it is important to keep focus on implementing areas which do not go to waste.
Well-defined modules help here, by giving people a clear sense of direction, a goal to aim towards, and the feeling of completion when the goal is met.
Maintainability of code. The current Doctools code base, while conceptually clear, is complex to understand despite its relatively small size. It is dense and fragile; as with abstraction in general, modules help hide away the complexities which are irrelevant to a broader view.
Each feature must be well-defined, and therefore easily testable. This relates directly to giving a sense of completion for features.
A set of modules would be designed. Each maps one XML namespace of input to one or more formats for output. A namespace concerns itself with one concept only—for example, there might be a namespace to provide a mechanism for expressing footnotes.
These input namespaces are translated directly by the module to each output format. The module can then make good use of the underlying technologies which can be provided, for example the native footnote mechanisms provided by TeX macro packages. Alternately, for output formats which do not provide that feature, the module may provide its own implementation, for example for footnotes in XHTML.
The output formats available for a particular document are the intersection of what each namespace's module can provide, for all the namespaces employed by the document.
Figure 1, “Module I/O” illustrates input of a specific namespace rendered out to three supported formats. Note that several versions of this namespace may be supported by the module. Rendering options are provided to select properties such as paper size; these apply to all modules. Options are discussed in the section called “Anatomy of a module”.
Since features are implemented per-module, if the underlying technology used for the output format does not provide that feature in an adequate manner (for example, imagine if ConTeXt did not support footnotes), the module could simply not provide output for that format. This would not affect documents which do not use both that output format and feature. In this way, the effects of “missing” features are confined to only the formats they effect.
Figure 2, “Namespaces and modules” illustrates how a source document is split into its component namespaces (in this example there are only two), and each namespaces is handled by its relevant module. Those modules are capable of producing various output formats; the formats available to the user is the set of formats supported by all modules employed. In this case, that happens to only be one format, XHTML.
Given this well-defined interface to modules, a support matrix documenting what formats and features are available would be straightforward to maintain. Additionally, given the relatively small size of these modules, support for a specific feature is unlikely to be a partial implementation.
Unlike a portable compiler, no intermediate representation is shared between the modules, although code re-use is expected to be common. The rationale for this is that an intermediate representation would necessarily be a superset of concepts, which would either reduce quality by inappropriately describing content, or become unwieldy like docbook.
The intention is that a module's namespace closely mirrors the concepts provided by other formats. For example, a namespace providing chapters and sections may use much the same tag names as Docbook does. This would ease transition, particularly if automated.
Namespaces may be used in any combination within a document
(i.e. XSLT stylesheets should process these by calling
<apply-templates/> as-is, rather
than limiting processing to a specific set of tags. One possible
exception is the contents of structural elements of a strict
heirachicy.
It is an error to write elements of a namespace which is not handled by any module. In other words, our implementation is likely to contain a “catch-all” template which produces an error message and halts processing.
A module provides one “feature”, and implements all dependencies required to use that mechanism. This is a a mapping of an input format onto one or more output formats. Each input format resides in its own XML namespace. The mapping is strictly functional: modules always take input and produce output. For example, the input to a module providing a table of contents would be the elements describing the structural organisation of the document.
Modules are be implementable in various languages. This allows re-use of existing tools—for example, to convert TeX's mathematics syntax to MathML. Hence, APIs made available to various languages would be helpful. The most obvious language for XML transformation is XSLT, which is expected to make up the majority of the code.
Modules have well-defined tasks: each module's input format is versioned (as illustrated by Figure 1, “Module I/O”). Versions may share the bulk of their implementation code, but remain physically separate so as to avoid “disturbing” the output generated by stable releases.
A list of supported (human) languages per output format. Consider a module providing quotation marks: these may be implemented by using (for example) ConTeXt's rather capable quotation-mark system, which is aware of typographic conventions for various languages. These languages are published by the module's interface specification, and as with output formats, an intersection of those available in all modules employed is offered to the user.
I don't see any compelling reasons why a module need necessarily be incompatible with another module. One case that occurred during design was for the possibility of a structural module (providing chapters and sections) being used in conjunction with (for example) a module providing elements required to website pages. So which tags is outermost? Is a file a chapter containing website pages, or a website containing chapters? Well, that's not up to us; it's up to the author. They choose which tag is outermost, and so the module implementing that tag is the one “in control”.
Modules may heed various
“options”.
For example, the proposed page module
has an option for what kind of HTML to produce. Options are
a global namespace. For example, an option declaring that
output should be “chunked” will be picked up
by several modules. Options are simple name-value pairs.
Most often, the values would be an enumeration of choices.
The options available for a document are a union of those
provided by the modules employed.
The manner in which these properties are stated will be defined by a document detailing the interface to modules.
Modules may be obsoleted by more recent alternatives. Therefore we should take care to name them sensibly. To avoid naming clashes, each module should also have an “owner”. Possibly this would be a namespace; all our modules will belong under the Doctools project.
The user interface should provide a user-definable module search path. This provides a way for users to define their own modules (most often for their own namespaces) outside of Doctools.
Details of dependencies are provided. These are requirements for a specific set of releases (per module version) of each of the backend dependencies involved. For example, rendering PDF might depend on both texml and ConTeXt, of a particular version each. As with most module features, an intersection of each module's dependencies is listed, so that a user needn't guess at what will be used for rendering that particular document.
One disadvantage of moving to this scheme, instead of focusing all our efforts on re-implementing Docbook, is that it becomes more difficult for us to render arbitrary documents from other people's projects (which is a good chance to show off). However, we should be able to achieve something similar either by maintaining a partial implementation in a “subset of Docbook” module, or by (automatically) converting those documents into other formats. Converting to Docbook-simple would also make this more feasible.
Modules are nothing to do with themes. Themes provide a customisable visual style for documents, which may be applied across multiple documents to give a consistent appearance. They are here to stay; themes provide form, whereas modules provide content.
Despite my personal preference for ConTeXt, providing output to LaTeX would be useful for people wishing to use their existing LaTeX styles. This would help give us appeal to a wider audience. It is arrogant to assume that people will port their styles just to try out our system.