Doctools > Docbook, Namespaces & Mortality > Experiences

Experiences

What follows is based on both my own implementations over the time described above, and on interactions with people using those and other softwares.

For the sake of brevity, use-cases are not discussed in this document; it is expected that the reader is familiar with the features involved, and their repercussions on usage.

Fear of the unknown

People aren't interested in “made-up” formats. I think they have various reasons, including:

  • Their current system suits their needs. Probably more so than a newcomer's proposal. For example, academics will always use LaTeX; mathematicians will always use amsTeX, and so on.

    Possibly we can see these as “output” formats. However...

  • ...their current systems have input formats highly suited to their daily use. TeX's mathematics input is a clear example. It would be undesirable to drag existing users away from a format they know well.

  • Yet another format. What's so compelling about this one?

  • Publishers (especially academic journals) maintain their own customisations. These enforce a consistent style for papers. From an author's point of view, a paper must either be written in this dialect, or be converted to it from some “original” format. Converting TeX-to-TeX is not usually done, and so authors seem to opt to write under these journal-specific dialects, and so their personal bibliography ends up with several differing formats.

Making use of existing tools is desirable. We should concentrate on one thing at a time: we are trying to process documents, not provide an end-to-end toolchain. For example, there must be a MathML-to-TeX engine out there somewhere (Perhaps MathML is a bad example, given that human authors prefer writing TeX directly!) So this is a question of finding good tools, and filling in the gaps where those tools do not currently exist.

This also applies to the process as a whole; if we stick to “open formats”, author's needn't change their working tools. This helps keep people comfortable.

Ones own format always seems easiest to maintain. This is an illusion of two parts, the convenience of development, and the availability of understanding of its design.

The last two percent

Several times, I have implemented a system only to find it fundamentally incapable of meeting my requirements for a subtle reason I hadn't previously noticed. In terms of project management, one might drop the requirements, were they not the main motivation for the project: to produce high-quality output.

This is troublesome, and affects many projects: the deception lies in the main bulk of the design being “good enough”; the trouble is that the work required to transition from “good enough” to “superb” is fundamentally impossible based on the architecture chosen. The lesson here is that it is surprisingly difficult to come up with an adequate architecture before implementing it. I suspect Knuth was more patient in his designing stage than most of us are in ours.

People only care about what they can see They don't care about high-quality architectures. The differences are subtle, and it takes trained experts to spot these. The LilyPond essay puts this well: “Unless you are an expert, typographical errors will irk you without being obvious.” I cannot emphasise this enough. Even if they don't irk you, they will irk your readers.

High-quality architectures matter, else the implementation comes across unsurpassable dead-ends. What would the LilyPond authors have done if they'd found their design was incapable of finding that overlap for that F♭ without re-architecting their system?

Would you bother? Writing for high-quality architectures takes longer. Results happen less quickly, and look superficially similar to other, easier systems. The difference is that the quality keeps improving as implementation continues, instead of hitting against an obstacle.

Practicality and the edge case

One set of tags cannot express all possible documents. Docbook tries its hardest, though.

Obscure features get in the way. Docbook is notoriously full of weird and esoteric tags. This is not news.

Docbook is too large to be comprehensively implemented. The only implementations which can be relied upon are the official ones...

...unfortunately, the official implementations produce horrible output, both typographically and in terms of code clarity. In the case of XHTML output, it looks reasonable when rendered but underneath it's a mess. This makes writing CSS more convoluted than it need be. Better use could be made of the underlying markup language.

Incomplete implementations are unsatisfying: the edges are inevitably found by users who disregard the implementation as incomplete and therefore untrustable.

Clear status feedback of what is implemented and what is not is essential, as opposed to a “try it and see” gamble (which would need checking by eye, and therefore would not scale).

Which brings us back to the initial point here: a system comprehensively describing all possible types of document isn't necessarily something desirable. There's an oft-quoted passage from an aviator on this subject.

Multiple format output

Output to multiple formats (“multi-mode rendering”) is highly desirable. Providing this without duplicating a lot of work is tricky.

Different formats have different uses. I found this non-obvious. It has a significant side-effect: not all uses of a document need have all the features implemented. For example, cross-document linking need not be present for copies printed on paper.

Quality and reinvention

TeX is the only system to produce acceptable output for PDF. Everything else is a non-starter. Recent attempts such as XSL-FO ignore much of the design care which went in to TeX. I don't just mean flaws with the current implementations of XSL-FO—I mean fundamental flaws with XSL-FO itself.

Why reinvent that? Knuth did a better job than most of us could. The format is a little unreadable for complex constructs, but makes a fine back-end. Configuring it can be tricky, but the macro packages help. Generating it from XML can be tricky, too.

Keeping up

Docbook is a moving target. It gets both bigger (with new features) and smaller (with refactoring the schema). Therefore the only sensible implementation which can be used is the official one, else a lot of effort is spent keeping an implementation in sync. Furthermore, test-cases are difficult to automate for rendering documents; they are more akin to acceptance tests than unit tests.

Avoiding keeping in sync means our development can scale over time. In other words, the amount of work relative to features over time remains constant. Since we do not have many resources, avoiding this overhead is significant for us.

Authoring

Authoring of semantically-marked up documents is complex for staff, but not prohibitively so. It requires some discipline. which can be taught with training.

Semantic markup is inconvenient to produce small quick documents, such as this one. Do all documents need semantic markup? Unfortunately, if we want to avoid restricting the document to one output format, the answer is yes.