Developing SGML DTDs

Developing SGML DTDs front cover

I cowrote a book called Developing SGML DTDs: From Text to Model to Markup with my dear friend and colleague Jeanne El Andaloussi. It describes a methodology for SGML DTD design and development that you may find useful for XML too, especially if you’re developing a DTD (rather than, say, an XML Schema) that is intended for the editing and publishing of largely narrative-form documents. The writing process stretched from early 1994 to mid-1995 and included a lot of Boston/Paris trips, respectively. The book hit the stores in December 1995 and finally went out of print in the summer of 2005.

If you’re interested to read Developing SGML DTDs, you have two choices: find a printed copy somewhere or read it online. We made the online version available, with the help of the very talented Norm Walsh, on February 10th, 2008: the tenth anniversary of the publication of the XML V1.0 Recommendation. (XML V1.0 has seen several subsequent editions.)

Production Notes

The Preface has a colophon describing how the book was originally written and produced. It went to the publisher in DocBook V2.2.1 form and even included a filled-out DocBook Questionnaire. To prepare it for online publication in HTML form, Norm converted it to XML DocBook V4 as follows (in his own words):

I didn’t like the output from sgml2xml, so I did it with perl. Fix empty tags, remap character entities, convert entityref into fileref, turn the remaining entities into xincludes. Run through db4-upgrade. Fix a handful of things that didn’t work quite right because db2 wasn’t quite the same as db4. :-)

The only by-hand part was index terms. Most files had one or two places where the indexer wrote:

</section>
<indexterm .../>
</section>

That was valid in SGML DocBook because indexterms were an inclusion. In XML DocBook, content is forbidden after the close of a section, so it wasn’t. Whether DocBook’s content models should be relaxed so that it is legal is an interesting markup question.

I just moved the indexterms before the first </section>.

Norm did some beautiful custom work for the online version, based on DocBook stylesheets he’d already developed. I’m deeply in his debt.

Print/Online Differences

Following are the differences between the print and online versions of the book that I’m aware of, beyond obvious differences in print vs. browser formatting. It should be noted that the online version was prepared from the SGML source files that we provided to Prentice Hall PTR, and that many text improvements were made during final copy editing and typesetting. I’ve found and “synced” a few, helped immensely by the fact that we authors were clever enough to use DocBook’s <comment> (now <remark>) element to mark typesetting issues and incomplete text! But there are many, many others lurking within. I’m going to try to sync them by hand over time.

Errata

Jeanne and I want to take this opportunity to thank all those who commented on the book during review and after publication; in addition to those we mentioned in the Acknowledgments section, we’d like to mention Dave Peterson of SGMLWorks! and Diederik Gerth van Wijk of Kluwer Rechtswetenschappen specially because they provided many corrections and thoughtful comments after the book was out. We’re glad for this chance to advertise the corrections.

Following are errata for the printed (and online, unless otherwise noted) versions of the book. Errata exclusive to the online version will be corrected in place to the best of my ability; please do let me know of any issues you notice.