XML · 16 May 2006

Open document architectures

Through ConsortiumInfo.org (no surprise), I just came across this Gartner report on the impact of the approval of the OpenDocument Format as ISO/IEC 26300. Their take is that since ISO is unlikely to also approve Open XML given its similar scope, this is a blow to Microsoft’s attempts at standardization of its format. The following recommendations appear in the report, and they seem eminently sensible to me given Gartner’s assessment of the situation:

  • Users: Recognize that you eventually will be saving your office product data in an XML-based format. Users that need ODF support today or need to comply with ISO standards should explore applications that support ODF. These applications may be cheaper to acquire, and enable different functionality, but the migration will not be inexpensive and will involve compatibility issues when exchanging documents with Microsoft Office users. If you need compatibility with Microsoft Office formats or cannot cost justify a migration, lobby Microsoft to support ODF and look for plug-ins that allow you to open and save ODF files from within Microsoft applications.

  • Vendors supporting any application using document formats that deliver content to people: Seek opportunities to leverage ODF, particularly “mash-up” approaches to content creation and sharing.

I hadn’t been closely following the various de jure standards tracks of these formats very closely, and didn’t realize ISO/IEC approval of ODF was imminent. But this news somehow transported me back to the days of SGML vs. ODA.

ODA was a comprehensive “compound document format” intended to be the be-all and end-all for office documents. It originally stood for Office Document Architecture, then was renamed to Open Document Architecture to seem more inclusive. Wikipedia has a short article on it, which matches my recollection of the situation. It was a big complicated spec that squished together structure and presentation, and while it had — after many years — achieved sanction, with ITU-T and ISO standardization coming in 1999, it ultimately had no traction.

SGML was approved as ISO 8879 in 1986, and though we weren’t using it yet at Digital in the late 80’s, most of us were using “Standard Digital Markup Language”, a sort of proto-SGML used within the company (a bit similar to GML and its role within IBM) that had start-tag/end-tag pairs and the like; it was eventually productized as VAX Document (whoa! it lives on!). In my part of the tech doc business, we were totally sold on “generic markup”, particularly since we were now in the position of generating CDs along with paper documentation and “single source/multiple outputs” was an expensive reality. Word-processing programs gave us all kinds of headaches, and the editors and doc tools people, at least, were deeply suspicious of giving writers the ability to add lots of formatting on their own. Thus, we were suspicious of ODA too; around my office we used to refer to it as the Odious Document Architecture.

Okay, so having content creators control presentation was a juggernaut. Yes, yes, I do see the appeal (and indulge in this activity several times a day…). What’s interesting is that the problem just…dissolved away.

The first big lesson on this was the Rainbow DTD, invented by EBT specifically to aid in the regularization of word-processing files as part of their “up-conversion” into SGML. Another sort of sideways influencer, I think, was the Text Encoding Initiative, which helped break some SGMLers’ dependency on the non-presentational markup argument — when you’re marking up an old newspaper for analysis, you might very well want to capture whether a particular story appeared above the fold on page 1. The next really huge lesson was the popularity of HTML and reasonable ways of doing stylesheets for it, which mocked many old-school attempts to get away from the corrosive effects of presentation. The final one was the appearance on the scene of the StarOffice XML/OpenOffice.org work, which began to fulfill a lot of fantasies about making real office documents manipulable with standard tools.

(I just found an interesting paper from 1996 on something called the JEDI Project, for Joint Electronic Document Interchange. Their analysis of TEI, TEI Lite, Rainbow, HTML3, and ODA, along with the various stylesheet technologies then available, was probably repeated by others many hundreds of times.)

The wild part is that SGML and ODA were pitted against each other and both pretty much fell over of their own weight (the latter took 14 years to standardize, for crying out loud), but it’s quite easy today for anyone to benefit from combining their approaches, and we’re down to arguing about which flavor to use for best success in interchange, future-proofing, and user control. Gartner’s first recommendation above was once seen as an intractable and even distasteful issue; now it’s trivial advice: “Recognize that you eventually will be saving your office product data in an XML-based format.”