Open document architectures

Through ConsortiumInfo.org (no surprise), I just came across this Gartner report on the impact of the approval of the OpenDocument Format as ISO/IEC 26300. Their take is that since ISO is unlikely to also approve Open XML given its similar scope, this is a blow to Microsoft’s attempts at standardization of its format. The following recommendations appear in the report, and they seem eminently sensible to me given Gartner’s assessment of the situation:

  • Users: Recognize that you eventually will be saving your office product data in an XML-based format. Users that need ODF support today or need to comply with ISO standards should explore applications that support ODF. These applications may be cheaper to acquire, and enable different functionality, but the migration will not be inexpensive and will involve compatibility issues when exchanging documents with Microsoft Office users. If you need compatibility with Microsoft Office formats or cannot cost justify a migration, lobby Microsoft to support ODF and look for plug-ins that allow you to open and save ODF files from within Microsoft applications.

  • Vendors supporting any application using document formats that deliver content to people: Seek opportunities to leverage ODF, particularly “mash-up” approaches to content creation and sharing.

I hadn’t been closely following the various de jure standards tracks of these formats very closely, and didn’t realize ISO/IEC approval of ODF was imminent. But this news somehow transported me back to the days of SGML vs. ODA.

ODA was a comprehensive “compound document format” intended to be the be-all and end-all for office documents. It originally stood for Office Document Architecture, then was renamed to Open Document Architecture to seem more inclusive. Wikipedia has a short article on it, which matches my recollection of the situation. It was a big complicated spec that squished together structure and presentation, and while it had — after many years — achieved sanction, with ITU-T and ISO standardization coming in 1999, it ultimately had no traction.

SGML was approved as ISO 8879 in 1986, and though we weren’t using it yet at Digital in the late 80’s, most of us were using “Standard Digital Markup Language”, a sort of proto-SGML used within the company (a bit similar to GML and its role within IBM) that had start-tag/end-tag pairs and the like; it was eventually productized as VAX Document (whoa! it lives on!). In my part of the tech doc business, we were totally sold on “generic markup”, particularly since we were now in the position of generating CDs along with paper documentation and “single source/multiple outputs” was an expensive reality. Word-processing programs gave us all kinds of headaches, and the editors and doc tools people, at least, were deeply suspicious of giving writers the ability to add lots of formatting on their own. Thus, we were suspicious of ODA too; around my office we used to refer to it as the Odious Document Architecture.

Okay, so having content creators control presentation was a juggernaut. Yes, yes, I do see the appeal (and indulge in this activity several times a day…). What’s interesting is that the problem just…dissolved away.

The first big lesson on this was the Rainbow DTD, invented by EBT specifically to aid in the regularization of word-processing files as part of their “up-conversion” into SGML. Another sort of sideways influencer, I think, was the Text Encoding Initiative, which helped break some SGMLers’ dependency on the non-presentational markup argument — when you’re marking up an old newspaper for analysis, you might very well want to capture whether a particular story appeared above the fold on page 1. The next really huge lesson was the popularity of HTML and reasonable ways of doing stylesheets for it, which mocked many old-school attempts to get away from the corrosive effects of presentation. The final one was the appearance on the scene of the StarOffice XML/OpenOffice.org work, which began to fulfill a lot of fantasies about making real office documents manipulable with standard tools.

(I just found an interesting paper from 1996 on something called the JEDI Project, for Joint Electronic Document Interchange. Their analysis of TEI, TEI Lite, Rainbow, HTML3, and ODA, along with the various stylesheet technologies then available, was probably repeated by others many hundreds of times.)

The wild part is that SGML and ODA were pitted against each other and both pretty much fell over of their own weight (the latter took 14 years to standardize, for crying out loud), but it’s quite easy today for anyone to benefit from combining their approaches, and we’re down to arguing about which flavor to use for best success in interchange, future-proofing, and user control. Gartner’s first recommendation above was once seen as an intractable and even distasteful issue; now it’s trivial advice: “Recognize that you eventually will be saving your office product data in an XML-based format.”

No tags for this post.

10 Comments to “Open document architectures”

  1. orcmid 18 May 2006 at 9:16 am #

    As I recall, ODA provided a nice separation between content and presentation (although one could use a formatted-only form but that wasn’t really for turn-around editable documents). One of the features was that both parts could be provided in the same package, although it was not necessary to do so as part of interchange. The observations about ODA’s fate are apt and instructive, however.

    I would be cautious about the prediction in the Gartner analysis and elsewhere that acceptance of ODF as an ISO Draft International Standard precludes acceptance of the eventual ECMA Office Open XML Document Interchange specification. This is a complete misreading of actual rapid-adoption practice for the ISO ratification of standards produced by member standards bodies (OASIS and ECMA included) and of what is meant by “standard” at this level. (For example, do you think that the existence of ECMA and ISO ratification of C# specifications and the CLI would prevent Java from being processed through either ECMA or ISO? This might well, in fact, be an useful move at this point in the determination of an approach for open-sourcing Java.)

    Finally, it is strongly advisable for individuals involved in standards activities, especially those whose participation is sponsored by competing firms, to avoid suggesting or appearing to advocate that any standardization activity in some way precludes someone’s technology from becoming the subject of standardization.

  2. Eve M. 18 May 2006 at 10:01 am #

    Hi Dennis– Thanks for the additional info about ODA. I suspect our distrust back in the day was partly emotional and defensive, and we never had any software to play with anyway to discover what it would really offer us. It may have had the perfect separation of layout (but probably suffered quite a bit on the semantic axis — after all, we were already accustomed to marking up a myriad of command line components for what they were).

    I have no way or desire to predict whether Gartner’s assessment is right; I was observing that if you accept their analysis, then their recommendations are perfectly reasonable. In fact, it was the wording of their recommendations in particular that brought to mind the old discussions of how to achieve successful interchange and presentation independence, two requirements that content developers across time have often shared.

    I do not mean to advocate in any way what ISO should do, and in fact have nothing to do with ISO activities (though I did attend a meeting of the group that worked on SGML and DSSSL once more than a decade ago). To my very minimal knowledge, they have no rule against approving multiple specifications that treat similar areas of technology. And I am active in OASIS, which has at times hosted multiple committees working on similar technologies.

  3. orcmid 18 May 2006 at 4:43 pm #

    I just re-read your first paragraphs and I see what you were getting at. I was concerned you were being sucked into someone else’s lets-frame-the-conversation meme. With your experience in format standards wars, I should have known better!

    I struggled wiht ODA — and ASN.1 — at one time and I actually had a mutually unrewarding call with Charlie Goldbfarb about why Xerox wanted to abstain on HyTime (because of the SGML hub document being serious overkill). Last I heard, about 10 years ago, someone wanted to switch from the ASN.1ODA-like format we used to using a unique XML format. Odd, considering that thepayload consisted of 600 dpi page-sized raster images.

    I think there are going to be big interchange challenges between ODF-compliant applications, and it would be great if serious work began on that, so that adopters could specify profile requirements for procurement qualification. I think the same situation that made WS-I important is with us here too. I put off blogging about that, but I think it is almost soup now.

  4. Eve M. 18 May 2006 at 5:09 pm #

    I suppose I have to be careful to split hairs pretty precisely on such topics! :-)

    Your observation about the likely need for profiling ODF features is a good one. Except for the formula syntax issue, I’m not familiar with which features are implemented across the board consistently wrt interpretation, and which are underspecified (or even dying on the vine, like parts of W3C XML Schema?). Development of in-house styles and plugins for enforcement will also be key, of course for improving interop/interchange.

    Versioning issues might eventually be another worry, but of course that’s SOP for procurement of office suites now (with MS Office kind of the poster child for this).

  5. orcmid 18 May 2006 at 7:22 pm #

    I did a once-over for places where ODF is under-specified. I made a quick list in the table entries on compatibility and conformance starting at (http://nfocentrale.net/orcmid/writings/2005/06/w050601b.htm#3.5.1.7). Another way to check is to inspect ODF documents for namespace declarations that are not ones specified for use with ODF. (Not all of the declared namespaces are necessarily used in a document, but they are big hints as to what is potentially present in a given implementation.)

    There is no change in the OSI Draft International Standard 26300 (ODF) document that was balloted. It included the May 2005 ODF specification without changes. I don’t know whether there were ballot comments that might need to be resolved before the 2-month formal-standard ballot that should be next according to the OSI road-map (http://nfocentrale.net/orcmid/blog/2006/05/congratulations-odf-osi-draft.asp).

    I agree. I think compatibility test suites and profiles will be very important, especially in government agencies and institutions that are required to employ competitive procurement of standards-conforming products.

  6. orcmid 18 May 2006 at 7:57 pm #

    Short update. I was verifying my links to OSI materials and just saw that DIS 26300 is now at stage 40.99. This means that the stage 50.00 Formal Approval process should commence shortly. I suspect that means there were no comments in the 5-month ballot, but I’m just guessing based on how the OSI-published racecourse is being covered.

  7. Eve M. 19 May 2006 at 5:54 am #

    Man, I hope the 40.99/50.00 thing doesn’t mean there are 5000 separate and distinct standardization stages.

    Meanwhile, ConsortiumInfo is commenting now on the first Ecma draft of Open XML, and specifically on whether it’s over-specified!

  8. orcmid 19 May 2006 at 10:57 am #

    Stage 50.99 is the desired end for now. They go up by 10’s, and stages beyond 50.xx have to do with periodic reviews, end-of-life and such. The 50.xx range goes up discontinuously too. There are usually only 4 or 5 sub-stages, and the best-ending stage is always .99 although there are other exits. There’s also another ballot round at .

    Andy is funny. It is actually draft 1.3 (1.0 was the baseline submission). The table-of-contents is 97 pages and you can get the overall information in the first 550 pages of the document (thought there are still incomplete portions). Then there are reference sections that include element-by-element, attribute by attribute definitions.

    The parts I like are the conformance section and the Annex A on Interoperability Issues, where they will catalog every feature that has implementation-defined characteristics that must be defined by a conforming implementation. Strict conformance and (non-strict) conformance are clearly defined, including what a (non-strict) conforming implementation must make available, how it must provide warnings on detecting or producing documents using extensions, etc.

    I think Andy’s post reflects the common tension between assured interchange and interoperability and the use of a format (e.g., XML itself) as a foundation for innovation and variation on a common substrate. It’s probably a good depiction of why one size doesn’t fit all.

    My sense is that, in taking the legacy preservation route, this is what OOX has to look like. If ODF is to end up being the laboratory for exploratory development of office-format variations, it will be valuable that there are also OOX and PDF for strict preservation of editable and non-editable document forms over extended time. ODF may also be appealing to developers for new entrants since the conformance bar seems to be lower. On the other hand, repurposing and adding custom material to OOX documents will be popular. Thank heavens for XSLT, huh?

  9. orcmid 19 May 2006 at 11:00 am #

    Oops. There’s a two-month ballot that starts at 50.20. (I forgot to fill in that sentence.)

  10. Eve M. 19 May 2006 at 12:04 pm #

    “Thank heavens for XSLT, huh?” Yeah, and for transparent XML formats with no binary or foreign-format portions within them… :-)