Chapter 3. DTD Project Management

Table of Contents

3.1. The Global Picture
3.1.1. Types of Interaction with Documents
3.1.2. Components of an SGML-Based Production System
3.1.3. The Reference DTD and Its Variants
3.2. Preparing to Launch the Project
3.2.1. Defining the Project Goals and Strategic Directions
3.2.2. Controlling the Project Risks
3.2.3. Staffing the Project
3.2.4. Listing the Project Deliverables
3.2.5. Planning the Schedule and Budget
3.2.6. Writing the Project Plan
3.3. Launching the Project
3.3.1. Setting Up the Project Group
3.3.2. Identifying Future Users
3.3.3. Defining the Scope of Documents
3.3.4. Listing the Project Constraints
3.3.5. Planning the Project Workflow
3.4. Handling Project Politics

In this chapter, we explain where DTD development takes place in the context of a global SGML project, listing all the issues you need to consider before a DTD development project can be started and describing how to run the project with a minimum of risk. We conclude by giving hints on how to handle the politics of such a project.

This chapter is primarily for the project leaders and their managers who are responsible for planning and executing a DTD development project. Other key people on the project, such as the facilitator of the DTD design team or the DTD implementor (roles we'll explain in Section 3.3.1, “Setting Up the Project Group”), may find some useful information here about how the project is run and what is expected from them.

We won't cover in detail how a DTD development project is run on a daily basis, nor will we mention the traditional project management methods and tools that are handy to use if your project is large and complex.

3.1. The Global Picture

This section puts DTD development in perspective within the realms of document management and the setup of a complete SGML-based document production system. It should help you determine what must happen before the actual DTD project starts and what the DTD, once it is completed, is likely to be used for.

To design a useful and efficient DTD, you need to figure out how the documents will be created, how they will travel through their lifespan, and the uses to which they will be put. Because documents are created, controlled, and used by humans, we call their various states “interactions people have with documents.

Once you can articulate all the interactions people will have with your documents and their sequence, you will be able to define the features your information system will need to offer and to derive its probable architecture.

This section will therefore discuss:

  • The three basic types of interaction people have with documents

  • The typical components of an SGML-based document production system and their interdependencies

  • The “reference DTD” and the variants on this DTD that are likely to be needed

3.1.1. Types of Interaction with Documents

Writing documents is usually not a self-justified activity, unless you are writing poetry or fiction. Most documents stored in SGML form are created for the purpose of conveying information or keeping track of information. These considerations delimit the three types of interactions people have with documents:

  • Creation and modification

  • Management, storage, and archiving

  • Utilization

These interactions are, of course, strongly intertwined, but for the sake of clarity it is convenient to describe them separately. This classification system can be represented graphically, as shown in Figure 3.1, “Document Interaction Classification”. The question marks at the boundaries between classes represent document validation.

Figure 3.1. Document Interaction Classification

Document Interaction Classification

People involved in SGML projects tend to focus on document creation and modification needs because they are the first obvious activities where the DTD is used, and also because it is their usual domain of activity. Limiting your horizon to these activities is usually a mistake; when designing your DTD, you must also take into account needs arising from management and storage activities and those arising from the various uses that will be made of the documents. This broader point of view not only has consequences for the DTD design and implementation, but also for the way you will build your SGML-based production system.

The following sections describe the goals to be achieved, the documents to be produced, and the hardware and software considerations to be resolved for each type of interaction.

3.1.1.1. Document Creation and Modification

Creation and modification consist of all the activities involving the input of up-to-date documents into the system. These include the obvious on-site creation and modification activities, as well as review, validation, input of untagged data, and all the other ways to import documents from other sources.

There are several cases of import. The documents to be imported may already follow the chosen standards of your information system, in which case we can call their insertion in the information system a plain “import.” Then you simply need to ensure you have a reading device for the media on which the files are delivered.

In most cases, however, the documents to be imported into the system are heterogeneous, with different file formats, markup systems, and levels of consistency within types. They may even include SGML documents marked up according to a different DTD from the ones your system works with. These documents will need to be processed before they can be imported. Following current industry practice, we'll use the term conversion for processing of non-SGML source documents and transformation for processing of SGML source documents, no matter what the target form is.

Conversion and transformation processes are illustrated in Figure 3.2, “Conversion and Transformation Processes”.

Figure 3.2. Conversion and Transformation Processes

Conversion and Transformation Processes

The answers to the following questions can help you define your import, creation and modification activities, and as such they will influence the specifications of your DTD and document production system. If you are building an entirely new production system, asking these questions and analyzing the answers will help you specify and prioritize your requirements. But this specification is an iterative process, because some answers are likely to change as the system is developed and in-house culture evolves.

  • Document Contents

    What are the documents' contents: text, tables, illustrations, equations, video, other? In what proportions?

    This answer helps you determine how many different types of objects you will have to handle, the human competence you will need to have in-house, and the number of different tools these people will have to master.

  • Document Languages

    In how many languages are the documents written? What character sets do they require? Do any documents need to contain text from multiple natural languages? Will any documents be translated by humans or software?

    This answer helps determine certain technical aspects of the DTD and related setup information, and suggests special objects and markup that you may need to handle.

  • Markup Consistency

    Within the files, is the markup is consistent (use of stylesheets, templates, DTDs)?

    Whatever the original markup, if it has been used consistently and the rules have been checked regularly, then the conversion process can be largely successful (though it generally won't invent markup that wasn't there initially). If the markup hasn't been used consistently, then the results of conversion are usually insufficient and need to be manually controlled and enhanced.

  • Sources of Documents

    Do the documents come from one or several sources? On what media (paper, diskette, tape, online files, CD-ROMs)? With what means (online connection, up- and downloading, “sneakernet,” other)?

    As already mentioned, for a plain import of conforming material, you need to make sure your system knows how to make use of the delivery media. This is not as trivial as it may sound; if the markup of the documents is system-specific, it may require a particular hardware platform or operating system. If the import is from an online source, you need to make sure the network protocols are compatible and that your system knows how to read the downloaded result.

  • Document Outputs

    What is the expected output of the documents produced: books, magazines, catalogues, CD-ROMs, online databases, other? Under what circumstances will the contents and the writing style need to change?

    There are three main ways for the content to differ according to the output:

    • The document delivery media are so different that they require different ways of expressing the same content and different writing styles.

    • There are various outputs based on extraction parameters from a single set of content.

    • The content of the documents undergoes revision over time, and multiple chronological versions may need to be produced simultaneously.

    If the contents and the writing style of your documents differ according to their output, you will probably have to support several variants of the same information simultaneously, each containing the editorial changes required by each form of output. For instance, in computer companies, there may be substantial content differences between the summarized documentation they offer on line with their software and the full-blown paper user documentation. The difficulty is to teach the authors to write coherent variants, then to implement a coherence control device in the system so as to ensure the information contained in each output version does not diverge.

    If you plan to support only one set of source files for each document and produce different delivered forms of the documents through processing that same source differently, you have to define the relevant processing parameters up front and consider them in your modeling process. Once the source documents are properly parameterized or marked up, you must be able, through an automated process, to produce all the expected outputs. For instance, from the same source document you could automatically produce different documents for different target populations (elementary, intermediate, advanced), for different models of the same equipment, and with different levels of confidentiality.

    In addition, your documents will evolve over time and you may need to keep track of the progression of versions. Information systems typically call this “version control.

    You should plan to keep careful track of all the variants and versions that are important to your documents.

  • Document Creators

    Who creates the documents: subject matter experts, professional writers, secretaries, other? Will authors type and mark up the documents themselves or have it done?

    Most often, authors design the content and type their text themselves. In doing so, they also do some markup of their documents, but with traditional word processors the markup process is transparent. With an SGML-aware editor, they have to select and add the markup on top of their text. In this case, the authors are responsible for the design of the content, the choice of the markup, and the input of the data.

    However, there are other possible divisions of the labor. We know of a legal publisher where consulting lawyers could not be convinced to learn a DTD and write in SGML. The publishing department had to accept jurisprudence discussions even in handwritten form, and therefore had secretaries type or retype the texts and then had legal experts mark up the electronic documents. In this case, the functions of content development, markup, and data entry were performed by three different categories of staff.

    In another case, in a nuclear plant, where experts had to go in the field to do their work and produce their reports, it was decided that the troubleshooting teams would look over the equipment, analyze and fix the problems, and write the incident report on the spot. To avoid any ambiguity, they would hand-tag their handwritten report and would then give it to a secretary who would input the report content and markup electronically. The SGML-encoded report would then be reviewed by the author and signed by the troubleshooting manager before it could be entered in the database. In this latter case, the development of the content and the choice of the markup is made by the subject matter expert and the input by a tool specialist who has no knowledge of the content.

    It is important to determine what the organization is going to be like in your company, for two reasons. First, you need to ensure that you have representatives of both the people who design the content and the people who choose the markup in your design team. Their view of structure and content may be quite different, and you need to ensure both points of view are accommodated by your final DTD. Second, your choice of editor or editing environment will depend on the amount of markup assistance the person at the keyboard will need and the speed at which that person will need to work. SGML-aware editors with a graphic interface provide the most markup assistance, but many data entry specialists feel that nongraphical editors are more efficient for straight input.

  • Creation Locations

    Where are the documents created: on the company site, off-site, at the authors' homes, at a subcontractor's site? Can the site be connected to the information system?

    The answers are crucial to the architecture of your document production system. If you need authors to work on standalone systems on or off-site, then you must provide standalone editing environments. This is especially difficult when you want your information system to provide file names, IDs, or database information such as copyrights or parts numbers. In this case, you may need to build an “antechamber” for the standalone documents so that every aspect of their validity can be checked before they can be added to the system.

  • Creation Timeline

    How are the documents created over time? Are they created all at once, or over a certain length of time? Are regular additions made? Do they require several passes and reviews?

    This will help define how long each document will need to remain in unfinished form in the system and decide whether a software-based revision tracking system or a full-fledged workflow system should be implemented. If the same document is reviewed and corrected several times before it is finally published, it will probably be necessary to keep track of its various revisions. If the process is well defined and can be modeled, then a full-fledged workflow system may be useful.

    When documents are created over a period of time or when they are assembled from information modules of the same size (chapters, recipes, addresses) it is likely that people will edit only fragments of a document at one time. It is therefore important to determine what size of document fragment people feel is most practical to work on at a time, and use this fragment size as a basic level of information granularity. The answer will probably suggest an appropriate architecture for an authoring sub-DTD (discussed in Section 3.1.3.2, “Authoring DTDs”).

    If the documents are created by several authors and then assembled, or if they include pieces of information already created and stored in the information system, you need to define the extraction and assembly processes, each of which may use variant DTDs (see Section 3.1.3, “The Reference DTD and Its Variants”). If authors work on information modules, then the information system must check that the IDs are not duplicated and that the cross-references are satisfactorily resolved.

    Above all, ensuring the quality of the final assembled documents (for example, sequence and transitions between chunks) will require careful thought. Contrary to common belief, information reuse is far from easy and risk free. For example, reusing a graphic already designed in another document by a different author requires a very sophisticated control system to avoid blunders when the original author decides single-handedly to change the original graphic. The same goes for chunks of text, with the added difficulty of reusing text at the right structural level. Cutting and pasting is not always neutral in terms of meaning, and good writers do not write similarly at different hierarchical levels of a document. Reuse may imply some rewriting and adding transitional paragraphs to avoid producing awkward documents. (Section 6.5, “Divisions ” discusses some of these issues further as they relate to document type modeling.)

  • Deadlines

    Are there specific deadlines in document creation and modification? What is their frequency and their criticality?

    Listing the constraints on document creation regardless of document production constraints allows you to adapt and dimension your creation environment and your information system and choose feasible solutions that will automate and facilitate as many processes as possible to meet the deadlines.

  • Revision and Reuse

    Do the documents need to be revised and/or reused over time? If yes, how often, in what proportion to original material, and to what extent?

    Your revision needs will affect your requirements for keeping on hand the editing and conversion environments that produced your documents in the first place. If your information system contains only nonrevisable (for example, scanned) documents, you don't need to maintain an editing environment over time. The more formats you intend to use for revisable text, vector graphics, raster graphics, and so on, the more expensive and complex your creation and modification tools must be.

    To avoid the multiplication of tools and tool proficiency you may want to standardize all the data to one format for each type, and then convert incoming documents to those formats. This approach becomes very efficient when the time comes to update the documents.

    If your information system is only a repository of dead documents, the cheapest and simplest system might be in order. If you need to update most documents, if they come from various sources, and the minimum annual update rate is about 35 percent, then it is worthwhile to build a powerful DTD, have an SGML-based information system, and work with conversions and transformations.

  • Editing Culture

    What is the editing environment like? Do authors use structure-aware tools, WYSIWYG tools, or plain text editors? What should they use in the future?

    The answers help you understand the current authors' culture and figure out to what extent it should or should not be changed. The idea here is not to concentrate on tools, but on the frame of mind authors are in. Are they happy with what they have? Do they see any limitations? Would they like to have more structure-aware tools? Are they already using tools with structural constraints, and would real-time markup assistance and validation be helpful? Are they aware of what more intelligent markup would help produce in editorial terms?

    Once you have assessed the authors' will to change or improve their editing environment and depending on where you want to take them, you know what basic features your editing environment and tool should offer and how much training and “convincing” are ahead of you.

    This survey should pay close attention to the editing and inclusion of nontextual objects and tricky textual objects such as tables, because they are just as an important part of documents as text is, and they easily become an issue, especially with authors used to WYSIWYG tools.

  • Auxiliary Data

    How do authors collect and include additional relevant information?

    This kind of information covers technical specifications and marketing requirements for technical documentation, raw facts for an encyclopedia editor's remarks, or translation tips. Often, it is overlooked in the creation and modification activities. Ideally, you want to devise an efficient electronic system for this exchange of information, rather than making merely cosmetic changes to any inefficient processes in place, such as oral tradition, sticky notes, and “sneakernet.

  • Delivery Content

    What is an author's delivery composed of?

    The problem here is to define what to expect in terms of an author's handoff of material (content, form, media, metainformation about the delivery) and be able to qualify a completed task. For instance, if authors are fully responsible for developing chapter content and marking it up in SGML but aren't responsible for producing illustrations, and if their authoring DTD doesn't accommodate the required higher levels of document division, then you need to define the “interface” between the authors' delivery and the next step in document production. Authors might be required to declare entities for illustrations, insert entity references into the files, store their chapters in the information system, provide metainformation about each chapter, and validate the chapters. If you want them to do all this, you need to provide them with the tools and training to safely achieve all these tasks and have a mechanism for checking that they've performed the steps correctly.

3.1.1.2. Document Management and Storage

The document management and storage classification covers many activities. Managing documents implies checking, naming, classifying, and indexing them in order to retrieve them. It includes the assembling of document pieces, the updated insertion of data extracted from various databases, and the resolution of links. Your management system may put in place safety devices such as providing unique file names and IDs, locking up files while they are edited, automatic saving of current work, version control, and enforcement of document workflow. The storage activities range from the basic save and backup to more elaborate archiving devices, such as automated destruction of obsolete documents and regeneration of previous editions of documents.

To evaluate your current system and define the system you need, the answers to the following questions might be useful. They will complement the answers to the questions in Section 3.1.1.1, “Document Creation and Modification”, since the creation and modification activities are closely linked to those of management and storage.

  • Document Types

    What types of documents are handled? Are they of the same kind (all recipes, all articles, all standards) or of various kinds (research and development specifications, memos, marketing literature, and press reviews)?

    If various types of documents are handled, you need an information system that can handle several DTDs (or models of other kinds), plus a way to retrieve a thematic grouping of different types of documents based on specific parameters. For instance, there may need to be a way to group and search through all the documents related to a specific piece of equipment, from design specifications to after-sale maintenance documentation. Such a requirement can put a heavy constraint on the DTDs you will build, such as providing some compatible content models and similar markup names for similar components in the various DTDs.

  • Electronic Storage

    Are all documents currently stored in electronic form?

    If not, you will have to evaluate the real need for those documents to be stored in electronic form and then evaluate the cost of finding the sources, or scanning and using OCR versus keying in the documents from scratch. Such a process is usually long and costly. In most projects where documents are not available in electronic form, people generally start to build the system with current documents that are available electronically and documents that have yet to be written, and just leave the legacy documents to be handled in the old way.

  • Electronic Formats

    What are the electronic formats being used? Are there multiple formats for text and for each kind of nontextual object?

    The fewer the formats, the easier it is to handle them in your new system. Since this is not always the case, you need to choose the formats you want your system to support, and provide converters for all other formats. As already mentioned, any conversion process is a tricky and often disappointing one.

  • Document Reuse

    How are documents used or reused? By whom? How often? Where from (the same network, the same site, or some unconnected location)? Is there any version control? How is it or does it need to be implemented? Are documents reused “as is” or after modification?

    These questions are the same as some of those in Section 3.1.1.1, “Document Creation and Modification”, but need to be considered from the system administrator's point of view, rather than that of the authors.

  • Document Processing

    What processing happens to the documents? At which stages? Can documents be built from other documents? Which ones and how? Is workflow control necessary?

    These questions will help define the applications you need to build within the document database management system to meet the needs of the users. You may be able to take advantage of commercial products for workflow and groupware but may need to develop your own applications to meet specific company or department needs.

  • Access Control

    Is there access control to documents? For editing or consulting? How is it enforced?

    This question is just an example of all the questions you need to ask that are related to the security features of the system. The answers to these questions may be irrelevant to your business, or they may be absolutely crucial.

  • Storage and Archival

    Which documents must be stored in the work process, and for how long (taking into account business and legal considerations)? Is there a need for a “hub” storage or archival format and, if so, what is it (for example, SGML or formatted files)? What are the security requirements for archived documents? Are documents destroyed after their “expiration date” and, if so, by whom?

    These questions aim at automating the storing and archival process. One common mistake is to overlook the need to regularly destroy obsolete documents because they do not physically disturb the office environment. But documents do take up a lot of room on hard disks and providing a device to regularly destroy old documents or make people clean up their files is rewarding in the middle term. Also, the ISO 9000 standard requires expiration dates on documents, which is helpful in automating the cleanup process.

    If you need to archive source-form documents but want to retain some facts about their presentation, you might want to employ a special kind of presentation DTD that records portions of documents that had been generated for delivery, such as tables of contents and indexes.

    In some organizations, the document management and storage activity is completely set aside and left to the discretion of authors. In this case, documents are written and the output is generated right after they are created. But the uses of such documents are rather limited because all uses requiring large collections of data or documents (such as electronic publishing) imply some level of document management.

3.1.1.3. Document Utilization

If you are not limited by the architecture of your document production system, a whole range of uses is open to your documents: printing, searching, viewing, exporting, interchanging, extracting information and building alternate documents or subdocuments, and processing the contents to do all kinds of analysis.

It is essential to be able to describe precisely which uses are expected in the short and the middle terms for your documents so as to specify DTD requirements and the necessary processing applications. To describe the uses, you need to answer the following questions.

  • Reading and Viewing

    How can the documents be read or viewed? On paper, on screen, in Braille? Online or standalone? On-site or off-site, or both?

  • Searching and Navigation

    How can the information be searched in each delivered form? By page, by table of contents, by table of illustrations, by index, by full-text search, by keywords, by browsing hyperlinks, other?

  • Delivery and Media

    How do you need to prepare for delivery? Which media will you use (paper, diskette, smart card, CD, Internet, other)?

  • Required Processing

    Do you need to process the information first before actually using it?

    The answer to this question leads to other issues:

    • If the source files can be delivered as they are, as when you exchange source files with business partners and subcontractors, you do not need to develop processing applications, but just ensure that you can package the entire contents of an SGML document properly and that the recipients are equipped with a compatible environment.

    • If you must process the information, what kind of processing is necessary (assembling, extracting, formatting, indexing, building links, transforming to a different DTD, content analysis, other)?

    • Does the processing operate on whole documents or just fragments? In the latter case, what information from larger documents must travel with the fragments so as to render usage feasible?

3.1.2. Components of an SGML-Based Production System

With the three types of document interaction in mind, it is easier to define the potential components of an SGML-based document production system and their interactions.

Just like any other document production system, an SGML-based production system needs to offer tools to create or import documents, tools to store and archive them, and tools to publish or view them. Each of these activities corresponds to one or more components of the production system. Just as in the construction of a building, which has various parts (foundation, garage, ground floor, other floors) and logistical connections (stairs, elevators, electricity, plumbing), the pieces of a production system can't be assembled at random, though some pieces might be optional or might be possible to add to the system later.

The components we have identified are as follows:

  • Editing tools and environments

  • Conversion engines and applications

  • The document management system

  • Formatting engines and applications for rendering on paper and screen

  • Other transformation engines and applications

  • Search and retrieval engines

  • Additional processing applications

  • The “document engineering toolbox

In the following sections we describe each of the potential components of an SGML-based production system, their purpose, and some constraints that might apply to them. Section 3.1.2.9, “Dependencies Between Components” describes the dependencies among these components. Not all systems need to include all of the potential components; you can pick the ones that are adapted to your company's needs.

3.1.2.1. Editing Tools and Environments

To create SGML documents, you need to provide an editing environment or other ways to acquire information.

It's common to have to build your document repository using a combination of tools:

  • SGML-aware editors

  • Traditional word processors and other existing editing tools

  • Tools for one-time and/or routine conversions

  • SGML-to-SGML transformation tools

The conversion and transformation tools are described in later sections.

For actually creating and modifying documents, either you can use one or more of the available SGML-aware editors or simple non-SGML-aware text editors, which can work directly on native SGML documents, or you can continue to use your existing word processor or desktop publishing system and provide ongoing batch conversion to SGML. If you plan to use an SGML-aware editor for your existing documents coded with traditional markup, you need to perform a one-time conversion to SGML first. If you need to import documents from external sources, you may need to convert them or perform various SGML transformations, depending on the markup they use.

The decision between SGML-aware editors and traditional editors is an important one. Following are some of the factors to consider.

An SGML-aware editor integrates several functions. Apart from being a text editor with its traditional functions, it includes parser-based software that first compiles or otherwise prepares a DTD to become resident in the system, and then controls the validity of the document instances being built with the selected resident DTD. It also offers markup assistance, as it lets the user see and use only valid markup in context, and can offer a variety of automated authoring functions, such as automatic insertion of required markup, the building of tables of contents, and management of boilerplate text stored in entities. SGML-aware editors generally have interfaces that take advantage of the hierarchical nature of SGML, which can have a positive effect on the processes of document planning, outlining, and revision. There are a variety of SGML-aware editing products available on the market, with more or less sophisticated markup facilities, WYSIWYG features[4], text editing functions, accessibility from an external database, and availability on computer platforms.

As a result, when using such a tool, you can rely on the documents produced to conform to your DTD. Thus, you can concentrate your efforts on training authors to apply the markup correctly and thoroughly and on using the features of the tool itself.

If you keep using a traditional word processor, the problems of hardware, software, training, and documentation are already solved. Nevertheless, this solution does not ensure consistent markup of the documents and usually does not allow the content to be sufficiently and unambiguously marked up. In order to facilitate and improve the quality of the conversion to come, you will have to build extremely detailed stylesheets or templates and build tools to enforce their constraints, and even so, the results may be unsatisfactory. (Section 3.1.2.2, “Conversion Engines and Applications” discusses the reasons in more detail.) Thus, though it may seem like an inexpensive option at first, its ongoing costs can be quite burdensome.

3.1.2.2. Conversion Engines and Applications

A mechanism for conversion is a component that can often be postponed in the setup of the production system, but can't usually be ignored, because most projects have legacy documents to convert. Some projects start with conversion because of the need to have a huge document base to work from; others start with creating new documents compliant with the new standard, and when they are sure the whole production line is operational, they turn to conversion to import all their legacy documents into the system.

But conversion is absolutely unavoidable when you are planning to keep using a traditional tool for editing documents. In this case, the conversion process, whether in real time or in batch, is crucial to the editing of documents and therefore to the life of the system.

Commercial conversion engines that specialize in converting non-SGML documents to SGML usually work in two phases: the decoding of the source document, followed by its interpretation, resulting in an SGML instance conforming to the DTD for which you have parameterized it. Other character string manipulation tools can achieve the same result, but they are not sold as SGML-specific conversion engines, and they must be programmed specifically to achieve that task.

The first step is to read the source document and decode whatever pattern of markup there is. Some engines translate this markup into an internal language, and some translate it into a specifically designed or an existing presentation DTD (for example, the Rainbow DTD, described in Section 3.1.3.3, “Conversion DTDs”). For documents produced with some desktop publishing tools, multiple choices of markup can be used to produce the same presentation. In this case, the conversion engine must provide a capability that recognizes the various presentations just like a human eye would, regardless of the markup used to achieve the result. This feature helps to identify the relevant objects based on their presentation, rather than recognizing patterns of electronic data and markup.

The second step is to interpret the results of the first step and generate an instance conforming to the selected DTD. This step consists of identifying the relevant objects, matching them with SGML markup, adding structure that wasn't originally present (for example, chapter containers and element end-tags), and taking care of reordering, removing what will later be added back as generated text, and so on.

To parameterize a conversion engine, you build rules based on markup, literallayout, and patterns that tell the engine what to do when it encounters a pattern or a relevant object. Some rules are quite simple, like the matching of paragraphs. Others, like lists, require some typographical information to define where the items start and where they stop and if they are lists within lists. Some rules generate several actions from one event, such as inserting a chapter start-tag before a chapter title start-tag when such a title is encountered. Some rules need to be sensitive to context in order to be efficient, for example checking whether an apparent “chapter” title in the source document begins with a capital letter followed by a period before marking it up as a chapter or an appendix.

Some engines don't allow the building of sophisticated contextual rules, especially when the required context occurs after the event where the rule must be applied, a technique called “lookahead.

The quality of your conversion results is mainly based on the quality of the non-SGML source documents. If your source documents are all created with the same word processor and the same stylesheet with a consistent literallayout, the results are probably going to be very good. Conversely, if your documents were created with various tools, if no stylesheet or template or writing rules were imposed, or if authors were free to control the presentation and structure of their documents, then you are in trouble because it will be impossible to create rules which are applicable to all documents.

You must also be aware of the limits of conversion to SGML. Conversion of data formats has always been a necessary but difficult process. Converters are typically based on a set of filters that try to match the source format with the target format, trying to lose as little information as possible on the way. But how much was lost was difficult to evaluate. With SGML, the problem is slightly different. In most cases, the level of information that is marked up in the source documents is significantly inferior to the level of information that is expected to be marked up with the new DTD.

When you need to convert documents edited with a traditional word processor to SGML, you're likely to run into some common problems. Usually, converting the overall hierarchical structure of a document is a simple matter. The three main problems are the identification of boundaries of information blocks such as paragraphs, exercises, and notes; the identification of the end of each element when elements are nested; and the identification of inline pieces of information. A more subtle problem is that authors may have used appearance cues to give each document a unique “feel,” which is nearly impossible to capture in an automated conversion process.

With word processor styles, you usually mark up just the beginning of information blocks. Because the start of a new block is assumed to end the previous one, the end is never explicitly marked up. SGML can nest blocks of information inside other blocks, but automatic converters can have difficulty figuring where some blocks end unless this information is included in your original markup.

Inline information has a slightly different problem. Often, few of the significant words and phrases in a document are marked up at all, since they may look identical to other text. Even if the odd boldface, italicized, or underlined word appears, the emphasis often is inconsistently applied or has multiple meanings. Converters are often unable to mark up inline information appropriately, since several possibilities are open.

To help solve these problems, you may need to create styles and codes to mark up material that had never been marked up before, teach authors to use them properly and consistently, and develop validating software to catch improper usage. If the authors don't have real-time markup assistance, the validator may find many errors.

It is therefore very important before you launch the conversion project that you thoroughly study your source documents, their homogeneity or heterogeneity, and the type of problems you will be likely to encounter. If you can rely only on the presentation, you need to choose a type of engine based on visual recognition. If the patterns are very obvious and recurrent, then you can choose a pattern-matching type of engine.

If you have made up your mind to use conversion on a regular basis as a way to let the authors keep writing with their favorite tool, then it is worthwhile that you standardize the way they write, and the stylesheet they use, and that you make the markup as precise as possible—even to the extent of using hidden text codes representing SGML-like markup—so as to make the conversion process as easy and reliable as possible.

3.1.2.3. Document Management System

To manage and store SGML documents, it is convenient to have a database run by a database management system, complete with a version control device or a workflow system. Some database management systems are now specifically adapted to the handling of SGML documents. Setting up this system will be a whole task in itself, as you will have to define the user interactions, the document types and formats, the document flow, the documents status and lifespan, and so on. This component of the production line is crucial because it is the link between creation and utilization.

As of this writing, there is no one solution for SGML document management that can be unequivocally recommended. Commercial producers and consumers of document management systems are still wrangling over whether relational databases are sufficient or whether object-oriented databases are required to do a proper job.[5] Some software publishers are building SGML-aware layers atop hybrid object/relational database management systems, which may provide efficient solutions for managing SGML documents at the granularity of the individual elements.

What is more, experience shows that no document repository is as pure as we might wish it to be, so chances are that your system will have to handle both SGML and non-SGML documents.

Following are some basic requirements that may be useful in helping you to specify and choose such a system. The ideal document management system should at least offer the following features, even if you do not plan on implementing them all. Keep in mind, though, that the ideal system may not exist yet!

  • Handles All Documents

    Inputs, stores, and outputs all the documents you need to manage.

  • Controls Access and Retrieval

    Allows authors and system managers to log in, select work, and check out relevant documents. Allows authorized people to check documents back in after having done the necessary validation control (parsing or other).

  • Controls Writing of Data

    Locks up documents while they are being worked on.

  • Handles SGML and Non-SGML Documents

    Imports validated SGML documents, as well as non-SGML legacy documents, and provides at least a minimal management functionality for non-SGML documents (naming, storing, retrieving).

  • Manages Nontextual Objects

    Along with text, manages nontextual objects such as graphics, images, sound, video, animation, or other objects you have.

  • Controls Workflow

    Controls the workflow from creation through reviews, comments, modifications, approval, mastering, and archiving states and provides a device to show the status of each document and route it accordingly.

  • Manages Document Characteristics

    Manages the formats, names, IDs, and locations of documents so as to store them safely and retrieve them reliably.

  • Handles Version Control

    Offers ways to master and keep track of the versions and variants of documents according to your needs.

  • Handles Selective Extraction and Assembly

    Allows extraction of documents according to selected parameters, and facilitates their reuse and assembly into new documents while keeping track of the sources.

  • Handles Structured Queries

    Allows queries on every element within modules of information and retrieves the relevant modules.

  • Allows for Variable Granularity

    Allows you to choose the level of granularity of documents that the system should handle as independent information modules.

    For example, if you are a documentation publisher and your system only manages complete manuals, then each document is going to be too large and unwieldy to manipulate. You may decide that, for convenience, you want the granularity of management at the chapter or procedure level, because these correspond to a meaningful module of information worthy of separate management. This means that the basic modules authors and users will be able to manipulate will be a chapter, recipe or a procedure. Apart from being easier in terms of access time, it often makes more sense when you offer online search and retrieval access to information.

  • Manages Modules and Module Collections Individually

    Manages all the information modules separately and allows you to assemble completed documents from various modules. Several collections based on the same modules should be allowed to be created and stored separately, and it should be possible to control and check the edition of a module that gets used.

    After authors are done producing the necessary modules, your system must allow to define the sequence in which they wish the modules to be read or published and keep track of how the final document must be built.

3.1.2.4. Formatting Engines and Applications

Formatting engines and applications are special cases of transformation engines and applications, discussed in Section 3.1.2.5, “Transformation Engines and Applications ”. A formatting (or “composition” or “rendering”) engine provides specialized technology for transforming SGML documents into files that contain presentational markup (for example, using a page-display language) that can be interpreted by a printer driver or display software. A formatting application is a customized or parameterized use of such a technology.

Note

You should start building or reviewing your literallayout specifications and corporate identity charter as soon as you contemplate building a formatting engine. Experience shows it's often the case that, unless your company specializes in publishing, any corporate literallayout specifications you have are probably not detailed enough to serve as the basis for the stylesheets you'll need. When trying to define the presentation of all the potential combinations of markup in documents, you're likely to discover that the initial presentation requirements for documents are very thin and require further work.

Mechanisms for producing formatted output are especially important in an SGML-based document production system, not just because they help produce documents that are suitable for delivery, but because they serve a “marketing” function: They are the tools that help prove that the whole system works. Until authors, managers, and users see good-quality typeset output produced from an SGML instance, they may be suspicious of the ability of a system based on structured markup to output “real” documents. This is one of the reasons why, in some projects, the formatting applications are developed before any other component although the initial improvement expected from the new system is to release electronic or online documents. The new system is expected to output at least what the old one used to—that is, paper.

To build a formatting application, you need to make a transformation from SGML markup to the markup of the literallayout software you have chosen. You need to tell the software what the presentation must be for each element according to its attributes and its context. This mapping is usually referred to as a stylesheet.

It is relatively easy to build several stylesheets to apply to the same instances. Unfortunately, the stylesheets of most formatting products are in a proprietary form that cannot be read by other formatting engines. This lack of compatibility across software means that the presentation information for SGML documents is not nearly as portable as the source SGML documents themselves.

You have two main choices in formatting engines. If you buy an integrated software package and develop stylesheets for it, the package will be responsible for transforming SGML documents all the way into printable or otherwise deliverable form. Alternatively, if you acquire a programmable transformation language, you can program applications that turn your documents into markup systems of your own choosing (for example, inputting the results of the transformation into desktop publishing or word processing software for eventual output). In general, parameterizing an integrated package is easier and requires less programming skill, but for sophisticated output schemes, both approaches may require advanced programming skill.

People usually judge a formatting engine by its ability to render all the fine details of the traditional presentation of their company's documents. However, it is just as important to consider whether the engine works on the source SGML instance or on a transformed source file. The farther away from SGML the files must be for formatting, the more likely it is that any kind of late alteration during the literallayout process will not make it back to the original SGML instance and will be lost for the next editions. If the alterations are only presentational, there is no harm done, but often late alterations also affect the content, which is dangerous. If you choose a formatting engine that requires a proprietary data format and can't operate directly on the SGML instance, you have to either enforce strict procedures to forbid content alterations at the literallayout stage, or set up a mechanism whereby the content alterations can be added back to the original files.

For print, it may be useful and cost effective for you to implement a formatting engine with both a limited capability for proofing or preview, and a full-blown capability for mastering or producing the final copy. The proofing engine is limited to automatically laying out and printing the document according to a set stylesheet when you click on a button. It does not offer all the fine enhancement capabilities that you may require for a printing quality, and does not let anyone have access to formatting functions. The fact that it prevents authors from spending time tweaking the presentation of documents at this stage can result in savings of time and money. This part of the software can be available to authors, reviewers, and all the other actors in the document production line until the content of the documents has been completed and validated.

Only then are the documents sent to compositors, who are equipped with the version of the formatting engine that gives access to the full capabilities of a professional literallayout tool. It allows them to do manual copyfitting until the documents reach the required printing quality.

This two-pronged approach saves time and money on the content development phases, allows a more accurate estimation of the literallayout resources needed, and improves the quality of the printed result because it is conceived and carried out by professionals.

3.1.2.5. Transformation Engines and Applications

Transformation is a much more satisfying process than conversion because the starting point, SGML document data, is unambiguous and precisely marked up. There are three main types of transformation:

  • The transformation of an SGML document to any non-SGML format

  • The transformation of an SGML document from one DTD to another DTD

  • The transformation of an SGML document to a different instance conforming to the same DTD

As we've already discussed the transformation of SGML documents to various output formats in Section 3.1.2.4, “Formatting Engines and Applications ”, here we'll concentrate on the last two types.

The transformation from one DTD to another DTD is necessary when one must interchange documents with business partners or subcontractors using an agreed-on or an industry-specific DTD (which we'll call “DTD A” here). This DTD may not be used as is in-house because such DTDs are often considered too large, too complex, and ill-adapted to the specific needs of a specific task within a company. If a different DTD (which we'll call “DTD B”) is used in-house, it means that interchanging using DTD A will require two transformations: from DTD B to DTD A when documents are exported, and from DTD A to DTD B when they are imported. The two transformations are not similar because when exporting you must leave out everything that is company specific, and when importing you can choose to leave out the information you do not need.

Transforming from one DTD to another is also widely used when an in-house DTD has been upgraded and all the documents produced with the old DTD must be transformed to be compliant with the new one.

In both cases, the difficulty lies in the convergence or divergence of the markup models. Some divergent structures are extremely difficult to map. The rule of thumb is to build an in-house DTD that is structurally compatible with the interchange DTD you will transform to and from. And when you upgrade your DTD, remember that you will have to upgrade all the documents already marked up. So make sure that the new DTD does not prevent the upgrade of the old instances.

The transformation process is based on the parsing of the SGML instance and the use of application languages. The parser reads the instance and transforms it into a sequence of events consisting minimally of the “Element Structure Information Set”, or ESIS[6]. This sequence of events is then read by an application language that is programmed to recognize each markup event and to trigger the appropriate action.

Just as in conversion, the necessary actions depend on a number of contextual factors. If these factors occur before the spot where they are needed, it is not difficult to test for their existence. It is much harder to look ahead for them in the instance and come back to the point where the context is required. This is one of the reasons why some tools load the whole ESIS representation of the document in memory so as to be able to analyze it and manipulate it at will. Although such a solution is very powerful, it is also very costly in memory and may dramatically slow down the transformation process of large documents. Depending on your ultimate transformation needs, you can choose the tool best adapted to your situation.

3.1.2.6. Search and Retrieval Engines

Electronic delivery is often the major reason for moving to an SGML document production system. The underlying rationale is usually that if your data is marked up in SGML and the search and retrieval engine you choose uses SGML markup as its own markup system, then most traditional document preparation steps are obsolete. Many of the usual processing tasks—conversion of data to specific markup, consistency control, error correction, inclusion of offset addresses for nontextual objects, manual assembling of document structures, and programming of relevant fields for multicriteria search—become unnecessary, since they are solved in the following ways through the use of SGML as the native markup:

  • SGML instances do not require transformation.

  • SGML instances that have been validated with a parser do not need any additional validating control.

  • Links to nontextual objects, if encoded with entity references, are well defined and can be used to manipulate the objects once their presence at the right location has been ensured.

  • The “database fields” (the elements themselves) are already built in and do not need further processing to become accessible for searching.

As a consequence, switching to SGML for electronic delivery appears to be powerful in exploiting the potential of the data, as well as time and cost effective.

If you use one of the commercial SGML-aware search and retrieval engines on the market, you only need to launch an indexing process on the available documents. Most engines build the table of contents to offer a hierarchical search, build a word index to offer full-text search, and index the content by element so as to offer multicriteria search.

Once you have run the program that indexes the documents in your information system or your document repository, you can distribute the document base on line (locally or on a public network), on CD-ROM, or on any media adapted to the volume of your data and your distribution channels. It is usually necessary to attach a run-time version of the retrieval engine and a viewer to enable the users to search the base and view the documents. The viewer usually needs to apply a stylesheet that you have defined to render the documents with the adequate presentation.

This indexing process is not required when you want to put your data on the World Wide Web. You only need to transform your data to HTML format and make it available on a Web server to make it instantly accessible worldwide to people who have an Internet connection and who have acquired any one of several free Web document browsers. While HTML is not very scalable to large document bases, recent advancements in browser technology and availability will soon make it possible to serve documents on the Internet in their original SGML form, conforming to any DTD.

The search issue is still problematic with large documents. The Web community has not defined the granularity of documents which will travel on the net and the information which must travel with each chunk of information. So today it is the server's responsibility to divide the information into browsable chunks.

3.1.2.7. Other Processing Applications

Any additional applications that you require should be taken into account in your planning; typically, these applications must be programmed ad hoc. Such applications might include:

  • Enriching documents from a database of part numbers, glossary entries, and so on

  • Filling a database from document content

  • Computerized information analysis

  • SGML-aware validation of sentence construction

For example, one French publisher has combined all its dictionaries into one huge SGML database out of which it now extracts new dictionaries according to various parameters.

3.1.2.8. Document Engineering Toolbox

In your planning for a document production system, it's important not to overlook the maintenance of the tools and data used in the system. We use the name “document engineering toolbox” to refer to this collection of enabling technology and data. The toolbox is a place where you organize information about any of the following items that apply to you:

When you consider the toolbox as a component of the document production system in its own right, you give it the same importance, allow the appropriate resources, and apply the same procedures as with any important application. Each item in the toolbox should be fully documented, successive versions archived, and all necessary engineering information centralized and available.

The result will be to make maintenance and upgrades to any tool faster and easier and to build the proficiency of the people responsible for engineering the production line. For instance, if you have subcontracted the development and installation of your document production system, there comes a time when you need to reinstate the daily maintenance in-house, and it can be the job of the person in charge of the daily maintenance to keep the engineering toolbox in the best possible state. If this activity is seriously pursued, staff turnover becomes much less critical.

3.1.2.9. Dependencies Between Components

When you are faced with the problem of launching an SGML-based document system project, it is difficult to know where to start and in what order to proceed. Developing the DTD should be the starting point; how to build the DTD component is explained in this book. Then you have to choose what to do next, with the goal of concluding as quickly and efficiently as possible. One way is to run several subprojects, corresponding to each component of the production line, in parallel. In what order should you launch the production of the components of the system, and which components of the system can be produced in parallel?

The answer varies according to the priorities of each project, but there are some dependencies that cannot be avoided, whatever the project. When planning your project, you have to anticipate four main steps in the subproject for each component:

  • Analysis and specification

  • Development

  • Test and correction

  • General deployment

The first rule is that you can start specifying any subproject once the DTD has been at least specified or, better, developed. Never start any development of a subproject before the DTD has been tested, corrected, and stabilized, even if it is not in its final form.

The second rule is to have, after the second subproject, enough valid SGML instances to help specify, test, and validate all the other components. After building the DTD, the second subproject to tackle depends on your choice of architecture. If you have decided to work with traditional tools and regularly convert the documents to the DTD, the second task should be the setup of the conversion engine and applications to help test the DTD and evaluate the performance of the converter and the potential constraints on source documents that must be formalized more strictly. If you plan to use an SGML-aware editor, the second task should be to program the editing environment based on the editor of your choice so that authors will test the DTD and start producing new valid documents that can be used to test and finalize other components.

The third subproject to deal with again depends on your priorities. You may need to build one or more proof-of-concept formatting applications immediately, or you may need to build the document management system if you have many documents to handle.

In each case, you will need the reference DTD and valid instances to work from, as well as the precise specifications for the project's results, before starting the development. Because you will also need to test the performance of the components developed in the second subproject, so you will need a large number of documents and not just a few samples.

Subsequent subprojects can be in any order that meets your criteria.

A project advancement chart could look as shown in Table 3.1, “Interdependencies in the Components of a Document Production System ”.

Table 3.1. Interdependencies in the Components of a Document Production System

  Tool box, reference DTD Editing tools, custom work Conversion from two WP formats Format apps, style sheets DBMS, custom work Search engine, custom work Two-way interchange, DTD transformation
Analyze and specify Done Done Done Done In process Done In process
Develop Done Done Done In process   In process  
Test and correct Done Done In process        
Deploy generally Done In process          

3.1.3. The Reference DTD and Its Variants

When we talk about “the DTD,” we are really referring to the reference DTD common to a group of users within a company, an industry, or an interest group. Often such a DTD isn't used as is, except as a hub format used in processing. Each activity in the production of documents may require a derivative version of the generic model. We call these derivative versions variant DTDs.

The reference DTD should be developed first. It encodes the “ideal” markup language for a complete document type. It is the DTD to which a whole document should conform when its content is complete, before it is processed for any particular purpose. Figure 3.3, “Derivation Pattern for Variant DTDs” shows the typical variant DTDs and their derivation. Variant DTDs for additional processes might also be created.

Figure 3.3. Derivation Pattern for Variant DTDs

Derivation Pattern for Variant DTDs

3.1.3.1. Interchange DTD

The interchange DTD, unlike the authoring, conversion, and presentation DTDs, is usually imposed on a company or department, and thus is seldom derived from an in-house DTD (unless you are involved in a business deal where one of the partners has imposed its own in-house reference DTD as the interchange DTD). In most cases, the interchange DTD is an industry-wide DTD or a DTD that a large interest group has agreed on. (Section 7.2, “Designing Document Types as an Industry-Wide Effort” describes how this process can be undertaken.) The interchange DTD is a kind of external reference DTD to which you must transform but that you cannot alter.

If you build your in-house DTD after having identified which interchange DTD you will have to conform to, then you need to ensure that the structure of your own DTD is compatible and that transformation is possible. If the interchange DTD is imposed after you have built your own reference DTD and produced documents with it, then it is safer to compare both DTDs, find out if transformation is possible, and if it is not, alter the necessary element types and structures so as to make transformation possible. Then transform your old instances to make them compatible with the new DTD.

The transformation from a reference DTD to an interchange DTD usually consists of removing proprietary and secret information and adding control information and revision status. It often includes generated or augmented information such as the table of contents, which normally would be built as part of the formatting process but are contractually required to be delivered with the source files.

The problem becomes tricky if you find out that your company has to conform to two unrelated interchange DTDs. This usually happens in large companies that support activities for widely differing customers. For instance, a company building equipment for aeronautic and aerospace activities may have to conform both to the CALS DTDs for their military clients and to the Air Transport Association DTDs for their civil clients.

There is no easy solution to this problem. Our recommendation is to analyze whether the same documents are really interchanged with both types of clients or partners. If they are, your needs analysis should favor the DTD to which you will transform documents more often, in larger volumes, and for delivery to partners who are most important to your business. Refine your transformation choices for the process whose result must be of the highest quality, and don't compromise your own company's requirements for leveraging your information investment. You may nevertheless find that the choice between the two interchange DTDs is a toss-up. In this case, you may need to make your in-house DTD loose enough to accommodate as much as possible of the transformation requirements from and to both interchange DTDs.

As a conclusion, we can say that the in-house reference DTD of a company in a specific trade should probably be a variant of the interchange DTD used in that industry. But the fact that transformation back and forth must be possible does not mean that no information is discarded in the process. By documenting the differences between the two DTDs, it is possible to list all the information that is lost in the transformation process in each direction. As long as this information is consciously evaluated as unnecessary to the use of the target documents, the loss of information is unimportant.

3.1.3.2. Authoring DTDs

Authoring DTDs are built for editing purposes. Reference DTDs are usually huge, and not every part of them is useful to the authors. So the first action is to downsize the DTD so as to make it smaller and simpler where possible. In environments where authors work in small chunks (such as chapters) that are later assembled into whole documents, the authoring DTD is often a sub-DTD inside the document hierarchy of the reference DTD.

The authoring DTD may need to be looser than the reference DTD so that SGML validation does not cause a string of irrelevant errors when the work is still incomplete. For instance, if two subsections with some minimal content are required in a chapter in the reference DTD, the requirement for content can be removed from the authoring DTD so as to allow the author to write only one subsection at a time and still validate the draft.

Authoring DTDs are often optimized for use with a specific editing environment. To ease the markup process in an SGML-aware editor, it may be useful to add layers of container elements so that the list of allowed elements in a certain context does not overwhelm the author. Alternatively, the use of an unstructured editor may call for the implementation of aggressive minimization schemes and possibly the flattening of some structures.

Finally, authoring DTDs can be adapted to specific organizations. In large companies, there may be a need for several authoring DTDs adapted to each department's philosophy and needs. For instance, at Groupe Bull, one product line documentation unit had wanted access only to the general text entities associated with that unit's products, another product line had required a set order for some elements that the reference DTD allowed in random order, and a third product line needed a DTD that would easily transform to the interchange DTD it was required to use. All these needs can be met by devising adapted authoring DTDs.

Authoring DTDs must adhere to two rules. The first is that, whatever the needs met by the authoring DTDs, they must transform to the reference DTD without loss of information. The second is that all the differences from the reference DTD must be fully documented. Attached to each authoring DTD, one must find the date of the changes, their contents, and the reasons why they were made. This information is crucial if one is to maintain the authoring DTDs effectively each time a new request is made or each time the reference DTD is upgraded.

3.1.3.3. Conversion DTDs

A conversion DTD is an intermediate DTD between the source markup and the target DTD. (The target is usually the reference DTD, but it may also be the authoring DTD.) It is usually similar to but looser than the reference DTD, so as to accommodate all the normal and abnormal structures found in source documents. It might also include presentation-related elements and elements that hold conversion data, which don't appear in the reference DTD.

In the abstract, there could be as many conversion DTDs as there are source markup systems because each source markup–target DTD pair is unique. But some conversion engines have their own basic intermediate conversion DTDs that they adapt and use in two-step conversions. And to save time and money, other conversion engines are beginning to rely on a single DTD, the “Rainbow DTD,” for holding the intermediate results of conversion from word processor formats. Rainbow is a highly presentation-oriented DTD that records the formatting information that was present in the source markup. It allows clarification of the original markup by translating it into a simple SGML form, which is much easier to process than system-specific markup in further conversions. The Rainbow DTD is not linked to any target DTD, so once the conversion has been made to Rainbow, it is the transformation process from Rainbow to the target DTD which is unique.

The rule about conversion DTDs is that they must allow a one-way transformation to the reference DTD without losing any of the source information. And unless you plan on doing the conversion of your legacy documents once and for all, it is important to document the conversion DTD and how you use its results in the conversion process for further reuse or to understand markup errors that may appear later on.

3.1.3.4. Presentation DTDs

Presentation DTDs are made to hold the results of augmenting SGML documents with processing or formatting information. They are useful for storing the results of processing, if there are several complex stages of document conditioning to perform before the output stage. For example, presentation DTDs may have attributes that explicitly control literallayout features such as typefaces, or they might add markup for hyperlinks that have been generated from content-based markup, such as table of contents or indexes. Presentation DTDs may also organize previously random elements in a strict linear or alphabetical order so that they can be output in that order.

Some presentation DTDs such as the Hypertext Markup language (HTML) are becoming widely used to hold output from original-form SGML documents. These languages are typically used with software that is customized to process or display them. Transforming SGML documents to languages such as these could obviate the need for a presentation DTD that is based on the design of the reference DTD.

The rule for presentation DTDs is that they must allow a one-way transformation from the reference DTD. Documenting all the augmentations made to the reference DTD can be useful if you decide to change the formatting engine. In this case, you can reuse the underlying analysis and change only the programming.

3.1.3.5. Data Flow Among the DTDs

Although all the variant DTDs are built from the reference DTD, the actual flow of documents follows a different pattern altogether. Documents typically go from creation (by editing or conversion) to storage and then output, as shown in Figure 3.4, “Conversion and Transformation Data Flow”. Each stage of the document is reached after a conversion or a transformation process based on one of the variant DTDs. The added value of such a workflow is that each process is secured by the control of a parser, which ensures the validity of documents at all times.

Figure 3.4. Conversion and Transformation Data Flow

Conversion and Transformation Data Flow

3.2. Preparing to Launch the Project

Just as with any project, before you launch it you must assess the risks, define the goals and other project parameters, and write the project plan. This section is therefore aimed at the project manager or project leader who will be assigned these tasks.

3.2.1. Defining the Project Goals and Strategic Directions

When you start a project, there must be a solid rationale for doing so. The most common goals are to improve the cost-effectiveness of document production, to rationalize and control document production, and to offer a better quality of documents on a wider variety of media. These are generic goals one hears a lot about, but for your project, you should make sure the specific goals of your company are clearly stated with short-, middle-, and long-term objectives and results. When there are several goals, you should make sure everybody agrees on the priorities of those goals.

For instance, a company may decide that what is most important is the protection and the longevity of its data. The short-term goal is to develop a DTD that accommodates all the existing documents and the middle-term goal is to convert all the legacy data into SGML instances which will suddenly become hardware, software, and operating system independent, thus justifying the investment. The long-term goal may be to deploy a robust document management system that will help find any version or edition of a given document within seconds for viewing or updating purposes.

For another company, the priority may be to offer all its documents on paper, on CD-ROM, and on the Internet and to create new types of documents by cleverly processing the existing ones. In that case, the priority will be to quickly put on the market new products (old documents on a new media or new documents) for a quick return on investment.

A third type of goal may be for a literature department in a university to adapt the Text Encoding Initiative (TEI) DTD to the type of documents they are studying and then mark up all the necessary documents to launch the analysis applications necessary to their study.

Beware of bizarre goals or hidden agendas. Your awareness of these may turn out to be crucial to the viability of a project where politics plays a big role. Among those one finds:

  • We should do SGML. We don't know why, but it is fashionable.

  • Our competitors are moving to SGML; we should do the same.

  • Morale is low in the documentation department; why not launch a new motivational project?

  • Our production department is overstaffed; SGML is difficult enough to help us figure out who we should keep.

  • We need to upgrade the perception of our organization's technological edge; why not do SGML?

  • I need a high-visibility project to boost my career; why not launch a new document system?

Clearly, some of these goals are so hollow and so far from sound business grounds that they are likely to endanger the DTD development project. The only way to know if a project is viable is to be able to prove that the benefits for the company largely overpower the drawbacks. So let's assume that in your company, the strategic directions that make DTD development necessary are legitimate and that you just need to formalize them. Here is a basic checklist to help you do so.

  • What documents are concerned?

  • Will the project cover existing documents, new documents, or both?

  • To what uses will the documents be put?

  • Will they be created and utilized internally, externally, or both, and how?

  • What is the expected outcome of the project in terms of organization, quality of documents, image, cost-effectiveness, standardization, return on investment, other?

  • What is the expected date for the first results and for the completion of the project?

Once you have all the answers with their relative priority, you can formalize the goals of the project, check that they match the strategic directions of your company, and have them officially approved.

Even though DTD development was mentioned as a subproject up to now in this chapter, we will now begin to concentrate on DTD development alone, and refer to it as “the project.

3.2.2. Controlling the Project Risks

Developing a DTD is a part of the document engineering toolbox component of the whole document production system. Because this component is seldom identified and consequently not well provided for, the main risks in launching a DTD development project are as follows:

  • Not to handle it as a project at all

  • Not to consider it as important as the other tools

  • Not to put it in the right perspective of its future uses

  • Not to include it in the bigger project for a new document production system

  • To entrust incompetent people with its development

  • Not to give its developers goals

Any of these mistakes will end up in burying the project and making sure it is never completed or never applicable. Unless a DTD project is officially defined, included in a company strategy, properly staffed, and formally launched, it is so fragile and inconsistent that it can be subject to any resource reallocation, budget cut, or change of direction if a manager leaves. We have heard of many similar failures where the people who had invested much time and effort were terribly frustrated to see the DTD project be dropped halfway, to see that the DTD was never used, or to learn that finally a completely different DTD had been selected.

In other words, whether you are a manager, a project leader, an author, or a developer, do not invest in a DTD development project unless you have proof that it is necessary to your company or organization, that it has been officially acknowledged as necessary, that the goals of the project are clearly defined, and that the necessary means (attention, human resources, budget, and time) have been allocated through the end of the project.

3.2.3. Staffing the Project

One of the pitfalls to avoid is to gather all the people who are interested in SGML and start working on a voluntary basis. When the project starts having problems (people do not show up at meetings, they are busy elsewhere, no one seems to care anymore, interested people are told priorities are somewhere else), the appropriate questions are asked, but it is usually too late:

  • Who makes decisions?

  • Who funds the project?

  • Who decides who is involved?

  • Who is actually concerned?

  • Who informs the managers of these people?

  • Who is in charge of the project and responsible for the results?

We suggest you ask these questions before you start and that you start only when they have been answered and the appropriate organizations have been set up. As is typical for the management of most projects, we suggest you formally organize three bodies, as shown in Figure 3.5, “Project Staff”.

Figure 3.5. Project Staff

Project Staff

  • The steering committee is composed of decision makers and funders of the project. It includes the project manager.

  • The project group is composed of the people who will do the work. It is led by the project leader and includes the members of the design team, some reviewers, eventually the DTD implementor, and occasionally some guest experts. The design team includes a facilitator and a recordist.

  • The user group is consulted to give their opinions and to test and validate the work of the project group. Some user-group reviewers have signoff responsibility.

If these people are clearly identified and have accepted their job, then the risk of the project failing halfway through is very small. All the individuals may not be selected yet, especially in the design team, but the project leader must be identified and must start helping with launching the project. Section 3.3.1, “Setting Up the Project Group” goes into more detail about selecting the project group members.

3.2.4. Listing the Project Deliverables

Before launching a DTD development project, especially if you plan on subcontracting parts of it, you have to define the document set it must cover (see Section 3.3.3, “Defining the Scope of Documents”), the test documents, and the validating parser. Unless you make these decisions upstream, you will not be able to validate and accept the deliverables of the project.

The minimum deliverables of a DTD development project are as follows:

  • The document analysis report recording all the expressed needs, the decisions that were made, and the rationale for them

  • The DTD “code” and some demonstration or documentation of the fact that the DTD is syntactically valid

  • The DTD maintenance documentation and user documentation (see Chapter 12, Documentation)

  • The files for the test documents marked up with the DTD

3.2.5. Planning the Schedule and Budget

The amount of time and the budget necessary to successfully complete a DTD development project varies according to several parameters:

  • The scope of documents (the wider it is, the longer and the more complex the project will be)

  • The complexity and variety of the structure and content of the documents

  • The number of constraints there are on the project

  • The competence of the project leader

  • The availability of the members of the design team

  • The existence of a competent DTD implementor

  • The discipline of the whole project group in following a DTD design methodology, documenting all their ideas, decisions, actions, and samples and seriously reviewing all the documents and “code” delivered

We know of a computer company DTD that was developed in three staff-weeks, over a period of three months, by an expert in DTD writing helped by a few technical writers of that company. In Digital and Bull, it took several staff-months over a period of one year. The difference lies in the size of the companies (the larger the company, the longer it takes to reach consensus), the variety of documents covered by our DTDs (we had to provide for very different technical documentation backed by very different technical cultures), and the amount of documentation the DTDs were delivered with. In a related effort to build an industry-wide software documentation DTD, although it was specified by people who were all experts in DTDs, it took over three staff-months of meetings with 15 people over a period of two years to deliver the analysis report. (Reaching consensus was a killer in that case.)

This is why it is impossible to give recommendations about the time your specific project will take. Nevertheless, some rough generalizations can be made:

  • Preparation for the project launch

    This step can take from a week to a month.

  • Launching the project

    Depending on the good will of all the people involved, it can take from a month to three months to have the people available, trained, and operational.

  • Document analysis, modeling, and specification done by the project group

    The minimum is 15 days of hard work (including the writing of the document analysis report), to be spread according to the frequency of meetings. The calendar time could be three weeks, but if the team plans on meeting on a weekly basis, this phase can take up to four months.

  • Review and validation of the document analysis report

    This can take from 15 days to a month.

  • Final design and implementation of the DTD

    This can take two weeks, with additional questions going to the design team.

  • DTD test and validation

    This step can take a week to two months, depending on the number of test documents to mark up, the availability of an SGML-aware editor, and the proficiency of the people who will validate the DTD.

  • Documentation

    Developing documentation can take about two months, but starts as soon as the design work is over, so it does not add much delay.

Thus, it seems unreasonable to plan on delivering a good DTD with all the necessary documents in less than three and a half months, but there seems to be no maximum if you consider all the delays bad organization can add to a non-priority project.

The same is true in terms of budget. If you subcontract the job and only count what you will pay in cash to a subcontractor, then the bill can be relatively low. But if you take into account the effort of all the people involved, for the design, the coding, the review, the validation and the documentation, the price is higher (apart from the time spent, do not forget the hardware and software for the tests). It is worthwhile, though, because it guarantees the quality and the instant usability of the delivered DTD.

3.2.6. Writing the Project Plan

At this stage, the DTD development project should be at ready to start because every aspect has been thought of, discussed, and organized. But this information is scattered in the notes and heads of all the people involved in the project. So to make sure that there is a consensus on what the DTD development project is going to be, it is necessary to write all the planning information down in a project plan.

The project plan includes all the information about the goals and the constraints of the project and the way it will be carried out. It lists all the tasks to be achieved and subdivides them in more precise steps. For each task or step the schedule, the budget, the resources, and the expected results must be fully described, as must the dependencies between tasks. Simple software packages are available help you build the project plan methodically and efficiently.

The final document must be validated by the steering committee that funds the project, and must be accepted by all the people involved before the DTD work can actually begin.

We purposefully do not mention a business plan here, although we know project leaders are often hard pressed to write any kind of justification for writing a DTD. Since a business plan consists of analyzing and listing all the expected expenses, then figuring out and recording what the expected revenue is and defining when and how the company will reach a return on investment, there is no sense in performing this exercise just for the task of developing DTDs. Obviously, DTD development, isolated from a whole document migration to SGML project, is only a cost center and can never have a return on investment as such. A real and viable business plan including the quantitative and qualitative benefits can be built only for a global project and not solely for DTD development.

3.3. Launching the Project

Launching the project is the responsibility of the project leader. As soon as you have identified the members of the project group, you can start working to make this phase as short and as efficient as possible.

3.3.1. Setting Up the Project Group

The project group is composed of permanent members and part-time members. Some member roles are required, while others are optional. All members have a precisely defined role that they must understand and accept.

The project group is composed of a project leader, the document type design team, the DTD implementor, and guest experts.

3.3.1.1. The Project Leader

The project leader is a person from within the company or the organization who is selected for his or her skills of leadership, organization, knowledge of the document world, and ability to complete a project.

The role of the project leader is to interface with and report to the steering committee, to reach the specific goals of the DTD project while keeping in mind the goals of the global project, and to keep all the team members at work until the project is completed. The project leader is in charge of keeping the project on schedule and within the budget, and is responsible for obtaining all the deliverables and ensuring their quality.

3.3.1.2. The Design Team, Facilitator, and Recordist

The document type design team is composed of representatives of all the actors who interact with the documents to be modeled. They must have intimate knowledge of the documents, either as producers or as users. To set up the design team, you need to select the persons who can best represent each actor community.

These people can be found among the authors; the marketing people; the editors; the librarians (or others in charge of managing and archiving the documents); the users (in-house, partners, or clients); the company quality, standards, and methods people; the publishers; and people who have a vision of what future documents will or should be like.

The role of the design team members is to analyze the existing documents and other relevant data, express the needs of their community, specify the markup model, select interesting samples and test documents, review and validate the final analysis report and all interim output, and test and accept the DTD. Part II, “Document Type Design” of this book can guide them in achieving these tasks.

The project leader must make sure that the selected people are officially assigned to this job and that the necessary amount of time is cleared from their schedule. If people are assigned the design work on top of their usual commitments, they will not be able to do the job seriously. And if they are only made available for the meetings, they will not be able to do the necessary homework between meetings. This homework involves reading the meeting reports, doing research on controversial topics, and filling out forms and proposals. The usual estimate for the necessary time is to double the meeting time estimated by the facilitator.

For reasons of group dynamics, the design team work is more efficient if it does not have more than eight participants. The project leader should try to resist the natural impulse of each department and subdepartment to have their own representative in the design team by stressing the amount of time and work involved in design and by offering interested people an opportunity to be reviewers.

All the members of the design team will be subject matter experts, but they will need the help of a facilitator who is competent in document type modeling. The role of the facilitator is to explain the design methodology being used, to organize meetings, to lead and direct discussions, to listen to all expressed needs, to make sure that everything that is said is recorded, to point out inconsistencies or oversights, and to complete all the necessary documents to hand over to the implementor. The facilitator is the interface with the project leader and the implementor.

The person in the role of facilitator must:

  • Be credible as an unbiased leader

  • Be able to lead groups of heterogeneous origin to produce effective work

  • Know enough about the document set at hand and the traps and the tricks of document modeling

  • Know the DTDs available for various industries and their best ideas

  • Be thorough enough not to skip a step or leave anything out

  • Be persistent enough to keep asking questions to the subject matter experts until all issues have been cleared

  • Be sensitive to the potential variations of the meanings behind the words used by professionals

  • Know enough SGML to interface with the DTD implementor and convey the messages from the design team

Ideally, this person should be picked within the company staff because he or she can coordinate all the logistics and hierarchical aspects of the work in an easier and more timely fashion. But if no one in-house has this profile, it will probably be necessary to hire a consultant to help. In this case, all the administrative aspects of the work will fall to the project leader.

The person in the role of recordist must:

  • Write easily and well enough to report all the decisions in a rigorous and clear fashion

  • Be able to distribute interim reports on demand

  • Be prepared to write the final document analysis report

If possible, find someone for this job who can follow and capture relatively complex and esoteric discussions, without injecting a bias into the notes through being too experienced or familiar with the arguments.

3.3.1.3. The DTD Implementor

The implementor is in charge of designing the markup model and the architecture of the DTD, writing the DTD “code,” documenting it, successfully validating the DTD with a parser, and eventually participating in the DTD tests. The implementor may also occasionally participate in the design team work to offer advice and direction when choices are to be made.

The best person to serve as the implementor is an SGML specialist who knows the language thoroughly, is familiar with the industry DTDs, and has written at least one operational DTD. It is also useful if this person has some experience with programming or customizing an environment based on SGML, such as an editing or formatting application. It is rare to have such a profile in-house, so the project manager can either subcontract the development job or have a person from the company trained in DTD implementation and maintenance techniques. In the latter case, the selected person must be interested in document modeling and be comfortable with computer languages.

3.3.1.4. Guest Experts

Experts are called in when the design team feels there are decisions to be made and they do not have the adequate information or competence. These experts fall into two main categories: They are either subject matter experts in a very specific area, or they specialize in the building of certain kinds of applications.

Some subject matter experts happen to be occasional members of the design team, although their expertise is much in demand, because they do not have the time to participate in the design activities full time. Consequently, the facilitator must choose the moments when their presence is absolutely necessary and invite them then.

Application developers may also be requested to join the design team in the modeling phase to describe and explain the constraints of their art and suggest ways to solve problems. For instance, if the SGML-based editors which will be used to mark up the instances do not support SGML “marked sections,” there are ways to achieve the same result with other means.

3.3.1.5. The One-Man Band Situation

In the previous sections we explained the different roles and the different tasks each player in the project group had to achieve. This does not mean that each role must be held by a different player. According to the size of your project and the distribution of skills, several different roles can be held by the same person: project leader, facilitator, design team member, and implementor. In this case, we suggest that it's a full time job and recommend that the person filling these roles should work hard to keep the roles distinct.

To summarize, we described a project group that could be twelve people strong, but it could as easily be five people strong if the right skills are concentrated in one person. The only skill that cannot be collapsed under a certain threshold is the variety of viewpoints in the design team, which is the only warranty that the DTD will encompass all needs and that it will be used.

3.3.2. Identifying Future Users

As shown in Figure 3.5, “Project Staff”, the project cannot start unless the users' group has been set up. Since they will be asked to give their opinion, critique the design team work, and test and validate the results, the real future users of the DTD must be selected as members of the user group. The user group should make sure to cover the profiles of all future users, especially those not represented in the design team.

Selecting the members of the user group, involving them, and training them from the very beginning of the project usually pays off in terms of motivation and availability for future work.

3.3.3. Defining the Scope of Documents

When the design work starts, it is necessary to set boundaries on the work at hand. One of them is defining precisely the categories of documents that the DTD(s) must model.

Note

At this stage, defining whether one or several DTDs are necessary to account for all the documents selected is not part of the scope work.

To define the scope of documents to be taken into account in the project, the best way is to list all the types of documents available in the company or the organization, then to decide which ones must be selected according to the goals of the project. The list might include technical documentation; desktop publishing documents like letters, reports, memos, and marketing literature; catalogs and directories; dictionaries; novels and short stories; tutorials; electronic documents like database chunks and online help; articles; presentation slides; standards; procedures; and other documents. You might be taken aback at the variety of documents published in an organization, but it doesn't necessarily mean that there are as many document types.

The second task is to select from the list which documents will be covered by the DTD project. Beware of expanding the scope with all the documents listed. It is usually a bad idea because of the wide variety of existing documents and their lack of commonality. Each choice must be explained and the reasons recorded in the rationale. It is important that these basic choices should be documented so as never to be questioned again and to be very clear to the design team when they turn the choices of the scope of documents into design principles.

For instance, for a manufacturing company, it is very unlikely to use the same DTD to write commercial letters, user documentation, and the technical specifications for the company products. Similarly, it is most improbable for a publisher to use the same DTD for their dictionaries, their novels, and their interpersonal mail. In both cases, if there is a need to model all the documents, the document types are so far apart that it is more efficient to organize the design work in several phases, each one covering one general class of documents.

The third task is to define more precisely what each class of document includes. For instance, “dictionaries” could include basic language dictionaries, translation dictionaries, proper nouns dictionaries, synonym dictionaries, pronunciation dictionaries, and so on. Although the content is different for each of them, the global structure is similar, so they are all part the same document class. For a manufacturer, all the technical product documentation, whether for internal staff or end-users, whether it's generated by the research department or the documentation department, can be considered similar enough to be dealt with in the same overall class of document.

Defining the scope of documents also implies listing the nontextual objects to be accommodated. Although they are not directly processed like text by SGML tools, hooks to nontextual objects must be planned for in the DTD. Therefore, the design team must know if users will want to point to graphics, still images, sound, animation, video, and others in order to provide the appropriate linking devices in the DTD.

Make your choices explicit and write the rationale down because the design team will need as much information as possible when casting the scope information into a “design principle” for analysis and modeling work.

3.3.4. Listing the Project Constraints

This phase aims at identifying and understanding the impact of all the factors that will constrain the design and implementation of the DTD. Some are major, others are trivial, and some matter only in later development stages, for instance, when building an authoring DTD. But to find out how important each constraint is, you need to list them all and then evaluate their impact on the current project.

Along with typical project constraints on schedule, budget, and staffing, the design and implementation of a DTD might be constrained by any of the tools and methods for document creation, management, and processing that have been decided on. Specific constraints might include:

  • A requirement to accommodate existing documents in the new system

    This requirement usually constrains the DTD to having a structure that is compatible with existing documents to a certain degree, suggesting that the design team should, for a start, take into account the existing markup systems during its needs analysis.

  • An obligation to use or transform to a particular interchange DTD

    If an interchange DTD has been defined, ignoring it when building your own DTD would be a serious mistake. Including the interchange DTD as an unavoidable constraint from the start will prevent the design team from doing new work and then throwing it all away and start over. This situation has occurred several times when an industry interchange DTD was defined and imposed after several companies in the industry had built their own and had started producing documents with it.

  • Display devices that have limited capabilities

    For instance, if the documents are likely to be displayed on character-cell terminals, there is no chance to display graphics, images, or video. But if the same documents are likely to be displayed on a multimedia PC, then the graphics, images, and video are probably the most interesting parts of the documents. If both must be accommodated, the model will need to provide for alternative content for each output version.

    You might be in a similar situation if you must prepare documents for print-disabled people. In this case, the constraint would suggest that you must incorporate into your DTD the techniques published by the International Committee for Accessible Document Design (ICADD) so that you can use publicly available tools to generate Braille, large-print, and voice-synthesized texts.

3.3.5. Planning the Project Workflow

Just as for any project, you need to determine and document the tasks to perform, their order and dependencies. Figure 3.6, “Typical DTD Project Workflow” shows a typical progression of tasks, using the notation of the Mallet project management methodology. (Appendix E, Bibliography and Sources describes where to get more information on this methodology.) It can serve as a roadmap for people to position themselves in the project in terms of role, time to act, type of action to perform, and deliverables they are expected to release.

Figure 3.6, “Typical DTD Project Workflow” uses the following notation:

The documents and objects identified in Figure 3.6, “Typical DTD Project Workflow” are as follows:

CF Component form
CL Component list
DAR Document analysis report
DS Design specification
DTD Document type definition
LD Launch document
NA Needs analysis report
P Parsing
RR Review report
SAE SGML-aware editor
TM Training material
TR Technical report
UD User documentation

Figure 3.6. Typical DTD Project Workflow

Typical DTD Project Workflow

Task Description
00 The design team meets to launch the project. The launch document records the goals, the design principles, and the constraints of the project, and describes the document samples for analysis.
10 The design team analyzes sample documents and the existing markup systems and generates a list of all the potential semantic components and a filled-in component form for each of them. All these documents are gathered in an analysis report that records the output from the needs analysis work.
20 The analysis report is sent to the reviewers who must validate the analysis work. They write a review report and send it back to the design team.
30 The design team does all the necessary corrections and starts the modeling phase. When this phase is over they hand out a design specification that records the output from the modeling work.
40 The design specification is sent for review to the user group and the reviewers, who inspect the design specifications and send a review report to the design team.
50 The deign team makes the necessary alterations and produces the document analysis report, which is then sent to the implementor.
60 The implementor studies the analysis report and the design specifications and then produces an architecture report that includes all the additional technical information necessary for the implementation phase.
65 The implementor then writes the DTD, validates it, and delivers it with the architecture report to the design team, the user group, and the technical writers in charge of producing the user documentation for the DTD.
70 The design team tests and reviews the DTD, probably using an SGML-aware editor to edit documents, but also reading the DTD itself. They write a review report and send it back to the implementor.
75 The user group makes the same tests and writes a double report that describes the technical problems in the DTD itself (for the implementor) and the design errors (for the design team).
80 The technical writer in charge of the DTD user documentation starts building the various manuals. This action takes a while and can only be completed when the finalized documents are received from the design team and the implementor (after steps 90 and 95).
90 The design team makes all the necessary alterations to the design, informs the implementor, and updates all the relevant pieces of the document analysis report so that its final version is coherent with the corrected DTD. They publish the updated version of the document analysis report.
92 The implementor makes all the required alterations to the DTD and updates the architecture report, and then delivers the final architecture report and the final DTD to be distributed.
97 The people in charge of building the training material will use all the final documents produced by every group, including the user documentation, to generate the training material and training program. All the documents that are not part of the final delivery must be archived when the necessary action has been taken in the organization where the action was taken. When all the relevant documents have been delivered, tested, and accepted, the DTD development project is over, and the maintenance phase can start.

3.4. Handling Project Politics

There are a number of political issues the project leader and project manager must be aware of throughout the DTD development project. In this section we will list the typical issues, to help you identify them from the very start and deal with them elegantly. If they don't all apply to the specific situation in your company, take it as a blessing! Make your own list of the problems you foresee in your project and try to deal with them as early as possible.

Section 7.2, “Designing Document Types as an Industry-Wide Effort” discusses the political realities and other considerations of conducting industry-wide DTD development projects.

  • Share Ownership of the Project

    It's usually desirable to attain standardization throughout a company by making sure a DTD is used across the board for every appropriate purpose. Unfortunately, it is difficult for some departments to accept being told how to craft their documents by another department. If you anticipate that people will refuse to use a DTD unless they designed it in their organization, include eminent representatives of that organization in the design team.

    If you think this participation will not be enough, you may need to let a neutral body in the company (such as the corporate standards department or the quality assurance department) take the official leadership on this operation.

  • Communicate Project Details Early and Often

    Writing under the control of a DTD is, in most cases, a heavy burden on authors in the early stages. Not everybody is eager to change their way of thinking, to learn new tools, and to face the possibility of failure or difficulty. This is why the introduction of any novelty in a work organization is usually met with suspicion.

    We've found that the earlier managers communicate with future users about the DTD and what the new information system will offer, the more time the users will have to get used to the idea. We have also found that broadly advertising the level of user participation in the design team helps the users be more accepting of the whole process.

    To further broaden the base of users involved in the analysis, design, and test activities, a wide user group should be built, trained, gathered often, regularly kept informed, and asked their opinion on all the project documents released.

    All these communication actions happen while the DTD development project is under way. They must not prevent the people in charge from preparing for the pilot use of the DTD and the early days of general deployment.

  • Motivate the Team to Stay Efficient

    The project leader has a special role in ensuring that the work within the project group is harmonious and efficient. The problem does not always come from the design team members themselves. Since that group is often a federation of individuals coming from various organizations, with various individual objectives and goals, the reasons for trouble can often be found in their parent organizations. At the beginning of the development work, all the members are usually motivated and interested, as are their managers. After a while, people tend to be less interested and less available, and are often diverted by their management to “more important tasks.

    It is the job of the project leader to keep people motivated, to ensure that tangible deliverables are released often enough for the managers to be aware of the amount and quality of work being produced, and to regularly remind the members of the group and their bosses of their assignment and engagement.

    To minimize the risk of having the project group dilute, the project leader should keep a tight agenda with only the minimum time between meetings for writing the reports and working on individual assignments.

  • Choose a Decision Model Carefully

    Although we recommend different ways to avoid conflict in Chapter 4, Document Type Needs Analysis and Chapter 5, Document Type Modeling and Specification (for example, by requiring proof of need and by deferring decisions until the issue is solvable), disagreements do occur. The natural tendency in any heterogeneous group like this, where everybody's voice is as important as a neighbor's, is to try to reach 100 percent consensus. Our experience shows that it never works. People come from too different backgrounds, have too different needs, or have too prescriptive orders from their bosses to allow consensus to be reached at all times.

    Our recommendation in the matter is to allow for a set time of discussion on controversial subjects for everyone to express their point of view, to suggest the members with the hottest opinions to put them in writing before the next meeting, and to allow for a set time at the following meeting to discuss the subject again. If agreement cannot be reached then, make members vote and settle for a majority or super-majority.

    This recommendation is especially valid when the project is to build an industry-wide DTD (a proposition discussed in Section 7.2, “Designing Document Types as an Industry-Wide Effort”). In this case, the conflicts of interest can be so severe that consensus has no meaning. Each decision is based either on the majority of votes or, sometimes, on negotiations based on tradeoffs.

  • Plan for DTD Maintenance

    Usually a new technological project attracts attention, budget, and motivated dynamic people; in such conditions, good work can be achieved—up to the point when the DTD is released with all its accompanying documents. But because a DTD is used daily and needs to evolve regularly, it is the responsibility of the managers who launched the project in the first place to assign the appropriate resources in time, budget, and staff for the DTD to be properly maintained. This will not happen if a person is not officially in charge of that task and is not assessed regularly on that objective.

    Our experience is that the motivated people who know the DTD inside and out have a propensity to move on to other interesting projects after the DTD development project is over, and that usually no one takes over. DTD maintenance requires that someone be in charge of collecting all the bug or enhancement requests, synthesizing them, gathering a knowledgeable change control board, and applying the decisions of that board.

    DTD maintenance is discussed in more detail in Section 5.4, “Updating the Model”.



[4] Of course, a single SGML document usually produces several outputs. A WYSIWYG appearance may be misleading as to the final different forms the instance will have.

[5] Some products that are touted as object-oriented turn out to be basic relational systems on further examination.

[6] The ESIS is described in ISO 13673, Conformance Testing for SGML Systems. At the time of this writing, this standard is in final editing stage and should be available in the near future.