Chapter 10. Techniques for DTD Reuse and Customization

Table of Contents

10.1. Categories of Customization
10.1.1. Subsetted Markup Models
10.1.2. Extended Markup Models
10.1.3. Renamed Markup Models
10.2. Facilitating Customization
10.2.1. Making DTDs Modular
10.2.2. Making Content Models Customizable
10.2.3. Including Markup Declarations Conditionally
10.2.4. Making Markup Names Customizable
10.3. Customizing Existing DTDs

You could construct your DTD to be contained in a single file, with all relevant declarations provided “in the flesh” (that is, with no portions pulled in by reference to external parameter entities) and with no expectation that you or anyone else will customize it. However, if your project goals include creating a family of similar DTDs or making your DTD available to a broad secondary audience that will need to customize its markup model, you will want to build mechanisms into your DTD that support reuse and customization.

In this chapter we concentrate on two related ways to generate multiple markup models using the same DTD “code”:

If you've had to answer the question of how many DTDs to build (discussed in Section 8.1, “Determining the Number of DTDs”) because you're dealing with an interrelated set of document types, you can probably already understand the motivations for using modularity: It allows you to maintain a single copy of certain declarations rather than typing, validating, and tracking several copies of the same thing, and it lets you ensure that the definitions for similar document types are synchronized. Modularity makes sense for most large DTDs as preparation for future expansion, even if the immediate needs don't seem to call for it.

Building customization placeholders into a DTD needs a stronger rationale. The project documents should help you determine whether this is appropriate for your DTD.

If you need your DTD to be used precisely as it was originally written, you may not want to encourage its customization. However, it's possible you can increase the value of your own data by encouraging others to use the DTD, and it's likely that other organizations will need to customize the DTD before they can use it. Your goals in making the DTD widely used may be compromised if other users customize it in inappropriate ways or ways that are incompatible with each other, or if they feel compelled to write new DTDs that do essentially the same job as yours. Therefore, you have an incentive to encourage the kinds of customization and reuse that you prefer by building in features that facilitate them.

Section 10.1, “Categories of Customization” summarizes the three fundamental ways a DTD markup model can be modified (subsetting, extension, and renaming) and their effects on document processing and interchange. Section 10.2, “Facilitating Customization” describes how to use techniques for facilitating customization of your DTD and reuse of portions of it. Section 10.3, “Customizing Existing DTDs” contains advice on customizing and reusing portions of existing DTDs.

(Chapter 12, Documentation discusses documenting the methods you've supplied for DTD customization and the constraints you want to place on variant-DTD implementors, and Appendix C, DTD Reuse and Customization Sample contains a sample of many of the techniques described in this chapter.)

10.1. Categories of Customization

In terms of conformance to the markup model of an original DTD, there are three basic kinds of changes that can be made: subsets, extensions, and renamings. An entire variant DTD can fall into one or more of these categories, and individual changes to the markup model can be categorized along these lines as well.

Think of a DTD as describing a set of document instances. All document instances that conform to the rules of this DTD are in this set, but instances that would cause a validating parser to return one or more error messages fall outside it, as do instances that conform to radically different DTDs. The sets of instances produced by variant DTDs have different relations to the original instance set:

  • Subsetted DTDs produce a smaller set of instances that fits wholly within the original set.

  • Extended DTDs produce a set containing at least some instances that are outside the original set (though it may not contain all the instances in the original set).

  • Renamed DTDs produce a “shadow” set that's identical to the original except for the actual element names.

Figure 10.1, “Relationship of Subsetted, Extended, and Renamed Markup Models to the Original” shows these relationships.

Figure 10.1. Relationship of Subsetted, Extended, and Renamed Markup Models to the Original

Relationship of Subsetted, Extended, and Renamed Markup Models to the Original

Any variation can be described in these terms, whether or not any special features for customization were built into the original DTD. Each type of variation from an original markup model has advantages and drawbacks; you can combine them to achieve the effect you want. This section discusses each type of variant and the consequences of using it or encouraging it to be used by others.

Note

It's possible to change a DTD without changing the markup model. In fact, the same markup model could be expressed in two entirely different DTDs with different amounts of modularity, parameter entities, comments, and so on, implemented by different people—as long as the element and attribute markup amounts to the same set of markup rules. What we're discussing in this chapter is the customization of markup model characteristics, though they're facilitated by non-markup model mechanisms such as parameter entities.

10.1.1. Subsetted Markup Models

If all valid instances of the variant DTD are guaranteed to fit inside the set of original valid instances, the variant is a subset of the original.

The markup model changes in this category might not intuitively seem to be acts of “subsetting.” The following pairs of DTD fragments each demonstrate a subsetting. In each case, the result is a subset because the variant fragment places tighter restrictions on the markup than the original does and still produces instances that conform to the original fragment.

original:
<!ELEMENT div  - - (title, subtitle?, para*)>
variant (subset):
<!ELEMENT div  - - (title, subtitle, para+)>
original:
<!ATTLIST document
        status          CDATA           #IMPLIED
>
variant (subset):
<!ATTLIST document
        status          (draft|final)   #IMPLIED
>

Subsetting is usually desirable when the markup model being considered has many elements (typically in its information pool), of which only a fraction are needed in any one document-processing environment. Removing unwanted elements becomes especially important for authoring tools that show authors all the defined elements before allowing them to choose one for insertion. Subsetting is also common when the DTD being considered has content models that are less prescriptive in enforcing stylistic standards than the project's needs dictate.

Often, large industry- or government-standard DTDs used by a wide audience are subsetted for actual use. Of course, the larger the DTD, the more likely it is that incompatible subsets will be developed, which may defeat the purpose of developing a standard DTD for interchange. Identifying “packages” of subsets (for example, modularizing the DTD so that it's easy to use each relevant portion in its entirety) in effect creates several easily categorized types of conformance to the standard, so that document interchange negotiations can pinpoint the modules used.

If you plan to facilitate subsetting of your own DTD or to subset a standard DTD, consider the following factors. If an organization uses a subsetted DTD and anticipates importing documents that use the original DTD (or a subset that is incompatible with its own), it may want to try to persuade its interchange partners to use its own subsetted version. If the organization has developed processing applications for only its own subset of features, it may need to develop processing for additional features, or filters that convert imported documents into the subsetted form, or some mix of the two. At the least, the maintenance documentation for the subsetted DTD should identify the conversions that would provide the best semantic fit of original-form documents into or out of the subsetted form.

10.1.2. Extended Markup Models

If some or all instances of the variant DTD are guaranteed not to be valid according to the original, the variant is an extension of the original.

While it takes some care to ensure that a change to a markup model is a subset, almost anything else you do to change the original is likely to be an extension. Thus, facilitating any customizations to your original DTD can be risky if you're trying to avoid extensions.

The following example shows a typical situation where extension is performed and has a profound effect on the markup model. If the parameter entity containing the basic collection of paragraph-level elements is extended through redefinition of %local.para.mix; (a technique discussed in Section 10.2.2, “Making Content Models Customizable”), every element containing that collection becomes extended as well.

<!ENTITY % local.para.mix "">
<!ENTITY % para.mix     "p|list|note|figure %local.para.mix;">
<!ELEMENT div       - - (title, (%para.mix;)*)>
<!ELEMENT listitem  - - (%para.mix;)*>
<!ELEMENT note      - - (title, (%para.mix;)*)>

The following is another typical DTD variation that results in an extension. By removing layers of containment, this model makes variant instances nonconforming to the original.[15]

original:
<!ELEMENT deflist    - - (defentry+)>
<!ELEMENT defentry   - - (terms, defs)>
<!ELEMENT terms      - - (term+)>
<!ELEMENT defs       - - (def+)>
variant (extension):
<!ELEMENT deflist - - ((term+, def+)+)>

Extending a DTD might be appropriate if the original DTD being considered is suitable for general needs, but doesn't take into account the specialized needs of one company or department. Also, as demonstrated by this example, extension might be useful if an authoring DTD needs to be “flatter” than the reference DTD. Any area of a DTD might be in need of extension, but it's common to customize the metainformation while standardizing on the rest of the document hierarchy, and to extend collections of elements in the information pool, as shown in the first example above.

It's also common to extend a DTD to include the results of an automatic conversion into SGML, to include editorial information in order to assemble documents from fragments that have been imported from many sources (such as articles being assembled into a journal), and to augment a document with format or literallayout information as part of a series of processing passes.

If you plan to facilitate extension of your own DTD or to extend a standard DTD, consider the following factors. If an organization uses an extended DTD and anticipates exporting documents to recipients that use the original DTD (or a subset of it), it may want to try to persuade its interchange partners to accept deviations from the standard. The organization may need to develop filters to transform its documents to the original form. At the least, the maintenance documentation for the extended DTD should identify the conversions that would provide the best semantic fit back to the original form.

10.1.3. Renamed Markup Models

If all the instances of the variant DTD are structurally valid according to the original and apply the markup in the intended manner, but some or all of the element names used are different from those in the original, the variant is a renamed version of the original.

There are two common situations in which renaming is needed: specific renaming and general renaming.

In the first case, an existing DTD otherwise suits an organization's needs but the technical or cultural jargon used in naming some of the markup differs. (You might also need to change a few names in a standard DTD fragment in order to avoid naming clashes with your DTD.) Each name is considered separately for possible renaming.

In the second case, the organization needs two or more versions of an entire DTD in different languages, so that authors can use markup in the native language in which the text is being written. The renaming is thus a result of a top-down decision affecting all markup. (Often, translated documents simply retain the markup of the original language and so don't need this treatment.)

The following is an example of renaming from one language to another.

original:
<!ELEMENT numlist  - - (item+)>
<!ATTLIST numlist
        type            (arabic|alpha)          #IMPLIED
>
<!ELEMENT item    - - (para+)>
variant (French renaming):
<!ELEMENT listord - - (item+)>
<!ATTLIST listord
        type           (numérique|alphabétique)      #IMPLIED
>
<!ELEMENT item      - - (p+)>

Keep in mind that it may be possible to change your processing applications, rather than your DTD, to accommodate the desires of authors. For example, some editors and word processors have the ability to “alias” the name of an element within the editing environment, but to write out SGML document files that use the real generic identifier.

If you plan to facilitate renaming of markup in your own DTD or to rename the markup in a standard DTD, consider the following factors. An organization that uses a renamed DTD might need to export documents to or import documents from interchange partners that use the original DTD, or to use processing applications that work only with the original DTD. If the organization has developed processing applications that work only with the variant names, it needs to develop filters that map markup names to and from original and variant forms, or add to its applications the ability to handle multiple name alternatives (such as by using “architectural forms” to identify functionally similar elements), or a mixture of the two. At the least, the maintenance documentation for the renamed DTD should identify the mappings from one set of names to the other.

Section 10.2.4, “Making Markup Names Customizable ” discusses the DTD techniques that facilitate renaming.

10.2. Facilitating Customization

The following sections discuss various techniques for allowing controlled customization of a DTD's markup model:

10.2.1. Making DTDs Modular

Just as for programs written in procedural languages such as C, it's often valuable—particularly for very large markup models—to build DTDs made up of standalone modules. Modular DTD fragments allow for mixing and matching along well-defined seams in the structure of the document type, greatly increasing the similarities of sets of documents that use the markup between those seams and facilitating the creation of modular processing applications.

A DTD module is a file containing a collection of related markup declarations, which is included in a larger DTD by means of an external parameter entity. You might have modules containing element and attribute declarations, notation declarations, entity declarations, and so on.

Creating modules and reading modular DTDs might seem complicated, but the benefits nearly always outweigh the costs. Following are typical scenarios where modularity makes sense. They are illustrated with stacked boxes representing the modular structure, where the horizontal boundaries between boxes show the amount of dependency of upper modules on lower ones.

  • The Same Markup Is Used in Several Related Document Types

    For example, many of the elements in the information pool for pharmaceutical information may be appropriate for use in new-drug applications, medical product literature, internal plans, and other widely differing high-level document hierarchies. By storing the declarations for these elements and attributes in a module, you can call them into several DTDs while maintaining them centrally. Figure 10.2, “Modular DTD Structure for Sharing One Information Pool Among Several Document Hierarchies” represents the resulting modular structure.

    Figure 10.2. Modular DTD Structure for Sharing One Information Pool Among Several Document Hierarchies

    Modular DTD Structure for Sharing One Information Pool Among Several Document Hierarchies

  • The Document Types Are Nested

    If, for example, documents will be created and validated in fragmentary form and then assembled for processing, your environment might call for a nested DTD within a full-blown document DTD (as discussed in Section 8.1, “Determining the Number of DTDs”). Rather than duplicating the identical portions of the two DTDs, you'd construct appropriate modules that can stand on their own or be pulled into larger DTDs. Figure 10.3, “Modular DTD Structure for Nested Document Types” represents the resulting modular structure.

    Figure 10.3. Modular DTD Structure for Nested Document Types

    Modular DTD Structure for Nested Document Types

  • Existing DTD Fragments Are Used

    You may be planning to use a standard DTD fragment to leverage already-developed processing capabilities or meet interchange requirements. If you are going to use a standard fragment in your DTD, or for that matter any fragment whose design you are not in control of, you should include it by reference using an external parameter entity (rather than retyping it for your own use) and customize the fragment only in documented ways, in order to maximize the validity of your documents according to the standard.

    The common examples of this case are standard table and equation DTD modules, which involve sophisticated formatting that you're unlikely to want to develop from scratch. However, other specialized fragments might also exist in your information domain. Figure 10.4, “Modular DTD Structure for Incorporating Standard Fragments” shows the resulting modular structure.

    Figure 10.4. Modular DTD Structure for Incorporating Standard Fragments

    Modular DTD Structure for Incorporating Standard Fragments

The following example shows a DTD made up of two modules, one containing the document hierarchy and one containing the information pool. (This example and all the examples in this chapter use formal public identifiers for referencing DTDs. For more information on formal public identifiers, see Section A.10, “Formal Public Identifiers and Catalogs”.)

<!DOCTYPE modulardoc [
<!ENTITY % infopool PUBLIC 
        "-//Ept Associates//DTD Information Pool//EN">
%infopool;
<!ENTITY % dochier  PUBLIC 
        "-//Ept Associates//DTD Document Hierarchy//EN">
%dochier;
]>

For this setup to work, certain relationships need to hold between the two modules. For example, if one module defines any parameter entities that the other uses, the module containing the definitions must be referenced first because SGML doesn't allow “forward references” to entities that haven't yet been defined in the linear flow of declarations. In the example above, and in most modular DTDs, low-level modules precede high-level modules because the latter make use of parameter entities defined in the former. Beyond this inflexible rule, here are other good practices to follow when constructing modules.

  • Use Native SGML Mechanisms

    Use SGML mechanisms such as external parameter entities and marked sections, rather than non-SGML ones such as makefiles or scripts, to piece together the modules. This way, anyone with any SGML parser can get the benefit of the DTD's modularity.

  • Use Public Identifiers

    Rather than using system identifiers (for example, file names) to reference modules from within DTDs, use public identifiers so that the locations of the modules can be stored outside the actual DTD. This indirection helps the DTD to be more portable across computer systems and more useful when interchanged.

    With formal public identifiers, the actual locations are stored in a separate file that maps the public identifer to information on how to locate or retrieve the desired module. The SGML Open organization has issued Technical Resoluation 9401, which specifies a syntax for this mapping file, called a catalog, that most SGML software vendors support. Using a catalog in this form allows entities to be more portable across not only computer platforms, but across SGML software applications.

    For information on the syntax of formal public identifiers and on catalogs, see Section A.10, “Formal Public Identifiers and Catalogs”.

  • Control Dependencies Between Modules

    Where possible, make the dependencies between modules go in only one direction. As mentioned above, higher levels generally have dependencies on the lower levels, so, for example, make sure your low-level element set doesn't rely on any characteristics of the document hierarchy elements. If it does, what you'll have instead is DTD fragments that can't be reused except with their related fragments.

    A special case of controlling module dependencies is to use element name indirection between them. For example, say you have a document hierarchy module whose content models mention information pool elements. If you want to be able to reuse the upper module with different lower-level modules, or you anticipate some fluctuation in the element naming scheme, use parameter entities to represent the element names wherever you mention them.

In addition to helping you reuse collections of declarations at a broad level, modules are also important for building customization features into your DTD, and parameter entities are essential for accomplishing all of these tasks. Two characteristics of parameter entities have a strong effect on how DTDs must be structured for reusability and customizability. First, as already mentioned, you cannot use a parameter entity reference unless a declaration for that entity has been provided previously in the flow of the DTD code. For example, the following is invalid.

<!ELEMENT list  - -     (%list.content;)>
<!ENTITY % list.content "item+">

Second, entities can have multiple declarations in a DTD, with the first declaration taking precedence. In the following, the first parameter entity declaration is the one used.

<!ENTITY % list.content "title?, item+"> active
<!ENTITY % list.content "item+">         inactive
<!ELEMENT list  - -     (%list.content;)>

Note that you can use the internal subset of a document type declaration (the part between the square brackets of the DOCTYPE declaration) to redeclare parameter entities, since the internal subset is read before any parts of the DTD pulled in by reference to a system or public identifier. For example:

remote DTD containing article-content parameter entity declaration

<!DOCTYPE journal PUBLIC "-//Ept Associates//DTD Journal//EN" [
redefined parameter entity
<!ENTITY % article-content "para">
]>

Even though the internal subset appears to occur after the remote portion, it is actually read first, and any entity declarations here take precedence over those in the remote portion.

If you plan to use a software product that compiles DTDs, you may not be able to use internal subsets to change the markup model. In this case, you can use a DTD-nesting technique. Assume the following is a customization of the journal DTD.

redefine article-content:

<!ENTITY % article-content "para">
<!ENTITY % main-journal PUBLIC "-//Ept Associates//DTD Journal//EN">
pull in the original DTD:
%main-journal;

A document could then reference the entire changed DTD, which itself contains the original DTD:

<!DOCTYPE mod-journal PUBLIC 
        "-//Ept Associates//DTD Modified Journal for Authoring//EN">

Some DTD implementors choose to store declarations for individual element types (particularly those in the information pool) in separate modules, building up a so-called “tag library” that can be recombined in different ways for different DTDs. However, in our experience, the complex interdependencies between information pool elements are easier to understand and maintain if the entire information pool is stored in a single module, with marked sections (discussed in Section 10.2.3, “Including Markup Declarations Conditionally ”) used to “modularize” individual element types.

Appendix C, DTD Reuse and Customization Sample contains a sample modularization of a monolithic DTD structure.

10.2.2. Making Content Models Customizable

You can use parameter entities to facilitate both subsetting and extension of content models and attribute lists. If you create such entities, your DTD documentation needs to explain how to use them properly. It's a good idea to use a consistent prefix, such as local., for entities that are intended for direct customization.

Note

Remember that SGML allows variant-DTD implementors to redefine all entities in a DTD, whether or not this was your intent. Your documentation will need to make clear which entities are not for use in customization.

To make a content model directly customizable, identify logical parts of the model and make some parts replaceable where you know flexibility is needed—for example, where different departments in the company have expressed strong but opposite views on the content of a certain element.

Typical examples of customizable constructs are blocks of titles and other labeling information on divisions. The following example allows variant DTDs to add or subtract labeling information associated with divisions by redefining the %local.title; parameter entity. Note that this setup implies that all levels of division should have the same set of labeling information, even if that set is customizable.

<!ENTITY % local.title "title, subtitle?, shorttitle?">
⋮
<!ELEMENT div       - - (%local.title;, para*, subdiv+)>
<!ELEMENT subdiv    - - (%local.title;, para*, subsubdiv*)>
<!ELEMENT subsubdiv - - (%local.title;, para*)>

Alternatively, you could use different parameter entities to indicate where the set of labeling information is allowed to differ.

<!ENTITY % local.hightitle "title, subtitle?, shorttitle?">
<!ENTITY % local.lowtitle  "title, subtitle?, shorttitle?">
⋮
<!ELEMENT div       - - (%local.hightitle;, para*, subdiv+)>
<!ELEMENT subdiv    - - (%local.lowtitle;, para*, subsubdiv*)>
<!ELEMENT subsubdiv - - (%local.lowtitle;, para*)>

To redefine such a placeholder, a variant-DTD implementor would place a parameter entity declaration before the original one and supply the element and attribute declarations for any newly introduced elements, in this case creating subsetted content models.

<!DOCTYPE document PUBLIC "-//Ept Associates//DTD Document//EN" [
<!ENTITY % local.hightitle "title, subtitle, shorttitle?">
<!ENTITY % local.lowtitle  "title, subtitle">
]>

If the two separate entities were not available, that is, if the first solution had been implemented, this set of customizations would have required a much more “invasive” procedure performed on the original DTD code, requiring either editing of the original declaration for each element involved or substitution of a whole new declaration.

Once you start adding placeholder parameter entities, you may be tempted to put entities between every element in a content model, “just in case.” Remember that these entities are meant to encourage appropriate kinds of customization. If you don't want to compromise the integrity of your markup model, use content model placeholders only where different environments need flexibility or where variations won't affect interchange or processing.

If you use parameter entities to manage element classes and collections as discussed in Section 9.3, “Managing Parameter Entities for Element Collections”, you can offer a great deal of flexibility for customization, since you can make the classes and collections independently customizable.

To allow for extension of both element classes and collections, add a placeholder parameter entity to each entity's definition and define it as containing an empty string, as follows.

<!ENTITY % local.blocks    "">
<!ENTITY % local.para.mix  "">

<!ENTITY % blocks          "para|quotation %local.blocks;">
<!ENTITY % para.mix        "%blocks; %local.para.mix;">

Variant-DTD implementors can then extend the list of elements considered text blocks and, by doing so, implicitly extend any construct containing %blocks;.

<!DOCTYPE document PUBLIC "-//Ept Associates//DTD Document//EN" [
<!ENTITY % local.blocks   "|mytextblock">
<!ELEMENT mytextblock  - - (#PCDATA)>
]>

Note the vertical bar OR sequence indicator at the beginning of the entity, which is required to integrate the local portion with the already defined %blocks; entity.

Implementors could also simply extend the list of elements allowed directly in the paragraph-level collection, %para.mix;, without affecting any other collections.

<!DOCTYPE document PUBLIC "-//Ept Associates//DTD Document//EN" [
<!ENTITY % local.para.mix "|mytextblock">
<!ELEMENT mytextblock  - - (#PCDATA)>
]>

Because the entity defined first is the one used in a DTD, sometimes variant-DTD developers can end up in a catch-22 situation: If you want to refer to other entities inside an entity you're redefining, you must define it after those other entities, but before the original definition of the entity. You might need to solve this problem, for example, if you want to redefine a collection entity to remove some of its contents, while still using most of the element class entities that were in it originally, for easier maintenance. For example, how could you redefine %mix2; in the following DTD to remove %class2;, while retaining the reference to %class1;?

<!ENTITY % class1   "elem-a|elem-b|elem-c %local.class1;">
<!ENTITY % class2   "elem-d|elem-e|elem-f %local.class2;">

<!--...................................................-->

<!ENTITY % mix1     "%class1;          %local.mix1;">
<!ENTITY % mix2     "%class1;|%class2; %local.mix2;">

The only place you could put the necessary redefinition would be in the middle:

<!ENTITY % class1   "elem-a|elem-b|elem-c %local.class1;">
<!ENTITY % class2   "elem-d|elem-e|elem-f %local.class2;">

<!--...................................................-->
<!ENTITY % mix2     "%class1;          %local.mix2;">
<!--...................................................-->

<!ENTITY % mix1     "%class1;          %local.mix1;">
<!ENTITY % mix2     "%class1;|%class2; %local.mix2;">

Because of this state of affairs, to facilitate customization in your DTDs you might want to put a “placeholder” entity between all your element class entities and collection entities, so that variant-DTD developers can make the necessary redefinitions without having to edit your original file. By default, the placeholder would contain (for example) just an SGML comment explaining how to customize the value.

<!ENTITY % class1   "elem-a|elem-b|elem-c %local.class1;">
<!ENTITY % class2   "elem-d|elem-e|elem-f %local.class2;">

<!--...................................................-->
<!ENTITY % redefine PUBLIC 
        "-//Ept Associates//DTD Redefinition Block//EN">
%redefine;
<!--...................................................-->

<!ENTITY % mix1     "%class1;          %local.mix1;">
<!ENTITY % mix2     "%class1;|%class2; %local.mix2;">

To make attribute lists easily customizable, you can use the same basic placeholder techniques as for content models. To make them extensible, use empty placeholder entities. The following example allows variant-DTD implementors to add common attributes by redefining the %local.common.atts ; entity.

<!ENTITY % local.common.atts  "">
<!ENTITY % common.atts "
        id              ID              #IMPLIED
        security        (none|high)     none
        status          (draft|final)   draft
        %local.common.atts;
">
⋮
<!ATTLIST document
        %common.atts;
        partnumber       NMTOKEN        #REQUIRED
>
⋮
<!ATTLIST paragraph
        %common.atts;
>
⋮

To add attributes, variant-DTD developers would redefine % local.common.atts; as follows.

<!DOCTYPE document PUBLIC "-//Ept Associates//DTD Document//EN" [
<!ENTITY % local.common.atts "
        tracenum        NUMBER          #REQUIRED
">
]>

10.2.3. Including Markup Declarations Conditionally

Marked sections around declarations, in combination with parameter entities that store the marked section's IGNORE/INCLUDE status keyword, can make it easy to customize a DTD to get just the declarations you want and eliminate the others. You can almost think of marked sections as miniature modules, since you can easily “pull them into” your DTD by using the INCLUDE keyword; the difference is that both included and ignored blocks of material are still physically present in the same file, meaning you can easily maintain them together.

Following is an example of setting up a marked section for an element that isn't wanted in all variants of a DTD.

<!ENTITY % big.DTD   "IGNORE">
<!ENTITY % small.DTD "INCLUDE">
⋮
<![ %big.DTD; [
<!ENTITY % blocks "para|excerpt|epigraph">
]]>
<![ %small.DTD; [
<!ENTITY % blocks "para|excerpt">
]]>
⋮
<![ %big.DTD; [
<!ELEMENT epigraph  - - (#PCDATA)>
]]>

Two kinds of marked section are set up by the declarations of the %big.DTD; and %small.DTD ; entities; these keywords control whether or not epigraphs are part of the markup model. If the “small” DTD is desired (the default because %small.DTD; is defined as INCLUDE), epigraphs will be left out. If the “big” DTD is desired, %big.DTD; can be redefined as INCLUDE instead of IGNORE, which will be activate the larger %blocks; entity and the epigraph element declaration.

In this example, there's no need to redefine % small.DTD; as IGNORE when you redefine %big.DTD; as INCLUDE, since the first definition of %blocks; will take precedence anyway. But there might be some circumstances in which you'd have to switch the values of both keyword entities. Following is an easy way to allow the switching of both values by redefining only a single entity.

<!ENTITY % big.DTD   "IGNORE">
   <!--..........................-->
   <![ %big.DTD; [
   <!ENTITY % small.DTD   "IGNORE">
   ]]>
   <!--..........................-->
<!ENTITY % small.DTD "INCLUDE">
⋮

These entity declarations set up the ignored-region keyword first, and the included-region keyword last. By redefining the first keyword as INCLUDE, you can simultaneously redefine the last keyword as IGNORE. This system works because the declaration in the middle (where % small.DTD; is set to IGNORE) is itself ignored, as long as %big.DTD; is ignored. If %big.DTD; is changed to INCLUDE, the middle declaration is “activated.[16]

A more targeted way to use marked sections is to surround a set of declarations for an individual element type with its own marked section. For example, if you have an information pool module in which some of the element types are known to be undesirable in some variants, you might want to make them removable.

<!ENTITY % weird-elem.module "INCLUDE">
⋮
<!ELEMENT normal-elem   - - (...)>
<!ATTLIST normal-elem
        %common.attribs;
>

<![ %weird-elem.module; [
<!ELEMENT weird-elem    - - (...)>
<!ATTLIST weird-elem
        %common.attribs;
        special         NAME            #IMPLIED
>
]]>

Redefining the %weird-elem.module; entity as IGNORE, along with redefining other elements and entities containing weird-elem, would allow it to disappear from the variant DTD.

This technique is very powerful because it allows both the removal of declarations and the replacement of declarations. Normally, SGML does not allow the wholesale redefinition of elements and attribute lists, but using marked sections in this way makes such redefinition possible. For this reason, if you don't want to allow radical redefinition of content models or attribute lists, you may not want to use this technique.

10.2.4. Making Markup Names Customizable

To facilitate changing any markup name in a DTD, you would define a parameter entity that stands for the desired name, and use that parameter entity in every location where the name is mentioned, including in other modules. Changing the name then becomes a simple matter of redefining the entity. For example:

<!ENTITY % title "title">
<!ELEMENT %title; - - (#PCDATA)>
<!ATTLIST %title;
        id              ID              #IMPLIED
>
⋮
<!ELEMENT div     - - ((%title;), para+, subdiv*)>

Usually, attribute names and token values are not considered for renaming. Even if you are only renaming the elements, the work involved can be immense (and the readability of the DTD will probably suffer). However, it may be worth the effort if you need to provide several different natural-language versions of the DTD for authoring purposes. Note that if you need to store and manage SGML documents from sources that use different markup-naming versions of the DTD, it may be more appropriate to customize your editing environment to present “aliased” markup names to authors, rather than actually changing the markup names in the DTD. Otherwise, your processing and retrieval applications may need to be much more generic than they would normally have to be.

If you need to dictate the use of a particular markup model to users of the DTD, but want to use broad strokes rather than specifying the element names that must be used, you may want to consider defining architectural forms to guide the creation of conforming elements in other DTDs. An element-type architectural form is essentially a named set of rules for and constraints on an element's declaration; any element declaration claiming to conform to the architectural form must reference the name as the value of a special attribute. The attribute value functions almost like an additional “generic identifier.” By treating your own element declarations as the set of rules for variant-DTD creation and by documenting how the special attribute values are to be added to variant DTDs, you facilitate the use of your DTD as a “meta-DTD.

For example, let's say your original DTD is an industry standard for mathematics textbooks. It has a model for lists of student exercises, but you want to allow variant-DTD developers to call an “exercise list” and an “exercise” whatever they want. You can describe to those developers that the presence of a MathText attribute with a value of exercise-list on their version of the exercise-list element will indicate to users and applications that their element should be treated like a “mathematics textbook exercise list” in every respect. Likewise, you can do the same for the other elements in your DTD.

An easy way to specify this mapping is to put the attribute in your original DTD. This way, anyone who customizes the original DTD directly will get the architectural forms for free.

original DTD:
<!ELEMENT exercise-list   - - (exercise+)>
<!ATTLIST exercise-list
        MathText        (exercise-list) exercise-list
>
<!ELEMENT exercise        - - (#PCDATA)>
<!ATTLIST exercise
        MathText        (exercise)      exercise
>

In the following variant DTD, the correspondence to the original model is made through an attribute that has only one choice of attribute token value, which is also the default. Thus, the attribute is effectively fixed in the DTD, with no possibility to change it in instances.

variant DTD 1:
<!ELEMENT exlist  - - (ex+)>
<!ATTLIST exlist
        MathText        (exercise-list) exercise-list
>
<!ELEMENT ex      - - (#PCDATA)>
<!ATTLIST ex
        MathText        (exercise)      exercise
>

Another way to achieve the same thing is the following, which uses the #FIXED keyword.

variant DTD 2:
<!ELEMENT exerlist  - - (exer+)>
<!ATTLIST exerlist
        MathText        CDATA           #FIXED exercise-list
>
<!ELEMENT exer  - - (#PCDATA)>
<!ATTLIST exer
        MathText        CDATA           #FIXED exercise
>

Finally, the variant-DTD developer may decide that it's valuable for individual instances to indicate that they do or do not correspond to a standard model, and allow the attribute value to change:

variant DTD 3:
<!ELEMENT ex-series - - (ex+)>
<!ATTLIST ex-series
        MathText        CDATA           exercise-list
>
<!ELEMENT ex        - - (#PCDATA)>
<!ATTLIST ex
        MathText        CDATA           exercise
>

Notice that if any of the variant DTDs changed the content model of the exercise-list equivalent to, say, allow zero exercises, the element would be an extension of the original—it would allow instances that do not conform to the original rules. For a variant element to conform correctly to an architectural form, it must allow either the identical content model or a subset of it.

If elements have an extra “generic identifier,” applications can search for attribute values and apply processing regardless of the actual element names used. However, some systems don't allow this fine level of access to the SGML structure, a restriction that can give architectural forms some of the same problems as other mechanisms of renaming DTDs (for example, needing to filter the document instances into a form that uses generic identifiers that the applications can handle). Further, there is no way to validate that a variant DTD that invokes your architectural forms actually conforms to the rules you have made. Thus, even if applications can act on the presence of the forms instead of the element names, incompatibilities in the markup models may result in incorrect processing.

10.3. Customizing Existing DTDs

For a project where the goal is to customize an existing DTD, as discussed in Chapter 7, Design Under Special Constraints, you should make the changes carefully. Armed with the document analysis report, the project documents, and knowledge of the risks of changing the original DTD, you're ready to make the necessary changes. If the original has one or more built-in customization methods, you may find that you've already been given most of the tools you need to accomplish the changes. Even in that case, and especially in the case where the original DTD implementor gave you no help in customizing, you should change the original in ways that are as backward compatible as possible and that help you maintain your variant over time, as the original DTD changes. Here are some ways you can accomplish this goal.

  • Plan and Document

    Most importantly, plan and document how the two markup models differ, and carefully specify the behavior of any transformation filters and additional applications that will need to be developed. In order to interchange and process your documents successfully, it's essential to have a clear idea of any differences and how you'll handle them.

  • Emphasize Subsetting

    Change only what you must, particularly if you plan to use already-developed applications, and try to subset rather than extend the original DTD.

  • Avoid Editing the Original Files

    Avoid simply copying over the DTD files and changing them indiscriminately; try to make all the necessary changes by supplying alternate declarations for elements, attribute definition lists, and parameter entities—preferably the latter, if they're available—and by using only the modules you need.

  • Use Marked Sections for Wholesale Changes

    If it's necessary to edit the original files, use marked sections around your changes and around the portions you have replaced so that you can switch back and forth between the original version and the variant, or at least see clearly what was done.



[15] Note that in cases like this, if OMITTAG minimization is used, it's possible to partially simulate the lack of nestedness by making the start-tags of the container elements omissible, as follows.

original:
<!ELEMENT deflist      - - (defentry+)>
<!ELEMENT defentry     O O (terms, defs)>
<!ELEMENT terms        O O (term+)>
<!ELEMENT defs         O O (def+)>
variant (extension):
<!ELEMENT deflist - - ((term+, def+)+)>
valid instance of both:
<deflist>                       def and terms start-tags omitted

<term>apple</term>              defs start-tag omitted

<def>A bright red fruit.</def>  all end-tags omitted

</deflist>

Of course, if you must specify attributes on the missing start-tags or if instances of the original must be normalized, this trick wouldn't help.

[16] We first saw this trick in Dan Connolly's work on the HTML DTD.