Appendix B. Tree Diagram Reference

Table of Contents

B.1. Elements
B.2. Sequential and Either-Or Relationships
B.3. Occurrence Specifications
B.4. Collections and Any-Order Groups
B.5. Groups
B.6. Attributes
B.7. Additional Notations
B.8. Tree Diagram Building Process

This appendix provides a quick reference to the elm tree diagram notation.

The notation does not quite equate to “pictorial SGML.” While the SGML language caters mostly to the needs of computers, tree diagrams cater mostly to humans in their modeling and markup efforts. Thus, the tree diagram that corresponds precisely to a content model in an SGML element declaration may not always be the most effective expression of the model for many purposes.

Figure B.1, “Tree Diagram Notation Summary” summarizes the notation.

Figure B.1. Tree Diagram Notation Summary

Tree Diagram Notation Summary

The following sections explain the notation by building up from the simpler parts to the more complex ones. Section B.8, “Tree Diagram Building Process” describes how tree diagrams tend to grow and change during a DTD project.

B.1. Elements

An element type is represented by a box containing a name. In the modeling stages, the name should be an English description. For DTD documentation purposes, element boxes might contain either descriptive names or actual generic identifiers for elements.

Information on the content of a parent element appears below the box, either connected by a vertical bar to other symbols representing a particular configuration of contents (a content model group), or indicated by descriptive text. The descriptive text uses an equal sign ( = ) prefix to indicate either the name of another element that this element should emulate or the kind of data-character mixture it contains. In the modeling stages, SGML keywords for declared content, such as RCDATA, may or may not be used depending on the technical knowledge of the document type design team members.

The description of the content can be adjusted through the use of inclusion and exclusion indicators, which use a minus sign ( - ) and plus sign ( + ) prefix, respectively. In the modeling stages, these symbols may or may not correspond to true SGML inclusions and exclusions.

B.2. Sequential and Either-Or Relationships

If a parent element contains two or more child elements that must appear in sequential order, the vertical bar that leads to the parent element's content branches out into a series of square brackets, each bracket point terminating in a child element. The children must appear in left-to-right order. (If right-to-left is more intuitive for the design team members because of their locale or other factors, it can be used instead, as long as the document analysis report makes this clear.) This configuration corresponds to a group that uses the SGML SEQ connector.

If a parent element contains a mutually exclusive choice among child elements, the vertical bar that leads to the parent element's content branches out into a series of angled bars, each terminating in a mutually exclusive choice. This configuration corresponds to a group that uses the SGML OR connector.

Two other relationships are possible; these are discussed in Section B.4, “Collections and Any-Order Groups”.

B.3. Occurrence Specifications

The presence of a child element is required if its box is unadorned. A symbol at the upper right indicates other occurrence requirements. A question mark ( ? ) means the element can optionally occur. An asterisk ( * ) means the element can optionally occur and can be repeated. A plus sign ( + ) means the element must occur at least once and can be repeated. A compact way of specifying that the minimum number of occurrences is greater than one is to precede the plus sign with that number.

These symbols correspond to the SGML occurrence indicators OPT, REP, and PLUS , respectively.

B.4. Collections and Any-Order Groups

When any or all of the child elements can appear repeatedly, in an arbitrary order, they appear in an oval[17] that has either an asterisk or a plus sign occurrence indicator (showing that the contents of the collection can be chosen from repeatedly). We call this configuration a collection . It corresponds to a group that uses the SGML OR content model and that has a REP or PLUS occurrence indicator on it. It is meaningless for the individual elements in the oval to have occurrence indicators on them.

For collections containing only elements, it is important in the modeling stages to choose an asterisk or plus sign occurrence indicator, to indicate whether or not at least one element from the collection must appear. For collections of #PCDATA and elements (that is, mixed content models of the type recommended for use in ISO 8879), an occurrence indicator on the oval other than an asterisk is meaningless because an empty character string can satisfy the #PCDATA part of the content model.

Some collections that appear frequently in many contexts can be represented by a single descriptive name in an oval with an occurrence indicator on it. These common collections are often implemented in the DTD with parameter entities.

Likewise, a model consisting only of #PCDATA can be represented with a simple oval or, in shorthand, a keyword with an equal sign. #PCDATA can represent any number of data characters, including zero. Thus, an occurrence indicator is not required (and a PLUS indicator may be misleading, since the element can be entirely empty and satisfy a (#PCDATA)+ model). However, it is probably consistent and intuitive for design teams to supply occurrence indicators on #PCDATA ovals, and they can use them or not as they wish.

Even though the SGML OR connector is used to implement both collections (except for simple #PCDATA collections) and either-or groups, the two configurations are fundamentally different in their effects on document authoring and processing. This is why their respective notations look different.

The design team will begin to find contexts where collections appear long before the actual contents of those locations are known. A cloud symbol represents a placeholder for a collection. (We show the cloud with the label “text” (in quotation marks) to distinguish it from the official ISO 8879 term.) In the specifications of the final document analysis report, no cloud symbols will remain, having been replaced by either named collections or actual lists of the contents allowed.

When all the child elements must appear but can appear in any order, they appear in an oval that has no occurrence indicator (showing that the elements in the collection are “required”). This configuration corresponds to a group that uses the SGML AND connector. The individual elements in the oval can have occurrence indicators on them.

For both collections and any-order groups, it's usually impractical to define the contents of the elements inside the oval by attaching them directly. Instead, you can put an ellipsis below the boxes inside the oval, and elsewhere supply individual tree diagrams for each inner element.

B.5. Groups

Connectors can emanate from points where an element box could have appeared, but does not. These points represent groups containing the entire model below them. Groups can have occurrence indicators, just as elements can.

B.6. Attributes

An element box can have lines of descriptive text on its right side, with each line indicating an attribute that the element should have. Each line of text always includes a descriptive name for the attribute, and can also include its “data type” (declared value) and any default value. If a value is optional to supply, the line can end with a question mark. If a value is required, the line can end with a period to distinguish it from optional attributes and those for which optionality hasn't been decided. Default values can be shown with underlining or with boldface text.

In the modeling stages, the specifications for attributes can often be imprecise; a single descriptive word might suffice to indicate the intent. For example, “id” might be taken to mean an attribute named id with a declared content of ID. According to the technical SGML knowledge of the document type design team members, some or all of these conventions can be used to give the level of precision desired in specifying attributes.

Following are some examples of attribute specifications in tree diagrams, along with their possible attribute definitions in the DTD.

Attribute Specification Possible Declaration in DTD
id        ID      #IMPLIED
link to entry.
entrylink IDREF   #REQUIRED
type      --??--  #REQUIRED
delim (" ")
delimiter CDATA   " "
audience=(novice |expert)
audience  (novice|expert) novice

B.7. Additional Notations

It is usually impractical to fit an entire markup model into a single diagram. It's better to focus each diagram on a single relevant portion of the model, and to elide unnecessary detail. This section describes parts of the notation that, like parameter entities, are unrelated to the markup model per se but help the model be better organized. You may find that you want to add to or change these convenience notations.

An ellipsis ( ... ) stops the descent into lower levels of child elements and implies that a specification for the parent element can be found elsewhere. If you're preparing a package of tree diagrams as part of DTD user documentation, it can be helpful to indicate the page or section in which the desired diagram appears.

To represent parent elements with many sequentially ordered child elements or many mutually exclusive choices of child elements, you can partially or wholly orient the children top-to-bottom rather than left-to-right. Unfortunately, this arrangement leaves little room for attribute information and declared content, and it appears to be less effective than the usual orientation in communicating the necessary modeling information.

Alternatively, you can split the diagram into two parts and indicate a continuation with an ellipsis or a page reference.

If one or more sets of common attributes are used on multiple elements, you can use a special symbol or keyword next to the relevant element boxes to stand for each set of attributes.

If a group or class of elements has identical content model and attribute list characteristics and are intended to stay in synchronization, the correspondence can be shown in various ways. For example, multiple parent element boxes can be stacked on top of one another. Alternatively, if the elements have already been identified as being part of a named class, the class can be shown in an oval as if it were the parent element. The DTD implementor can make use of this correspondence in constructing the declarations.

B.8. Tree Diagram Building Process

The tree diagram notation can have several forms, representing modeling-in-progress, various stages of the specification work, DTD user documentation, and so on. Each design team will probably come up with its own shorthand as it develops a unique working style. Following are some examples to give you an idea of some of the shorthand forms we've used and the progression of work.

During the modeling process in a DTD design session, the notation is likely to be applied informally on a whiteboard or flipchart, with various details elided in the interest of quick communication. For example, the following diagram might be produced partway through a discussion of the basic hierarchical structure of a document type. Notice that the number of levels of division is indicated, but most information about contents of those divisions and about repeatability of elements hasn't been supplied.

When the team is ready to commit its decision to paper (or to a graphical editor or computer-aided DTD development software) for a particular phase of specification work, these details must be filled in. For example, the following diagram might appear in the draft document analysis report as the result of the final discussions on the document hierarchical structure, before the collections have been handled. It shows a much more sophisticated content model for each of the divisions, includes attribute information, and specifies some occurrence details even before the contents of the “text” clouds have been filled in.

When when the design team is done populating the “text” clouds and has determined the occurrence requirements of those contexts, these details can be filled in. No more clouds remain in our example.

When both the specification work and the DTD implementation and testing are done, the diagram might need changes to bring it up to date. For example, the following diagram shows the result of some simplifications made to the occurrence requirements and structure and uses more precise SGML terminology.

In the DTD user documentation, it can be useful to excerpt various fragments of the diagrams in explaining each element, showing only the most significant or immediate levels of ancestor and descendent elements. For example, the following partial diagram might appear in the documentation for a topic, while the whole document hierarchy might be shown elsewhere.

[17] In an early project we worked on, these ovals acquired the name potatoes because when hand-drawn they tend to look—how can we describe it?—much more “organic” than do the perfectly symmetrical ovals shown here. The name has stuck in several companies, and we've even seen parameter entity names that use potato as a suffix.