Glossary

This glossary provides definitions for SGML technical terms, as well as terms and concepts that we introduce as part of our methodology and techniques. For terms that have an ISO 8879 definition, we supply that definition with its clause reference (though without any notes that accompany the definition in the standard), along with additional explanation as appropriate.

abstract syntax

The functional roles of pieces of SGML markup, for example, a “start-tag open” (STAGO). A concrete syntax maps actual character strings to the functional roles; for example, the reference concrete syntax maps an STAGO to the left angle bracket ( < ).

The ISO 8879 definition is as follows:

Rules that define how markup is added to the data of a document, without regard to the specific characters used to represent the markup. (4.1)

See Also concrete syntax.

ancestor

An element that contains another element, directly or indirectly; the first is said to be an ancestor of the second.

architectural form

Named set of rules for and constraints on the declaration and processing of an element or an attribute definition list, usually expressed as a markup declaration and accompanying documentation. A declaration conforming to an architectural form references it by supplying the form's name as the value of a certain attribute.

attribute

Markup that allows further description of an element. If you think of an element as a noun, you can think of an attribute is an adjective modifying a noun. Attribute information for an element is stored in its start-tag.

The ISO 8879 definition is as follows:

A characteristic quality, other than type or content. (4.9)

attribute name

A label for an attribute value. Attribute information in an element's start-tag is not positionally sensitive; the attribute name helps to distinguish between values for different attributes.

attribute value

A string that provides additional description for an element. An attribute's declared value determines the rules an attribute value must follow to be valid, for example, indicating that the value must be a NUMBER (a string made only of the characters 0–9).

See Also declared value.

attribute specification

The ISO 8879 definition is as follows:

A member of an attribute specification list; it specifies the value of a single attribute. (4.15)

authoring DTD

A variant of a reference DTD whose markup model has been optimized for use in authoring, editing, and modifying documents. Authoring DTDs are sometimes created to solve problems in specific software environments or to simplify the markup process.

See Also conversion DTD, interchange DTD, presentational DTD, reference DTD.

catalog

A file that maps public identifiers (primarily used in entity declarations) to objects (such as files) on a computer system, so that the contents of each object can be substituted. The format of the most commonly used catalog file was standardized by SGML Open in its Technical Resolution 9401.

child

An element that is directly contained by another element; the first is said to be a child of the second.

collection

A “palette” of elements from which authors can choose freely (possibly along with data characters) in a particular context, without restriction on number or order, other than potentially requiring a single element to be supplied.

An element declaration achieves this effect by using a repeatable OR group in its content model. If the optional-repeatable indicator is used or if the collection allows #PCDATA, the content model can be satisfied by an absence of any content. If the required-repeatable indicator is used and the collection specifies only elements, at least one of the elements must be present to satisfy the content model.

comment

Special markup and content that is solely for the eyes of readers of the “source” files. In a document instance, the comment is usually in its own comment declaration, surrounded with <!-- --> characters. In a DTD, comments are sometimes interspersed throughout other markup declarations.

The ISO 8879 definition is as follows:

A portion of a markup declaration that contains explanations or remarks intended to aid persons working with the document. (4.46)

component

See semantic component.

content-based component

A semantic component that is primarily descriptive of information content, rather than structure or presentation. For example, a “mailing address” component is content-based.

See Also presentational component, structural component.

concrete syntax

The expression of functional roles of pieces of SGML markup in terms of character strings. For example, a “start-tag open” (STAGO) in the abstract syntax is mapped to a left angle bracket ( < ) in the reference concrete syntax.

The ISO 8879 definition is as follows:

A binding of the abstract syntax to particular delimiter characters, quantities, markup declaration names, etc. (4.48)

See Also abstract syntax.

content model

The rules for the configuration of element and/or data content allowable in instances of an element type.

The ISO 8879 definition is as follows:

Parameter of an element declaration that specifies the model group and exceptions that define the allowed content of the element. (4.55)

context

The specific arrangement of document text in which a particular kind of markup or content is found (or can be found, if you are examining a markup model rather than a document instance).

In a document instance, context is usually understood to mean the list of element ancestors of a certain element. For example, the context of a “recipe instruction step” element might be represented as “recipe→instruction-list→step.” However, other factors, such as the values of particular attributes, can also be examined. Most kinds of document utilization, such as searching and formatting, involve locating material in a certain bounded context.

contextual markup

A markup system for which not all individual pieces of markup are allowable in all locations in a document. SGML is contextual, whereas most word-processing systems are not.

See Also noncontextual markup.

conversion

The process of changing a document's system-specific markup, usually permanently, to conform to an SGML DTD.

See Also transformation.

conversion DTD

A variant of a reference DTD that is optimized for receiving the results of converting non-SGML document sources to SGML form. Typically, conversion DTDs relax the content models and attribute rules of the reference DTD.

See Also authoring DTD, interchange DTD, presentational DTD, reference DTD.

data

The ISO 8879 definition is as follows:

The characters of a document that represent the inherent information content; characters that are not recognized as markup. (4.72)

SGML distinguishes between data and markup, calling the combination of the two text.

See Also text, data-level component, data-level element.

data-level component, data-level element

A component or element that represents a small piece of information that needs to be processed or handled specially. A data-level element usually has a simple internal structure and would be meaningless without its surrounding context, which almost always consists of character data, usually in prose form.

See Also information unit (IU).

declarative markup

A markup system that describes the document content rather than describing how a computer system should process that content. Markup that effectively says, “This is a paragraph” is declarative, while markup that says, “Wrap this region of text to fit a line length of 26 picas using 10–point Times font on 11–point leading” is procedural.

See Also procedural markup.

declared content

Instructions for the content of an element type that are represented with a single keyword. The three choices of element declared content are CDATA, RCDATA, and EMPTY.

declared value

The constraints imposed by the attribute definition list declaration, which any value for that attribute must follow in a document. The declared value of an attribute serves as a kind of “data type” for the value. Table A.1, “Attribute Declared Values” describes the available declared values.

descendant

An element that is contained by another element, directly or indirectly; the first is said to be a descendant of the second.

descriptive markup

See declarative markup.

design principle

A goal arising from the overall SGML project goals, stated specifically and unambiguously, that should be used by the document type design team and the DTD implementor in their work.

document

The ISO 8879 definition is as follows:

A collection of information that is processed as a unit. A document is classified as being of a particular document type. (4.96)

document analysis report

The formal, written results of the needs analysis and document type design work performed by the document type design team. This report, along with the project documents, is the main source of information from which the DTD implementor works.

document element

The ISO 8879 definition is as follows:

The element that is the outermost element of an instance of a document type; that is, the element whose generic identifier is the document type name. (4.99)

document hierarchy

The overall structure of a document type; the highest levels of markup that dictate the characteristic “shape” of the documents.

See Also information pool.

document instance

The ISO 8879 definition is as follows:

Instance of a document type. (4.100)

The ISO 8879 definition of an “instance of a document type” is as follows:

The data and markup for a hierarchy of elements that conforms to a document type definition. (4.160)

See Also presentation instance.

document type

The ISO 8879 definition is as follows:

A class of documents having similar characteristics; for example, journal, article, technical manual, or memo. (4.102)

document type declaration

The declaration at the top of an SGML document (after the SGML declaration, if one is present) that indicates the DTD rules to which the document instance is intended to conform.

The ISO 8879 definition is as follows:

A markup declaration that formally specifies a portion of a document type definition. (4.103)

document type definition

See DTD (document type definition).

DTD (document type definition)

A formal expression of the SGML-based rules that a document's markup must follow.

The ISO 8879 definition is as follows:

Rules, determined by an application, that apply SGML to the markup of documents of a particular type. (4.105)

See Also markup model.

element

A named collection of document content. Most such collections can contain and/or be contained in other collections.

The ISO 8879 definition is as follows:

A component of the hierarchical structure defined by a document type definition; it is identified in a document instance by descriptive markup, usually a start-tag and end-tag. (4.110)

See Also element type.

element declaration

The markup declaration that specifies the rules for an element type.

The ISO 8879 definition is as follows:

A markup declaration that contains the formal specification of the part of an element type definition that deals with the content and markup minimization. (4.111)

element set

A portion of a DTD, usually containing element declarations, that “travels together” and can be used easily in multiple DTDs. An element set is stored in its own parameter entity.

The ISO 8879 definition is as follows:

A set of element, attribute definition list, and notation declarations that are used together. (4.112)

element type

The definition of an element; an element in the abstract sense, as opposed to any instances of that element type in an actual document. Any one element declaration, even if it specifies multiple generic identifiers, defines a single element type.

The ISO 8879 definition is as follows:

A class of elements having similar characteristics; for example, paragraph, chapter, abstract, footnote, or bibliography. (4.114)

See Also element.

elm tree diagram

A graphically based description of the desired markup model for part or all of a document type being designed, or a similar description of the model for an existing DTD, using the notation explained in Appendix B, Tree Diagram Reference. “Elm” is an acronym for “enables lucid models.

end-tag

The ISO 8879 definition is as follows:

Descriptive markup that identifies the end of an element. (4.119)

entity

A named fragment of document content that is stored separately from other fragments and that can be included in a document one or more times by reference to its name.

The ISO 8879 definition is as follows:

A collection of characters that can be referenced as a unit. (4.120)

entity reference

Markup that indicates a location in a document where the content of an entity should be included.

The ISO 8879 definition is as follows:

A reference that is replaced by an entity. (4.124)

extended DTD

A DTD whose markup model has been modifed from that of an original (usually standard) DTD, such that some or all instances conforming to the modified one can potentially be invalid according to the original one.

See Also renamed DTD, subsetted DTD.

external identifier

The ISO 8879 definition is as follows:

A parameter that identifies an external entity or data content notation. (4.135)

generic identifier

The ISO 8879 definition is as follows:

A name that identifies the element type of an element. (4.145)

generic markup

A markup system that is not specific to a single vendor, document producer, or computer hardware or software configuration.

See Also system-specific markup.

hierarchical

Arranged by means of successive levels of containment, where “lower” (or “inner”) units are nested entirely within “higher” (or “outer”) ones. Elements in an SGML document are arranged hierarchically.

HyTime

The Hypermedia/Time-based Structuring Language; ISO standard 10744. HyTime is a language, defined largely by means of architectural forms, for representing hypertext links and the scheduling and synchronization of events. To use HyTime-based processing applications, you map the relevant markup in your DTD to the architectural forms specified in the HyTime standard, following the constraints set forth by the forms.

See Also architectural form.

information pool

The body of markup available to authors in the contexts where they supply the “main content” of a document. These contexts typically offer great discretion in choosing and applying markup. The information pool is a kind of “supercollection” encompassing all the information units and data-level elements.

See Also document hierarchy.

information unit (IU)

A high-level component or element that can, to some degree, “stand alone” in order to be understood by a reader, such that it must “travel together” during information processing and assembly. An information unit typically has a complex internal structure. However, the most common information unit, the paragraph, often has a very simple content model.

See Also data-level component, data-level element.

instance

See document instance.

interchange DTD

A DTD that has been agreed on as the standard form for document interchange by the senders and recipients of SGML documents. For example, DocBook Version 2.2.1 was the interchange DTD agreed on by the authors and publisher of this book. Reference DTDs often must use an interchange DTD as their design base.

See Also authoring DTD, conversion DTD, presentational DTD, reference DTD.

internal declaration subset

The portion of a DTD's markup declarations that are provided directly inside the document type declaration, between square brackets.

International Organization for Standardization

See ISO (International Organization for Standardization) .

ISO (International Organization for Standardization)

ISO describes itself and explains its work as follows:

ISO (the International Organization for Standarization) is a worldwide federation of national standards bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committees. Each member body interested in a subject for which a technical commitee has been established has the right to be represented on that committee. International organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.

SGML was created under the auspices of ISO/IEC JTC1/SC18/WG8—Working Group 8 of Subcommittee 18 of Joint Technical Committee 1 of the combined effort of ISO and the International Electrotechnical Commission.

See Also SGML (Standard Generalized Markup Language) .

IU

See information unit (IU).

key data

A data-level component or element that is highly content-based and specifically related to the information domain of the document type under discussion. For example, in software documentation, a “command name” would be key data.

link component

A component that records the relationship of two or more pieces of information. Two common kinds of links are those that join document content to locations where that content should be reproduced, and those that constitute a suggestion to the reader to seek out additional information.

markup, mark up

The ISO 8879 definition of markup is as follows:

Text that is added to the data of a document in order to convey information about it. (4.183)

To mark up data is to add markup to it.

markup declaration

A “statement” in the SGML language that defines a portion of a markup model or other markup characteristics of a document. Most markup declarations appear in DTDs, but a few (such as comment declarations) can appear in document instances.

The ISO 8879 definition is as follows:

Markup that controls how other markup of a document is to be interpreted. (4.186)

markup model

The markup “vocabulary” and “grammar” defined by a DTD (or some part of a DTD), which serve as the rules of the language “spoken” by documents conforming to that DTD. Many people simply use the term “DTD” for this concept, but we use a unique term for it because of the need to distinguish between the actual markup characteristics defined in a DTD and the various implementation techniques used to make the design readable, maintainable, and so on.

metainformation

Information about information; facts about a document (or smaller piece of information) as a body of information. For example, a document's publication date is metainformation.

metalanguage

A language that is used to create or define other languages. SGML is a metalanguage used to define DTDs that specify markup models; these models function as unique document markup “languages.

modeling

The act of designing markup requirements in a way that makes the results suitable for expression in SGML markup declarations.

noncontextual markup

A markup system that places no formal restrictions on the appearance or order of the individual pieces of markup. Most word-processing systems are noncontextual, whereas SGML is contextual.

See Also contextual markup.

parent

An element that directly contains another element; the first is said to be an ancestor of the second.

parser

The ISO 8879 definition of “SGML parser” is as follows:

A program (or portion of a program or a combination of programs) that recognizes markup in SGML documents. (4.285)

potato

An oval containing an element collection or any-order group. Also, an herb of the nightshade family that is widely cultivated as a vegetable crop.

presentation instance

One form of an SGML document as presented to a user, possibly with some content changed, added, or removed compared to other presentations.

See Also document instance.

presentational component

A semantic component that is primarily descriptive of information appearance, rather than structure or meaning. For example, a “bold font” component is presentational.

See Also content-based component, structural component.

presentational DTD

A variant of a reference DTD that is optimized to assist the process of transforming SGML documents into presented or otherwise processed form. Typically, presentation DTDs allow for the “augmenting” of the original document to contain generated material such as tables of contents and to contain formatting-related information.

See Also authoring DTD, conversion DTD, interchange DTD, reference DTD.

principle

See design principle.

procedural markup

A markup system that describes how a computer system should process the document content rather than describing what the content means. Markup that effectively says, “Wrap this region of text to fit a line length of 26 picas using 10–point Times font on 11–point leading” is procedural, while markup that says, “This is a paragraph” is declarative.

See Also declarative markup.

processing expectations

The assumptions about markup that constrain and inform its use in document authoring, management, and processing. For example, the processing expectations about a cross-reference element might include the requirement that it be replaced with generated text when it is formatted for printing. Some people use the term “semantics” or “processing semantics” for this meaning—which accounts for our use of the terms semantic component and semantic extension—but as a noun, semantics is too ambiguous for our taste.

RE

See record end (RE).

record end (RE)

An invisible character that occurs at the end of units of stored data that are known as records or, sometimes, “lines.

The ISO 8879 definition is as follows:

A function character, assigned by the concrete syntax, that represents the end of a record. (4.254)

reference concrete syntax

The default concrete syntax for SGML documents, and the one used in SGML declarations.

The ISO 8879 definition is as follows:

A concrete syntax, defined in this International Standard, that is used in all SGML declarations. (4.258)

reference DTD

A DTD that encodes the “ideal” markup model for complete documents of a specified type. A reference DTD may be based on (that is, a variant of) an interchange DTD, but otherwise it typically provides the design base for the other variant DTDs, such as an authoring DTD.

See Also authoring DTD, conversion DTD, interchange DTD, presentational DTD.

renamed DTD

A DTD that is identical to another DTD, except that some or all of the element names and other markup names have been changed to be more suitable for use with authors who use a different jargon or write in a different language.

See Also extended DTD, subsetted DTD.

semantic component

A unit of specification representing a requirement for the design of a document type model, which corresponds to a kind of information that must be distinguished from all others. A semantic component often results in the DTD having a new element type, but can also result in other kinds of markup distinctions.

See Also processing expectations.

semantic extension

A technique for markup model design that allows a DTD's markup to be used for making novel distinctions among kinds of information, even if the markup didn't previously recognize the distinction. The technique is useful for DTDs that cannot be updated frequently enough to satisfy new requirements at the rate at which they are created.

SGML (Standard Generalized Markup Language)

The ISO 8879 definition of Standard Generalized Markup Language is as follows:

A language for document representation that formalizes markup and frees it of system and processing dependencies. (4.305)

SGML was published in 1986 as ISO standard 8879. Amendment 1 to the standard was published in 1988.

SGML document

See document.

sibling

An element that occurs at the same level as another element that has the same parent; the two are said to be siblings of each other.

specific markup

See system-specific markup.

Standard Generalized Markup Language

See SGML (Standard Generalized Markup Language) .

start-tag

The ISO 8879 definition is as follows:

Descriptive markup that identifies the start of an element and specifies its generic identifier and attributes. (4.306)

structural component

A semantic component that is primarily descriptive of information structure, rather than meaning or appearance. For example, a “list” component is structural.

See Also content-based component, presentational component.

subsetted DTD

A DTD whose markup model has been modifed from that of an original (usually standard) DTD, such that all instances conforming to the modified one are still valid according to the original one. Note that a subsetted DTD is unrelated to a DTD internal subset, which is a the portion of a DTD that is “local” to a document by virtue of being supplied inside the DOCTYPE declaration's square brackets ( [ ] ).

See Also extended DTD, renamed DTD.

system-specific markup

A markup system that is specific to a single vendor, document producer, or computer hardware or software configuration.

See Also generic markup.

tag

The ISO 8879 definition is as follows:

Descriptive markup. (4.314)

Tag Abuse Syndrome

A condition that afflicts authors who choose inappropriate markup to get a certain formatting effect or choose markup that isn't as precise or accurate as possible. A poor DTD design often exacerbates the problem.

text

Data and markup making up a document.

The ISO 8879 definition is as follows:

Characters. (4.316)

Where we use the term for its colloquial meaning—the main content in the flow of a document, exclusive of the document hierarchy—we use quotation marks around it.

See Also data.

transformation

The process of changing the data and markup within SGML documents to make them conform to a different DTD or to another kind of markup, typically one that can be directly interpreted by printers, display devices, or further transformation software.

See Also conversion.

tree diagram

See elm tree diagram.

validating SGML parser

The ISO 8879 definition is as follows:

A conforming SGML parser that can find and report a reportable markup error if (and only if) one exists. (4.329)

value

See attribute value.

variant DTD

A DTD whose design is based closely on the markup model of another DTD.

See Also reference DTD.

WYSIWYG

What you see is what you get.” A description of word processing systems and desktop publishing systems that let authors see a representation of the formatted appearance of a document on the computer screen as they work. Most such systems also allow authors to customize the formatted appearance by manipulating the screen display dynamically.