Building XHTML^™ Modules

W3C Working Draft 10 September 1999

This version:: http://www.w3.org/TR/1999/WD-xhtml-building-19990910
(Single HTML file, Postscript version, PDF version, ZIP archive, or Gzip'd TAR archive)
Latest version:: http://www.w3.org/TR/xhtml-building
Previous version:: http://www.w3.org/TR/1999/xhtml-modularization-19990406/
Editors:: Murray Altheim, Sun Microsystems
Shane McCarron, Applied Testing and Technology

Abstract

This working draft defines the mechanism for defining markup language modules that are compatible with the modularization framework used by XHTML. This includes a definition of the way in which an abstract module is specified, the way in which this abstraction is mapped into an XML DTD, and the way in which the resulting DTD module can be combined with other XHTML DTD modules to create new markup languages. In the future, it is expected that instructions will also be provided for mapping the abstract specifications into an XML Schema. Note that the materials in this document were formerly part of the Modularization of XHTML document, but have been separated out for editorial purposes.

Status of this document

This document is a working draft of the W3C's HTML Working Group. It is being released for public review, discussion, and comment. This working draft may be updated, replaced or rendered obsolete by other W3C documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". This document is work in progress and does not imply endorsement by the W3C membership.

This document has been produced as part of the W3C HTML Activity. The goals of the HTML Working Group (members only) are discussed in the HTML Working Group charter (members only).

Please send detailed comments on this document to [email protected]. We cannot guarantee a personal response, but we will try when it is appropriate. Public discussion on HTML features takes place on the mailing list [email protected].

Note that, as this is a work in progress, text is frequently added to the document that has not yet been reviewed by the working group. Such text is marked using the class "new", with an appearance like this. Text delimited with class "new" is submitted for consideration by the working group, and necessarily has a lesser status than the text in the remainder of the document. [This technique was agreed at the 5 May 1999 teleconference of the working group.]

Quick Table of Contents

1. Introduction
2. Terms and Definitions
3. Conformance Definition
4. Defining Abstract Modules
5. XML DTD Modules
5. Developing DTDs with defined and extended modules
A. References

Full Table of Contents

1. Introduction
- 1.1. Why Build XHTML Modules?
- 1.2. Abstract Modules
- 1.3. XML DTD Modules
2. Terms and Definitions
3. Conformance Definition
- 3.1. Module Conformance
- 3.2. Naming Rules
  - 3.2.1. Rationale for Naming Rules
4. Defining Abstract Modules
- 4.1. Syntactic Conventions
- 4.2. Content Types
- 4.3. Attribute Types
- 4.4. An Example Abstract Module Definition
  - 4.4.1. XHTML Skiing Module
5. XML DTD Modules
- 5.1. Implementing Document Model Modules in the DTD
  - 5.1.1. Parameterization
  - 5.1.2. Modularization
5. Developing DTDs with defined and extended modules
- 5.1. Defining additional attributes
- 5.2. Defining additional elements
- 5.3. Defining the content model for a collection of modules
  - 5.3.1. Integrating a stand-alone module into XHTML
  - 5.3.2. Mixing a new module throughout the modules in XHTML
- 5.4. Creating a new DTD
  - 5.4.1. Creating a simple DTD
  - 5.4.2. Creating a DTD by extending XHTML
  - 5.4.3. Creating a DTD by removing and replacing XHTML modules
- 5.5. Using the new DTD
A. References
- A.1. Normative References
- A.2. Informative References

1. Introduction

This section is normative.

1.1. Why Build XHTML Modules?

XHTML is more than just a recasting of HTML into XML. It is also an extensible architecture that permits the ready definition of new document types. The W3C envisions that client manufacturers, document authors, and content providers may all use this architecture to define document types that are specific to their needs. The XHTML Modularization specification defines a collection of modules and a framework that make the definition of these new document types relatively easy.

That architecture by itself may not be sufficient for the needs of all document type creators. In particular, people who are defining new functionality or combining new functionality with existing elements need a way to define that functionality. The XHTML method for doing this is through the definition of an XHTML module.

XHTML modules define elements and their attributes, add attributes to elements defined in other modules, add values to the set of values available to an attribute defined in other modules, define content models, or some combination of these things. The expression of a module is done through the creation of a prose functional description of the module, an abstract definition of the module's contents, and then one or more implementations of the module. The remainder of this document defines the way in which these steps should be conducted.

1.2. Abstract Modules

An XHTML document type is defined as a set of modules. Each XHTML module has an abstract definition that generally indicates the facilities made available through the module and way those facilities are minimally integrated with each other and with an (eventual) document type.

1.3. XML DTD Modules

An XML DTD module consists of a set of element types, a set of attribute list declarations, and a set of content model declarations, where any of these three sets may be empty. An attribute list declaration in an XML DTD module may modify an element type outside the element types in the module, and a content model declaration may modify an element type outside the element type set.

2. Terms and Definitions

This section is informative.

While some terms are defined in place, the following definitions are used throughout this document. Familiarity with the W3C XML 1.0 Recommendation [XML] is highly recommended.

document type: a class of documents sharing a common abstract structure. The ISO 8879 [SGML] definition is as follows: "a class of documents having similar characteristics; for example, journal, article, technical manual, or memo. (4.102)"
document model: the effective structure and constraints of a given document type. The document model constitutes the abstract representation of the physical or semantic structures of a class of documents.
markup model: the markup vocabulary (ie., the gamut of element and attribute names, notations, etc.) and grammar (ie., the prescribed use of that vocabulary) as defined by a document type definition (ie., a schema) The markup model is the concrete representation in markup syntax of the document model, and may be defined with varying levels of strict conformity. The same document model may be expressed by a variety of markup models.
document type definition (DTD): a formal, machine-readable expression of the XML structure and syntax rules to which a document instance of a specific document type must conform; the schema type used in XML 1.0 to validate conformance of a document instance to its declared document type. The same markup model may be expressed by a variety of DTDs.
reference DTD: a DTD whose markup model represents the foundation of a complete document type. A reference DTD provides the basis for the design of a "family" of related DTDs, such as subsets, extensions and variants.
subset DTD: a DTD whose document model is the proper subset of a reference document type, whose conforming document instances are still valid according to the reference DTD. A subset may place tighter restrictions on the markup than the reference, remove elements or attributes, or both.
extension DTD: a DTD whose document model extends a reference document type (usually by the addition of element types or attributes), but generally makes no profound changes to the reference document model other than required to add the extension's semantic components. An extension can also be considered a proper superset if the reference document type is a proper subset of the extension.
variant DTD: a DTD whose document model alters (through subsetting, extension, and/or substitution) the basic data model of a reference document type. It is often difficult to transform without loss between instances conforming to a variant DTD and the reference DTD.
fragment DTD: a portion of a DTD used as a component either for the creation of a compound or variant document type, or for validation of a document fragment. SGML nor XML current have standardized methods for such partial validation.
content model: the declared markup structure allowed within instances of an element type. XML 1.0 differentiates two types: elements containing only element content (no character data) and mixed content (elements that may contain character data optionally interspersed with child elements). The latter are characterized by a content specification beginning with the "#PCDATA" string (denoting character data).
abstract module: a unit of document type specification corresponding to a distinct type of content, corresponding to a markup construct reflecting this distinct type.
element type: the definition of an element, that is, a container for a distinct semantic class of document content.
element: an instance of an element type.
generic identifier: the name identifying the element type of an element. Also, element type name.
tag: descriptive markup delimiting the start and end (including its generic identifier and any attributes) of an element.
markup declaration: a syntactical construct within a DTD declaring an entity or defining a markup structure. Within XML DTDs, there are four specific types: entity declaration defines the binding between a mnemonic symbol and its replacement content. element declaration constrains which element types may occur as descendants within an element. See also content model. attribute definition list declaration defines the set of attributes for a given element type, and may also establish type constraints and default values. notation declaration defines the binding between a notation name and an external identifier referencing the format of an unparsed entity
entity: an entity is a logical or physical storage unit containing document content. Entities may be composed of parse-able XML markup or character data, or unparsed (ie., non-XML, possibly non-textual) content. Entity content may be either defined entirely within the document entity ("internal entities") or external to the document entity ("external entities"). In parsed entities, the replacement text may include references to other entities.
entity reference: a mnemonic or numeric string used as a reference to the content of a declared entity (eg., "&" for "&", "<" for "<", "©" for "©".)
instantiate: to replace an entity reference with an instance of its declared content.
parameter: entity an entity whose scope of use is within the document prolog (ie., the external subset/DTD or internal subset). Parameter entities are disallowed within the document instance.
module: an abstract unit within a document model expressed as a DTD fragment, used to consolidate markup declarations to increase the flexibility, modifiability, reuse and understanding of specific logical or semantic structures.
modularization: an implementation of a modularization model; the process of composing or de-composing a DTD by dividing its markup declarations into units or groups to support specific goals. Modules may or may not exist as separate file entities (ie., the physical and logical structures of a DTD may mirror each other, but there is no such requirement).
modularization model: the abstract design of the document type definition (DTD) in support of the modularization goals, such as reuse, extensibility, expressiveness, ease of documentation, code size, consistency and intuitiveness of use. It is important to note that a modularization model is only orthogonally related to the document model it describes, so that two very different modularization models may describe the same document type.
driver: a generally short file used to declare and instantiate the modules of a DTD. A good rule of thumb is that a DTD driver contains no markup declarations that comprise any part of the document model itself.
parent document type: A parent document type of a compound document is the document type of the root element.
compound document: A compound document is a document that uses more than one XML Namespace. Compound documents may be defined as documents that contain elements or attributes from multiple document types.
module: A module is a collection of elements or attributes.; A profile is meta-data about an XML document type and possible related technologies, such as scripting languages and style-sheets. A browser can support one or many profiles. The purpose of profiles is to provide content developers with machine-readable information about features that can be expected from a particular browser. For example, a Television profile would include a DTD for Television sets and describe technologies that can be used (e.g. scripts and style-sheets).

3. Conformance Definition

This section is normative.

In order to ensure that XHTML modules are maximally portable, this specification rigidly defines conformance requirements. While the conformance definitions can be found in this section, they necessarily reference normative text within this document, within the base XHTML specification [XHTML1], and within other related specifications. It is only possible to fully comprehend the conformance requirements of XHTML through a complete reading of all normative references.

3.1. Module Conformance

This specification defines a method for defining XHTML-conforming modules. A module conforms to this specification when it meets all of the following criteria:

The module must be defined using one of the implementation methods identified in this specification (currently on XML DTDs are defined).
The module must have a unique identifier as defined in Naming Rules.
When the module is implemented using an XML DTD, the module must insulate its parameter entity names through the use of unique prefixes or other, similar methods.
The module must have a prose definition that describes the syntactic and semantic requirements of the elements, attributes, and/or content models that it declares.
The module must not reuse any element names that are defined in other W3C-defined modules, except when the content model and semantics of those elements are either identical to the original or an extension of the original.

Note: There really should be a defined restriction on the reuse of elements that are specified in other W3C-defined modules. However, this restriction should be moderated such that, for example, the li element can be reused under a new list-style element (e.g. mylist).

3.2. Naming Rules

Names for XHTML-conforming document types must adhere to strict naming conventions so that it is possible for software and users to readily determine the relationship of document types to XHTML. The names for modules are defined through XML Formal Public Identifiers (FPIs). Within FPIs, fields are separated by double slash character sequences (//). The various fields MUST be composed as follows:

The leading field identifies the resources relationship to a formal standard. For privately defined resources, this field MUST be "-". For formal standards, this field MUST be the formal reference to the standard (e.g. ISO/IEC 15445:1999).
The second field MUST contain the name of the organization responsible for maintaining the named item. There is no formal registry for these organization names. Each organization SHOULD define a name that is unique. The name used by the W3C is, for example, W3C.
The third field MUST take the form ELEMENTS XHTML- followed by an organization-defined unique identifier (e.g. MyML 1.0). This identifier is SHOULD be composed of a unique name and a version identifier that can be updated as the document type evolves.
The fourth field defines the language in which the item is developed (e.g. EN).

Using these rules, the name for an XHTML conforming module might be -//MyCompany//ELEMENTS XHTML-MyModule 1.0//EN.

3.2.1. Rationale for Naming Rules

Naming Rules are critical for portability of user agents and XHTML-conforming tools. These rules need to be simple enough that they can be readily adhered to, and need to convey upon document type and module designers the power to readily associate their creations with XHTML (for marketing purposes, if nothing else). The above rules address these concerns. There were some other possibilities for naming conventions, and they were not used for the following reasons:

Use the XHTML version in the identifier.
In the case of new modules, there is no need to associate the module with a specific version of XHTML - the name does not need to identify version dependencies.

4. Defining Abstract Modules

This section is normative.

An Abstract Module is a definition of an XHTML module using prose text and some informal markup conventions. While such a definition is not generally useful in the machine processing of document types, it is critical in helping people understand what is contained in a module. This section defines the way in which XHTML abstract modules are defined. An XHTML conforming module is not required to provide an abstract module. However, anyone developing an XHTML module is encouraged to provide an abstraction to ease in the use of that module.

4.1. Syntactic Conventions

The abstract modules are not defined in a formal grammar. However, the definitions do adhere to the following syntactic conventions. These conventions are similar to those of XML DTDs, and should be familiar to XML DTD authors. Each discrete syntactic element can be combined with others to make more complex expressions that conform to the algebra defined here.

element name: When an element is included in a content model, its explicit name will be listed.
Content set: Some modules define lists of explicit element names called content sets. When a content set is included in a content model, its name will be listed.
expr ?: Zero or one instances of expr are permitted.
expr +: One or more instances or expr are required.
expr *: Zero or more instances of expr are permitted.
a , b: Expression a is required, followed by expression b.
a | b: Either expression a or expression b is required.
a - b: Expression a is permitted, omitting elements in expression b.
parentheses: When an expression is contained within parentheses, evaluation of any subexpressions within the parentheses take place before evaluation of expressions outside of the parentheses (starting at the deepest level of nesting first).
extending pre-defined elements: In some instances, a module adds attributes to an element. In these instances, the element name is followed by an ampersand (&).
Defining the type of attribute values: When a module defines the type of an attribute value, it does so by listing the type in parentheses after the attribute name.
Defining the legal values of attributes: When a module defines the legal values for an attribute, it does so by listing the explicit legal values (enclosed in quotation marks), separated by verical bars (|), inside of parentheses following the attribute name.

4.2. Content Types

Abstract module definitions define minimal, atomic content models for each module. These minimal content models reference the elements in the module itself. They may also reference elements in other modules upon which the abstract module depends. Finally, the content model in many cases requires that text be permitted as content to one or more elements. In these cases, the symbol used for text is PCDATA. This is a term, defined in the XML 1.0 Recommendation, that refers to processed character data. A content type can also be defined as EMPTY, meaning the element has no content in its minimal content model.

4.3. Attribute Types

In some instances, it is necessary to define the types of attribute values or the explicit set of permitted values for attributes. The following attribute types (defined in the XML 1.0 Recommendation) are used in the definitions of the Abstract Modules:

Attribute Type	Definition
CDATA	Character data
ID	A document-unique identifier
IDREF	A reference to a document-unique identifier
NAME	A name with the same character constraints as ID above
NMTOKEN	A name composed of CDATA characters but no whitespace
NMTOKENS	Multiple names composed of CDATA characters separated by whitespace
PCDATA	Processed character data

4.4. An Example Abstract Module Definition

This section defines a sample abstract module as an example of how to take advantage of the syntax rules defined above. Since this exampple is trying to use all of the various syntactic elements defined, it is pretty complicated. Typical module defintions would be much simpler than this. Finally, note that this module references the attribute collection Common. This is a collection defined in the XHTML Modularization specification that includes all of the basic attributes that most elements need.

4.4.1. XHTML Skiing Module

The XHTML Skiing Module defines markup used when describing aspects of a ski lodge. The elements and attributes defined in this module are:

Elements	Attributes	Minimal Content Model
resort	Common, href (CDATA)	description , Aspen+
lodge	Common	description, (Aspen - lift)+
lift	Common, href	description?
chalet	Common, href	description?
room	Common, href	description?
lobby	Common, href	description?
fireplace	Common, href	description?
description	Common	PCDATA*

This module also defines the content set Aspen with the minimal content model lodge | lift | chalet | room | lobby.

5. XML DTD Modules

This section is normative.

5.1. Implementing Document Model Modules in the DTD

Partitioning of the document model occurs at the abstract module level. This partitioning is implemented in the markup model by two primary methods: parameterization, the use of parameter entities as reusable strings, and modularization, the creation of DTD fragments called modules.

5.1.1. Parameterization

This specification classifies parameter entities into six categories and names them consistently using the following suffixes:

.mod: parameter entities use the suffix .mod when they are used to represent a DTD module (a collection of element classes). In this specification, each module is an atomic unit and may be represented as a separate file entity.
.module: parameter entities use the suffix .module when they are used to control the inclusion of a DTD module by containing either of the conditional section keywords INCLUDE or IGNORE.
.content: parameter entities use the suffix .content when they are used to represent the content model of an element type.
.class: parameter entities use the suffix .class when they are used to represent elements of the same class.
.mix: parameter entities use the suffix .mix when they are used to represent a collection of element types from different classes.
.attrib: parameter entities use the suffix .attrib when they are used to represent a group of tokens representing one or more complete attribute specifications within an ATTLIST declaration.

For example, in HTML 4.0, the %block; parameter entity is defined to represent the heterogenous collection of element types that are block-level elements. In this specification, the corollary parameter entity is %Block.mix;.

5.1.2. Modularization

DTD modules are often used to encompass the markup declarations of a specific semantic component or "feature", from higher-level document features like tables and forms, to lower-level components such as specific elements or element groups. Modules can even contain modules, creating a hierarchical structure mirroring the document model. Note that modules are not always implemented as separate file entities, and modular DTDs can be easily normalized into single file versions for more efficient distribution over the Web.

The relationship between document model components and how they are implemented in markup as modules, entities and files (i.e., the granularity of the parameterization or modularization, how the markup model is structured and stored as separate entities, etc.) is not necessarily direct, as design style and implementation issues properly play a part. Higher-level modules are sometimes delivered as individual file entities to facilitate portability and reusability. To promote interoperability, the XHTML DTD design considers each module as atomic, with the notion that implementations should support the semantics of an entire module without further subdivision.

While the notion of "plug and play" with DTD modules is very attractive, in practice this is not quite so simple. Complex document models often resort to extensive parameterization of abstract modules to facilitate understanding, markup reuse, extensibility, and maintenance. The resultant modules may have have many interdependencies, and may require a fair amount of "rewiring" when adding or removing a DTD module. In light of this, a compromise must be made between markup flexibility, complexity of the DTD, and ease of maintainability.

The XHTML DTD attempts to ameliorate this by localizing many of the more "global" parameter entities to several modules that are declared early in the DTD. These are labelled common modules, and include declarations for common names, attributes, parameter and character entities.

XHTML elements are classified into the following categories:

structural element types: element types that create the overall structure of an XHTML document.
block element types: element types that should cause a line break.
inline element types: element types that are displayed inline to an existing block.
phrasal element types: element types that specify a domain-relevant connotation
presentational element types: element types that indicate a desire on the part of the author for a specific rendering effect.
special case (or "feature") element types: element types that provide XHTML with special features, such as linking, forms, etc.

5. Developing DTDs with defined and extended modules

This section is informative.

The primary purpose of defining XHTML modules and a general modularization methodology is to ease the development of document types that are based upon XHTML. These document types may extend XHTML by integrating additional capabilities (e.g. [SMIL] or [MathML]), or they may define a subset of XHTML for use in a specialized device. Regardless of the application, XHTML modules are up to the task. This section describes the techniques that document type designers must use in order to take advantage of this modularization architecture. It does this by applying the techniques defined in the previous sections in progressively more complex ways, culminating in the creation of a complete document type from disparate modules.

Note that in no case do these examples require the modification of the XHTML-provided module files themselves. The XHTML module files are completely parameterized, so that it is possible through separate module definitions and driver files to customize the definition and the content model of each element and each element's hierarchy.

Finally, remember that most users of XHTML are not expected to be DTD authors. DTD authors are generally people who are defining specialized markup that will improve the readability, simplify the rendering of a document, or ease machine-processing of documents, or they are client designers that need to define the specialized DTD for their specific client. Consider these cases:

An organization is providing subscriber's information via a web interface. The organization stores its subscriber information in an XML-based database. One way to report that information out from the database to the web is to embed the XML records from the database directly in the XHTML document. While it is possible to merely embed the records, the organization could define a DTD module that describes the records, attach that module to an XHTML DTD, and thereby create a complete DTD for the pages. The organization can then access the data within the new elements via the Document Object Model [DOM], validate the documents, provide style definitions for the elements that cascade using Cascading Style Sheets [CSS2], etc. By taking the time to define the structure of their data and create a DTD using the processes defined in this section, the organization can realize the full benefits of XML.
An Internet client developer is designing a specialized device. That device will only support a subset of XHTML, and the devices will always access the Internet via a proxy server that is validating content before passing it on to the client (to minimize error handling on the client). In order to ensure that the content is valid, the developer creates a DTD that is a subset of XHTML using the processes defined in this section. They then use the new DTD in their proxy server and in their devices, and also make the DTD available to content developers so that developers can validate their content before making it available. By performing a few simple steps, the client developer can use the architecture defined in this document to greatly ease their DTD development cost and ensure that they are fully supporting the subset of XHTML that they choose to include.

5.1. Defining additional attributes

In some cases, an extension to XHTML can be as simple as additional attributes. Attributes can be added to an element just by specifying an additional ATTLIST for the element, for example:

<!ATTLIST a
      myattr   CDATA        #IMPLIED
>

would add the "myattr" attribute, with a value type of CDATA, to the "a" element. This works because XML permits the extension of the attribute list for an element at any point in a DTD.

Naturally, adding an attribute to a DTD does not mean that any new behavior is defined for arbitrary clients. However, a content developer could use an extra attribute to store information that is accessed by associated scripts via the Document Object Model (for example).

5.2. Defining additional elements

Defining additional elements is only slightly more complicated than defining additional attributes. Basically, DTD authors should write the element declaration for each element:

<!ELEMENT myelement ( #CDATA | myotherelement )* >
<!ATTLIST myelement
          myattribute    CDATA    #IMPLIED
>

<!ELEMENT myotherelement EMPTY >

After the elements are defined, they need to be integrated into the content model. Strategies for integrating new elements or sets of elements into the content model are addressed in Defining the content model for a collection of modules below.

5.3. Defining the content model for a collection of modules

Since the content model of XHTML modules is fully parameterized, DTD authors may modify the content model for every element in every module. The details of the DTD module interface are defined in XML DTD Modules. However, basically there are two ways to approach this modification:

Re-define the "<element>.content" entity for each element.
Re-define one or more of the global content model entities (*.class or *.mix).

The strategy taken will depend upon the nature of the modules being combined and the nature of the elements being integrated. The remainder of this section describes techniques for integrating two different classes of modules.

5.3.1. Integrating a stand-alone module into XHTML

When a module (and remember, a module can be a collection of other modules) contains elements that only reference each other in their content model, it is said to be "internally complete". As such, the module can be used on its own (for example, you could define a DTD that was just that module, and use one of its elements as the root element). Integrating such a module into XHTML is a three step process:

Decide what element(s) can be thought of as the root(s) of the new module.
Decide where these elements need to attach in the XHTML content tree.
Then, for each attachment point in the content tree, add the root element(s) to the content definition for the XHTML elements.

Consider attaching the elements defined above. In that example, the element myelement is the root. To attach this element under the object element, and only the object element, of XHTML, the following would work:

<!ENTITY % Object.content "( % Flow.mix | param | myelement )*">

A DTD defined with this content model would allow a document like the following fragment:

<object data="...">
<p>The object didn't load!</p>
<myelement>This is content of a locally defined element</myelement>
</object>

5.3.2. Mixing a new module throughout the modules in XHTML

Extending the example above, to attach this module everywhere that the %Flow.mix content model group is permitted, would require something like the following:

<!ENTITY % Misc.class
     "ins | del | script | noscript | myelement" >

Since the %Misc.class content model class is used throughout the XHTML Modules, the new module would become available throughout an extended XHTML document type.

5.4. Creating a new DTD

So far the examples in this section have described the methods of extending XHTML and XHTML's content model. Once this is done, the next step is to collect the modules that comprise the DTD into a single DTD driver, incorporating the new definitions so that they override and augment the basic XHTML definitions as appropriate.

When defining a new DTD, it is essential that each DTD have a unique identifier to use in the xmlns attribute of the root element (usually the html element). This identifier is often a URI, but in any event is something that can be used by user agents to differentiate the DTD from others. This identifier is defined using the XHTML1.ns parameter entity when creating a DTD that uses the XHTML1 structure module.

5.4.1. Creating a simple DTD

Using the trivial example above, it is possible to define a new DTD that extends the XHTML Transitional DTD pretty easily. The following is a complete, working extended DTD:

<!ENTITY % XHTML1.ns "http://my.company.com/DTDs/example.dtd" >

<!ELEMENT myelement ( #PCDATA | myotherelement )* >
<!ATTLIST myelement
     myattribute    CDATA   #IMPLIED
>

<!ELEMENT myotherelement EMPTY >

<!ENTITY % Misc.class
     "ins | del | script | noscript | myelement" >

<!ENTITY % XHTML1-t.dtd PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
          "http://www.w3.org/DTDs/XHTML1/XHTML1-t.dtd">
%XHTML1-t.dtd;

5.4.2. Creating a DTD by extending XHTML

Next, there is the situation where a complete, additional, and complex module is added to XHTML (or to a subset of XHTML). In essence, this is the same as in the trivial example above, the only difference being that the module being added is incorporated in the DTD by reference rather than explicitly including the new definitions in the DTD.

One such complex module is the DTD for [MathML]. In order to combine MathML and XHTML into a single DTD, an author would just decide where MathML content should be legal in the document, and add the MathML root element to the content model at that point:

<!ENTITY % XHTML1.ns "http://www.w3.org/DTDs/XHTML1_plus_MathML.dtd" >
<!ENTITY % XHTML1-math
     PUBLIC "-//W3C//MathML 1.0//EN"
            "http://www.w3.org/DTDs/MathML/MathML1.dtd" >
%XHTML1-math;

<!ENTITY % Inlspecial.class "a | img | object | map | math" >

<!ENTITY % XHTML1-strict
     PUBLIC "-//W3C//XHTML 1.0 Strict//EN"
            "http://www.w3.org/DTDs/XHTML/XHTML1-s.dtd" >
%XHTML1-strict;

Note that, while this is a valid example, it does not create a working DTD at this time. The reason for this is that the MathML DTD defines two elements (var and select) that conflict directly with XHTML. This conflict needs to be resolved in order for the new DTD to work correctly.

5.4.3. Creating a DTD by removing and replacing XHTML modules

Finally, another way in which DTD authors may use XHTML modules is to define a DTD that is a subset of XHTML (because, for example, they are building devices or software that only supports a subset of XHTML). Doing this is only slightly more complex than the previous example. The basic steps to follow are:

Take the XHTML 1.1 DTD as the basis of the new document type.
Select the modules to remove from that DTD.
Define a new DTD that "IGNORES" the modules.

For example, consider a device that supports the Strict XHTML 1.0, but without forms or tables. The DTD for such a device would look like this:

<!ENTITY % XHTML1.ns "http://www.w3.org/DTDs/XHTML1_simple.dtd" >

<!ENTITY % XHTML1-form.module "IGNORE" >
<!ENTITY % XHTML1-table.module "IGNORE" >

<!ENTITY % XHTML1-strict
     PUBLIC "-//W3C//XHTML 1.0 Strict//EN"
            "http://www.w3.org/DTDs/XHTML/XHTML1-s.dtd" >
%XHTML1-strict;

Note that this does not actually modify the content model for the Strict XHTML 1.0 DTD. However, since XML ignores elements in content models that are not defined, the form and table elements are dropped from the model automatically.

5.5. Using the new DTD

Once a new DTD has been developed, it can be used in any document. Using the DTD is as simple as just referencing it in the DOCTYPE declaration of a document:

<!DOCTYPE html PUBLIC "-//MyOrg//DTD My XHTML Extensions//EN"
          "http://www.myorg.com/DTDs/myorg.dtd">
<html xmlns="http://www.myorg.com/DTDs/myorg.dtd">
<head>
<title>MyOrg Document</title>
</head>
<body>
<p>This is an example document using the new elements:
<myelement>A test element <myotherelement /> </myelement>
</p>
</body>
</html>

A. References

This appendix is normative.

A.1. Normative References

[XHTML1]: Extensible HTML (XHTML) 1.0: W3C Working Draft, Steven Pemberton, et. al., 4 March 1999.
See: http://www.w3.org/TR/WD-html-in-xml
[XML]: Extensible Markup Language (XML) 1.0: W3C Recommendation, Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, 10 February 1998.
See: http://www.w3.org/TR/REC-xml
[SGML]: Information Processing -- Text and Office Systems -- Standard Generalized Markup Language (SGML), ISO 8879:1986.
Please consult http://www.iso.ch/cate/d16387.html for information about the standard, or http://www.oasis-open.org/cover/general.html#overview about SGML.
[MATHML]: Mathematical Markup Language (MathML) 1.01 Specification: W3C Recommendation, Patrick Ion, Robert Miner, et al, 7 July 1999.
See: http://www.w3.org/TR/REC-MathML

A.2. Informative References

[CATALOG]: Entity Management: OASIS Technical Resolution 9401:1997 (Amendment 2 to TR 9401) Paul Grosso, Chair, Entity Management Subcommittee, SGML Open, 10 September 1997.
See: http://www.oasis-open.org/html/a401.htm
[DEVDTD]: Developing SGML DTDs: From Text to Model to Markup, Eve Maler and Jeanne El Andaloussi.
Prentice Hall PTR, 1996, ISBN 0-13-309881-8.
[DOCBOOK]: DocBook DTD, Eve Maler and Terry Allen.
Originally created under the auspices of the Davenport Group, DocBook is now maintained by OASIS. The Customizer's Guide for the DocBook DTD V2.4.1 is available from this site.
See: http://www.oasis-open.org/docbook/index.html
[DUBLIN]: The Dublin Core: A Simple Content Description Model for Electronic Resources, The Dublin Core Metadata Initiative.
See: http://purl.oclc.org/dc/
[HTML32]: HTML 3.2 Reference Specification: W3C Recommendation, Dave Raggett, 14 January 1997.
See: http://www.w3.org/TR/REC-html32
[ISO-HTML]: ISO/IEC 15445:1998 HyperText Markup Language (HTML), David M. Abrahamson and Roger Price.
See: http://dmsl.cs.uml.edu/15445/FinalCD.html
[RDF]: Resource Description Framework (RDF): Model and Syntax Specification, Ora Lassila and Ralph R. Swick, 19 August 1998.
See: http://www.w3.org/TR/PR-rdf-syntax
[SMIL]: Synchronized Multimedia Integration Language (SMIL) 1.0 Specification, Philipp Hoschka, 15 June 1998.
See: http://www.w3.org/TR/REC-smil
[STRUCTXML]: Structuring XML Documents, David Megginson. Part of the Charles Goldfarb Series on Information Management.
Prentice Hall PTR, 1998, ISBN 0-13-642299-3.
[SGML-XML]: Comparison of SGML and XML: W3C Note, James Clark, 15 December 1997.
See: http://www.w3.org/TR/NOTE-sgml-xml-971215
[XLINK]: XML Linking Language (XLink): W3C Working Draft, Eve Maler and Steve DeRose, 3 March 1998.
A new XLink requirements document is expected soon, followed by a working draft update.
See: http://www.w3.org/TR/WD-xlink
[TEI]: The Text Encoding Initiative (TEI)
See: http://www.uic.edu/orgs/tei/
[URI]: Uniform Resource Identifiers (URI): Generic Syntax, T. Berners-Lee, R. Fielding, L. Masinter, August 1998.
See: http://www.ietf.org/rfc/rfc2396.txt. This RFC updates RFC >1738 [URL] and [RFC1808].
[URL]: IETF RFC 1738, Uniform Resource Locators (URL), T. Berners-Lee, L. Masinter, M. McCahill.
See: http://www.ietf.org/rfc/rfc1738.txt
[RFC-1808]: Relative Uniform Resource Locators, R. Fielding.
See: http://www.ietf.org/rfc/rfc1808.txt
[CC/PP]: "Composite Capability/Preference Profiles (CC/PP): A user side framework for content negotiation", F. Reynolds, J. Hjelm, S. Dawkins, S. Singhal, 30 November 1998.
This document describes a method for using the Resource Description Format (RDF) to create a general, yet extensible framework for describing user preferences and device capabilities. Servers can exploit this to customize the service or content provided.
Available at: http://www.w3.org/TR/NOTE-CCPP
[CSS2]: "Cascading Style Sheets, level 2 (CSS2) Specification", B. Bos, H. W. Lie, C. Lilley, I. Jacobs, 12 May 1998.
Available at: http://www.w3.org/TR/REC-CSS2
[DOM]: "Document Object Model (DOM) Level 1 Specification", Lauren Wood et al., 1 October 1998.
Available at: http://www.w3.org/TR/REC-DOM-Level-1
[ERRATA]: "HTML 4.0 Specification Errata".
This document lists the errata for the HTML 4.0 specification.
Available at: http://www.w3.org/MarkUp/html40-updates/REC-html40-19980424-errata.html
[HTML]: "HTML 4.0 Specification", D. Raggett, A. Le Hors, I. Jacobs, 18 December 1997, revised 24 April 1998.
Available at: http://www.w3.org/TR/REC-html40
[POSIX.1]: "ISO/IEC 9945-1:1990 Information Technology - Portable Operating System Interface (POSIX) - Part 1: System Application Program Interface (API) [C Language]", Institute of Electrical and Electronics Engineers, Inc, 1990.
[RFC2119]: "RFC2119: Key words for use in RFCs to Indicate Requirement Levels", S. Bradner, March 1997.
Available at: http://www.ietf.org/rfc/rfc2119.txt
[RFC2376]: "RFC2376: XML Media Types", E. Whitehead, M. Murata, July 1998.
Available at: http://www.ietf.org/rfc/rfc2376.txt
[RFC2396]: "RFC2396: Uniform Resource Identifiers (URI): Generic Syntax", T. Berners-Lee, L. Masinter, August 1998.
This document updates RFC1738 and RFC1808.
Available at: http://www.ietf.org/rfc/rfc2396.txt
[TIDY]: "HTML Tidy" is a tool for detecting and correcting a wide range of markup errors prevalent in HTML. It can also be used as a tool for converting existing HTML content to be well formed XML. Tidy is being made available on the same terms as other W3C sample code, i.e. free for any purpose, and entirely at your own risk.
It is available from: http://www.w3.org/Status.html#TIDY
[XMLNAMES]: "Namespaces in XML", T. Bray, D. Hollander, A. Layman, 14 January 1999.
XML namespaces provide a simple method for qualifying names used in XML documents by associating them with namespaces identified by URI.
Available at: http://www.w3.org/TR/REC-xml-names
[XMLSTYLE]: "Associating stylesheets with XML documents Version 1.0", J. Clark, 14 January 1999.
This document describes a means for a stylesheet to be associated with an XML document by including one or more processing instructions with a target of xml-stylesheet in the document's prolog.
Available at: http://www.w3.org/TR/PR-xml-stylesheet
[FRMWRK]: "Protocol-independent content negotiation framework", Klyne G., 16 February 1999.
See: http://www.ietf.org/internet-drafts/draft-ietf-conneg-requirements-02.txt
[SYNTAX]: "A syntax for describing media feature sets", Klyne G., 14 December 1998.
See http://www.ietf.org/internet-drafts/draft-ietf-conneg-feature-syntax-04.txt
[XMLMOD]: "XML Modularization of HTML 4.0", M. Altheim, Sun Microsystems, 2 February 1999
See http://www.altheim.com/specs/xhtml/NOTE-xhtml-modular.html
[REC FRAG]: XML Fragment Interchange - Working Draft Paul Grosso, et. al., 3 March, 1999
See: http://www.w3.org/TR/WD-xml-fragment
[TC9601]: SGML standard Technical Corrigendum 9601 Paul Grosso??
See:
[MSIE5]: Microsoft Internet Explorer Version 5.0
See: http://www.microsoft.com/windows/ie/ie5/default.asp
[XSCHEMA]: XML Schema Requirements - W3C Note Ashok Malhotra, et. al., 15 February 1999
See: http://www.w3.org/TR/1999/NOTE-xml-schema-req-19990215

Building XHTML™ Modules

W3C Working Draft 10 September 1999

Abstract

Status of this document

Quick Table of Contents

Full Table of Contents

1. Introduction

1.1. Why Build XHTML Modules?

1.2. Abstract Modules

1.3. XML DTD Modules

2. Terms and Definitions

3. Conformance Definition

3.1. Module Conformance

3.2. Naming Rules

3.2.1. Rationale for Naming Rules

4. Defining Abstract Modules

4.1. Syntactic Conventions

4.2. Content Types

4.3. Attribute Types

4.4. An Example Abstract Module Definition

4.4.1. XHTML Skiing Module

5. XML DTD Modules

5.1. Implementing Document Model Modules in the DTD

5.1.1. Parameterization

5.1.2. Modularization

5. Developing DTDs with defined and extended modules

5.1. Defining additional attributes

5.2. Defining additional elements

5.3. Defining the content model for a collection of modules

5.3.1. Integrating a stand-alone module into XHTML

5.3.2. Mixing a new module throughout the modules in XHTML

5.4. Creating a new DTD

5.4.1. Creating a simple DTD

5.4.2. Creating a DTD by extending XHTML

5.4.3. Creating a DTD by removing and replacing XHTML modules

5.5. Using the new DTD

A. References

A.1. Normative References

A.2. Informative References

Building XHTML^™ Modules