Copyright © 1999 W3C (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This working draft defines the mechanism for defining markup language modules that are compatible with the modularization framework used by XHTML. This includes a definition of the way in which an abstract module is specified, the way in which this abstraction is mapped into an XML DTD, and the way in which the resulting DTD module can be combined with other XHTML DTD modules to create new markup languages. In the future, it is expected that instructions will also be provided for mapping the abstract specifications into an XML Schema. Note that the materials in this document were formerly part of the Modularization of XHTML document, but have been separated out for editorial purposes.
This document is a working draft of the W3C's HTML Working Group. It is being released for public review, discussion, and comment. This working draft may be updated, replaced or rendered obsolete by other W3C documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". This document is work in progress and does not imply endorsement by the W3C membership.
This document has been produced as part of the W3C HTML Activity. The goals of the HTML Working Group (members only) are discussed in the HTML Working Group charter (members only).
Please send detailed comments on this document to [email protected]. We cannot guarantee a personal response, but we will try when it is appropriate. Public discussion on HTML features takes place on the mailing list [email protected].
Note that, as this is a work in progress, text is frequently added to the document that has not yet been reviewed by the working group. Such text is marked using the class "new", with an appearance like this. Text delimited with class "new" is submitted for consideration by the working group, and necessarily has a lesser status than the text in the remainder of the document. [This technique was agreed at the 5 May 1999 teleconference of the working group.]
This section is normative.
XHTML is more than just a recasting of HTML into XML. It is also an extensible architecture that permits the ready definition of new document types. The W3C envisions that client manufacturers, document authors, and content providers may all use this architecture to define document types that are specific to their needs. The XHTML Modularization specification defines a collection of modules and a framework that make the definition of these new document types relatively easy.
That architecture by itself may not be sufficient for the needs of all document type creators. In particular, people who are defining new functionality or combining new functionality with existing elements need a way to define that functionality. The XHTML method for doing this is through the definition of an XHTML module.
XHTML modules define elements and their attributes, add attributes to elements defined in other modules, add values to the set of values available to an attribute defined in other modules, define content models, or some combination of these things. The expression of a module is done through the creation of a prose functional description of the module, an abstract definition of the module's contents, and then one or more implementations of the module. The remainder of this document defines the way in which these steps should be conducted.
An XHTML document type is defined as a set of modules. Each XHTML module has an abstract definition that generally indicates the facilities made available through the module and way those facilities are minimally integrated with each other and with an (eventual) document type.
An XML DTD module consists of a set of element types, a set of attribute list declarations, and a set of content model declarations, where any of these three sets may be empty. An attribute list declaration in an XML DTD module may modify an element type outside the element types in the module, and a content model declaration may modify an element type outside the element type set.
This section is informative.
While some terms are defined in place, the following definitions are used throughout this document. Familiarity with the W3C XML 1.0 Recommendation [XML] is highly recommended.
This section is normative.
In order to ensure that XHTML modules are maximally portable, this specification rigidly defines conformance requirements. While the conformance definitions can be found in this section, they necessarily reference normative text within this document, within the base XHTML specification [XHTML1], and within other related specifications. It is only possible to fully comprehend the conformance requirements of XHTML through a complete reading of all normative references.
This specification defines a method for defining XHTML-conforming modules. A module conforms to this specification when it meets all of the following criteria:
Note: There really should be a defined restriction on the
reuse of elements that are specified in other W3C-defined
modules. However, this restriction should be moderated such
that, for example, the li
element can be reused
under a new list-style element (e.g. mylist
).
Names for XHTML-conforming document types must adhere to
strict naming conventions so that it is possible for software
and users to readily determine the relationship of document
types to XHTML. The names for modules are defined through XML
Formal Public Identifiers (FPIs). Within FPIs, fields are
separated by double slash character sequences
(//
). The various fields MUST be composed as
follows:
-
". For formal standards, this
field MUST be the formal reference to the standard (e.g.
ISO/IEC 15445:1999
).
W3C
.
ELEMENTS
XHTML-
followed by an organization-defined unique
identifier (e.g. MyML 1.0). This identifier is SHOULD be
composed of a unique name and a version identifier that can
be updated as the document type evolves.
EN
).
Using these rules, the name for an XHTML conforming module
might be -//MyCompany//ELEMENTS XHTML-MyModule
1.0//EN
.
Naming Rules are critical for portability of user agents and XHTML-conforming tools. These rules need to be simple enough that they can be readily adhered to, and need to convey upon document type and module designers the power to readily associate their creations with XHTML (for marketing purposes, if nothing else). The above rules address these concerns. There were some other possibilities for naming conventions, and they were not used for the following reasons:
In the case of new modules, there is no need to associate the module with a specific version of XHTML - the name does not need to identify version dependencies.
This section is normative.
An Abstract Module is a definition of an XHTML module using prose text and some informal markup conventions. While such a definition is not generally useful in the machine processing of document types, it is critical in helping people understand what is contained in a module. This section defines the way in which XHTML abstract modules are defined. An XHTML conforming module is not required to provide an abstract module. However, anyone developing an XHTML module is encouraged to provide an abstraction to ease in the use of that module.
The abstract modules are not defined in a formal grammar. However, the definitions do adhere to the following syntactic conventions. These conventions are similar to those of XML DTDs, and should be familiar to XML DTD authors. Each discrete syntactic element can be combined with others to make more complex expressions that conform to the algebra defined here.
expr ?
expr +
expr *
a , b
a
is required, followed by
expression b
.
a | b
a - b
&
).
|
), inside of parentheses following the
attribute name.
Abstract module definitions define minimal, atomic content models for each module. These minimal content models reference the elements in the module itself. They may also reference elements in other modules upon which the abstract module depends. Finally, the content model in many cases requires that text be permitted as content to one or more elements. In these cases, the symbol used for text is PCDATA. This is a term, defined in the XML 1.0 Recommendation, that refers to processed character data. A content type can also be defined as EMPTY, meaning the element has no content in its minimal content model.
In some instances, it is necessary to define the types of attribute values or the explicit set of permitted values for attributes. The following attribute types (defined in the XML 1.0 Recommendation) are used in the definitions of the Abstract Modules:
Attribute Type | Definition |
---|---|
CDATA | Character data |
ID | A document-unique identifier |
IDREF | A reference to a document-unique identifier |
NAME | A name with the same character constraints as ID above |
NMTOKEN | A name composed of CDATA characters but no whitespace |
NMTOKENS | Multiple names composed of CDATA characters separated by whitespace |
PCDATA | Processed character data |
This section defines a sample abstract module as an example of how to take advantage of the syntax rules defined above. Since this exampple is trying to use all of the various syntactic elements defined, it is pretty complicated. Typical module defintions would be much simpler than this. Finally, note that this module references the attribute collection Common. This is a collection defined in the XHTML Modularization specification that includes all of the basic attributes that most elements need.
The XHTML Skiing Module defines markup used when describing aspects of a ski lodge. The elements and attributes defined in this module are:
Elements | Attributes | Minimal Content Model |
---|---|---|
resort | Common, href (CDATA) | description , Aspen+ |
lodge | Common | description, (Aspen - lift)+ |
lift | Common, href | description? |
chalet | Common, href | description? |
room | Common, href | description? |
lobby | Common, href | description? |
fireplace | Common, href | description? |
description | Common | PCDATA* |
This module also defines the content set Aspen with the minimal content model lodge | lift | chalet | room | lobby.
This section is normative.
Partitioning of the document model occurs at the abstract module level. This partitioning is implemented in the markup model by two primary methods: parameterization, the use of parameter entities as reusable strings, and modularization, the creation of DTD fragments called modules.
This specification classifies parameter entities into six categories and names them consistently using the following suffixes:
.mod
when
they are used to represent a DTD module (a collection of
element classes). In this specification, each module is an
atomic unit and may be represented as a separate file
entity.
.module
when
they are used to control the inclusion of a DTD module by
containing either of the conditional section keywords
INCLUDE or IGNORE.
.content
when they are used to represent the content model of an
element type.
.class
when
they are used to represent elements of the same class.
.mix
when
they are used to represent a collection of element types
from different classes.
.attrib
when
they are used to represent a group of tokens representing
one or more complete attribute specifications within an
ATTLIST declaration.
For example, in HTML 4.0, the %block; parameter entity is defined to represent the heterogenous collection of element types that are block-level elements. In this specification, the corollary parameter entity is %Block.mix;.
DTD modules are often used to encompass the markup declarations of a specific semantic component or "feature", from higher-level document features like tables and forms, to lower-level components such as specific elements or element groups. Modules can even contain modules, creating a hierarchical structure mirroring the document model. Note that modules are not always implemented as separate file entities, and modular DTDs can be easily normalized into single file versions for more efficient distribution over the Web.
The relationship between document model components and how they are implemented in markup as modules, entities and files (i.e., the granularity of the parameterization or modularization, how the markup model is structured and stored as separate entities, etc.) is not necessarily direct, as design style and implementation issues properly play a part. Higher-level modules are sometimes delivered as individual file entities to facilitate portability and reusability. To promote interoperability, the XHTML DTD design considers each module as atomic, with the notion that implementations should support the semantics of an entire module without further subdivision.
While the notion of "plug and play" with DTD modules is very attractive, in practice this is not quite so simple. Complex document models often resort to extensive parameterization of abstract modules to facilitate understanding, markup reuse, extensibility, and maintenance. The resultant modules may have have many interdependencies, and may require a fair amount of "rewiring" when adding or removing a DTD module. In light of this, a compromise must be made between markup flexibility, complexity of the DTD, and ease of maintainability.
The XHTML DTD attempts to ameliorate this by localizing many of the more "global" parameter entities to several modules that are declared early in the DTD. These are labelled common modules, and include declarations for common names, attributes, parameter and character entities.
XHTML elements are classified into the following categories:
This section is informative.
The primary purpose of defining XHTML modules and a general modularization methodology is to ease the development of document types that are based upon XHTML. These document types may extend XHTML by integrating additional capabilities (e.g. [SMIL] or [MathML]), or they may define a subset of XHTML for use in a specialized device. Regardless of the application, XHTML modules are up to the task. This section describes the techniques that document type designers must use in order to take advantage of this modularization architecture. It does this by applying the techniques defined in the previous sections in progressively more complex ways, culminating in the creation of a complete document type from disparate modules.
Note that in no case do these examples require the modification of the XHTML-provided module files themselves. The XHTML module files are completely parameterized, so that it is possible through separate module definitions and driver files to customize the definition and the content model of each element and each element's hierarchy.
Finally, remember that most users of XHTML are not expected to be DTD authors. DTD authors are generally people who are defining specialized markup that will improve the readability, simplify the rendering of a document, or ease machine-processing of documents, or they are client designers that need to define the specialized DTD for their specific client. Consider these cases:
In some cases, an extension to XHTML can be as simple as additional attributes. Attributes can be added to an element just by specifying an additional ATTLIST for the element, for example:
<!ATTLIST a myattr CDATA #IMPLIED >
would add the "myattr" attribute, with a value type of CDATA, to the "a" element. This works because XML permits the extension of the attribute list for an element at any point in a DTD.
Naturally, adding an attribute to a DTD does not mean that any new behavior is defined for arbitrary clients. However, a content developer could use an extra attribute to store information that is accessed by associated scripts via the Document Object Model (for example).
Defining additional elements is only slightly more complicated than defining additional attributes. Basically, DTD authors should write the element declaration for each element:
<!ELEMENT myelement ( #CDATA | myotherelement )* > <!ATTLIST myelement myattribute CDATA #IMPLIED > <!ELEMENT myotherelement EMPTY >
After the elements are defined, they need to be integrated into the content model. Strategies for integrating new elements or sets of elements into the content model are addressed in Defining the content model for a collection of modules below.
Since the content model of XHTML modules is fully parameterized, DTD authors may modify the content model for every element in every module. The details of the DTD module interface are defined in XML DTD Modules. However, basically there are two ways to approach this modification:
The strategy taken will depend upon the nature of the modules being combined and the nature of the elements being integrated. The remainder of this section describes techniques for integrating two different classes of modules.
When a module (and remember, a module can be a collection of other modules) contains elements that only reference each other in their content model, it is said to be "internally complete". As such, the module can be used on its own (for example, you could define a DTD that was just that module, and use one of its elements as the root element). Integrating such a module into XHTML is a three step process:
Consider attaching the elements defined above. In that example, the element myelement is the root. To attach this element under the object element, and only the object element, of XHTML, the following would work:
<!ENTITY % Object.content "( % Flow.mix | param | myelement )*">
A DTD defined with this content model would allow a document like the following fragment:
<object data="..."> <p>The object didn't load!</p> <myelement>This is content of a locally defined element</myelement> </object>
Extending the example above, to attach this module everywhere that the %Flow.mix content model group is permitted, would require something like the following:
<!ENTITY % Misc.class "ins | del | script | noscript | myelement" >
Since the %Misc.class content model class is used throughout the XHTML Modules, the new module would become available throughout an extended XHTML document type.
So far the examples in this section have described the methods of extending XHTML and XHTML's content model. Once this is done, the next step is to collect the modules that comprise the DTD into a single DTD driver, incorporating the new definitions so that they override and augment the basic XHTML definitions as appropriate.
When defining a new DTD, it is essential that each DTD have a
unique identifier to use in the xmlns attribute of the root
element (usually the html
element). This
identifier is often a URI, but in any event is something that
can be used by user agents to differentiate the DTD from
others. This identifier is defined using the
XHTML1.ns
parameter entity when creating a DTD that
uses the XHTML1 structure module.
Using the trivial example above, it is possible to define a new DTD that extends the XHTML Transitional DTD pretty easily. The following is a complete, working extended DTD:
<!ENTITY % XHTML1.ns "http://my.company.com/DTDs/example.dtd" > <!ELEMENT myelement ( #PCDATA | myotherelement )* > <!ATTLIST myelement myattribute CDATA #IMPLIED > <!ELEMENT myotherelement EMPTY > <!ENTITY % Misc.class "ins | del | script | noscript | myelement" > <!ENTITY % XHTML1-t.dtd PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/DTDs/XHTML1/XHTML1-t.dtd"> %XHTML1-t.dtd;
Next, there is the situation where a complete, additional, and complex module is added to XHTML (or to a subset of XHTML). In essence, this is the same as in the trivial example above, the only difference being that the module being added is incorporated in the DTD by reference rather than explicitly including the new definitions in the DTD.
One such complex module is the DTD for [MathML]. In order to combine MathML and XHTML into a single DTD, an author would just decide where MathML content should be legal in the document, and add the MathML root element to the content model at that point:
<!ENTITY % XHTML1.ns "http://www.w3.org/DTDs/XHTML1_plus_MathML.dtd" > <!ENTITY % XHTML1-math PUBLIC "-//W3C//MathML 1.0//EN" "http://www.w3.org/DTDs/MathML/MathML1.dtd" > %XHTML1-math; <!ENTITY % Inlspecial.class "a | img | object | map | math" > <!ENTITY % XHTML1-strict PUBLIC "-//W3C//XHTML 1.0 Strict//EN" "http://www.w3.org/DTDs/XHTML/XHTML1-s.dtd" > %XHTML1-strict;
Note that, while this is a valid example, it does not create a working DTD at this time. The reason for this is that the MathML DTD defines two elements (var and select) that conflict directly with XHTML. This conflict needs to be resolved in order for the new DTD to work correctly.
Finally, another way in which DTD authors may use XHTML modules is to define a DTD that is a subset of XHTML (because, for example, they are building devices or software that only supports a subset of XHTML). Doing this is only slightly more complex than the previous example. The basic steps to follow are:
For example, consider a device that supports the Strict XHTML 1.0, but without forms or tables. The DTD for such a device would look like this:
<!ENTITY % XHTML1.ns "http://www.w3.org/DTDs/XHTML1_simple.dtd" > <!ENTITY % XHTML1-form.module "IGNORE" > <!ENTITY % XHTML1-table.module "IGNORE" > <!ENTITY % XHTML1-strict PUBLIC "-//W3C//XHTML 1.0 Strict//EN" "http://www.w3.org/DTDs/XHTML/XHTML1-s.dtd" > %XHTML1-strict;
Note that this does not actually modify the content model for the Strict XHTML 1.0 DTD. However, since XML ignores elements in content models that are not defined, the form and table elements are dropped from the model automatically.
Once a new DTD has been developed, it can be used in any document. Using the DTD is as simple as just referencing it in the DOCTYPE declaration of a document:
<!DOCTYPE html PUBLIC "-//MyOrg//DTD My XHTML Extensions//EN" "http://www.myorg.com/DTDs/myorg.dtd"> <html xmlns="http://www.myorg.com/DTDs/myorg.dtd"> <head> <title>MyOrg Document</title> </head> <body> <p>This is an example document using the new elements: <myelement>A test element <myotherelement /> </myelement> </p> </body> </html>
This appendix is normative.