W3C

XHTML Document Profile Requirements

Document profiles - a basis for interoperability guarantees

W3C Working Draft 6th September 1999

This version:
http://www.w3.org/TR/1999/WD-xhtml-prof-req-19990906
Latest version:
http://www.w3.org/TR/xhtml-prof-req
Previous versions:
http://www.w3.org/MarkUp/Group/1999/xhtml-prof-reqs-19990730/ (W3C Members only)
Editors:
Dave Raggett <[email protected]>
Peter Stark <[email protected]>
Ted Wugofski <[email protected]>

Abstract

The increasing disparities between the capabilities of different kinds of Web user agents present challenges to Web content developers wishing to reach a wide audience. A promising approach is to formally describe profiles for documents intended for broad groups of user agents, for instance, separate document profiles for user agents running on desktops, television, handhelds, cellphones and voice user agents. Document profiles provide a basis for interoperability guarantees. If an author develops content for a given profile and a user agent supports the profile then the author may be confident that the document will be rendered as expected. The requirements for document profiles are analyzed.

Status of this document

This is a public W3C Working Draft for review by W3C members and other interested parties. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current public W3C working drafts can be found at http://www.w3.org/TR.

This document has been produced as part of the W3C HTML Activity, but should not be taken as evidence of consensus in the HTML Working Group. The goals of the HTML Working Group (members only) are discussed in the HTML Working Group charter (members only).

Please send detailed comments on this document to [email protected]. We cannot guarantee a personal response, but we will try when it is appropriate. Public discussion on HTML features takes place on the mailing list [email protected].

Table of Contents

1. Introduction

As vendors introduce new versions of user agents, content developers have had to get to grips with a variety of differences between user agents from the same vendor as well as the larger differences between vendors. The World Wide Web Consortium plays a stabilizing role through the development of standards, for instance, HTML 3.2, HTML 4.0 and more recently XHTML 1.0. Standards act as beacons for vendors, lighting the path towards greater interoperability. Investment in existing code together with the desire to innovate and differentiate act as pressures to veer away from the light. The end result is that content developers will be forced to continue to cope with variations.

The range of user agent platforms is rapidly expanding to include television sets, handheld organizers, cell phones, in-car systems and regular phones. Each of these platforms presents different capabilities. Viewing distances for television sets are much greater than for desktop or notebook computers, reducing the legibility of text. In addition saturated colors tend to bleed and need to be avoided. Handheld devices have reduced resolution and limited color capabilities. Cell phones are even more limited with display resolutions of as little as 4 lines of 12 characters. Voice user agents substitute speech recognition for keyboards, using synthetic or pre-recorded speech for output.

Users are likely to expect to be able to access Web services from wherever they are and at any time - from home, on the move or in the office. This will encourage content developers to reach out to as wide an audience as possible, on as many kinds of user agents as practical. One approach is to develop content separately for each of the dominant platforms. Another is to develop content in a way that lends itself to automatic transformation to specific platforms. This approach emphasizes the separation of content, structure and style. The transformation can be done at authoring-time, by the originating server, proxy server or in the user agent itself as appropriate.

Document profiles offer a means to characterize the features appropriate to given categories of user agents. For instance, one profile might include support for style sheets, vector graphics and scripting, while another might be restricted to the tags in HTML 3.2. Document profiles can be used by servers to select between document variants developed for different user agent categories. They can be used to determine what transformations to apply when such variants are not available. Content developers can use document profiles to ensure that their web sites will be rendered as intended.

2. Framework for Content Negotiation

Web content developers can manage differences in user agent capabilities by developing several variants of their content, where each variant is tuned to the capabilities of a particular class of user agents. If the server is able to obtain information about the capabilities of a given user agent, then the server can select the variant of the Web content most appropriate to that user agent. If a suitable variant isn't available it may be practical to apply a transformation to generate one.

diagram showing idea of how client capabilities are used by the server to select or transform content based upon document profiles

In the figure above, the client is either a user agent or a proxy server. The client's capabilities describe hardware and software constraints, such as display size, multimedia support, support for scripting and style sheets etc. The client's preferences reflect user settings. The client transmits these to the server. The W3C Note on CC/PP describes ways in which this can be realized. Additional information is available from the IETF content negotiation working group [CONNEG].

The server has access to the content. The request from the client includes a name that the server can use to identify one or more documents. Each document is associated with a document profile. The server compares the client capabilities to the document profiles to find a good match or to identify which document to use as a basis for transformation to meet the client's request.

diagram showing transformations

This framework is consistent with a broadcast scenario in which there is a uni-directional link between the transmitter and receiver.

Diagram of flow in one-way devices

In a one-way scenario, a proxy server at the transmitter may model the capabilities of the receivers and negotiate on the receiver's behalf. In this scenario, the transmitter uses the appropriate descriptions of the receiver capabilities in its requests to the Web sites. As in the previous case, the transmitter may transform the content to meet the capabilities of the receiver or this may occur at the content server.

This capability is particularly useful in television and mobile environments in which a return channel from the client is either not available or desirable.

Further work is needed in four areas:

Note: Preferences for modules and attribute values etc. need to be treated as part of formalization of client capabilities and personal preferences, rather the document profiles which provide a declarative description of a group of documents.

3. Requirements for Document Profiles

This section identifies a preliminary list of requirements for further work on document profiles as distinct from client capabilities and personal preferences. Please note that this is work in progress and should not be taken as evidence of consensus in the HTML working group. No significance should be attached to the order in which the requirements are listed.

The requirements are partitioned into two categories: (1) functional requirements, and (2) design requirements. Functional requirements represent the needs of the content authoring community that will use the profiling solution. These needs are fundamentally independent of how that solution is implemented. Design requirements represent the needs of the tools community that will implement the profiling solution (or parts thereof). These needs are may drive requirements that go beyond the solution at hand, including requirements related to how the profiling solution integrates with other problems in this problem space.

Note: This document follows the decision by the HTML working group to deliberately omit consideration of how these requirements can be met. Proposed solutions will be discussed in a separate draft.

3.1. Functional Requirements

Functional requirements correspond to the capabilities of document profile system, and not how the solution is realized. These requirements typically related to what features can be found in the document profiles.

3.1.1. Content developers shall have a simple means of associating a document profile with a document

There shall be a means for content developers to associate a document with a document profile without having to specify the features of that document profile.

A possible solution to this requirement is to allow content developers to associate a name or location of the document profile.

3.1.2. Content developers shall have a simple means of describing a document profile in terms of features found in that profile

Content developers may wish to provide more detail than a simple document profile name. This is important if the name of their document profile is not readily recognized (if it is new, for example) or if their document profile is a variant of a recognized document profile.

Document profile names must be unique.

Issue (what's a feature): We should provide a definition of a feature

The most obvious idea is to describe profiles as a list of feature names, for instance, the names of modules, image formats, scripting, and style sheet languages. Specific requirements for some of these are spelled out below.

The semantics of features may be defined by binding them to existing standards, or via additional information supplied as part of an extended profile definition, or accessible via registries.

In practice, in a flat name space, the number of names can get out of hand. As a result, it is desirable to introduce a hierarchical structure into feature names where each level in the hierarchy scopes the name space for the next level of detail. One possible approach for this is described in [RFC2506].

3.1.3. Content developers shall have a means of describing exceptions to the general case (non-critical)

If profile features are described in purely atomic terms it may be necessary to decompose the features into a lot of subsidiary features. The ability to specify exceptions makes it practical to use a higher level description together with the exceptions, leading to much shorter descriptions. These exceptions may be additive or subtractive.

The draft modularization proposed for XHTML includes over 20 modules. If you wanted to define a profile omitting just one of these modules, an additive description would involve some 20 modules, compared with 2 for the case where you can state exceptions.

Exceptions can take several forms, for instance, the ability to add a new element, or to add a new attribute to a given element, or a new attribute value to a given attribute. When exceptions act to override some property, that is to take away an element, attribute or value, things become more complex.

One possible approach is to provide an algebra for adding and subtracting modules as a basis for describing document profiles, where the document syntax is defined by reference to a DTD or XML schema specifying the combined effect of the modules. This approach delegates the effort of combining the DTDs or schemas for each module to the author of the profile.

Note: The ability to deal with inheritance and set subtraction could provide the basis for formalizing the effects of module algebra on document syntax. Dave Raggett has studied ways to achieve this using assertion grammars.

3.1.4. Content developers shall have a means of describing alternative features in a document profile

A document profile might specify that image/gif or image/png support is required, but it is not necessary for the client to support both features. There are several features within the HTML, SMIL, and CSS that provide a means for content developers to specify alternative content: nesting object elements,the switch element, and @media types. This means that a document profile may indicate that it requires one feature or another feature, but it is not necessary for the client to support both features.

3.1.5. Content developers shall have a means of expressing constraints on linked data formats

The need to say which image formats authors can rely on, e.g. image/gif, image/jpeg, and image/png. In addition, the profile may need to express constraints on whether animated gifs are allowed, what optional features of image/png are supported, and the maximum file size permitted (e.g. < 20K).

3.1.6. Content developers shall have a means of expressing detailed interoperability constraints for scripts and style sheets

The variations in support for scripting and style sheets across user agents and platforms cause real problems for content developers. Document profiles need to be able to specify constraints on the scripting language and interfaces, and on which style properties and values are supported on what elements. The capability to express these constraints would make it easier to develop transformation tools. For instance, allowing content developers to create content using a clean set of style properties with automatic transformation into markup tuned to the vagaries of particular user agents.

Note: Work on the DOM and on modularizing CSS will go some of the way to alleviate the problem this requirement addresses, but the weak conformance requirements for CSS and scripting (as a whole) means that this problem won't go away altogether.

3.1.7. Content developers shall have a means of expressing expectations of user agent capabilities

A document profile may make certain assumptions about user agent capabilities. By expressing these in the profile, servers can use information about user agent capabilities as a basis for selecting content with matching assumptions.

Examples include assumptions about display resolution, and multimedia support; support for Java and 3D graphics; support for particular character sets, e.g. for Kanji, and support for cookies. These requirements need to be expressed in the same vocabulary used to describe user agent capabilities, e.g. see W3C's work on Client Capabilities and Personal Preferences [CC/PP].

3.1.8. Content authors shall have a means of defining profiles for documents with controlled extensibility as will be permitted by the XML Schema language

Sometimes it is impractical to provide a closed definition for a group of documents. For instance, where the set of elements representing business procedures is evolving over time, it may be impractical to specify a frozen set of elements. The document schema needs to be able to indicate where the content model or attributes are open-ended and in what way. This requirement is needed to cater for situations where you have partial knowledge, but can also be exploited as a mechanism for dealing with forward compatibility.

Issue (Examples from Dave H.): Dave Hollander (co-chair of the XML Schema working group) has promised some real-world examples for this.

3.1.9. Content authors shall have a means of expressing distribution rights based on user agent support for features of the profile

A related point is the requirement to give authors control over how user agents treat unknown elements and attributes. For instance, should the user agent attempt to render the content of an unknown element or not. This situation arises when a server gets a request for a document from a user agent for which the server can't provide a version of the document with a matching profile.

The server can choose either to fail the request or to deliver a document which the user agent may not be able to fully render. The content author wishes to have some assurance in such cases over how the user agent treats elements it doesn't understand. Consistent treatment of this is essential to controlled evolution of the Web. Without such a mechanism, content developers may feel unable to deploy new features.

Note: The HTML working group spent considerable time on this issue at the Amsterdam meeting in May. W3C's SMIL specification includes support for this feature.

3.1.10. Content developers shall have a means of expressing how long the document profile may be cached

If software agents have to download profiles each time they get a request to for a document, the response time will suffer. As as result, agents will want to cache profiles. The requirement is for an ability to specify an expiry date/time after which the cached copy must be refreshed. In HTML documents, authors can specify this via the meta element. If profiles are specified in other than HTML, then another mechanism is needed to meet this requirement.

3.1.11. Content developers shall have a means of expressing attribution and copyright information

Additional information about who has defined the profile and when.

3.1.12. Content developers shall have a means of expressing required protocols

Content developers need to be able to specify if using a document (or executing a script or resource associated with the document) requires the use of a particular protocol (such as an SSL connection).

3.2. Design Requirements

Design requirements correspond to how the solution is implemented and how it will be used, independent of the features found within the document profile solution.

3.2.1. The design shall support lightweight testing of two profiles for equality

This is needed to support efficient run-time selection of documents belonging to a given profile.

3.2.2. The design shall support lightweight testing of a client's capabilities and preferences against a document's profile.

To provide support for matching a client's capabilities and preferences with the profiles of variant documents.

3.2.3. The design shall support machine readable profiles

This is needed so that servers can autonomously perform content selection in response to a request. In addition, this is needed to support transformation agents.

3.2.4. The design shall specify document syntax by reference to external definitions

This requirement is needed to decouple work on profiles from that on document syntax. This will allow W3C to develop a specification for XHTML document profiles independently of work on XML Schemas. It will enable profiles to use XML 1.0 Document Type Definitions or XML Schemas. The expectation is that over time XML Schemas will supplant DTDs.

3.2.5. The design shall support formal verification that a given document conforms to a profile

This is needed when content developers are unsure of themselves or their tools. It necessitates a formal definition of the profile that can be used to automate the testing of whether or not the document conforms to the profile. This can be reduced to verifying that the document conforms to syntax and semantic constraints defined by the profile.

3.2.6. The design shall support multiple XML name spaces

Document profiles need to be able to describe documents including elements from more than one name space, and the means to verify that such documents conform the the profile. It is strongly desirable that any such solution does not prescribe the prefixes used for each name space.

3.2.7. The design shall support a human readable description of the profile

Which can be shown to content developers to aid their understanding of the purpose of the profile and appropriate ways to create content for it.

3.2.8. The design shall support reference to specifications and documentation defining a document type for the profiled documents

A common means to express the name and/or location of available specifications or documentation about the document type being profiled. This is similar to 'Human readable description of the profile' but is called out as a separate requirement to ensure that links to specs and documentation are treated in a very consistent and predictable manner.

3.2.9. The design shall use XML or RDF

The idea here is to avoid creating radically new formats by constraining profiles to be expressed in XML or RDF.

The work on client capabilities and personal preferences is represented in RDF, making it attractive to consider use RDF for document profiles [CC/PP].

The way profile information is structured in the profile document should be such that the effort is minimized to map between a profile document and a content negotiation session using the same data. This applies to both the features in the profile and the algebra combining/negotiating them [CONNEG].

3.2.10. The design shall support a uniform way in which to extend profiles

It should be easy to extend profiles with new kinds of constraints as the need for these emerges.

3.2.11. The design shall support a means of specifying document profile information inside the document.

content developers should have a simple means of specifying a document profile and keeping that profile information tightly coupled to a specific document.

3.2.12. The design shall support a means of specifying document profiles external to the document

content developers should be able to specify a document profile external to a document so that it may be readily reused by other documents. This is needed to simplify the management and maintenance of profiles shared by large number of documents. Furthermore, the size of profiles may make it impractical to incorporate them directly in documents.

3.2.13. The design shall support including document profile information in a request to a server

To enable servers to identify content appropriate to each client, the request must be able to include information that can be used to find matching document profiles. This requirement acts as a constraint on the representations of client capabilities/preferences and document profiles.

The Web is dependent on the TCP/IP network protocols. When opening an HTTP connection (using TCP/IP) the size of the initial request has a considerable effect on the number of round trip times needed for the response to be received by the requestor. For optimum performance the request should fit into a single packet. This places a premium on reducing the size of the profile description sent as part of an HTTP request.

Note: This is an important consideration when defining client-server protocols. If you want to know more, Jim Gettys can explain this in great detail!

3.2.14. The design shall support in-place or linked assertions

In much the same way as a file user agent allows folders to be collapsed or expanded, the document profile should allow the profile author to explicitly include a particular assertion, or to include a link to it. In this way an otherwise identical profile could be expressed explicitly, or as a list of titled links, or some mix of both (the choice would depend upon environmental or application considerations).

3.2.15. Profiles that are embedded in the document shall be accessible through the Document Object Model

If the document profile information within the document, it should be accessible through DOM interfaces.

3.2.16. The design shall support referencing resources indirectly

The ability to express the name and/or location of a resource resolution authority, such as a catalog file or name to resource resolution server.

4. Acknowledgements

The editors wish to thank Murray Altheim and Håkon Wium Lie for their feedback on earlier versions

5. References

[CC/PP]
"Composite Capability/Preference Profiles (CC/PP): A user side framework for content negotiation", F. Reynolds, J. Hjelm, S. Dawkins, S. Singhal, 30 November 1998.

This document describes a method for using the Resource Description Format (RDF) to create a general, yet extensible framework for describing user preferences and device capabilities. Servers can exploit this to customize the service or content provided.
Available at: http://www.w3.org/TR/NOTE-CCPP
[CONNEG]
The IETF Content Negotiation (conneg) Working Group which has defined a number of RFC's relevant to document profiles.
[RFC2396]
"RFC2396: Uniform Resource Identifiers (URI): Generic Syntax", T. Berners-Lee, R. Fielding, L. Masinter, August 1998.
This document updates RFC1738 and RFC1808.
Available at: http://www.ietf.org/rfc/rfc2396.txt
[RFC2506]
"RFC2506: Media Feature Tag Registration Procedure", K. Holtman, A. Mutz, March 1999. This is in the IETF category of "Best Current Practice"
Available at: http://www.ietf.org/rfc/rfc2506.txt
[XML]
"Extensible Markup Language (XML) 1.0 Specification", T. Bray, J. Paoli, C. M. Sperberg-McQueen, 10 February 1998.
Available at: http://www.w3.org/TR/REC-xml
[XMLNAMES]
"Namespaces in XML", T. Bray, D. Hollander, A. Layman, 14 January 1999.
XML namespaces provide a simple method for qualifying names used in XML documents by associating them with namespaces identified by URI.
Available at: http://www.w3.org/TR/REC-xml-names

Level Triple-A conformance icon, W3C-WAI Web Content Accessibility Guidelines 1.0