W3C

XML Schema Composition Task Force Report: Partial draft

Taskforce Working Draft $Id: composition-tf.html,v 1.2 1999/09/24 19:55:48 connolly Exp $ 21 September 1999

Editors:
Henry S. Thompson (University of Edinburgh) <[email protected]>
Noah Mendelsohn (Lotus) <[email protected]>

Copyright  ©  1999 W3C (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.


Abstract

This partial draft discusses the architecture of the relation between instances and schemas, and outlines mechanisms for instance-to-schema association.

Status of this document

This is partial draft of the expected TF report.

Table of contents

1 Associating schemas with instances
    1.1 Preferred Means for Locating XML Schema Documents
        1.1.1 Namespace URIs Point to XML Schemas
        1.1.2 Approximating the XML SYSTEM identifier
    1.2 Conclusions

Appendices

A Tabulation of changes

1 Associating schemas with instances

This note purports to outline key points of agreement regarding the use of XML schemas to validate namespace qualified elements, including document elements, the corresponding rules for validation, and additional instance document constructs which can be used to approximate the capabilities of system and public identifiers when referring to XML schemas.

Key features of the proposed design are:

  1. The minimum a conforming XML schema-aware processor is required to do is to be able to take an XML schema (or schemas) acquired by any means, and attempt to schema-validate a document, (actually an XML 1.0 element information item) with it.
  2. In general, the means by which an XML schema-aware processor locates a file or files containing XML schemas with which to validate elements is processor dependent.
  3. To facilitate standardization and interoperation on the web, one or more mechanisms for locating XML schemas on the Web will be described in the XML Schema recommendation. An number of candidates for this are proposed in a separate section below.
  4. Each XML schema must explicitly indicate the URI of the namespace(s) for which it is supplying declarations and definitions. This eliminates any requirement for the XML schema processor to be aware of the filename or URI for the XML schema being processed; the namespaces being declared can thus be identified by inspection of the XML schema document. Specifically, we propose:
    <schema xmlns='http://www.w3.org/1999/09/23-xmlschema/' 
            targetNS='myNamespaceURI'>
     ...
    </schema>
    
    Note that this XML schema itself may be stored in a document of arbitrary name and accessed via an arbitrary URI on the Web. During validation of an instance element qualified with myNamespaceURI, the processor locates (using processor-dependent means) the XML schema document illustrated above. By inspection of the targetNS attribute, the processor determines the namespace for elements and attributes being declared by the XML schema.

    Also note that this allows for an XML schema to be defined post-hoc for XML documents with no namespace declaration at all: by convention these could be schema-validated by an XML schema with targetNS=''.
  5. It will in addition be possible to indicate that it schema processing is mandatory, that is, that if whatever mechanisms are used do not yield a schema, or if the schema they yield does not identify itself as supplying declarations and definitions for the namespace(s) of the element information item(s) in question (see (3)), this is a non-recoverable error for schema-aware processors: no further processing of any kind should take place. This is the equivalent for XML schemas of standalone='no' in the XML 1.0 declaration.

    This does not of course preclude processing such instances without any schema processing at all: e.g. editors are free to edit instances as such without reference to schemas at all.
  6. Whether XML schemas should remain co-extensive with namespaces, in the sense that each XML schema document provides declarations for exactly one namespace, remains a point of controversy within our group. However, we note that the provision of an explicit URI (as specified above) greatly facilitates decoupling the two. In other words, it would be straightforward to consider an extension to the XML schema language along the following lines:
       <schemadoc xmlns='http://www.w3.org/1999/09/23-xmlschema/'>
          <schema targetNS='myNamespaceURI1'>
                  ...
          </schema>
          <schema targetNS='myNamespaceURI2'>
                  ...
          </schema>
       </schemadoc>
    
    The sample above illustrates a single XML schema document which contributes definitions to two XML namespaces. Nothing in the rest of this design requires such an extension to the XML schema language, but it is a fundamental philosophical change, and a point which must be resolved before we can fully settle the means by which instances prefer to XML schema documents (see below).

    Note that this settles the long-standing dc:creator problem, i.e. how it is possible for an XML schema to allow elements from a namespace whose namespace URI does not point to an XML schema. Now we can simply write
       <schemadoc xmlns='http://www.w3.org/1999/09/23-xmlschema/'>
          <schema id='dc' targetNS='http://purl.org/metadata/dublin_core'>
            <element name='creator'>...</element>
          </schema>
          <schema targetNS='myNamespaceURI2'>
            <element name='mybook'>
             <archetype order='all'>
              <element ref='creator' schemaName='#dc'/>
              ...
             </archetype>
            </element>
          </schema>
       </schemadoc>
    

1.1 Preferred Means for Locating XML Schema Documents

As noted above, the core XML schema language provides no single means for locating an XML schema document during processing. As described in point (3) however, we wish to propose one or two standard mechanisms to promote interoperability in the common cases. Here, we consider two such mechanisms.

1.1.1 Namespace URIs Point to XML Schemas

The first such mechanism we discussed in our call is the seemingly obvious one in which the namespace URI is dereferenced in an attempt to discover either an XML schema document itself, or some sort of external package or directory which might be used to find an XML schema. The design above is consistent with such a convention, and we are considering recommending its use. Detailed means by which the retrieved resource would be inspected, its type determined (mime type?), etc., have not been resolved at this time.

NOTE: We have discussed a problem with this approach: requiring the dereferencing of arbitrary namespace URI may be beyond the purview of the XML Schema WG. Why? The current Namespaces recommendation places no burdens at all on the inventor of a namespace beyond URI syntactic correctness. In particular, there is NO requirement that a resource exists associated with a namespace URI, or if there is that retrieving it produces anything meaningful or up to date. The proposed design suggests that anything retrieved which appears to be a valid XML schema or package is indeed valid and trustworthy. We should consider whether coordination with other XML workgroups is required to make this the case, and whether in fact we can do this retroactively given that the Namespaces recommendation has already been issued.

A variation on this theme is one that Microsoft is currently using. Specifically, Microsoft dereferences only those URIs which contain a reserved URI scheme prefix. Consider an instance document containing an element E:

<a:E/>

The x-schema: URI scheme grants permission for the processor to dereference the URL, and asserts that it will find an XML schema at the other end.

NOTE: We don't know whether Microsoft has registered this URI scheme, or whether we would have to register one to make it part of the XML Schema REC. An introduction to URI schemes and their registration is available from the W3c, including pointers to the official registration authorities.

1.1.2 Approximating the XML SYSTEM identifier

To handle the common case in which the author of an instance document knows the exact absolute or relative URL of a DTD, XML provides for system identifiers in the instance document. In the case of documents using multiple namespaces, to be validated against multiple possible XML schema documents, something more elaborate and robust is required.

Two related proposals have been offered to provide similar flexibility for locating an XML schema.

Consider first the simple case where either no namespace declaration is involved at all, or a single namespace declaration appears on the top element which is qualified with that namespace. In these cases an attribute from the XML Schema namespace can be used to provide a URL of an XML schema, e.g. as follows

   <myDoc xmlns='myNamespaceURI'
       xmlns:xsd='http://www.w3.org/1999/09/23-xmlschema/'
       xsd:schemaLoc='myschema.xsd'>
    ...
   </myDoc>

This approach depends on the proposal above that an XML schema contains the URI of the namespace for which it provides definitions.

Extending this approach to cases where more than one namespace is declared and used on a single element requires allowing xsd:schemaLoc to contain a list (space-separated) of URLs.

Alternatively, to give greater flexibility than a purely attribute-based approach allows, we could define an element, again in the XML Schema namespace, whose content specified either one or more XML schema URLs, or, if we decide to pull back from the proposal above where an XML schema identifies the namespace it is about, one or more mappings from namespaces to XML schema URLs. This element in turn would then be pointed to from an attribute from the XML Schema namespace:

 <a:rootel b:someattr="1"
       xmlns:xsd='http://www.w3.org/1999/09/23-xmlschema/'
       xsd:schemaLoc='#sds'>
  <xsd:schemaBindings id='sds'>
    <xsd:namespacebinding ns="URIForA"
                          systemFile="http://myorg.com/somefileA.xsd"/>
    <xsd:namespacebinding ns="URIForB"
                          systemFile="http://myorg.com/somefileB.xsd"/>
  </xsd:schemadocs>
  ...real content...
 </a:rootel>

A namespaceBinding provides a URL for an XML schema, in the spirit of the XML system identifier, for each of the namespaces used in the document. Note that these associations must apply also to the root element in which they are contained.

1.2 Conclusions

We have a fundamental choice to make regarding how firm a stand we take on the connection between elements/documents and XML schemas. We've opted for a layered story above, in which what you do with an XML schema once you've got one is fixed, but how you get one is left flexible.

If we agree with that story, there are further questions to be resolved. Most importantly, we must consider the various options as to whether an individual XML schema remains co-extensive with a namespace. For example, we should consider the <schemadoc> packaging of multiple XML schema declarations into a single document. We must also decide how many of the 4 options canvassed above (any namespace URI may point to an XML schema, directly or indirectly; special URI scheme prefix; xsd:schemaLoc attribute; xsd:schemaBindings element) we explicitly describe, and whether we indicate a prioritisation of the list.


A Tabulation of changes


$Log: composition-tf.html,v $
Revision 1.2  1999/09/24 19:55:48  connolly
fixing HTML validation and links

Revision 1.1  1999/09/24 18:18:56  hugo
Initial version

Revision 1.1.2.3  1999/09/23 13:08:02  ht
typo in $id :-(

Revision 1.1.2.2  1999/09/23 12:56:14  ht
fixed URI for schema per dan connolly, added ID and LOG