W3C

XML Schema Part 0: Primer

W3C Working Draft, 25 February 2000

This version:
http://www.w3.org/TR/2000/WD-xmlschema-0-20000225
Latest version:
http://www.w3.org/TR/xmlschema-0
Editor:
David C. Fallside (IBM) [email protected]

Abstract

XML Schema Part 0: Primer is a non-normative document intended to provide an easily readable description of the XML Schema facilities and is oriented towards quickly understanding how to create schemas using the XML Schema language. XML Schema Part 1: Structures and XML Schema Part 2: Datatypes provide the complete normative description of the XML Schema definition language, and the primer describes the language features through numerous examples which are complemented by extensive references to the normative texts.

Status of this Document

The XML Schema Part 0: Primer is a part of the W3C XML Activity.

This is a public working draft of XML Schema 1.0 for review by the public and by members of the World Wide Web Consortium. The XML Schema Working Group has agreed to its publication. Note that some sections of this draft may not be up-to-date with the XML Schema language described in Parts 1 and 2 of the XML Schema specification. Known discrepancies are noted in the text.

The Working Group does not anticipate further substantial changes to the syntax described here, although this is still a working draft, and is subject to change based on experience and on comment by the public, and other W3C working groups.

A list of current W3C working drafts can be found at http://www.w3.org/TR/. They may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress".

Table of contents

1 Introduction

2 Basic Concepts: The Purchase Order
2.1 The Purchase Order Schema
2.2 Complex Type Definitions, Element & Attribute Declarations
2.3 Simple Types
2.4 Anonymous Type Definitions
2.5 Types of Element Content
2.6 Annotations, Versions & Comments
2.7 Groups
2.8 Attribute Groups
2.9 Null Values

3. Advanced Concepts I: The International Purchase Order
3.1 A Schema in Multiple Documents
3.2 Deriving Types by Extension
3.3 Using Derived Types in Instance Documents
3.4 Deriving Types by Restriction
3.5 Equivalence Classes
3.6 Abstract Elements
3.7 Preventing Type Derivations

4. Advanced Concepts II: The Quarterly Report
4.1 Specifying Uniqueness
4.2 Defining Keys and their References
4.3 XML Schema Constraints vs. XML 1.0 ID Attributes
4.4 Importing Types
4.5 Any Element, Any Attribute
4.6 schemaLocation
4.7 Conformance

Appendices

A. Acknowledgements
B. Simple Types & Their Facets
C. Regular Expressions
D. Index
E. Document History


1 Introduction

This document, XML Schema Part 0: Primer, provides an easily approachable description of the XML Schema definition language, and should be used alongside the formal descriptions of the language contained in Parts 1 and 2 of the XML Schema specification. The intended audience of this document includes application developers whose programs read and write schema documents, and schema authors who need to know about the features of the language, especially features that provide functionality above and beyond what is provided by DTDs. The text assumes that you have a basic understanding of XML 1.0 and XML-Namespaces. Each major section of the primer introduces new features of the language, and describes the features in the context of concrete examples.

Section 2 covers the basic mechanisms of XML Schema. It describes how to declare the elements and attributes that appear in XML documents, the distinctions between simple and complex types, defining complex types, the use of simple types for element and attribute values, schema annotation, a simple mechanism for re-using element and attribute definitions, and null values.

Section 3 covers some of XML Schema's advanced features, and in particular, it describes mechanisms for deriving types from existing types, and for controlling these derivations. The section also describes mechanisms for merging together fragments of a schema from multiple sources, and for element substitution.

Section 4 covers more advanced features, including a powerful mechanism for specifying uniqueness among attributes and elements, a mechanism for using types across namespaces, a mechanism for extending types based on namespaces, and a description of how documents are checked for conformance.

In addition to the sections just described, the primer has a number of appendices that contain detailed reference information on simple types and an associated regular expression language.

The primer is a non-normative document, which means that it does not provide a definitive (from the W3C's point of view) specification of the XML Schema language. The examples and other explanatory material in this document are provided to help you understand XML Schema, but they may not always provide definitive answers. In such cases, you will need to refer to the XML Schema specification, and to help you do this, we provide many links pointing to the relevant parts of the specification.

Ed. Note: At this time, there are only links from section 2.3 and appendix B to XML Schema Part 2: Datatypes. Links to XML Schema Part 1: Structures and links from more sections to Part 2 will be provided in future working drafts.

2 Basic Concepts: The Purchase Order

The purpose of a schema is to define a class of XML documents, and so the term "instance document" is often used to describe an XML document that conforms to a particular schema. In fact, neither instances nor schemas need to exist as documents per se -- they may exist as streams of bytes sent between applications, as fields in a database record, or as collections of XML Infoset "Information Items" -- but to simplify the primer, we have chosen to always refer to instances and schemas as if they are files.

Let us start by considering an instance document in a file called po.xml. It describes a purchase order generated by an application for ordering and billing home products:

The Purchase Order, po.xml
<?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20">
    <shipTo country="US">
        <name>Alice Smith</name>
        <street>123 Maple Street</street>
        <city>Mill Valley</city>
        <state>CA</state>
        <zip>90952</zip>
    </shipTo>
    <billTo country="US">
        <name>Robert Smith</name>
        <street>8 Oak Avenue</street>
        <city>Old Town</city>
        <state>PA</state>
        <zip>95819</zip>
    </billTo>
    <comment>Hurry, my lawn is going wild!</comment>
    <items>
        <item partNum="872-AA">
            <productName>Lawnmower</productName>
            <quantity>1</quantity>
            <price>148.95</price>
            <comment>Confirm this is electric</comment>
        </item>
        <item partNum="926-AA">
            <productName>Baby Monitor</productName>
            <quantity>1</quantity>
            <price>39.98</price>
            <shipDate>1999-05-21</shipDate>
        </item>
    </items>
</purchaseOrder>

The purchase order consists of a main element, purchaseOrder, and the subelements shipTo, billTo, and items. These subelements in turn contain other subelements, and so on, until a subelement such as price contains a number rather than any subelements. Elements that contain subelements or attributes are said to have complex types, whereas elements that contain numbers (and strings, and dates, etc) but do not contain any subelements are said to have simple types. Some elements have attributes; attributes always have simple types.

The complex types in the instance document and some of the simple types are defined in the schema for purchase orders. The other simple types are defined as part of XML Schema's repertoire of built-in simple types.

Before going on to examine the purchase order schema, we digress briefly to mention the association between the instance document and the purchase order schema. As you can see by inspecting the instance document, the purchase order schema is not mentioned. To keep this first section simple, we assume that any processor of the instance document can obtain the purchase order schema without requiring any information from the instance document. In later sections, we will examine mechanisms that provide explicit information about the schema.

2.1 The Purchase Order Schema

The purchase order schema is contained in the file po.xsd:

The Purchase Order Schema, po.xsd
<xsd:schema xmlns:xsd="http://www.w3.org/1999/XMLSchema">

 <xsd:annotation>
  <xsd:documentation>
   Purchase order schema for Example.com.
   Copyright 2000 Example.com. All rights reserved.
  </xsd:documentation>
 </xsd:annotation>

 <xsd:element name="purchaseOrder" type="PurchaseOrderType"/>

 <xsd:element name="comment" type="xsd:string"/>

 <xsd:complexType name="PurchaseOrderType">
  <xsd:element name="shipTo" type="Address"/>
  <xsd:element name="billTo" type="Address"/>
  <xsd:element ref="comment" minOccurs="0"/>
  <xsd:element name="items"  type="Items"/>
  <xsd:attribute name="orderDate" type="xsd:date"/>
 </xsd:complexType>

 <xsd:complexType name="Address">
  <xsd:element name="name"   type="xsd:string"/>
  <xsd:element name="street" type="xsd:string"/>
  <xsd:element name="city"   type="xsd:string"/>
  <xsd:element name="state"  type="xsd:string"/>
  <xsd:element name="zip"    type="xsd:decimal"/>
  <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/>
 </xsd:complexType>

 <xsd:complexType name="Items">
  <xsd:element name="item" minOccurs="0" maxOccurs="*">
   <xsd:complexType>
    <xsd:element name="productName" type="xsd:string"/>
    <xsd:element name="quantity">
     <xsd:simpleType base="xsd:positive-integer">
      <xsd:maxExclusive value="100"/>
     </xsd:simpleType>
    </xsd:element>
    <xsd:element name="price"    type="xsd:decimal"/>
    <xsd:element ref="comment"   minOccurs="0"/>
    <xsd:element name="shipDate" type="xsd:date" minOccurs='0'/>
   <xsd:attribute name="partNum" type="Sku"/>
   </xsd:complexType>
  </xsd:element>
 </xsd:complexType>

<xsd:simpleType name="Sku" base="xsd:string">
  <xsd:pattern value="/d{3}-[A-Z]{2}"/>
 </xsd:simpleType>

</xsd:schema>

The purchase order schema consists of a schema element and a variety of subelements, most notably element, complexType, and simpleType which determine the appearance of elements and their content in instance documents.

Each of the elements in the schema has a prefix xsd: which is associated with the XML Schema namespace through a declaration (xmlns:xsd="http://www.w3.org/1999/XMLSchema") that appears in the schema element. The same prefix, and hence the same association, also appears on the names of built-in simple types. The purpose of the association is to identify the elements and simple types as belonging to XML Schema rather than the schema author. For the sake of clarity in the text, we just mention the names of elements and simple types (e.g. simpleType), and omit the prefix.

2.2 Complex Type Definitions, Element & Attribute Declarations

In XML Schema, there is a basic difference between complex types which allow elements in their content and may carry attributes, and simple types which cannot have element content nor carry attributes. There is also a major distinction between definitions which create new types (both simple and complex), and declarations which enable the appearance in document instances of elements or attributes with specific names and types (both simple and complex). In this section, we focus on defining complex types and declaring the elements and attributes that appear within them.

New complex types are defined using the complexType element and typically contain a set of element and attribute declarations. These declarations are not themselves types, but rather an association between a name and constraints which govern the appearance of that name in documents governed by the associated schema. For example, Address is defined as a complex type, and within the definition of Address we see five element declarations and one attribute declaration:

Defining the Address Type
<xsd:complexType name="Address" >
    <xsd:element name="name"   type="xsd:string" />
    <xsd:element name="street" type="xsd:string" />
    <xsd:element name="city"   type="xsd:string" />
    <xsd:element name="state"  type="xsd:string" />
    <xsd:element name="zip"    type="xsd:decimal" />
    <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/>
</xsd:complexType>

The consequence of this definition is that any element appearing in an instance (e.g. po.xml) whose type is declared to be Address must consist of five elements and one attribute. These elements must be called name, street, city, state and zip. The first four of these elements will each contain a string, and the fifth will contain a decimal number. The element whose type is declared to be Address may appear with an attribute called country containing the string "US".

The Address definition contains only declarations involving simple types: string, decimal and NMTOKEN. In contrast, the purchaseOrderType definition contains element declarations involving complex types, e.g. Address, although note that both declarations use the same "type=" attribute to identify the type, regardless of whether the type is simple or complex.

Defining the PurchaseOrderType Type
<xsd:complexType name="PurchaseOrderType">
    <xsd:element   name="shipTo"    type="Address" />
    <xsd:element   name="billTo"    type="Address" />
    <xsd:element   ref="comment"    minOccurs="0" />
    <xsd:element   name="items"     type="Items" />
    <xsd:attribute name="orderDate" type="xsd:date" />
</xsd:complexType>

In defining PurchaseOrderType, two of the element declarations, shipTo and billTo, associate different element names with the same complex type, namely Address. The consequence of this definition is that any element appearing in an instance (e.g. po.xml) whose type is declared to be PurchaseOrderType must consist of elements called shipTo and billTo, each containing the five subelements (name, street, city, state and zip) that were declared as part of Address. The shipTo and billTo elements may also carry the country attribute that was declared as part of Address.

The PurchaseOrderType definition contains an orderDate attribute declaration which, like the country attribute declaration, involves a simple type (date). In fact, all attribute declarations must reference simple types because, unlike element declarations, attributes cannot contain other elements or attributes.

The element declarations we have described so far have each associated a name with an existing type definition. Sometimes it is preferable to use an existing element rather than to declare a new element, for example:

<xsd:element   ref="comment" minOccurs="0" />

This declaration references an existing element, comment, that was declared elsewhere in the purchase order schema. In general, the value of the ref attribute must reference a global element, i.e. one that exists at the top-level of the schema as an immediate subelement of the schema element. The consequence of this declaration is that an element called comment may appear in an instance document, and its content must be consistent with that element's type, in this case, string.

The comment element is optional within PurchaseOrderType, on account of minOccurs=0. Elements may also be declared to appear one or more times, by setting a maxOccurs attribute to 1 or * respectively. The default value for minOccurs is 1 but there is no default value for maxOccurs, so element declarations that omit both minOccurs and maxOccurs attributes must occur exactly once. minOccurs and maxOccurs may also be applied to attribute declarations, although their default values are 0 and 1 repectively, making unattributed attributes optional by default. A particular attribute may appear only once in an element, and so the maximum value of maxOccurs is 1.

In this section we have described how to define new complex types (e.g. PurchaseOrderType), and declare elements (e.g. purchaseOrder) and attributes (e.g. orderDate). These activities generally involve naming, and the question naturally arises: What happens if two things have the same name? The answer depends upon the two things in question, although in general the more similar are the two things, the more likely is there to be a conflict.

Here are some examples to illustrate when same names cause problems. If the two things are both types, say I define a complex type called US-States and a simple type called US-States, there is a conflict. If the two things are a type and an element or attribute, say I define a complex type called purchaseOrder and I declare an element called purchaseOrder, there is no conflict. If the two things are elements within different types (i.e. not global elements), say I declare one element called name as part of the Address type and a second element called name as part of the Item type, there is no conflict. Finally, if the two things are both types and you define one and XML Schema has defined the other, say you define a simple type called decimal, there is no conflict. The reason for the apparent contradiction in the last example is that the two types belong to different namespaces. We'll explore the use of schema and namespaces in a later section.

2.3 Simple Types

The purchase order schema declares several elements and attributes that have simple types. Some of these simple types, such as string and decimal, are built-in to XML Schema, while others are derived from the built-in's. For example, the partNum attribute has a type called Sku that is derived from string. Both built-in simple types and their derivations can be used in all element and attribute declarations. Table 1 lists all the simple types built-in to XML Schema, along with an example of each type.

Table 1. Simple Types Built-In to XML Schema
Simple Type Example
string "Confirm this is electric"
boolean true, false, 1, 0
float -INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN ("not a number"), equivalent to single-precision 32-bit floating point
double -INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN, equivalent to double-precision 64-bit floating point
decimal -1.23, 0, 123.4, 1000.00
timeInstant 1999-05-31T13:20:00.000-05:00 (May 31st 1999 at 1.20pm Eastern Standard Time which is 5 hours behind Co-Ordinated Universal Time)
timeDuration P1Y2M3DT10H30M12.3S (1 year, 2 months, 3 days, 10 hours, 30 minutes, 12.3 seconds)
recurringInstant --05-31T13:20:00 (May 31st every year at 1.20pm Co-Ordinated Universal Time, format similar to timeInstant)
binary 100010
uri-reference http://www.example.com/, http://www.example.com/doc.html#ID5
ID is an XML 1.0 ID attribute type
IDREF is an XML 1.0 IDREF attribute type
ENTITY is an XML 1.0 ENTITY attribute type
NOTATION is an XML 1.0 NOTATION attribute type
language en-GB, en-US, fr, and other valid values for xml:lang as defined in XML 1.0
IDREFS is an XML 1.0 IDREFS attribute type
ENTITIES is an XML 1.0 ENTITIES attribute type
NMTOKEN US, is an XML 1.0 NMTOKEN attribute type
NMTOKENS "US UK", is an XML 1.0 NMTOKENS attribute type
Name shipTo (is an XML 1.0 Name type)
QName Address (is an XML Namespace QName)
NCName Address (is an XML Namespace NCName, i.e. is a QName without the prefix and colon)
integer -126789, -1, 0, 1, 126789
non-positive-integer -126789, -1, 0
negative-integer -126789, -1
long -1, 12678967543233
int -1, 126789675
short -1, 12678
byte -1, 126
non-negative-integer 0, 1, 126789
unsigned-long 0, 12678967543233
unsigned-int 0, 1267896754
unsigned-short 0, 12678
unsigned-byte 0, 126
positive-integer 1, 126789
date 1999-05-31, ---05 (5th day of every month)
time 13:20:00.000, 13:20:00.000-05:00
Note that to retain compatibility between XML Schema and XML 1.0 DTDs, the simple types ID, IDREF, IDREFS, ENTITY, ENTITIES, NOTATION, NMTOKEN, NMTOKENS should only be used in attributes.

New simple types are defined by derivation from existing simple types (built-in's and derived). A new type must have a name different from the existing type, and the new type may constrain the legal range of values obtained from the existing type. We use the simpleType element to define a new simple type, and there are a wide variety of so-called facets that may be used to constrain the values of the new type (a complete listing of facets is provided in Appendix B). For example, in the schema po.xsd, we defined a new simple type called Sku that is derived from the simple type string. Furthermore, we constrain the values of Sku using a facet called pattern in conjunction with the regular expression "/d{3}-[A-Z]{2}" that is read "three digits followed by a hyphen followed by two upper-case letters":

Defining the Simple Type "Sku"
<xsd:simpleType name="Sku" base="xsd:string">
  <xsd:pattern value="/d{3}-[A-Z]{2}"/>
</xsd:simpleType>

This regular expression language is described more fully in Appendix C.

XML Schema defines thirteen facets (see Appendix B). Among these, the enumeration facet is one the most useful and it can be used to constrain the values of almost every simple type, except the boolean type. The enumeration facet limits a simple type to a set of distinct values. For example, we can use the enumeration facet to define a new simple type called US-State, derived from string, whose value must be one of the standard US state abbreviations:

Using the Enumeration Facet
<xsd:simpleType name="US-State" base="xsd:string">
  <xsd:enumeration value="AK"/>
  <xsd:enumeration value="AL"/>
  <xsd:enumeration value="AR"/>
  <!-- and so on ... -->
</xsd:simpleType>

US-State would be a good replacement for the string type currently used in the state element declaration. By making this replacement, the legal values of a state element, i.e. the state subelements of billTo and shipTo, would be limited to one of AK, AL, AR, etc.

Ed. Note: Need to describe List simple types

2.4 Anonymous Type Definitions

Schema can be constructed by defining sets of named types such as PurchaseOrderType and then declaring elements such as purchaseOrder that reference the types using the "type=" construction. This style of schema construction is straightforward but it can be unwieldy, especially if you create a lot of types that are only referenced once and consist of a very few constraints. In these cases, a type can be more succinctly defined as an anonymous type which saves the overhead of being named, and referenced through "type=".

The definition of the type Items contains two element declarations that have anonymous types. In general, you can identify anonymous types by the lack of a "type=" in the element (or attribute) declaration, and an immediately following un-named type definition:

Two Anonymous Type Definitions
<xsd:complexType name="Items">
  <xsd:element name="item" minOccurs="0" maxOccurs="*">
    <xsd:complexType>
      <xsd:element name="productName" type="xsd:string"/>
      <xsd:element name="quantity">
        <xsd:simpleType base="xsd:positive-integer">
          <xsd:maxExclusive value="100"/>
        </xsd:simpleType>
      </xsd:element>
      <xsd:element name="price" type="xsd:decimal"/>
      <xsd:element ref="comment" minOccurs="0"/>
      <xsd:element name="shipDate" type="xsd:date" minOccurs='0'/>
      <xsd:attribute name="partNum" type="Sku"/>
    </xsd:complexType>
  </xsd:element>
</xsd:complexType>

In the case of the item element, it has an un-named complex type consisting of the elements productName, quantity, price, comment, and shipDate, and an attribute called partNum. In the case of the quantity element, is has an un-named simple type derived from integer whose value ranges between 1 and 99.

2.5 Types of Element Content

The purchase order schema has many examples of elements containing other elements (e.g. items), elements having attributes and containing other elements (e.g. shipTo), and elements containing only a simple type of value (e.g. price). However, we have not seen an element having any attributes and containing only a simple type of value, nor have we seen an element that contains other elements and simple values, nor have we seen an element that has no content at all. In this section we'll examine these variations in the content of element types.

Let us first consider how to declare an element that has an attribute and contains a simple value. In an instance document, such an element might appear as:

<price currency='EU'>423.46</price>

The purchase order schema declares a price element that gives us a starting point:

<xsd:element name="price" type="decimal"/>

Now, how do we add an attribute to this element? As we have said before, simple types cannot have attributes, and decimal is a simple type. Therefore, we must create a complex type to carry the attribute declaration. We also want the content to be simple type decimal. So our original question becomes: How do we create a complex type that is based on the simple type decimal? The answer is to derive a new complex type from the simple type decimal:

Deriving a Complex Type from a Simple Type
<xsd:element name='price'>
    <xsd:complexType base='xsd:decimal' derivedBy='extension'>
        <xsd:attribute name='currency' type='xsd:string' />
    </xsd:complexType>
</xsd:element>

We use the complexType element to define a new (anonymous) type, and we refer to decimal in the base attribute to indicate it is the simple type from which we are deriving the new type. We add a currency attribute using a standard attribute declaration, and because we want to add this attribute to the simple type, we must signal our intent by stating derivedBy='extension'. (We cover type derivation in detail in Section 3). The price element declared in this way will appear in an instance as shown in the example above.

For the sake of brevity, we have derived an anonymous complex type from decimal, but the price element declared here is still correct relative to the price instance example above.

Now suppose that we want the price element to convey both the unit of currency and the price as attribute values rather than as separate attribute and content values. For example:

<price currency='EU' value='423.46' />

Such an element has no content at all, we say that its content model is empty:

An Empty Complex Type
<xsd:element name='price'>
    <xsd:complexType content='empty'>
        <xsd:attribute name='currency' type='xsd:string' />
        <xsd:attribute name='value'    type='xsd:decimal' />
    </xsd:complexType>
</xsd:element>

The purchase order schema is constructed in a style which can be broadly described as elements containing subelements and the deepest subelements containing character data. XML Schema also provides for constructing schemas using a style in which character data can appear alongside subelements at multiple levels of embedding, and such data is not confined to the deepest level subelements. The latter style of constructing schemas is enabled through the mixed value of the content attribute. To illustrate, consider the following snippet from a customer letter that uses some of the same elements as the purchase order:

Snippet of Customer Letter
<letterBody>
<salutation>Dear Mr.<name>Robert Smith</name>.</salutation>
Your order of <quantity>1</quantity> <productName>Baby
Monitor</productName> shipped from our warehouse on
<shipDate>1999-05-21</shipDate>. ....
</letterBody>

Notice the text appearing between elements at different levels. Specifically, text appears between the elements salutation, quantity, productName and shipDate which are all children of letterBody, and text appears around the element name which is the child of a child of letterBody. The following snippet of a schema declares letterBody:

Snippet of Schema for Customer Letter
<xsd:element name='letterBody'>
    <xsd:complexType content='mixed'>
        <xsd:element name='salutation'>
            <xsd:complexType content='mixed'>
                <xsd:element name='name' type='xsd:string'/>
            </xsd:complexType>
        </xsd:element>
        <xsd:element name='quantity' type='xsd:positive-integer'/>
        <xsd:element name='productName' type='xsd:string'/>
        <xsd:element name='shipDate' type='xsd:date' minOccurs='0'/>
        <!-- etc -->
    </xsd:complexType>
</xsd:element>

Ed. Note: Clarify the default and possible values of min/maxOccurs on elements in mixed content model.

By now, we hope you understand that the different values for the content attribute represent different content models. In previous sections, we have defined new complex types without reference to the content attribute, and so it is reasonable to ask what content model was used in those definitions. The default content model for a complex type is called elementOnly, i.e. the complex type may contain elements and attributes. elementOnly is the content model that applies when we derive complex types from other complex types, but when we derive a complex type from a simple type (as we did at the beginning of this section), the content model is called textOnly. In fact, we can define a complexType in terms of textOnly:

A textOnly Complex Type
<xsd:element name='price'>
    <xsd:complexType content='textOnly'>
        <xsd:attribute name='currency' type='xsd:string' />
    </xsd:complexType>
</xsd:element>

The content of the anonymous type defined in this way is unconstrained, so the element value may be 423.46, but it legitimately may be any other sequence of characters as well. In general it is probably better to avoid such unconstrained type definitions in favour of the any construction described in a later section, and constrained type definitions such as decimal and string.

2.6 Annotations, Versions & Comments

XML Schema provides a set of elements for annotating schemas for the benefit of both human readers and applications. In the purchase order schema, we put a basic schema description and copyright information inside the documentation element, which is the recommended location for human readable material. Another element, appinfo, which we did not use in the purchase order schema, can be used to provide information for tools, stylesheets and other applications. Both documentation and appinfo appear as a subelement of annotation, which may itself appear anywhere in a schema.

2.7 Groups

The definitions of complex types in the purchase order schema all declare a sequence of elements that must appear in the instance document. The occurence of individual elements declared in the so-called content models of these types may be optional, as in the case of the comment element where the value of minOccurs is 0, or otherwise constrained depending upon the values of minOccurs and maxOccurs. XML Schema also provides constraints that apply to groups of elements appearing in a content model. Note that the constraints do not apply to attributes. The constraints provided by the group element mirror those available in XML 1.0, and provide some additional constraints.

Ed. Note: The syntax for groups has changed. There are now <choice>, etc elements. This section will be rewritten.

The default for a group is that all the group's elements must appear in the order given. Alternatively, a group may be defined such that only one of the elements within the group may appear in the instance. Groups can also be nested, and may take minOccurs and maxOccurs attributes. To illustrate, we can use two groups in the purchase order schema to allow documents containing separate shipping and billing addresses, or single addresses in cases where the shipper and biller are co-located:

Nested Groups
<xsd:complexType name="PurchaseOrderType">
  <xsd:group order="choice">
    <xsd:group>
      <xsd:element name="shipTo" type="Address" />
      <xsd:element name="billTo" type="Address" />
    </xsd:group>
    <xsd:element name="singleAddress" type="Address"/>
  </xsd:group>
  <xsd:element   ref="comment"    minOccurs="0" />
  <xsd:element   name="items"     type="Items" />
  <xsd:attribute name="orderDate" type="xsd:date" />
</xsd:complexType>

The third option for ordering elements in a group specifies that all the elements in the group must appear once, and in any order. Usage of the "all" option (which provides a simplified version of theSGML &-Connector) is limited to the top-level of any content model, the items listed must all be individual elements (no groups), and each element in the content model can only appear once, i.e. every element in the content model must have minOccurs="1" and maxOccurs="1". For example, if it was important to allow the child elements of purchaseOrder to appear in any order, we could redefine PurchaseOrderType as:

An 'All' Group
<xsd:complexType name="PurchaseOrderType">
  <xsd:group order="all">
    <xsd:element name="shipTo" type="Address"/>
    <xsd:element name="billTo" type="Address"/>
    <xsd:element ref="comment"/>
    <xsd:element name="items"  type="Items" />
  </xsd:group>
  <xsd:attribute name="orderDate" type="xsd:date" />
</xsd:complexType>

Note that the comment element in this example is no longer optional (as indicated previously by minOccurs="0") to meet the stipulation that every element in an "all" group must appear exactly once. Furthermore, the comment element cannot be placed outside the "all" group as a means to making it optional because the "all" group must appear at the top of the content model. In other words, the following is illegal:

Illegal Example with an 'All' Group
<xsd:complexType name="PurchaseOrderType">
  <xsd:group order="all">
    <xsd:element name="shipTo" type="Address"/>
    <xsd:element name="billTo" type="Address"/>
    <xsd:element name="items"  type="Items" />
  </xsd:group>
  <xsd:element   ref="comment"    minOccurs="1"/>
  <xsd:attribute name="orderDate" type="xsd:date" />
</xsd:complexType>

The preceding examples describe groups defined inline, in other words, the groups are not named and they do not exist outside the context of their surrounding type definitions. However, groups can be named and used in multiple locations, in much the same way as the complexType and attributeGroup elements. In this way, they reconstruct common usage of Parameter Entities in XML 1.0.

2.8 Attribute Groups

Suppose we want to provide more information about the items in a purchase order, by adding attributes to the item element indicating whether or not the item is in stock, weight, and preferred shipping method. One way to add these attributes is to add more attribute declarations to the inline Item complexType definition:

Adding Attributes to the Inline Type Definition
<xsd:element name="Item" minOccurs="0" maxOccurs="*">
   <xsd:complexType>
    <xsd:element   name="productName" type="xsd:string"/>
    <xsd:element   name="quantity">
     <xsd:simpleType base="xsd:positive-integer">
      <xsd:maxExclusive value="100"/>
     </xsd:simpleType>
    </xsd:element>
    <xsd:element   name="price"    type="xsd:decimal"/>
    <xsd:element   ref="comment"   minOccurs="0"/>
    <xsd:element   name="shipDate" type="xsd:date" minOccurs='0'/>
    <xsd:attribute name="partNum"  type="Sku"/>
    <xsd:attribute name="weight"   type="xsd:decimal"/>
    <xsd:attribute name="shipBy">
     <xsd:simpleType base="string">
       <xsd:enumeration value="air"/>
       <xsd:enumeration value="land"/>
       <xsd:enumeration value="any"/>
     </xsd:simpleType>
    </xsd:attribute>
   </xsd:complexType>
</xsd:element>

Alternatively, we can create a named Attribute Group containing these attributes and reference this group by name in the item element declaration:

Adding Attributes Using an Attribute Group
<xsd:element name="item" minOccurs="0" maxOccurs="*">
   <xsd:complexType>
    <xsd:element name="productName" type="xsd:string"/>
    <xsd:element name="quantity">
     <xsd:simpleType base="xsd:positive-integer">
      <xsd:maxExclusive value="100"/>
     </xsd:simpleType>
    </xsd:element>
    <xsd:element name="price"    type="xsd:decimal"/>
    <xsd:element ref="comment"   minOccurs="0"/>
    <xsd:element name="shipDate" type="xsd:date" minOccurs='0'/>
    <xsd:attributeGroup ref="ItemDelivery"/>
   </xsd:complexType>
</xsd:element>
<xsd:attributeGroup name="ItemDelivery">
  <xsd:attribute name="partNum" type="Sku"/>
  <xsd:attribute name="weight"  type="xsd:decimal"/>
  <xsd:attribute name="shipBy">
    <xsd:simpleType base="xsd:string">
      <xsd:enumeration value="air"/>
      <xsd:enumeration value="land"/>
      <xsd:enumeration value="any"/>
    </xsd:simpleType>
  </xsd:attribute>
</xsd:attributeGroup>

Using an Attribute Group in this way can improve the readability of schema, and facilitates updating schema because an Attribute Group can be defined and edited in one place and referenced in multiple definitions and declarations. These characteristics of Attribute Groups make them similar to Parameter Entities in XML 1.0. Note that both Attribute declarations and Attribute Group references must appear at the end of complex type definitions.

2.9 Null Values

One of the purchase order items, the Lawnmower, does not have a shipDate element. Within the context of our scenario, the schema author may have intended such an absence to indicate that the Lawnmower has not shipped. But in general, the absence of an element does not have any particular meaning; it may indicate that the information is unknown, or not applicable, or the element may be absent for some other reason. Sometimes the absence of an element may have the same meaning as a "null" value in a relational database, although there will be other times when it is desirable to explicitly represent such a null value.

XML Schema provides a mechanism for explicitly representing nulls in an XML format. This mechanism involves an "out of band" null signal. In other words, there is no actual null value that appears as element content, instead there is an attribute to indicate the element content is null. To illustrate, we can modify the shipDate element declaration so that nulls can be signalled:

<xsd:element name="shipDate" type="xsd:date" nullable="true"/>

And to explictly represent that shipDate has a null value in the instance document, we set the null attribute (from the XML Schema namespace for instances) to true:

<shipDate xsi:null="true"></shipDate>

The null attribute is defined as part of the XML Schema namespace for instances (http://www.w3.org/1999/XMLSchema/instance), and so it must appear in the instance document with a prefix (xsi) associated with that namespace. Note that the null mechanism applies only to element values, and not to attribute values. An element with xsi:null="true" may not have any element content but it may still carry attributes.

3 Advanced Concepts I: The International Purchase Order

In this section, we consider some of the advanced features available in XML Schema.

As schemas become larger, it is often desirable to divide their content among several schema documents for purposes such as ease of maintenance, access control, and readability. For these reasons, we have taken the schema constructs concerning addresses out of po.xsd, and put them in a new file called address.xsd. The modified purchase order schema file is now called ipo.xsd:

The International Purchase Order Schema, ipo.xsd
<schema targetNamespace="http://www.example.com/IPO"
        xmlns="http://www.w3.org/1999/XMLSchema"
        xmlns:ipo="http://www.example.com/IPO>

 <annotation>
  <documentation>
   International Purchase order schema for Example.com
   Copyright 2000 Example.com. All rights reserved.
  </documentation> 
 </annotation>

 <!-- include address constructs -->
 <include
  schemaLocation="http://www.example.com/schemas/address.xsd"/>

 <element name="purchaseOrder" type="ipo:PurchaseOrderType"/>

 <element name="comment" type="string"/>

 <complexType name="PurchaseOrderType">
  <element name="shipTo"     type="ipo:Address"/>
  <element name="billTo"     type="ipo:Address"/>
  <element ref="ipo:comment" minOccurs="0"/>
  <element name="Items"      type="ipo:Items"/>
  <attribute name="orderDate" type="date"/>
 </complexType>

 <complexType name="Items">
  <element name="item" minOccurs="0" maxOccurs="*">
   <complexType>
    <element name="productName" type="string"/>
    <element name="quantity">
     <simpleType base="positive-integer">
      <maxExclusive value="100"/>
     </simpleType>
    </element>
    <element name="price"      type="decimal"/>
    <element ref="ipo:comment" minOccurs="0"/>
    <element name="shipDate"   type="date" minOccurs='0'/>
    <attribute name="partNum"  type="ipo:Sku"/>
   </complexType>
  </element>
 </complexType>

 <simpleType name="Sku" base="string">
  <pattern value="/d{3}-[A-Z]{2}"/>
 </simpleType>

</schema>

The file containing the address constructs is:

Addresses for International Purchase Order schema, address.xsd
<schema targetNamespace="http://www.example.com/IPO"
        xmlns="http://www.w3.org/1999/XMLSchema"
        xmlns:ipo="http://www.example.com/IPO">

 <annotation>
  <documentation>
   Addresses for International Purchase order schema
   Copyright 2000 Example.com. All rights reserved.
  </documentation> 
 </annotation>

 <complexType name="Address">
  <element name="name"   type="string"/>
  <element name="street" type="string"/>
  <element name="city"   type="string"/>
 </complexType>

 <complexType name="US-Address" base="ipo:Address"
              derivedBy="extension">
  <element name="state" type="ipo:US-State"/>
  <element name="zip"   type="positive-integer"/>
 </complexType>

 <complexType name="UK-Address" base="ipo:Address"
              derivedBy="extension">
  <element   name="postcode" type="ipo:UK-Postcode"/>
  <attribute name="export-code" type="positive-integer"
             fixed="1"/>
 </complexType>

 <!-- other Address derivations for more countries --> 

 <simpleType name="US-State" base="string">
  <enumeration value="AK"/>
  <enumeration value="AL"/>
  <enumeration value="AR"/>
  <!-- and so on ... -->
 </simpleType>

 <!-- simple type definition for UK-Postcode -->

</schema>

The reader will have noticed that we have changed namespace conventions between the the original purchase order schema and the international purchase order schema. In particular, the XML Schema namespace is now the default, according to the default namespace declaration on the schema element (xmlns="http://www.w3.org/1999/XMLSchema"), and so the elements and built-in simple types belonging to XML Schema no longer require a prefix. In contrast, when references are made in the schema to types that have been defined in the schema, for example:

<element name="purchaseOrder" type="ipo:PurchaseOrderType"/>

then the type must appear with a prefix (ipo:) that is associated with the purchase order schema's namespace. Actually, we make this association in two steps: One statement (xmlns:ipo="http://www.example.com/IPO") associates the prefix with a particular URI, and a second statement (targetNamespace="http://www.example.com/IPO") asserts that this URI is the purchase order schema's namespace.

3.1 A Schema in Multiple Documents

Instead of the various address constructions being in the ipo.xsd file, they are located in address.xsd. To include these constructions as part of the international purchase order schema, in other words to include them in the international purchase order's namespace, ipo.xsd contains the include element:

<include schemaLocation="http://www.example.com/schemas/address.xsd"/>

The net effect of this include is equivalent to replacing the include element with all the definitions and declarations from address.xsd. Note that for the address constructions to be accessible as part of the international purchase order's schema, the namespace of the included constructions must be the same as the namespace of the international purchase order's schema. This is accomplished by making the (target) namespace of the included schema file the same as the (target) namespace of the including schema file, i.e. http://www.example.com/IPO. In this example, we have shown only one including document and one included document. In practice it is possible to include multiple documents using multiple include elements, and documents can include documents that themselves include other documents; Such nesting is legal only if all the included parts of the schema are declared to have the same target namespace.

Instance documents that conform to schema whose definitions span multiple schema documents need only reference the 'topmost' document, and the common namespace, and it is the responsibility of the XML processor to gather together all the definitions specified in the various included documents. So in our example, the instance document ipo.xml (see section 3.3) references only the common namespace, http://www.example.com/IPO, and the one schema file http://www.example.com/schemas/ipo.xsd.

In a later section we'll examine the situation when there is more than one schema namespace.

3.2 Deriving Types by Extension

To create our address constructs, we start by creating a complex type called Address in the usual way (see address.xsd). The Address type contains the basic elements of an address: a name, a street and a city. From this starting point we derive two new complex types that contain all the elements of the original type plus additional elements that are specific to addresses in the U.S. and the U.K.

We create the two new complex types, US-Address and UK-Address, using the complexType element along with values for base and derivedBy attributes. When a complex type is derived by extension, its effective content model is the content model of the base type plus the content model specified in the type derivation. The additional content is always appended at the end of the base type's content. In the case of UK-Address, the content model of UK-Address is the content model of Address plus the declarations for a postcode element and an export-code attribute. This is equivalent to defining the UK-Address from scratch as follows:

Example
 <complexType name="UK-Address">
   <!-- content model of Address -->
   <element name="name" type="string"/>
   <element name="street" type="string"/>
   <element name="city" type="string"/>

   <!-- appended declarations --> 
   <element name="postcode" type="ipo:UK-Postcode"/>
   <attribute name="export-code" type="positive-integer"
              fixed="1"/>
 </complexType>

3.3 Using Derived Types in Instance Documents

In our example scenario, purchase orders are generated in response to customer orders which may involve shipping and billing addresses in different countries. The international purchase order, ipo.xml, illustrates one such case where goods are shipped to England and the bill is sent to a US address. Clearly it is very useful if the schema for international purchase orders does not have to spell out every possible combination of international addresses for billing and shipping, and more so if we can add new complex types of international address simply by creating new derivations of Address.

XML Schema allows us to define the billTo and shipTo elements as Address types (see ipo.xsd) and to use instances of international addresses in place of instances of Address. In other words, an instance document that contains markup conforming to the UK-Address type will be valid if that markup appears within the document at a location where an Address is expected (assuming the UK-Address markup itself is valid). To make this feature of XML Schema work, and to disambiguate exactly which derived type is intended, the derived type should be identified in the instance document. The type is identified using the type attribute which is part of the XML Schema instance namespace. In the example, ipo.xml, use of the UK-Address and US-Address derived types is identified through the values assigned to the xsi:type attributes.

An International Purchase order, ipo.xml
<?xml version="1.0"?>
<ipo:purchaseOrder
  xmlns:xsi='http://www.w3.org/1999/XMLSchema/instance'
  xmlns:ipo="http://www.example.com/IPO"
  orderDate="1999-12-01">

    <shipTo export-code="1" xsi:type="ipo:UK-Address">
        <name>Helen Zoe</name>
        <street>47 Eden Street</street>
        <city>Cambridge</city>
        <postcode>CB1 1JR</postcode>
    </shipTo>

    <billTo xsi:type="ipo:US-Address">
        <name>Robert Smith</name>
        <street>8 Oak Avenue</street>
        <city>Old Town</city>
        <state>PA</state>
        <zip>95819</zip>
    </billTo>

    <items>
        <item partNum="833-AA">
            <productName>Lapis necklace</productName>
            <quantity>1</quantity>
            <price>99.95</price>
            <ipo:comment>Want this for the holidays!</ipo:comment>
            <shipDate>1999-12-05</shipDate>
        </item>
    </items>
</ipo:purchaseOrder>

Ed. Note: Describe here the use of namespaces in the instance document

3.4 Deriving Types by Restriction

Ed. Note: This section is not final, awaiting editorial decisions regarding exact syntax. Will provide a table detailing restrictions at that time.

As you have probably guessed, in addition to deriving new complex types by extending content models, it is also possible to derive new types by restricting the content models of existing types. A type derived by restriction looks just like an ordinary type definition, but it is constrained to only have declarations that are the same as or more limited than the corresponding declarations in the base type.

For example, suppose we want to update our definition of the list of items in an international purchase order so that it must contain at least one item on order (the schema in ipo.xsd currently allows an items element to appear without any child item elements). To create our new ConfirmedItems type, we define the new type in the usual way, indicate that it is derived from the base type Items, indicate that we are are deriving the new type by restriction, and indicate a new value for the minimum number of item element occurrences:

Deriving ConfirmedItems by Restriction from Items
 <complexType name="ConfirmedItems"
              base="ipo:Items" derivedBy="restriction">
     <element name="item" minOccurs="1"/>
 </complexType>

This change, requiring at least one child element rather than allowing zero or more child elements, narrows the range of allowable child elements. 

As another example, instead of extending Address to create US-Address, as earlier,  we instead could use a generic World-Address type and derive US-Address by restriction: 

Example
 <complexType name="World-Address">
   <element name="name"    minOccurs="0" maxOccurs="*"/>
   <element name="street"  minOccurs="0" maxOccurs="*"/>
   <element name="city"    minOccurs="0"/>
   <element name="region"  minOccurs="0"/>
   <element name="country" type="string" minOccurs="0"/>
   <element name="postal"  minOccurs="0"/>
 </complexType>

 <complexType name="US-Address" base="ipo:World-Address"
              derivedBy="restriction">
   <element name="name"    type="string" minOccurs="0"/>
   <element name="street"  type="string" minOccurs="0"
                                         maxOccurs="*"/>
   <element name="city"    type="string" minOccurs="1"/>
   <element name="state"   type="string" minOccurs="1"/>
   <element name="country" type="string" minOccurs="0"
            default="USA"/>
   <element name="zip" minOccurs="0"/>
 </complexType>

Note that the US-Address declares types for untyped elements in the base World-Address, tightens minOccurs and maxOccurs constraints, sets or changes default values and even renames region to state and postal to zip.

3.5 Equivalence Classes

that allows elements to be substituted for other elements. More

XML Schema provides a mechanism, called equivalence classes, specifically, elements can be made members of a special class of elements that are said to be equivalent to a particular named element which is called the exemplar. Note that the exemplar must be a global element. For example, we can declare two elements called customerComment and shipComment that are equivalent to the comment element, and so customerComment and shipComment can be used anyplace that we are able to use comment. Elements in an equivalence class must have the same type as the examplar (or they can have a type that has been derived from the exemplar's type). To declare these two new elements, and to make them equivalent to the comment element, we use the following syntax:

Declaring Elements Equivalent to comment
<element name='shipComment' type='string'
				equivClass='ipo:comment' />
<element name='customerComment' type='string'
equivClass='ipo:comment' />

When these declarations are added to the ipo.xsd schema file, comment can be substituted for in the instance document, for example:

Snippet of ipo.xml Containing Substituted Elements
....
 <items>
   <item partNum="833-AA">
     <productName>Lapis necklace</productName>
     <quantity>1</quantity>
     <price>99.95</price>
     <ipo:shipComment>Use blue wrap if possible</ipo:shipComment>
     <ipo:customerComment>
       Want this for the holidays!
     </ipo:customerComment>
     <shipDate>1999-12-05</shipDate>
   </item>
 </items>
....

The existence of an equivalence class does not require any of the elements in that class to be used, nor does it preclude use of the exemplar. It simply provides a mechanism for allowing elements to be used interchangeably.

3.6 Abstract Elements

XML Schema provides a mechanism that can preclude particular elements from being used. By declaring an element to be "abstract", it cannot be used in an instance document.

In the equivalence class scenario we have just described, it would be useful to specifically disallow use of the comment element so that instances must make use of the customerComment and shipComment elements. To declare the Comment element abstract, we modify its original declarations in the international purchase order schema, ipo.xsd, as follows:

<element name="comment" type="string" abstract='true'/>

With comment declared as abstract, instances of international purchase orders are now only valid if they contain customerComment and shipComment elements.

Ed. Note: Need to describe abstract types as well.

3.7 Preventing Type Derivations

So far, we have been able to derive new types without any restrictions. Schema authors will sometimes want to prevent derivations based on particular types, to avoid bad practice and for other reasons. Probably the simplest form of prevention is to specify that for a particular type (simple or complex), new types may not be derived from it by restriction, by extension, or all. To illustrate, suppose we want to prevent any derivation of the Address type by restriction because we decree that an Address must consist of at least a name, a street and a city (which is how it is defined in address.xsd). To prevent such derivations, we would slightly modify the original definition of Address as follows:

Preventing Derivations by Restriction of Address
<complexType name="Address" final="restriction">
  <element name="name" type="string"/>
  <element name="street" type="string"/>
  <element name="city" type="string"/>
</complexType>

The restriction value of the final attribute prevents derivations by restriction. Preventing derivations at all, or by extension, are indicated by the values #all and extension respectively.

Another prevention mechanism controls which derivations and equivalence classes may and may not be used in instance documents. In section 3.3, we described how the derived types, US-Address and UK-Address, could be used by the shipTo and billTo elements in instance documents. These derived types can replace the content model provided by the Address type with which the shipTo and BillTo elements were originally declared, because they are derived from the Address type. However, replacement by derived types can be controlled using the block attribute in a type definition. For example, if we want to block any derivation-by-restriction from being used in place of Address (perhaps for the same reason we defined Address with final='restriction'), we can modify the original definition of Address as follows:

Preventing Derivations by Restriction of Address in the Instance
<complexType name="Address" block="restriction">
  <element name="name" type="string"/>
  <element name="street" type="string"/>
  <element name="city" type="string"/>
</complexType>

The restriction value on the block attribute prevents derivations-by-restriction from replacing Address in an instance. However, it would not prevent UK-Address and US-Address from replacing Address because they were derived by extension. Preventing replacement by derivations at all, or by derivations-by-extension, are indicated by the values #all and extension respectively.

4 Advanced Concepts II: The Quarterly Report

The home-products ordering and billing application can generate ad-hoc reports that summarise how many of which types of products have been billed on a per region basis. An example of such a report, one that covers the fourth quarter of 1999, is shown in 4Q99.xml.

Quarterly Report, 4Q99.xml
<r:purchaseReport
  xmlns:r='http://www.example.com/Report'
  period="P3M" periodEnding="1999-12-31">

 <regions>
  <zip code="95819">
   <part number="872-AA" quantity="1"/>
   <part number="926-AA" quantity="1"/>
   <part number="833-AA" quantity="1"/>
   <part number="455-BX" quantity="1"/>
  </zip>
  <zip code="63143">
   <part number="455-BX" quantity="4"/>
  </zip>
 </regions>

 <parts>
  <part number="872-AA">Lawnmower</part>
  <part number="926-AA">Baby Monitor</part>
  <part number="833-AA">Lapis Necklace</part>
  <part number="455-BX">Sturdy Shelves</part>
 </parts>

</r:purchaseReport>

The report lists, by number and quantity, the parts billed to various zip codes, and it provides a description of each part mentioned. In summarising the billing data, the intention of the report is clear and the data is unambiguous because a number of constraints are in effect. For example, each zip code appears only once (uniqueness constraint). Similarly, the description of every billed part appears only once although parts may be billed to several zip codes (referential constraint): See for example, part number 455-BX. In the following sections, we'll see how to specify these constraints using XML Schema.

The Report Schema, report.xsd
<schema targetNamespace='http://www.example.com/Report'
        xmlns='http://www.w3.org/1999/XMLSchema'
        xmlns:r='http://www.example.com/Report'
        xmlns:xipo='http://www.example.com/IPO'>

 <!-- for Sku -->
 <import namespace='http://www.example.com/IPO'/>

 <annotation>
  <documentation>
   Report schema for Example.com
   Copyright 2000 Example.com. All rights reserved.
  </documentation> 
 </annotation>

 <element name="purchaseReport">
  <complexType>
   <element name="regions" type="r:RegionsType"/>
   <element name="parts" type="r:PartsType"/>
   <attribute name="period" type="timeDuration"/>
   <attribute name="periodEnding" type="date"/>
  </complexType>
	
  <unique>
   <selector>regions/zip</selector>
   <field>@code</field>
  </unique>

  <key name="pNumKey">
   <selector>parts/part</selector>
   <field>@number</field>
  </key>

  <keyref refer="pNumKey">
   <selector>regions/zip/part</selector>
   <field>@number</field>
  </keyref> </element>

 <complexType name="RegionsType">
  <element name="zipcode" minOccurs="1" maxOccurs="*">
   <complexType>
    <element name="part">
     <complexType content="empty">
      <attribute name="number" type="xipo:Sku"/>
      <attribute name="quantity" type="positive-integer"/>
     </complexType>
    </element>
    <attribute name="code" type="positive-integer"/>
   </complexType>
  </element>
 </complexType>

 <complexType name="PartsType>
  <element name="part" minOccurs="1" maxOccurs="*">
   <complexType content="textOnly">
     <attribute name="number" type="xipo:Sku"/>
   </complexType>
  </element>
 </complexType>

</schema>

4.1 Specifying Uniqueness

XML Schema enables us to indicate that any attribute or element value must be unique; In fact, it enables us to indicate that combinations of attribute and element values must be unique. To indicate that one particular attribute or element value is unique, we use the unique element identify the set of elements containing the attribute or element value, and within this set we identify the particular attribute or element. In the case of our report schema, report.xsd, the selector element contains an XPath expression, regions/zip, that returns a list of all the zip elements in a report instance, and the field element contains a second XPath expression, @code, that indicates the code attribute of those elements must be unique. Note that the XPath expression limits the scope of what must be unique. The report might contain another code attribute, but it's value does not have to be unique because it lies outside the scope defined by the XPath expression.

As we mentioned previously, we can indicate that combinations of values must be unique. To illustrate, suppose we can relax the constraint that zip codes may only be listed once, although we still want to enforce the constraint that any product is listed only once within a given zip code. We could acheive such a constraint by specifying that the combination of zip code and product number must be unique. From the report document, 4Q99.xml, the combined values of zip and number would be: {95819 872-AA}, {95819 926-AA}, {95819 833-AA}, {95819 455-BX}, and {63143 455-BX}. Clearly, these combinations do not distinguish between zip and number combinations derived from single or multiple listings of any particular zip, but the combinations would unambiguously represent a product listed more than once within a single zip. In other words, a schema processor could detect violations of the uniqueness constraint.

To define combinations of values, we simply add field elements to identify all the values involved. So, to add the part number value to our existing definition, we add a new field element whose XPath expression, part[@number], identifies the number attribute of part elements that are children of the zip elements identified by regions/zip:

A Unique Composed Value
 <unique>
  <selector>regions/zip</selector>
  <field>@code</field>
  <field>part[@number]</field>
 </unique>

The XPath language used in specifying uniqueness, keys and key references is a subset of the XML Path Language 1.0.

Ed. Note: Describe here the subset of XPath

4.2 Defining Keys and their References

In the 1999 quarterly report, the description of every billed part appears only once. We could enforce this constraint using unique, however, we also want to ensure that every part-quantity element listed under a zipcode has a corresponding part description, and so we use the key and keyref elements instead. The report schema, report.xsd, shows that the key and keyref constructions are applied using almost the same syntax as unique. The key element applies to the number attribute value of part element's that are children of the parts element. This declaration of number as a key means that its value must be unique, and the name that is associated with the key, pNumKey, makes the key referenceable from elsewhere.

To ensure that the part-quantity elements have corresponding part descriptions, we say that the number attribute ( <field>@number</field>) of those elements ( <selector>regions/zip/part</selector>) must reference the pNumKey key. This declaration of number as a keyref does not mean that its value must be unique, but it does mean there must exist a pNumKey with the same value.

As you may have figured out by analogy with unique, it is possible to define combinations of key and keyref values. Using this mechanism, we could go beyond simply requiring the product numbers to be equal, and define a combination of values that must be equal. Such values may involve combinations of multiple value types (string, integer, date, etc), provided that the order of the field element references is the same in both the key and keyref definitions.

4.3 XML Schema Constraints vs. XML 1.0 ID Attributes

XML 1.0 provides a mechanism for ensuring uniqueness using the ID and associated IDREF and IDREFS attributes. So, how do XML Schema's mechanisms compare? In short, they are vastly more powerful. More specifically, XML Schema's mechanisms can be applied to any element and attribute content, regardless of their type. In contrast, ID is a type of attribute and so cannot be arbitrarily applied to attributes, elements and their content. Furthermore, Schema enables you to specify the scope within which uniqueness applies whereas the range within which an ID applies is unique cannot be modified. Finally, Schema enables you to create keys or a keyref from combinations of element and attribute content whereas ID has no such facility.

4.4 Importing Types

The report schema, report.xsd, makes use of the simple type xipo:Sku that is defined in another schema, and more specifically, in another namespace. Recall that we used include so that the schema in ipo.xsd could make use of definitions and declarations from address.xsd. We cannot use include here because it can only "import" definitions and declarations from a schema whose target namespace is the same as the including schema's target namespace. Hence, the include element does not identify a namespace (although it does require a schemaLocation ).

To import the type Sku and use it in the report schema, we must identify the namespace in which Sku is defined, and associate that namespace with a prefix for use in the report schema. Specifically, we use the import element to identify Sku's namespace (http://www.example.com/IPO), and we associate the namespace with the prefix xipo using a standard namespace declaration. We use xipo rather than ipo to illustrate that the prefix is only used locally. The simple type Sku, defined in the namespace http://www.example.com/IPO, may then be referenced as xipo:Sku in any definitions and declarations.

In our example, we imported one simple type from one external namespace, and referred to it in an attribute declaration. XML Schema in fact permits multiple schema components to be imported, from multiple namespaces, and they can be referred to in both definitions and declarations. We can reference an element in a declaration, for example in report.xsd we can reuse the comment element declared in po.xsd:

<element ref='xpo:comment' minOccurs='1'/>

Note however, that we cannot reuse the shipTo element from po.xsd, and the following is not legal:

<element ref='xpo:shipTo'/>

The reason is that only global schema components can be imported. In po.xsd, comment is declared as a global element, in other words it appears as a sublement of schema. In contrast, shipTo is declared locally, in other words it is declared as part of something else, namely the PurchaseOrderType definition.

Complex types may also be imported, and they can be used as the base type for deriving new types. Suppose we want to include in our reports the name of an analyst, along with contact information. We can reuse the (globally defined) complex type US-Address from address.xsd, and extend it to include phone and email to define a new type called Analyst:

Defining Analyst by Extending US-Address
 <complexType name='Analyst' base='xipo:US-Address'
              derivedBy='extension'>
   <element name="phone" type="string"/>
   <element name="email" type="string"/>
 </complexType>

Using the new type we declare an element called analyst (declaration not shown). A snippet of an instance document conforming to analyst is:

Snippet of Instance Document Conforming to Analyst
 <analyst>
        <name>Wendy Uhro</name>
        <street>10 Corporate Towers</street>
        <city>San Jose</city>
        <state>CA</state>
        <zip>95113</zip>
        <phone>408-271-3366</phone>
        <email>[email protected]</email>
 </analyst>

When schema components are imported from multiple namespaces, each namespace must be identified with a separate import element. The import elements themselves must appear as the first children of the schema element. Furthermore, each namespace must be associated with a prefix, using a standard namespace declaration, and that prefix used to qualify references to any schema components belonging to that namespace. Finally, import elements optionally contain a schemaLocation attribute to help locates resource associated with the namespaces. We discuss the schemaLocation attribute in more detail in a later section.

4.5 Any Element, Any Attribute

In previous sections we have seen several mechanisms for extending the content models of complex types. For example, a mixed content model can contain arbitrary character data in addition to elements, and for example, a content model can contain particular elements whose types are imported from external namespaces. However, these mechanisms respectively provide very broad and very narrow controls, and the purpose of this section is to describe a flexible mechanism that enables content models to be extended by any elements and attributes belonging to specified namespaces.

To illustrate, consider a version of the quarterly report, 4Q99html.xml, in which we have embedded an an HTML formatted representation of the XML parts data. The HTML content appears as the content of the element htmlExample, and the default namespace is changed on the outermost HTML element (table) so that all the HTML elements belong to the HTML namespace, http://www.w3.org/1999/XHTML:

Quarterly Report with HTML, 4Q99html.xml
<r:purchaseReport
  xmlns:r='http://www.example.com/Report'
  period="P3M" periodEnding="1999-12-31">

 <regions>
   <!-- part sales listed by zipcode, data from 4Q99.xml -->
 </regions>

 <parts>
   <!-- part descriptions from 4Q99.xml -->
 </parts>

 <htmlExample>
  <table xmlns='http://www.w3.org/1999/HTML'
         border="0" width="100%">
   <tr>
     <th align="left">Zip Code</th>
     <th align="left">Part Number</th>
     <th align="left">Quantity</th>
   </tr>
   <tr><td>95819</td><td> </td><td> </td></tr>
   <tr><td> </td><td>872-AA</td><td>1</td></tr>
   <tr><td> </td><td>926-AA</td><td>1</td></tr>
   <tr><td> </td><td>833-AA</td><td>1</td></tr>
   <tr><td> </td><td>455-BX</td><td>1</td></tr>
   <tr><td>63143</td><td> </td><td> </td></tr>
   <tr><td> </td><td>455-BX</td><td>4</td></tr>
  </table>
 </htmlExample>

</r:purchaseReport>

To permit the appearance of HTML in the instance document we modify the report schema by declaring a new element htmlExample whose content is defined by the any element. In general, an any element specifies that any well-formed XML is permissable in a type's content model. In the example, we require the XML to belong to the namespace http://www.w3.org/1999/XHTML, in other words, it should be HTML. The example also requires there to be at least one element present from this namespace, as indicated by the values of minOccurs and maxOccurs:

Modification to purchaseReport Declaration to Allow HTML in Instance
 <element name="purchaseReport">
  <complexType>
   <element name="regions" type="r:RegionsType"/>
   <element name="parts" type="r:PartsType"/>
   <element name='htmlExample'>
     <complexType>
       <any namespace='http://www.w3.org/1999/XHTML'
			      minOccurs='1' maxOccurs='*'
            processContents='skip'/>
     </complexType>
   </element>
   <attribute name="period" type="timeDuration"/>
   <attribute name="periodEnding" type="date"/>
  </complexType>
 </element>

The modification permits some well-formed XML belonging to the namespace http://www.w3.org/1999/XHTML to appear inside the htmlExample element. Therefore 4Q99html.xml is permissable because there is one element which (with its chlildren) is well formed, the element appears inside the appropriate element (htmlExample), and the instance document asserts that the element and its content belongs to the required namespace. However, the HTML may not actually be valid because nothing in 4Q99html.xml by itself can provide that guarantee. If such a guarantee is required, the value of the processContents attribute should be set to strict (which is in fact the default value). In this case, an XML processor is obliged to obtain the schema associated with the required namespace, and validate the HTML appearing within the htmlExample element. Alternatively, the value of the processContents attribute can be set to lax, in which case the processor will validate the HTML on a can-do basis: It will validate elements and attributes for which it can obtain schema information, but it will not signal errors for those it cannot obtain schema information.

Namespaces may be used to permit and forbid element content in various ways depending upon the value of the namespace attribute:

Values of Namespace Attribute
namespace='##any' any well-formed XML from any namespace (default)
namespace='##local' any well-formed XML that is not qualified, i.e. not declared to be in a namespace
namespace='##other any well-formed XML in a namespace different from the namespace of the type being defined
namespace='http://www.w3.org/1999/XHTML ##targetNamespace' any well-formed XML belonging to any namespace in the (whitespace separated) list; ##targetNamespace is shorthand for the target namespace of the enclosing schema

In addition to the any element which enables element content according to namespaces, there is a corresponding anyAttribute element which enables attributes to appear in elements. For example, we can permit any HTML attribute to appear as part of the htmlExample element by adding anyAttribute to its declaration:

Modification to htmlExample Declaration to Allow HTML Attributes
   <element name='htmlExample'>
     <complexType>
       <any namespace='http://www.w3.org/1999/XHTML'
			      minOccurs='1' maxOccurs='*'
            processContents='skip'/>
       <anyAttribute namespace='http://www.w3.org/1999/XHTML'/>
     </complexType>
   </element>

This declaration permits an HTML attribute, such as href, to appear in the htmlExample element. For example:

An HTML attribute in the htmlExample Element
....
  <htmlExample xmlns:h="http://www.w3.org/1999/XHTML"
               h:href="http://www.example.com/reports/4Q99.html">
     <!-- HTML markup here -->
  </htmlExample>
....

The namespace attribute in an anyAttribute element can be set to any of the values listed for the any element. But, in contrast to an any element, anyAttribute cannot restrict the number of attributes that may appear in an element.

Ed. Note: Decision pending whether or not anyAttribute has a processContents attribute.

4.6 schemaLocation

XML Schema uses attributes named schemaLocation in three circumstances.

  1. In an instance document, the attribute xsi:schemaLocation provides hints from the author to any reader regarding the location of schema documents. The author warrants that these schema documents are relevant to checking the validity of the material in the document, on a namespace by namespace basis. The presence of these hints does not require the reader to obtain or use the cited schema documents, and the reader is free to use other schemas obtained by any suitable means, or no schema at all.
  2. In a schema, the include element has a required schemaLocation attribute, and it contains a URI reference which must identify a schema document. The effect is to compose a final effective schema by merging in the contents of the schema contained by the referenced schema document.
  3. Also in a schema, the import element has a required namespace attribute and an optional schemaLocation attribute. If present, the schemaLocation attribute is understood in a way which parallels the interpretation of xsi:schemaLocation in (1). Specifically, it provides a hint from the author to any reader regarding the location of a schema document that the author warrants supplies the required components for that namespace. The hint does not require the reader to obtain or use the cited schema document, but some schema components from that namespace are likely to be necessary for successful validation.

4.7 Conformance

An XML instance document may be processed against a schema to verify whether the rules specified in the schema are honored in the instance. Typically, such processing actually does two things: it checks for conformance to the rules, called "validation," and it also adds supplementary information that is not immediately present in the instance, such as types and default values, called "InfoSet contributions."

The author of an XML instance, such as a particular purchase order, may claim, in the instance itself, that it conforms to the rules in a particular schema. The author does this using the schemaLocation attribute discussed elsewhere. But regardless of whether a schemaLocation attribute is present, an application is free to process the document against any schema. For example, a purchasing application may have the policy of always using a certain purchase order schema, regardless of any schemaLocation values.

Conformance checking can be thought of as proceeding in steps, first checking that the root element of the document instance has the right contents, then checking that each contained element conforms to its description in a schema, and so forth until the entire document is verified. Of course, it is possible to check only a portion of a document either by not starting at the root, or by stopping before the full depth has been reached. Whether a given processor supports such partial checking is optional, but processors are required to report what checking has been done.

To check an element for conformance, the processor first locates the declaration for the element in a schema, and then checks that the targetNamespace attribute in the schema matches the actual namespace URI of the element (or, alternatively, that the schema does not have a targetNamespace element and the instance element is not namespace-qualified).

Supposing the namespaces match, the processor then examines the type of the element, either as given by the declaration in the schema, or by an xsi:type attribute in the instance. If the latter, the instance type must be an allowed substitution for the type given in the schema; what is allowed is controlled by the block attribute in the schema. At this same time, default values and other InfoSet contributes are applied.

Next the processor checks the immediate attributes and contents of the element, comparing these against the attributes and contents permitted by the element's type. For example, considering a shipTo element such as found in section 2.1, the processor checks against what is permitted for a Address, since that is the shipTo element's type.

If the element has a simple type, the processor verifies that the element has no attributes or contained elements, and that its character content matches the rules for the simple type. This sometimes involves checking the character sequence against regular expressions or enumerated literals, and sometimes it involves checking that the character sequence represents a value in a permitted range.

If the element has a complex type, then the processor checks that any required attributes are present and that their values conform to the requirements of their simple types. It also checks that all required subelements are present, and that the sequence of subelements (and any mixed text) matches the content model declared for the complex type. Regarding subelements, schemas can either require exact name matching, permit substitution by an equivalent element or permit substitution by any element allowed by an 'any' particle.

Unless a schema indicates otherwise (as it can for 'any' particles) conformance checking then proceeds one level more deeply by looking at each subelement in turn, according to the process described above.


Appendices

A. Acknowledgements

Many people have contributed ideas, material and feedback that has improved this document. In particular, the editor would like to acknowledge contributions from David Beech, Paul Biron, Allen Brown, David Cleary, Dan Connolly, Roger Costello, Dave Hollander, John McCarthy, Andrew Layman, Eve Maler, Ashok Malhotra, Noah Mendelsohn, and Henry Thompson.

B. Simple Types & their Facets

The legal values for each simple type can be constrained through the application of one or more facets. Tables B.1a and B.1b list all of XML Schemas built-in simple types and the facets applicable to each type.

Table B.1a. Simple Types & Applicable Facets
Simple Types Facets
  length minlength maxlength pattern enumeration
string y y y y y
boolean       y  
float       y y
double       y y
decimal       y y
timeInstant       y y
timeDuration       y y
recurringInstant       y y
binary y y y y y
uri-reference y y y y y
ID y y y y y
IDREF y y y y y
ENTITY y y y y y
NOTATION y y y y y
language y y y y y
IDREFS y y y   y
ENTITIES y y y   y
NMTOKEN y y y y y
NMTOKENS y y y   y
Name y y y y y
QName y y y y y
NCName y y y y y
integer       y y
non-positive-integer       y y
negative-integer       y y
long       y y
int       y y
short       y y
byte       y y
non-negative-integer       y y
unsigned-long       y y
unsigned-int       y y
unsigned-short       y y
unsigned-byte       y y
positive-integer       y y
date       y y
time       y y

The facets listed in Table B1.b apply only to simple types which are ordered. Not all simple types are ordered and so B1.b does not list all of the simple types.

Table B.1b. Simple Types & Applicable Facets
Simple Types Facets
  max
Inclusive
max
Exclusive
min
Inclusive
min
Exclusive
precision scale encoding period
string y y y y        
float y y y y        
double y y y y        
decimal y y y y y y    
timeInstant y y y y        
timeDuration y y y y        
recurringInstant y y y y       y
binary             y  
integer y y y y y y    
non-positive-integer y y y y y y    
negative-integer y y y y y y    
long y y y y y y    
int y y y y y y    
short y y y y y y    
byte y y y y y y    
non-negative-integer y y y y y y    
unsigned-long y y y y y y    
unsigned-int y y y y y y    
unsigned-short y y y y y y    
unsigned-byte y y y y y y    
positive-integer y y y y y y    
date y y y y       y
time y y y y       y

C. Regular Expressions

XML Schema's <pattern> facet uses a regular expression language that supports Unicode. The language is similar to the regular expression language used in the Perl Programming language [bibref to Perl], although expressions are matched against entire lexical representations rather than user-scoped lexical representions such as line and paragraph. For this reason, the expression language does not contain the metacharacters ^ and $, although ^ is used to express exception, e.g. [^0-9]x.

Table C1. Examples of Regular Expressions
Expression Match(s)
Chapter \d Chapter 0, Chapter 1, Chapter 2 ....
Chapter\s\d Chapter followed by a single whitespace character (space, tab, newline, etc), followed by a single digit
Chapter\s\w Chapter followed by a single whitespace character (space, tab, newline, etc), followed by a word character (XML 1.0 Letter or Digit)
Espan&#x0303;ola Espan~ola (where n and tilde are combined into a single character)
\p{Lu} any uppercase character, the value of \p{} (e.g. "Lu") is defined by Unicode
\p{IsGreek} any Greek character, the 'Is' construction may be applied to any block name (e.g. "Greek") as defined by Unicode
\P{IsGreek} any non-Greek character, the 'Is' construction may be applied to any block name (e.g. "Greek") as defined by Unicode
a*x x, ax, aax, aaax ....
a?x ax, x
a+x ax, aax, aaax ....
(a|b)+x ax, bx, aax, abx, bax, bbx, aaax, aabx, abax, abbx, baax, babx, bbax, bbbx, aaaax ....
[abcde]x ax, bx, cx, dx, ex
[a-e]x ax, bx, cx, dx, ex
[-ae]x -x, ax, ex
[ae-]x ax, ex, -x
[a-e-[bd]]x ax, cx, ex
[^0-9]x any non-digit character followed by the character x
\Dx any non-digit character followed by the character x
.x any character followed by the character x
.*abc.* 1x2abc, abc1x2, z3456abchooray ....
ab{2}x abbx
ab{2,4}x abbx, abbbx, abbbbx
ab{2,}x abbx, abbbx, abbbbx ....
(ab){2}x ababx

 

D. Index

XML Schema Elements:

XML Schema Attributes:

XML Schema's simple types are described in Table 1.

E. Document History

February 25th draft submitted as public draft.

Rewrote "Section X covers .." in Introduction. Added xsd: to simple types in sec. 2 and textual explanation. In sec. 2.3 added URLs to text and Table 1 linking it to Datatypes spec. Corrected URI for xsi:. Added sec. 3.0 explanation for namespace convention change schema. Substantially rewrote se. 3.5 to fix error, and created a new section 3.6 to cover Abstract Elements. Changed "exact" to "block" in sec 3.6. Fixed error in location of unique/key defns in sec. 4.0, and XPath expressions. Fixed ##local and ##TargetNS, and clarified anyAttb in sec 4.5. Appendix B added URLs to Table B1 linking to Datatypes spec. Fixed general typos.

February 23rd draft published to WG.

Added sections on import, schemaLocation, conformance, wildcard, and type content. Replaced "source" with "base". Modified HTML for W3C compliance. Moved Types of Content section from section 3 to section 2. Updated use of namespaces in instance and schema in all sections, reworked section 2 text to account for these changes. Fixed typos, added/deleted text at suggestion of WG members.

February 16th draft published to WG.

Added regular expression description as an appendix. Substantial re-ordering and rewrite of section 2. Added index as an appendix. Fixed large number of typos and adopted TypeName and elementName naming convention.

February 9th draft published to WG.