Copyright© 2000 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This document presents a proposal for explicitly representing data models for XForms, the next generation of Web forms. Apart from other mechanisms described in this document, it is based upon the framework provided by XML Schema. While XML Schemas are used to define XML grammars, the XForms data model is intended to capture the device-independent data model and logic of form-based Web applications.
Although both specifications address different problems, they overlap in the definition of simple datatypes. Therefore, the datatypes defined in this specification are a close match to the datatypes found in XML Schema Part 2: Datatypes [XSchema-2]. In some cases, however, the XForms datatypes differ from the ones in XML Schema, due to different usage scenarios and target audiences. In Appendix A, an [XSLT] filter will be provided for translating the XForms data model into the corresponding syntax defined in the XML Schema specifications.
A later specification will focus on the user interface aspects of XForms.
This is a W3C Working Draft. It is intended for review by W3C members and other interested parties.
This working draft may be updated, replaced or rendered obsolete by other W3C documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current public W3C working drafts can be found at http://www.w3.org/TR.
This document is work in progress and does not imply endorsement by the W3C membership or the HTML Working Group (members only).
This document has been produced as part of the W3C HTML Activity. Further information on XForms can be found at http://www.w3.org/MarkUp/Forms.
Please send detailed comments on this document to [email protected], the public forum for discussion of W3C's work on Web forms.
Web forms are an important part of the Web, allowing fine-tuned interaction between document reader and document author, Web page visitor and Web server, software program and software program, buyer and seller over the Web.
The demand for richer Web forms that allow greater flexibility and richer interaction mechanisms has led to several proposals for the next generation of Web forms. XForms are the result of extended analysis, creating a new platform-independent markup language for user interaction and transactional behavior between a user agent and a remote entity.
XForms are the successor to HTML forms, and as such is being designed as modules for integration in [XHTML 1.0]. However, the design of XForms allows its usage in other XML grammars as well.
Proposals for the next generation of Web forms separate the user interface from the data and logic, allowing different presentations to be used with the same back-end. The form is represented in terms of the following pieces:
An explicit data model defining the form as a composite datatype with constraints on and between form data values
The user interface, expressed as a set of presentation controls that are bound to the data model
The use of XML and Unicode for exchanging form data with servers
In the past it was necessary to compromise the presentation to accommodate the various media with a one-size-fits-all approach. For instance, imagine a form that can be filled out either on a palm-top computer or on a paper print out. Now that XForms address the user interface separately from the data, the same form can include many presentations.
Apart from various problems with previous HTML forms, the design didn't separate the purpose from the presentation of a form. There is a fine distinction between the purpose of a form (i.e. the questions being asked to the user, the selections he may choose from, the logical sequence of decisions) and the presentation of the form (e.g. the visual appearance on a screen).
The purpose of a form can be expressed in various, device and media-specific ways, without losing the original intention of the form designer. The presentation, however, loses its richness when trying to accommodate every possible device because the least common denominator has to be used.
XForms separate purpose and presentation with two specifications: "Data model" and "User Interface". The data model in XForms is covered in this document, and a future document will focus on the user interface. The data model allows the abstract structure of a form to be defined without explicitly specifying a user interface. XForms will introduce a new user interface layer for richer user interaction, but the device-indepedence will be limited to avoid the sacrifice of functionality in and beyond existing HTML forms. Since the data model is device-independent, it is possible to bind other XML grammars to the data model, for instance VoiceML, WML, SMIL or even existing XHTML forms.
Even though Web forms represent a world of their own, they are often only building blocks in larger frameworks, for instance database and workflow applications. The process of moving information from within a database to the inside of an HTML document containing a form and back from the form via submit to the database is one of many usage scenarios where forms are just one component.
The XForms design focuses on improving both the form itself as well as the match to database and workflow applications. XML is the universal data format, and now almost any data can be represented as XML. Since a form is a structured data exchange, XML and forms are a perfect match. In fact, it is possible to simply edit arbitrary XML document instances with XForms in a user agent.
Here is an example how to use XForms to edit a simple XML document, based on a simplified version of the purchase order example found in "XML Schema Part 0: Primer" [XSchema-0].
<?xml version="1.0"?> <purchaseOrder> <shipTo> <name>Alice Smith</name> <street>123 Maple Street</street> <city>Mill Valley</city> <state>CA</state> <zip>90952</zip> </shipTo> </purchaseOrder>
[Issue: Will it be possible for attributes in the instance data to be edited with XForms?]
We would like to allow the contents of the elements inside
<shipTo>
to be edited as text input fields on a
Web page. To do this, we need to construct an XForms data model
that maps to this XML. This is simple to do by hand, although
advanced users might want to use more powerful tools, such as [XSLT], as explained in Appendix A.
This is what the XForms data model would look like for the preceding purchase order XML document instance:
<group name="purchaseOrder"> <group name="shipTo"> <string name="name"/> <string name="street"/> <string name="city"/> <string name="state"/> <string name="zip"> <mask>ddddd</mask> </string> </group> </group>
In XForms, the underlying data model that the form represents will be persisted into a generic, well-formed XML document where:
The data model can be embedded in a parent document, giving it form capabilities. Likewise, the instance data can also be embedded in a parent document, serving as form data. In the following example, both the data model and instance data have been embedded in an [XHTML 1.0] document:
<?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML-XForms 1.0//EN" "http://www.w3.org/TR/xhtml-forms1/DTD/xhtml-xforms1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>Purchase Order</title> <xform xmlns="http://www.w3.org/2000/xforms" action="http://www.my.com/cgi-bin/receiver.pl" method="postXML" id="po_xform"> <model> <group name="purchaseOrder"> <group name="shipTo"> <string name="name"/> <string name="street"/> <string name="city"/> <string name="state"/> <string name="zip"> <mask>ddddd</mask> </string> </group> </group> </model> <instance> <purchaseOrder> <shipTo> <name>Alice Smith</name> <street>123 Maple Street</street> <city>Mill Valley</city> <state>CA</state> <zip>90952</zip> </shipTo> </purchaseOrder> </instance> </xform> </head> <body> <h1>Shipping Information</h1> <form name="po_xform"> Name: <input name="purchaseOrder.shipTo.name"/><br/> Street: <input name="purchaseOrder.shipTo.street"/><br/> City: <input name="purchaseOrder.shipTo.city"/><br/> State: <input name="purchaseOrder.shipTo.state"/><br/> Zip: <input name="purchaseOrder.shipTo.zip"/><br/> <button onclick="submit('po_xform')">Submit</button> </form> </body> </html>
Note that standard [XHTML 1.0] form elements have been used for the user interface. XForms makes it possible for different user interface modules to work the same data model such as SVG and SMIL. Appendix B explains how XHTML form user interface elements work with XForms.
When the user changes the entries in the text input fields, the instance data gets updated. When the user hits the submit button, the instance data gets sent over to the server, i.e. the same XML document from the very beginning but with updated element contents. On the server, this returning XML document can be validated with the same Schema as the original XML document. In fact, once valid, the original XML document on the server can be overwritten with the new one returned from the XForm. Hence we can think of XForms as editing XML in the browser.
The starting point is a small set of built-in datatypes. For instance, here is how you could define a string valued data item named "City":
<string name="city"/>
If the user entered "Boston", this would appear in the submitted data as:
<city>Boston</city>
Each datatype supports several facets which you can use
to constrain data items. The name attribute can be used with all
datatypes. The value for name attributes must match the syntax for
object identifiers in ECMAScript as defined in the [ECMA-262] specification. This precludes
the use of ".
", ":
" or "-
"
characters in names. This restriction makes it practical to use
names within scripts.
Strings have the following facets:
Facet | Description | Default |
---|---|---|
min |
minimum length in characters | 0 |
max |
maximum length in characters | unlimited |
mask |
simple mask, e.g. "ddd-ddd-dddd " |
no restriction |
pattern |
regular expression, e.g. "\d{3}-\d{3}-\d{4} " |
no restriction |
A simple syntax may be used to constrain, or "mask" permissible lexical values at the character level. Most people will find masks easier to understand than regular expression patterns, however this is at the expense of some expressive power.
A fundamental simplifying design aspect of masks is the absence of escape characters. Characters in the mask always map one-to-one with characters in the corresponding string data.
The mask is a string where certain characters are used to represent given classes of characters, and any remaining characters are literals. The following character classes are defined:
Character Class | Description | Equiv. Regex | Default Representation |
letter |
All letter characters | \p{L} |
'l ' |
digit |
All digit characters | \d |
'd ' |
character |
All characters allowed in XML names | \c |
'c ' |
space |
All whitespace characters | \s |
's ' |
any |
Any Unicode character except newline | . (dot) |
'. ' |
Note that any given mask can be transformed into an equivalent regular expression.
All non-literal positions in the mask must be filled. This facet
is identified by a <mask>
child element. Using
this basic syntax, a telephone number could be represented like
this:
<string name="phone"> <mask>ddd-dddd</mask> </string>
[Issue: We may want to consider allowing a child attribute for simple cases]
Multiple pattern facets can be specified. They will be processed
in document order, in a logical OR
fashion. For
instance, allowing multiple types of postal codes might be done
like this:
<string name="postalcode"> <mask>ddddd</mask> <!-- US ZIP code --> <mask>ddddd-dddd</mask> <!-- US ZIP+4 code --> <mask>lldsdll</mask> <!-- UK postal code --> <mask>llddsdll</mask> <!-- UK postal code --> <mask>ldlsdld</mask> <!-- Canadian postal code --> </string>
In some instances, users may want to use different
representative characters than 'l
', 'd
',
'c
', 's
', and '.
'. Perhaps
one or more of those symbols are needed as a literal part of the
string. Perhaps using different characters makes it easier to
interoperate with an existing system or makes an [XSLT] transformation simpler. For these
cases, the representative characters can be redefined, using an
attribute with the same name as the "character class" heading
above. For example:
<string name="phone"> <mask digit="#">###-####</mask> </string>
Only single characters can be used when redefining a representative character.
ILLEGAL: <mask digit="foo">...</mask>
Regular expressions are more powerful, but considerably harder to understand. XForms use Perl-like regular expressions modified for Unicode compliance, as described in Appendix E of the XML Schema Datatypes part 2 document [XSchema-2].
Regular expression pattern facets are identified by a
<pattern>
child element.
[Issue: Currently, the element
<pattern>
is used for compatibility with XML Schema
Datatypes. Is the difference between <mask>
and
<pattern>
too subtle?]
<string name="phone"> <pattern>(\d{3}-)?\d{3}-\d{4}</pattern> </string>
These represent true
or false
values.
Here is an example of a Boolean data item:
<boolean name="married"/>
This would appear in the submitted data as:
<married>true</married>
or
<married>false</married>
Note this example could also have been represented by an enumeration, which is explained in Section 6.1.
Numeric calculations should be performed on the internal data values (not the presentation values) using decimal arithmetic, except where the resource constraints preclude this.
For example:
<number name="age"/>
When submitted this would appear like:
<age>24</age>
Numbers can be constrained in various ways using the following facets:
Facet | Description | Default |
---|---|---|
min |
minimum value | minus infinity |
max |
maximum value | plus infinity |
integer |
if "true" only integer values are permitted | real numbers |
decimals |
how many digits after the decimal point are significant | unlimited |
Here are some examples:
The following is a non-zero positive integer:
<number name="count" min="1" integer="true"/>
[Issue: Integers are used commonly enough that an
abbreviated syntax may be desirable, e.g.:
<integer name="quantity"/>
]
Monetary values and represented using the
<money>
element and can be constrained in various ways
using the following facets
Facet | Description | Default |
---|---|---|
min |
minimum value, e.g. "0" | minus infinity |
max |
maximum value | plus infinity |
decimals |
how many digits after the decimal point are significant | unlimited |
currency |
a space separated list of 3 letter currency codes, e.g.
USD or GBP |
unspecified |
Calculations should be carried out using decimal arithmetic. The
rounding method used for calculations involving monetary values
will be specified after consultation with the financial community,
but is likely to be ROUND_HALF_UP
.
The currency attribute allows you to specify a list of acceptable currencies. The first in the list shall be considered to be the default when the data value doesn't specify the currency.
This is a value restricted to US Dollars:
<money name="price" currency="USD"/>
When submitted this would appear as:
<price>24.25</price>
This is a non-negative value in British Pounds:
<money name="price" currency="GBP" decimals="2" min="0"/>
If the monetary datatype allows more than one currency, the currency for a given data value represented by the currency attribute, e.g.
<price currency="EUR">26.00</price>
Three letter currency codes are defined in [ISO 4217].
[Issue: Do we want to support a means to indirectly specify facets when there are many money values in a form, all of which accept the same currency?]
Dates are specified in years, months and days following the [ISO 8601] standard for date and time. The format consists of a decimal number denoting the year, optionally followed by the month and day, separated by hyphens. Thus 31st January 2000 is "2000-01-31", while the year 1976 is "1976". Note that months and days are always represented with two digits, with a leading zero for numbers in the range 1 to 9. Here is an example of how you declare a date.
<date name="date"/>
The facets for <date>
allow you to constrain
the value to be in the past or future, or in a given range. This
range can be set to be relative to the present (when the form is
filled in), and may be specified in days
,
months
or years
.
Facet | Description | Default |
---|---|---|
min |
minimum value or "now " |
the distant past |
max |
maximum value or "now " |
the distant future |
precision |
"years ", "months " or
"days " |
unconstrained |
The values for min
and max
can be
explicit dates. Alternatively, you specify the special value
"now
" which refers to the date the form is submitted.
Finally you can specify values relative to the submission date
using positive or negative durations. The syntax for dates and
durations are as per the subset of [ISO
8601] specified for XML Schemas for time instants and
durations.
The value "now
" can be used to restrict dates to be
in the past or future. For example a date of birth could be
constrained to be in the past:
<date name="birth" max="now"/>
For a credit card expiry date, the value could be constrained to be some time between now and 4 years hence:
<date name="expires" precision="months" min="now" max="+P4Y"/>
The lexical format for dates and times to be transmitted to the server is defined in [ISO 8601].
It is recommended that user agents offer date and time pickers which offer date validation and choices from the distant past to the distant future. Small portable devices will likely validate and pick only dates in the range likely for business appointments near the current time; whereas, a full-featured desktop browser, which supports use cases such as historical records search and long-term financial obligations, should offer an extended range of dates. As always, the server must assume that the client has not performed the validation specified in the data model and perform its own validation on the entered date.
The time
datatype is used for points in time such
as the time of an appointment. It uses the
[ISO 8601] subset specified by XML Schemas.
Facet | Description | Default |
---|---|---|
min |
minimum value | unconstrained |
max |
maximum value | unconstrained |
precision |
"hours ", "minutes " or
"seconds " |
unconstrained |
The values for min
and max
are defined
in exactly the same manner as for <date>
values.
The following defines a data item called "meeting"
which should be a time of day specified in hours for the Eastern
standard time zone:
<time name="meeting" zone="EST"/>
The time could be restricted to between 9am and 5pm EST:
<time name="meeting" min="09:00-5" max="17:00-5"/>
[Issue: The time zone is expressed as hours relative to UTC. The format is less convenient than a time zone name e.g. "09:00 EST", but ISO8601 doesn't permit such names. Likewise you need to use a 24-hour clock and can't use "am" or "pm". User agents may provide time pickers with 12-hour clocks and named time zones as a convenience to users.]
This datatype is used for values representing a duration in years, months, hours, minutes, days or seconds. The precision can be specified via a facet:
Facet | Description | Default |
---|---|---|
precision |
"years ", "months ",
"days ", "hours ", "minutes "
or "seconds " |
unconstrained |
For instance, the duration of a meeting in hours could be specified as:
<duration name="lasting" precision="hours"/>
When submitted a two-hour meeting would be represented as:
<lasting>+P2H</lasting>
[Issue: Months only provide an approximate means to specify duration since individual months vary in length.]
This datatype is used for values representing an absolute Uniform Resource Identifier (URI) as defined in [RFC 2396].
Facet | Description | Default |
---|---|---|
scheme |
space separated list of schemes | unconstrained |
Here is how you can define a URI data item:
<uri name="home"/>
When submitted this would look like:
<home>http://www.acme.com</home>
The scheme
attribute allows you to restrict URIs to
a limited set of schemes. For instance, to restrict a field to
email and Web addresses you could write:
<uri name="contact" schema="mailto http"/>
User agents are encouraged to provide a means to pick or browse addresses, for instance an email address picker. The user interface may allow users to enter relative URIs, but the internal values will always be absolute URIs.
[Issue: Should we split off special datatypes for email addresses and HTTP URLs?]
This is a datatype for use with data appropriate to specific Internet media types. The user agent could use the media type to determine how to prompt the user. For example, an image could be acquired from a digital camera, an image scanner, or a disk file.
Facet | Description | Default |
---|---|---|
<type> |
one or more elements listing mime types | unconstrained |
Here is an example for JPEG and PNG images:
<binary name="photo"> <type>image/jpeg</type> <type>image/png</type> </binary>
Binary data could be packaged either in-place as part of XML form data or held separately and referenced from XML. Further work is needed to cover the details.
[Issue: Is there a need for facets to further constrain the data, for instance, to place limits on the size of the data?]
Sometimes it may be useful to provide an explicit default value. A natural way to do this is via a facet on the datatype, e.g.
<string name="color" default="black"/>
For enumerations, it is repetitive to specify the default value
once as part of the enumeration and then again as the default. One
proposal is to use the <default>
element instead
of the <value>
element for marking up the
default value. For example:
<string name="rating" enum="closed"> <value>excellent</value> <value>good</value> <default>indifferent</default> <value>poor</value> <value>terrible</value> </string>
Another possibility would be to treat default
as a
Boolean attribute of the <value>
element, for
example:
<string name="rating" enum="closed"> <value>excellent</value> <value>good</value> <value default="true">indifferent</value> <value>poor</value> <value>terrible</value> </string>
Within a union, no more than one value may marked as the default.
If a value is not supplied for a given data item, it will
default to <null/>
, which represents a null
value, and can be used to distinguish values which haven't been
filled out from those which the user has set to an empty
string.
[Issue: How should <null/>
be
specified? Should it exist in another namespace?]
[Issue: It may be worth allowing the default
attribute on the datatype elements such as
<string>
and <number
>.]
These correspond to data items whose value is fixed. This can be
represented by setting the range
attribute to
"closed
" and supplying a single value, for
instance:
<string name="color" range="closed"> <value>black</value> </string>
A more concise way to represent this would be to specify the
fixed value using a fixed
attribute as a facet on the
data item, for instance:
<string name="color" fixed="black"/>
The form may require certain values to be filled in before the form is submitted. This would be easy to represent as a facet on a data item, e.g.
<integer name="age" required="true"/>
More generally, fields may or may not be required according to the values of other fields. This could be represented as a Boolean expression, using the same syntax for expressions as used for computed values, for instance:
<string name="spouse" required="status is 'married'"/>
where "status
" refers to a field which can be one
of "married
" or "single
".
The form may include values that are computed from the values of other fields. For example, the sum over line items for quantity times unit price, or the amount of tax to be paid on an order. The computed value can be represented as an expression over the values of other data items.
Here is an example:
<currency name="totalPrice" calc="sum(lineItem, quantity * price)"/>
This sums the product of the values of the data items named
quantity
and price
over the repeated
group named lineItem
. See section
6.5 which provides an example of how lineItem could be
represented.
Sometimes it will be valuable to be able to use an expression to
verify that a group or field has a valid value. The
validate
facet will include an expression (which may be able
to refer to this.value
) that returns a Boolean. Our
expression language will also permit a callout to a function
defined elsewhere (such as in a traditional scripting language)
which could also do the validation and return a Boolean. For
example:
<string name="postcode" validate="ValidPostCode(this.value)"/>
where ValidPostCode
is a Boolean function the form
designer has written to verify that the postcode value is ok.
[Issue: this.value
is used to access
the value of the current data object. Scripts would be able to find
their way around the form via functions to traverse the data model.
This must be addressed by the expression syntax.]
An enumeration specifies a type and a set of values. Here is an example for a closed set of credit card types:
<string name="card" range="closed"> <value>Visa</value> <value>MasterCard</value> <value>Diners</value> <value>American Express</value> </string>
The range
attribute specifies whether the
enumeration is "open
" or "closed
". If
"open
" the datatype accepts values other than those
listed, but the entered value must satisfy any facets specified for
the datatype. The range
attribute can be used with all
of the built-in datatypes and defaults to "open
".
You can specify a datatype as a union of types. For example, the following accepts a number or enumerated string:
<union name="weekday"> <string range="closed"> <value>Monday</value> <value>Tuesday</value> <value>Wednesday</value> <value>Thursday</value> <value>Friday</value> <value>Saturday</value> <value>Sunday</value> </string> <number min="1" max="7" integer="true"/> </union>
Some examples of valid data are:
<weekday>Tuesday</weekday> <weekday>2</weekday>
[Issue: Is the name attribute required for each of
the types within a union? Is there a better name than
"range
"?]
The <group>
element is used to define
composite datatypes aggregating several data items. Here is an
example for a datatype used to represent a customer address:
<group name="customer"> <string name="fullname"/> <string name="street"/> <string name="city"/> <string name="state"/> <string name="zip"/> <string name="phone"/> <string name="email"/> <string name="fax"/> </group>
Groups can be nested as needed for creating hierarchical
datatypes. <group>
elements are intended to be
treated as "objects" in scripting languages such as ECMAScript. For
instance, you could access the street in above data structure using
the syntax: customer.street
. If the group named
"customer
" is a member of a group named
"order
", the street could be accessed by
order.customer.street
. This is made possible by restrictions
on the characters you can use for names. The names of data items
must be unique for the group in which they are defined.
When submitted, the group is represented by an element whose tag is the same as the name of the group, for example:
<customer> <fullname>John Smith</fullname> <street>21 Filofax avenue</street> <city>Peoria</city> <state>Illinois</state> <zip>02139</zip> <phone>1 809 235 6178</phone> <email><null/></email> <fax><null/></fax> </customer>
The details of postal addresses vary from one country to another. If a Web-based order form is to be used internationally, one solution is to use a lowest common denominator approach, for example, to provide a multi-line text input field. This makes it harder to identify the subfields in the address, for instance, in the US, the street, city, zip code and state.
A more sophisticated approach would be for the form to adjust itself according to the user's locale. This impacts both the user interface and the data model. It is proposed that data models can exploit a variant mechanism that allows a given name to identify one of a set of variants as appropriate to the locale.
<variant name="address"> <case locale="us"> <string name="street"/> <string name="city"/> <string name="state"/> <string name="zip"/> </case> <case locale="uk"> <string name="street"/> <string name="town"/> <string name="county"/> <string name="postcode"/> </case> </variant>
If an expression used to constrain the data model needs to
reference one of the variant fields, the locale appears as part of
the name, for instance: address.uk.town
.
[Issue: The above description needs to be extended
to allow for a default case, perhaps using a default
element. How should this appear in references, e.g.
address.default.town
?]
Normally each datatype definition corresponds to a single data
value. You can allow a sequence of data values for the same
datatype by specifying values for the minOccurs
and
maxOccurs
attributes. These can be used with all the
built-in types and with the <group>
,
<union>
and variant
elements. The default
value for these attributes is "1
". The special value
"*
" represents an unlimited repetition.
The data model for an order form will typically allow for a
number of line items that detail the products and quantities being
ordered. Setting maxOccurs
to "*
" will
allow the form to have one or more such line items:
<group name="lineItem" maxOccurs="*"> <integer name="quantity"/> <string name="product"/> <string name="description"/> <currency name="price"/> </group>
When submitted, the data for this would look something like:
<lineItem> <quantity>1</quantity> <product>51645A</product> <description>Black HP InkJet cartridge</description> <price>17.15</price> </lineItem> <lineItem> <quantity>2</quantity> <product>51641A</product> <description>Tri-color HP InkJet cartridge</description> <price>17:45</price> </lineItem> ...
[Issue: Would our target audience prefer an
explicit <array>
element? What about allowing
expressions for the values for minOccurs and maxOccurs so that
these can be sensitive to values entered into the form?]
Many forms applications are likely to have overlapping needs for the datatypes they use. One way to share definitions is to maintain a library of common datatypes for pasting into data models. Another would be to provide a means to import such datatypes by reference. A important consideration is a means to re-use server-side code for processing subforms when they use the same datatype.
Using a reference to a remote definition of a datatype could cause delays while the definition is retrieved. This suggests a combined approach whereby the shared definition is pasted into the data model, but an attribute is used to give a globally unique identifier which is the same for all occurrences of the shared datatype. For example:
<string name="isbn" pattern="\d*-\d*-\d*-\d*" uri="http://www.isbn.org/isbn"/>
The attribute's value, a URI reference, is the namespace name identifying the namespace. The namespace name, to serve its intended purpose, should have the characteristics of uniqueness and persistence. It is not a goal that it be directly usable for retrieval of a schema (if any exists). An example of a syntax that is designed with these goals in mind is that for Uniform Resource Names [RFC 2141]. However, it should be noted that ordinary URLs can be managed in such a way as to achieve these same goals.
[Issue: If the same datatype is used multiple times in the same data model, it might become tiresome to keep repeating the same definition over and over. Is it worth providing a short cut for this situation?]
[Issue: This is at a very early stage and much work remains to be done. In particular, methods to work with the data model in a reflective way are absent from this revision.]
Constraints on and between data values are easy to represent using expressions. The proposed syntax is close to that of ECMAScript expressions, with modifications to avoid the need for escaping characters such as "<" and "&" which occur in the names of certain ECMAScript operators [ECMA-262]. As a result, the expression syntax uses English words instead. A built-in set of functions would be provided for summing expressions over arrays, and common financial calculations and string operations. You could also call out to functions defined directly in ECMAScript or other scripting languages, for example, Microsoft's VBScript.
The <group>
elements define scopes for names.
Names in the local scope belonging to the same
<group>
element can be used directly. Each group
implicitly defines names for the parent group
("parent()"
) and the top-most group
("root()"
). These names are reserved and cannot be
used for data items.
In order avoid long and brittle sequences of parental names
(parent().parent().parent()...
), groups are used to
delineate the boundaries of scope. A name is within scope if it is
within the current group or within a group that is an ancestor to
the current group. This allows a form designer to create a form
where local references prevail (i.e., a group's internal field
references won't change when that group is inserted into a new
context). Yet, it still allows a group access to data in its parent
group, very useful for accessing common fields. Also, this is
consistent with block-structured programming languages. For
example:
<group name="outer"> <money name="bar"/> <group name="inner"> <money name="foo" calc="bar"/> </group> </group>
Here, since no bar
exists in the inner
scope, the next level, the outer
scope is examined,
where a bar
is found and used in the expression.
Note that only the immediate scoping context is searched for
names on the right of periods. For example,
Summary.Name
will find the nearest ancestor scoping context
for Summary
, even if it is not the current context.
However, Name
will then be located only in the context
managed by Summary
. In other words, the
Name
field in the root group will not be found, even though
it is an ancestral scoping context of Summary
and
Summary
does not have a Name
field.
In some situations, expressions may need to make a remote procedure call to a server, for instance to verify that a given value is acceptable based upon a database lookup operation. This can be handled in the scripting language and doesn't impact the data modeling language as such.
This is a partial BNF for expressions:
expr ::= identifier ::= number ::= 'string' | "string" ::= function ::= (expr) ::= prefix expr ::= expr infix expr ::= expr is [not] within(expr, expr) identifier ::= (this | ((root() | parent() | name) [[expr]])) [.((parent() | name)[[expr]] | function)]* function ::= name ([arg [, arg]*] ) arg ::= expr prefix ::= - | + infix ::= and | or | xor | + | - | * | / infix ::= is [not] [above | below]
White space is permitted between tokens but not before the
"[
" character of an array index. Likewise whitespace
is not permitted before or after the ".
" character in
compound identifiers. Whitespace is required between adjacent
alphanumeric tokens, e.g. white space is required between the
operator "not
" and the name of a function.
These are the built-in functions. You can also call functions defined in scripts if the user agent supports scripting.
within(x,y,z)
". Yet another would to define within as
a method on all XForm data objects, e.g.
"x.within(y,z)
".sum(lineItem, price*quantity)
".average(lineItem, price)
".Note that following ECMASCript, the + operator can be used to concatenate strings. The following have been adapted from the ECMAScript specification ([ECMA-262] edition 3) but as functions rather than as methods on strings. The following is an incomplete list of the string functions.
fromCharCodes()
method in
ECMAScript.[Issue: Should these be in an optional financial library?]
apr(35000, 269.50, 30 * 12)
" returns 0.085 (or 8.5%)
for the annual interest rate on a loan of $35,000 being repaid at
$269.50 per month over 30 years.cterm(.02, 200, 100)
" returns 35 as the
required period for $100 invested at 2% to grow to $200.fv(100, .075 / 12, 10 *
12)
" returns 17793.03* as the amount present after paying
$100 a month for 10 years in an account bearing an annual interest
of 7.5%.ipmt(30000, .085, 295.50, 7,
3)
" returns 624.88 as the amount of interest paid starting
in July (month 7) for 3 months on a loan of $30,000.00 at an annual
interest rate of 8.5% being repaid at a rate of $295.50 per
month.npv(0.15, 100000, 120000, 130000,
140000, 50000)
" returns 368075.16 as the net present value
of an investment projected to generate $100,000, $120,000,
$130,000, $140,000 and $50,000 over each of the next five years and
the rate is 15% per annum.pmt(30000.00, .085 / 12, 12 *
12)
" returns 333.01 as the monthly payment for a loan of a
$30,000, borrowed at a yearly interest rate of 8.5%, repayable over
12 years (144 months).ppmt(30000, .085, 295.50, 7, 3)
" returns
261.62 as the amount of principal paid starting in July (month 7)
for 3 months on a loan of $30,000 at an annual interest rate of
8.5%, being repaid at $295.50 per month. The annual interest rate
is used in the function because of the need to calculate a range
within the entire year.pv(1000, .08 / 12,
5 * 12)
" returns 49318.43 as the present value of $1000.00
invested at 8% for 5 years.rate(110, 100,
1)
" returns 0.10 as what the rate of interest must be for
and investment of $100 to grow to $110 if invested for 1 term.term(475, .05, 1500)
" returns 3 as
the number of months for an investment of $475, deposited at the
end of each period into an account bearing 5% compound interest, to
grow to $1500.00.It is very common for people who are not experienced programmers to be confused by the results of numeric calculations such as division by 10. They are not aware that computers use binary arithmetic and that this method can produce results that differ from decimal arithmetic - the method we were taught in school.
The proposal is for XForms expressions to conform to [ANSI X3-274] for arithmetic. This features full-function decimal floating point arithmetic with integers as a seamless subset. It preserves matissa length, e.g. 1.20 x 2 gives 2.40 (not 2.4) and provides for an exact representation as expected for values such as 0.1 (not 1/16 + 1/32 + 1/256 + 1/512 + 1/4096 + 1/8192 + … ).
This standard has been used heavily over 16 years by IBM and its customers and is based on feedback from users, mathematicians, data processing experts, and financial experts, etc. The overhead in processing time is expected to be negligible in practice, with a fixed code overhead of about 2 to 4 K bytes.
Further discussion on the choice of decimal arithmetic is in Appendix C.
This requirements document was written with the participation of the members of the Forms Subgroup of the W3C HTML Working Group (listed in alphabetical order):
This is a placeholder for a section where a future revision will explain how the syntax proposed in this document for data modeling is mapped into the concrete syntax for XML Schemas.
[Issue: This section contains preliminary information.]
One important aspect of XForms is providing a clean upgrade path for authors using Web forms today. The design of XForms is flexible in allowing various user interface technologies to work with a common data model. This appendix describes a simple binding between the XForms 1.0 data model and [XHTML 1.0] form elements.
The first step is to bind the <form>
element
to the appropriate <xform>
element that defines
the data model for the form. The value of the name
attribute of the <form>
element should match
the id
attribute on the <xform>
element.
The next step is to ensure that the name
attribute
on each form control matches the full name of the corresponding
field. XForms defines a hierarchical naming scheme using the name
attribute for each level in the hierarchy. The full name of a
field is given by the sequence of names from top to bottom of the
hierarchy. In the example, the field for the street is identified
by purchaseOrder.shipTo.street
.
Specifying a name
attribute on the following XHTML
form elements binds it to an element in the instance data with the
same fully qualified name:
<input>
<select>
<textarea>
<object>
The initial value of an XHTML form control is the value of the
bound instance data. The <button>
element can
be used to submit the form via a call to a script function,
supplying the id value for the XForm as an argument. The script
can use the DOM to traverse the markup defining the data model
to build the XML representation of the data.
Contributed by Mike Cowlishaw, IBM
Why is decimal arithmetic the right thing to use?
-- Many common decimal quantities (for example, 0.1) cannot be represented exactly in a binary floating point representation; binary floating point is a lossy encoding of decimal numbers. This leads to anomalies, even after a single operation, for example:
Division: 1/0.1 ==> 10 (correct) Remainder: 1%0.1 ==> 0.0999999999999995 (incorrect, it should be 0)
Anomalies build up even more rapidly under repeated operations.
-- These anomalies are visible even if rounding is applied (the latter result, for example, rounds to 0.1 instead of 0).
-- The anomalies lead to discrepancies between the results obtained 'manually' and those obtained by computer. This makes it difficult and expensive to verify algorithms and test software.
-- As a result, customers complain at unexpected results and there are significant increased costs in application development, service calls, and maintenance.
Issues of performance:
Binary floating point is often carried out in hardware, and is in that case faster than decimal arithmetic which on most computers is implemented in software. However:
-- Few commercial applications spend much time carrying out arithmetic; measurements in an interpreted environment using decimal arithmetic suggest a typical figure is 8% of execution time is in arithmetic.
-- Conversions between decimal and string (readable) forms are simpler and more efficient than those between binary and string.
-- The bulk of numeric data stored in databases is held in decimal form (to avoid the anomalies described above); converting these to and from a binary form is inefficient as well as lossy.
-- In practice, the 'default' decimal precision (9 digits) can be very efficiently implemented using 32-bit integers for mantissa and exponent. This implementation would be especially attractive on 'small' devices.
-- In addition, all widely used microprocessor, mini, and mainframe computers (other than RISC machines) provide native decimal instructions or decimal adjustment operations.