W3C

Voice Browser Call Control: CCXML Version 1.0

W3C Working Draft 21st February 2002

This version:
http://www.w3.org/TR/2002/WD-ccxml-20020221
Latest version:
http://www.w3.org/TR/ccxml
Previous version:
(none, this is the first version)
Editor:
RJ Auburn, Voxeo <[email protected]>

Abstract

This document describes CCXML, or the Call Control eXtensible Markup Language. CCXML is designed to provide telephony call control support for VoiceXML or other dialog systems. CCXML has been designed to complement and integrate with a VoiceXML system. Because of this you will find many references to VoiceXML's capabilities and limitations. You will also find details on how VoiceXML and CCXML can be integrated. However it should be noted that the two languages are separate and are not required in an implementation of either language. For example CCXML could be integrated with a more traditional IVR system and VoiceXML could be integrated with some other call control system.

Status of this Document

This specification describes markup for designed to provide telephony call control support for VoiceXML or other dialog systems. CCXML is far from complete. This draft is meant to give people access to an early version of the language so that people can understand the direction that the working group is moving in. This document has been produced as part of the W3C Voice Browser Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Voice Browser Working Group (W3C Members only). This document is for public review, and comments and discussion are welcomed on the public mailing list <[email protected]>. To subscribe, send an email to <www-voice-request@w3. org> with the word subscribe in the subject line (include the word unsubscribe if you want to unsubscribe). The archive for the list is accessible online.

This is a W3C Working Draft for review by W3C Members and other interested parties. It is a draft document and may be updated, replaced or made obsolete by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress".

This is work in progress and does not imply endorsement by the W3C membership. A list of current W3C Recommendations and other technical documents, including Working Drafts and Notes, can be found at http://www.w3.org/TR/.

Table of Contents

Section 1: Introduction

This document describes CCXML, or Call Control eXtensible Markup Language. CCXML is designed to provide telephony call control support for VoiceXML or other dialog systems. CCXML is intended to be an adjunct language to be used with a VoiceXML or other dialog implementation platform.

CCXML has been designed to complement and integrate with a VoiceXML system. Because of this you will find many references to VoiceXML's capabilities and limitations. You will also find details on how VoiceXML and CCXML can be integrated. However it should be noted that the two languages are separate and are not required in an implementation of either language. For example CCXML could be integrated with a more traditional IVR system and VoiceXML could be integrated with some other call control system.

CCXML is far from complete. This draft is meant to give people access to an early version of the language so that people can understand the direction that the working group is moving in.

Please note that this is a large document, and there are lots of improvements to make to it. However, it's more important to get widespread discussion on CCXML's ideas than to spend eternity refining it. There are a number of known issues and improvements, which are listed in Appendix C. It holds ideas that should be incorporated, but were not included in this draft because of time constraints.

Section 2: Background

This section outlines some of the capabilities required to support the scenarios described in the Introduction, and then describes the architectural model needed to make them easy to manipulate. There are a number of needed features that VoiceXML currently can't supply:

We originally intended to see if we could add new tags to VoiceXML to support the new features here. However, we repeatedly encountered conflicts between the design goals for VoiceXML, and the requirements we list here.

Most of these conflicts came from trying to reconcile two very different event models. Event generation and handling in VoiceXML are focused on specific user interface transactions (e.g., within a given field, the filled, noinput, nomatch, or help events are thrown and handled). However, events from telephony networks or external networked entities are non-transactional in nature; they can occur at any time, regardless of the current state of VoiceXML interpretation. These events could demand immediate attention. We could either abandon VoiceXML's admirably simple single-threaded programming model, or delay event-servicing until the VoiceXML program explicitly asked to handle such events. Instead of making either of these bad choices, we instead move all call control functions out of VoiceXML into an accompanying CCXML program. VoiceXML can thus focus on being effective for voice dialogs, while CCXML tackles the very different problems we described above.

An implementation of VoiceXML is not required to support CCXML. Such implementations may choose to support proprietary methods of call control, and still be deemed compliant with the W3C VoiceXML Recommendation.

An implementation of CCXML is not required to support VoiceXML. Such implementations may choose not to support interactive dialogs at all, or may do it in a proprietary way, and still be deemed compliant with the W3C CCXML Recommendation.

Section 3: Concepts

Properly adding advanced telephony features to VoiceXML entails adding not just a new telephone model, but new call management and event processing, as well. We'll briefly cover these three areas here. Since new event processing has the largest impact on our architecture, let's examine it first.

Section 3.1: Event Processing

Telephone applications need to receive and process large numbers of events. These events arrive from outside the program itself - either the underlying telephony platform, or from other programs. They must be processed quickly, as any delays could be discerned by the user.

Unfortunately, VoiceXML is is not designed for the task. Its event model assumes only "synchronous" events - those which occur only when the program occupies certain states. (For example, a nomatch event can only occur within a field.) The language would have to be augmented to properly handle events which can arrive at any time. (It's true that telephone.disconnect can occur at any time, but this is something of a special case.) Further, it would be tough to reconcile the need for immediate processing with VoiceXML's single-threaded model.

Instead, we move all asynchronous handling to a CCXML program. Every executing VoiceXML program has an associated CCXML program. It runs on a thread separate from the VoiceXML dialog. When an event is delivered to a user's voice session (now a coupling of an active VoiceXML dialog and its CCXML program), it is appended to the CCXML program's queue of events. The CCXML program spends almost all its time removing the event at the queue's head, processing the event, and then removing the next event. Meanwhile, the VoiceXML dialog can interact with the user, undisturbed by the incoming flow. Most VoiceXML programs never need to consider event processing at all.

Writing a CCXML program, then, mainly involves writing the handlers which are executed when certain events arrive. There are mechanisms for passing information back and forth between VoiceXML and CCXML, but the important points are that CCXML:

Section 3.2: Conferencing

VoiceXML currently has very few options for conferencing. Only the transfer tag will establish any kind of connection between two voice callers; calls involving more than two parties, or other advanced applications, are out of reach. VoiceXML is largely a language for voice dialogs, and is silent on many telephony issues.

We would like a far more powerful and flexible method of creating calls. To do so, we introduce a few ideas here:

Section 3.3: Call Management

VoiceXML also fails to offer sufficient control over the underlying telephone network. Although VoiceXML offers the disconnect directive to hang up an existing phone call, there are a number of desirable features to which the VoiceXML programmer cannot get access. They include:

We want to expose the phone network in a provider-neutral way, while still offering enough methods and information to be interesting. Toward that end, we have taken the Java Soft JCP call model for telephony-access. The topic is discussed below in more depth.

Section 4: Object Model and Programming Contract

There are several identifiable objects in the CCXML universe, all of which have a globally unique identifier, probably in the form of a URI.

There also exist voice dialogs, which do not need to be globally-uniquely identified. Whenever active, they must interact with a two-way audio stream. Today, this audio always comes from a call leg. In the future, it might also come from a conference object.

CCXML can manipulate the language objects through tags defined in the CCXML language. It sometimes can also send and/or receive the asynchronous events (mentioned above) with these objects.

diagram showing call legs & voice dialogs

CCXML directly touches call legs, conferences, and audio connections with various tags in the language, such as accept, createconference, and join. CCXML may also receive events from all three, in the case of line signaling, line-status informational messages, or error and failure scenarios. Call legs, conferences, and audio connections do not accept events; CCXML must use the builtin tags to direct them.

CCXML can start and kill voice dialogs using language tags. It can receive events from voice dialogs, which may be standardized events such as dialog.exit, or application-specific ones. It would also be very handy for CCXML to be able to send an event to a voice dialog. We describe below how VoiceXML might be slightly modified to synchronously process arbitrary external events, much as it processes recognition events today. CCXML's only guaranteed control over a dialog is start and kill. Even if VoiceXML will process these external events, anything more than lifetime-management must be specifically supported by the voice dialog.

Finally, CCXML can create other CCXML programs using a language tag. That is the only guaranteed control one CCXML program ever wields over another. Any other interaction takes place through the event mechanism. CCXML programs can both send and receive events between one another.

Section 5: CCXML Elements Listing

<accept> Accept an incoming phone call
<authenticate> Authenticate a ccxml script
<assign> Assign a variable a value
<ccxml> Start a CCXML program
<createcall> Make an outbound call
<createccxml> Create a new CCXML program
<createconference> Create a multi-party audio conference
<destroyconference> Destroy a multi-party audio conference
<dialogstart> Start a VoiceXML program's execution
<dialogterminate> Stop a VoiceXML program's execution
<disconnect> Terminate a phone connection
<else> Used in <if> statements
<elseif> Used in <if> statements
<eventhandler> Block of event-processing statements
<exit> Ends execution of the CCXML program
<fetch> Pre-load a CCXML file
<goto> Move execution to a new location
<if> Conditional logic
<join> Connect two audio sources
<reject> Reject an incoming phone call
<send> Generate an event
<submit> Like goto, but submits variables
<transition> A single event-processor block
<unjoin> Disconnect two audio sources
<var> Declare a variable

 

Section 6: Document Control Flow and Execution

The execution of a CCXML document begins with the ccxml tag at the top and then proceeds tag by tag in document order. The flow of the execution can be changed with the help of if, elseif, else, submit, and goto tags. However, most of a CCXML program's execution will take place within an eventhandler, which processes a stream of incoming events. As with VoiceXML applications, a CCXML application can consist of multiple CCXML documents, traversed by use of goto or submit.

While CCXML executes its own programs, on its own thread, it does have some influence over a VoiceXML program's execution. In the simplest form, CCXML has the ability to launch a new VoiceXML interpreter, or kill an existing one. When launching a new interpreter, CCXML can also decide between destroying the currently-executing dialog, or "freezing" it while the new one executes. When the new dialog completes execution, the old one resumes execution. In this way, CCXML can interrupt an ongoing interaction with a new dialog, based on information just received. When the interruption completes, the previous interaction can begin playing again.

There are two modes in which to start a new VoiceXML program. Launching a new dialog, while "freezing" the current one, is appropriate to give the user brief interruptions while he is engaged in a different task. It's also possible to simply destroy the current VoiceXML program, and start a new one. That's more appropriate if the current task is "hold music", only present to pass the time until an important event arrives in the future.

It's easy to think of the caller as listening to a stack of VoiceXML interpreters. Only the topmost interpreter is ever active at any one time. If a new Voice XML dialog is started while freezing the current dialog, we've pushed something onto the stack. If we just want to stop the current dialog and start a different one, we've replaced the topmost element with a new one.

CCXML starts up a VoiceXML interpreter with the <dialogstart> tag; the interpreter can die when (1) it receives an uncaught telephone.disconnect event either as a result of executing a <disconnect> tag or the call hangs up and the leg ceases to exist; (2) when the VoiceXML script encounters an <exit> tag; or (3) when CCXML explicitly terminates it. When CCXML starts a new VoiceXML interpreter on a call leg, the VoiceXML session becomes active with the new page, and any variables are restored as if a VoiceXML <goto> had been used to arrive at the new document.

CCXML programs will often want to query the user, and take action based on the response. One way VoiceXML can communicate information back to CCXML is through the exit tag. The exit tag includes an attribute called namelist, which indicates "variable names to be returned to interpreter context". When <dialogstart> is used, the CCXML program can be considered the interpreter context. When the dialog terminates, it returns information to the launching CCXML.

Here are the details of the CCXML tags for control flow and execution.

<ccxml>

This is the parent tag of a CCXML document and encloses the entire CCXML script in a document.

The definition of ccxml is:

<ccxml>
Attribute Name Details
version The version of this CCXML document (required). The initial version number is 1.0.

<dialogstart>

The <dialogstart> tag is used to launch a dialog and associate it with an indicated call leg. The tag includes a URL reference to a start-document for a dialog manager. The launched dialog executes on a separate thread of execution (may be a thread, process, or task depending upon operating system implementation), not on CCXML's thread of execution. Once the dialog is kicked off, the CCXML script can immediately go back to handling incoming events.

<dialogstart> can be used to initiate a VoiceXML dialog on a user call:

<dialogstart callid="call-leg-id" src="start-doc" type="mime-type" name="ECMAscript-variable"/>

The definition of <dialogstart> is:

<dialogstart>
Attribute Name Details
callid is the call leg object to associate the new dialog with, e.g., a VoiceXML interpreter instance. The default value of the "callid" attribute is "_event.source", meaning that the dialog instance will be launched for whatever call object generated the previous event (typically a connection.CONNECTION_CONNECTED event).
src is an character string identifying the URL of the dialog document that the dialog interpreter should load and begin executing upon startup.
type specifies the MIME type of the document, and as a result, which dialog interpreter is actually loaded. A MIME type of "application/xml+vxml" associates the start-document with a VoiceXML interpreter instance. A MIME type of "audio/wav" associates the start-document with a dialog manager that merely plays wave files. If omitted, the mime-type attribute defaults to "application/xml+vxml", but the CCXML interpreter is free to guess the correct MIME type, e.g., by examining the filename-extension of the start-doc URL.
name is the name of an ECMAscript variable in the CCXML script that will receive the value of the session-name identifying the launched dialog interpreter instance. This session-name can be used by the CCXML script programmer in future invocations of <dialogterminate>.

The <dialogstart> tag does not block execution of the CCXML script until the started dialog completes. The CCXML script regains control immediately. If the dialog cannot be started for any reason, a dialog.notstarted error event is thrown.

When the dialog completes, a "dialog.exit" event is posted to the event queue of the CCXML instance that launched it. If the dialog is a VoiceXML instance, the dialog will complete either because:

Note that this event may have many properties because the VoiceXML language specification allows data values ("expr" and a namelist) to be passed during exit.

For reference, the definition of the VoiceXML <exit> tag has the following syntax:

<exit expr="ECMAscript-expression" namelist="namelist" />

where:

<dialogterminate>

A CCXML script may decide that it wants to destroy a current dialog, e.g., a VoiceXML interpreter instance. This is accomplished using the <dialogterminate> tag in a CCXML script.

When the CCXML interpreter encounters a <dialogterminate> tag, it sends a terminate event to the specified dialog.

When a VoiceXML interpreter instance receives the terminate event, it reacts as if it encountered an <exit /> tag in the VoiceXML script:

Note that a dialogterminate request cannot be honored if the launched dialog is not yet in a ready state to receive it and act upon it. If this happens, a "dialog.wrongstate" event will be thrown.

The definition of <dialogterminate> is:

<dialogterminate>
Attribute Name Details
sessionid is a short character string identifying the dialog, e.g., a VoiceXML interpreter instance. This sessionid was generated by <dialogstart> and stored in the ECMAScript variable identified by the "name" attribute.

<if>, <else>, <elseif>

The tags if, else, and elseif work just as they do inside VoiceXML. They are straightforward program-control elements. The else and elseif elements can appear optionally within an if element.

The definition of <if> is:

<if>
Attribute Name Details
cond An ECMAScript expression which can be evaluated to true or false.

The definition of <elseif> is:

<elseif>
Attribute Name Details
cond An ECMAScript expression which can be evaluated to true or false.

The definition of <else> is:

<else>
Attribute Name Details
none none

<fetch>, <goto>, <submit>

The fetch tag, together with goto, is used to transfer execution to a different CCXML document in a multi-document CCXML application. In VoiceXML this is performed via the goto tag, which blocks execution until the target page is loaded and ready to execute. CCXML programs, however, can be substantially more timing-sensitive than VoiceXML ones. All event-handling would have to be suspended until a blocking goto had found the target page, loaded, and parsed it. The time required could be hundreds of milliseconds or seconds, periods too lengthy for ignoring important incoming events.

Instead, we break the functions of VoiceXML's goto command into two parts. The fetch operator tells the interpreter to find, load, and parse a given page of CCXML. Execution returns from the tag immediately, and the CCXML interpreter can continue on while the browser works to get the target document ready for execution. When the fetch completes, the interpreter receives an event. It can then issue a goto to immediately start executing the now-fetched page.

Below is a small snippet of code from the CCXML program's event handler. We kick off a fetch operation, and continue on to assign to a state variable, and maybe handle more events. Eventually, the fetch completes, the CCXML interpreter services the event, and we perform the goto.

<fetch next="http://www.web.com/control.ccxml"/>
<--control continues here->
<assign name="state_var" expr="fetch_wait"/>
</transition>

<!-- ……… -->

<transition state="fetch_wait" event="ccxml.fetch.done"/>
<goto next="http://www.web.com/control.ccxml"/>
</transition>

There's no requirement to goto previously-fetched pages, but it is wasteful to not do so.

The definition of <fetch> is:

<fetch>
Attribute Name Details
next URI of the CCXML document.

The definition of <goto> is:

<goto>
Attribute Name Details
next URI of the CCXML document.

The submit tag works just like goto. The only difference is that the programmer can include a namelist attribute; this attribute indicates the data to include with the document request.

The definition of <submit> is:

<submit>
Attribute Name Details
next URI of the CCXML document.
namelist A list of variables within the current scope. These variable names, plus their associated values, will be included in submit's POST to the indicated URI.
method Equal to "get" or "post". Indicates the HTTP method to use. Defaults to "get".

<createccxml>

The createccxml tag is used to create another CCXML program instance. The new CCXML program has no relation to its creator once spawned, and has a wholly separate lifetime and address space.

The definition of <createccxml> is:

<createccxml>
Attribute Name Details
src URI of the CCXML program to execute.

<exit>

Finally, the exit tag ends execution of the CCXML program. All pending events are discarded, and there is no way to restart CCXML execution.

The definition of <exit> is:

<exit>
Attribute Name Details
expr A return expression (e.g. 0 or 'oops!'). This attribute is optional; if omitted, a value of zero is assumed.
namelist Variable names to be returned to interpreter context. The default is to return no variables; this means the interpreter context will receive an empty ECMAScript object.

A CCXML script executing the <exit> tag will generate a ccxml.exit event. This event terminates the script, and cannot be caught or ignored.

Section 7: Variables and Expressions

<assign>, <var>

CCXML variables and expressions are similar to those in VoiceXML. All expressions must be valid ECMAScript expressions, assignable to variables with valid ECMAScript names.

Variables are declared explicitly by the var element, or implicitly by the assign element:

<var name="sessionid" />
<assign name="currentstate" expr="initial"/>

Variables declared without an explicit initial value are initialized to the ECMAScript undefined value. Variables must be declared before being used.

Variables are declared using var element and assigned a value either at declaration time, or later through the assign element. The attributes of both var and assign are:

The definition of <var> is:

<var>
Attribute Name Details
name Indicates the name of the variable. It should be a valid ECMAScript variable name.
expr Indicates the new value of the variable. This is the initial value. It can be any valid ECMAScript expression.

The definition of <assign> is:

<assign>
Attribute Name Details
name Indicates the name of the variable. It should be a valid ECMAScript variable name.
expr Indicates the new value of the variable. It can be any valid ECMAScript expression.

In addition to the variables defined within the CCXML documents, there is a set of variables associated with Events. This is a new concept in CCXML; the events carry variables with them. (Such a system of rich events has been suggested for VoiceXML.) For example, when a failed print job is being signaled via the external.printfailure event type, and the event is known as evt, the name of the job is accessible like this:

<assign name="printjob" expr="evt.jobname"/>

Section 8: Event Handling

Information on eventhandler, send, transition

Section 8.1: Event Concepts

Event Handling is one of the most powerful features of CCXML, and one way in which it is markedly different from VoiceXML. CCXML events have little to do with VoiceXML's nomatch and filled events. VoiceXML events only occur within well-defined contexts, and are generated by the task at hand; it's impossible to receive a recognition-related event except within a field inside a form. CCXML events, however, can be delivered at any time and from a variety of sources. This flexible event-handling mechanism is essential for many telephony applications.

Every CCXML program can receive events. These might be in response to a previous action by the CCXML program (e.g., an outbound-call request, which generates an event to indicate when the call goes off-hook), or on the initiative of some external source (e.g., an incoming call to be answered). Events can be generated by the telephony system (as in the two previous examples), other CCXML programs (which emit events via the send tag), or the CCXML recipient-interpreter (by sending an event to itself). There is a core set of telephony-related events (derived from the JTAPI/JCP/JCC event model. See JSR021 and JSR 034 from the sun web site) that a browser must generate. Implementors are otherwise free to define and generate any platform-specific events they like. In addition, users/programmers may use the send tag to send arbitrary events to external destinations, or may send arbitrary events to CCXML scripts from internal or external sources and may specify transition handlers to handle these events.

Each running CCXML interpreter has a queue, into which it places incoming events, and sorts them by arrival time. A CCXML programmer can only gain access to these queued events by using an eventhandler.

eventhandler elements are interpreted by an implicit Event Handler Interpretation Algorithm (EHIA). The EHIA's main loop removes the first event from the CCXML program's event queue, and then selects from the set of transition elements contained in the eventhandler . A transition element always indicates a set of accepted event types, and may indicate a further ECMAScript conditional expression to be evaluated. The transition element that accepts the type for the just-removed event, has a satisfied conditional expression (or none at all), and appears first in the eventhandler by document order, is the selected transition .

Once selected, the tags inside a transition element are executed in document order. At most, one transition will be chosen. If no transition elements meet all the criteria, none are selected and the event is simply dropped; the EHIA loop then starts over again, removing the first most event from the queue. Any events that arrive while an event is already being processed are just placed on the queue for later. If the event queue is empty, and the EHIA wants to process an event, execution pauses until an event arrives.

There may be default transitions that handle common events, such as an incoming call or a VoiceXML dialog exiting. It's not yet clear which events should have default handlers.

Code inside an eventhandler should run "instantaneously", without blocking execution. This will allow events to be rapidly processed.

The only way for CCXML execution to leave an eventhandler is via an explicit goto inside a transition.

An eventhandler may also declare a state variable. An eventhandler's state variable is used just as any other variable, but with a scope limited to the eventhandler and its elements. The eventhandler can be considered, and programmed as, a finite-state-automaton, with the state variable indicating the automaton's current state or node, and the transition elements, driven by incoming events, moving the machine to a new state and creating side effects along the way. We expect, but CCXML does not require, that each transition will contain an assignment to the state variable to drive the automaton to its next state.

<eventhandler>

The definition of <eventhandler> is:

<eventhandler>
Attribute Name Details
id The name of the eventhandler.
statevariable Indicates the name of the <eventhandler>'s state variable.

Eventhandler elements can contain only transition elements.

<transition>

The definition of <transition> is:

<transition>
Attribute Name Details
state Indicates the current possible state(s) of the eventhandler.
event A string that indicates a matching event type. Event types are dot-separated strings of arbitrary length. The character * is a wildcard, and will match zero or more characters of the processed-event's type name.
cond An ECMAScript expression that evaluates to true or false. If this attribute is present, it must evaluate to true for the transition to be selected.
name The name of a variable that is set to the received event and associated variables

It's now clear how CCXML can receive, process, and respond to external events. But it's not obvious how CCXML can generate those events directly. For that, we need the send tag. When the interpreter encounters a send, it will generate and deliver an event of the specified type to an indicated CCXML program. The set of addressable CCXML programs is up to the browser implementor, but at least all CCXML programs running within the same browser must be able to send events to each other.

<send>

The definition of <send> is:

<send>
Attribute Name Details
event A string that indicates the type of event being generated. The event type may include alphanumeric characters and the "." (dot) character. The first character may not be a dot or a digit. Event type names are case-insensitive.
target A handle for the target CCXML program. This should be a globally-unique identifier, in a still-to-be-specified URI format. It is valid for a CCXML program to send an event to itself.
name The unique identifier for the generated event is written to the variable indicated by name. If not present, the event's identifier is dropped.
delay The send tag will return immediately, but the event not dispatched until delay number of milliseconds elapse. Timers are useful for a wide variety of programming tasks, and can be implemented using this attribute.
namelist A list of variables to be included along with the event.

Section 8.2: Proposed Enhancement to VoiceXML Event Handling

One of CCXML's main reasons to exist is VoiceXML's inability to process asynchronous events. When CCXML receives one of these events, it can process it silently, or start a new VoiceXML dialog. However, there are some occasions when we would like to tightly bind the event-processing with the user interface. In these cases, the VoiceXML author must be aware of event-processing, but gains very fine-grained control over dealing with the incoming event. In these cases, VoiceXML must be enriched with a notion of synchronous event handling.

We would like VoiceXML to be as ignorant as possible of the events being tossed back and forth. For generic interruptible and hold-music applications, <dialogstart> is enough. For interfaces which might change in response to arriving events (e.g., an option becomes 'unghosted' when a processing job completes), we need VoiceXML to be event-aware.

We can make VoiceXML process event input in much the same way as it handles speech input today. A new kind of form item can check the interpreter's pending synchronous events, and try to process the first one, if any. The form item selects one from its set of valid event handlers, and executes the code there. If there are no incoming synchronous events for some period of time, the form item can generate a NOINPUT, and process it appropriately.

This kind of event-handling is much less ambitious than the asynchronous variety we described above. However, when we want the dialog to change according to incoming events, we don't mind changing the VoiceXML code, and <dialogstart> is too blunt an instrument, processing synchronous events inside the dialog can be a useful approach.

Section 8.3: Event Transmission

Many useful applications are only possible when events can be passed to an arbitrary CCXML program, regardless of browser. For example, a user could be notified about a failed remote print job, or transferred to a now-available remote operator. Both scenarios involve events generated from a remote location and delivered over a network to a CCXML program. CCXML browsers must be able to dispatch such events correctly.

The protocol for communicating these events is now undetermined, but will eventually be specified so we can guarantee interoperability between CCXML. (HTTP and SIP are two options for network protocol.) Encoding options include SOAP and Xforms. Whatever we decide, the protocol we choose must allow for the following:

Section 8.4: Standard Events

Both CCXML and VoiceXML can generate arbitrarily-named events. While any event name is possible, there is a small set of well-known events that are generated as a matter of course, and which any telephone application should handle. There are two kinds of these events: telephony events, which abstract interaction with the phone network, and language events, which are generated to keep the VoiceXML and CCXML interaction going smoothly.

The first, and larger set, is present so CCXML can keep abreast of what's happening with the telephone network. VoiceXML/CCXML are designed to be neutral with respect to the telephony layer, so the event set we choose must be very generic and capable of describing the behavior of a wide variety of systems (e.g., Q931, SS7, VoIP, etc).

For now, we've chosen the JavaSoft JCP event model. It abstracts away many of the differences between the networks mentioned above, but does not offer much functionality. There may be better models than JCP, but it fits the bill for what we need at the moment: a small and easily-understood call model so we can write concrete sample programs. The ultimate choice of a standard model can be made later.

JCP was designed to be a cross-platform high-level event set to describe as generic a phone model as possible. The JCP/JCC call model consists of Addresses, Calls, Connections, and Providers. A Connection models the relationship between a single Call and a single Address. Neither value can change over the lifetime of the Call. A Call will maintain an association with zero or more Connections. It will have one Connection for every party in the call (so a two-party Call will have two Connections, and a four-party conference Call will have four).

All Calls are created by a Provider; the Call maintains an association with its Provider, which cannot change over the formers lifetime. Calls, Connections, and Providers all generate events, emitted when the object transitions to a new state.

Note that the JCC model is designed for endpoint devices only. Here is a fast description of the events. The descriptions and names are borrowed directly from the JavaSoft documentation.

Call Class

The Call class emits events:

Connection Class

The Connection class emits events:

Provider Class

The Provider class emits events:

Standard Events

The second, smaller set, is only for language purposes. Of these, there are a) events sent from the VoiceXML interpreter, and b) events sent from the CCXML interpreter. The standard events are described here.

This list will be extended as we discover new standard events.

Section 9: Telephony Operations and Resources

Information on accept, createcall, createconference , destroyconference, disconnect, join, reject, unjoin

The primary goal of CCXML is to provide call control throughout the duration of a call. Call control includes handling incoming calls, placing outgoing calls, bridging (or conferencing) multiple call legs, and ultimately disconnecting calls.

Section 9.1: Incoming Calls

One of the events a CCXML document can receive is a call setup indication. A CCXML program interested in handling incoming calls will have an eventhandler transition block for execution. This transition block will either accept or reject the incoming call using the accept or reject CCXML tags. The underlying platform will signal the telephony system appropriately, depending on the tag. Once the call is accepted, the CCXML document may initiate interactive VoiceXML sessions with the incoming caller, or perform other telephony operations (e.g., place outgoing calls, join calls, etc).

<accept>

The <accept> tag will accept and connect an incoming phone call.

The definition of <accept> is:

<accept>
Attribute Name Details
callid Indicates the id of the signaling incoming call leg that should be accepted. The callid attribute is optional; if omitted, the interpreter will accept using the id indicated in the current event being processed.
An accepted incoming call will result in the generation of a connection.CONNECTION_CONNECTED event.

<reject>

The <reject>tag will reject an incoming phone call.

The definition of <reject> is:

<reject>
Attribute Name Details
callid Indicates the id of the signaling incoming call leg that should be rejected. The callid attribute is optional; if omitted, the interpreter will reject using the id indicated in the current event being processed

Section 9.2: Outgoing Calls

<createcall>

A CCXML document can attempt to place an outgoing call with the createcall tag. This tag will instruct the platform to attempt to place an outgoing call to the indicated party. The tag is non-blocking, and the CCXML document is immediately free to perform other tasks, such as initiating VoiceXML interaction with another caller. The CCXML interpreter will receive an asynchronous event when the call attempt is completed. An event handler transition block can handle this event and perform further call control, such as conferencing. If the call was successfully placed, the transition block can also initiate a VoiceXML interaction with the called party.

The definition of <createcall> is:

<createcall>
Attribute Name Details
dest The target of the outbound telephone call. This should be a telephone URL, as described in http://www.ietf.org/rfc/rfc2806.txt
name The name of the variable that receives the outbound call leg's callid.

Section 9.3: Conferencing

The CCXML createconference tag can be used to create a conferencing object. This object is used to create multi-party conferences. Once created, the CCXML document can add call legs to the conference by using the join CCXML tag. Call legs can be removed by issuing the unjoin tag. A conference object can be destroyed using the destroyconference CCXML tag. Asynchronous events will be sent to the CCXML document upon completion of each of these operations.

<createconference>

The definition of <createconference> is:

<createconference>
Attribute Name Details
name The name of the variable that receives the conference identifier. A conference identifier should be globally unique, so that conferences can be uniquely addressed and possibly connected to. It should be in URI format.

<destroyconference>

The definition of <destroyconference> is:

<destroyconference>
Attribute Name Details
conferenceid The identifier for the conference that should be destroyed.

<join>

The definition of <join> is:

<join>
Attribute Name Details
sessionid1 The first of two audio "endpoints" to join. This id can refer to a conference object, or a call leg. The interpreter will set up the underlying audio paths appropriately.
sessionid2 The second of two audio "endpoints" to join. This id can refer to a conference object or a call leg. The interpreter will set up the underlying audio paths appropriately. If both endpoints are call legs, they will be bridged. If one is a conference and the other a call leg, then the call leg will be added to the conference. The browser implementor may choose to fail in the case where both are conferences, or create and audio connection between the two conference objects.
duplex Equal to "half" or "full". A full-duplex connection allows both audio endpoints to hear each other. A half-duplex connection allows party 1 to hear party 2, but not vice-versa. If the attribute is not supplied, the join defaults to a full-duplex connection.

<unjoin>

The definition of <unjoin> is:

<unjoin>
Attribute Name Details
sessionid1 Id for the first audio endpoint we want to separate.
sessionid2 Id for the second audio endpoint we want to separate. If both endpoints are call legs, they will be unbridged. If one is a conference and the other a call leg, then the call leg will be removed from the conference. See the description of join above for more on the two-conference case.

Section 9.4: Bridging

In many situations, the CCXML document may want to create a simple, two party call. In these cases, a separate conference is not needed. Instead, the CCXML document can use the join CCXML tag to bridge the two parties. The unjoin tag can be used to unbridge a previously bridged call. Asynchronous events will be sent to the CCXML document upon completion of both the join and unjoin operations.

Note that it is possible to transition from a bridged two party call to a multi-party conference by first using the createconference CCXML tag to create a conference object and then adding all of the parties to this conference. It is not possible to join a third call leg to an existing bridged call in order to create a 3-way call. If leg A is bridged to leg B, and then subsequently bridged to leg C, the result will be two 2-party calls, one between A and B and one between A and C.

Section 9.5: Coaching

Another common scenario in call center applications is the idea of a coach. Here, the 'coach' is a supervisor whose purpose is to eavesdrop on a conversation between the user and the support agent in order to allow to coach to 'whisper' advice to the agent. This functionality necessitates the ability to establish a two-way conference between the agent and the user, a two-way conference between the coach and the agent, and a listen-only conference between the coach and the user. Creating this type of conference is possible using the conferencing CCXML tags described above. However, it is also possible to do this by using a combination of the join and unjoin CCXML tags. This is accomplished by first using the join tag to bridge the user and the agent, as well as the agent and the supervisor. Once the join operations have completed, the CCXML program can use the join tag to setup a half-duplex connection between the supervisor and the user. These joined conversations can be undone using the unjoin CCXML tag.

Section 9.6: Disconnects

<disconnect>

A CCXML document may disconnect a call leg using the disconnect CCXML tag. The underlying platform will sent the appropriate protocol messages to perform the disconnect, and send an asynchronous event to the CCXML document when the disconnect operation completes.

The disconnect tag looks just like accept:

The definition of <disconnect> is:

<disconnect>
Attribute Name Details
callid Indicates the id of the call leg that should be disconnected. The callid attribute is optional; if omitted, the interpreter will disconnect using the id indicated in the current event being processed.

A disconnected call will result in the generation of a connection.CONNECTION_DISCONNECTED event.

Section 10: Security

In some environments, it may be necessary to authenticate scripts with the execution platform, and vice-versa.

The first aspect of the trust relationship between the script and the platform is whether or not the document server (web server) trusts the platform making the URL request for a CCXML script. This can be strengthened by (1) requiring the platform to use HTTPS (SSL-encrypted HTTP) instead of cleartext HTTP to fetch the script; and (2) by requiring HTTP-style user authentication, i.e., userid and password.

<authenticate>

The second aspect is whether or not the platform trusts the script. This can be strengthened by requiring scripts to contain an <authenticate> tag which provides information necessary to validate the script with the platform or an off-board authentication device such as a RADIUS server, SIP Proxy, or H.323 gatekeeper.

The definition of <authenticate> is:

<authenticate>
Attribute Name Details
server The id of the server to authenticate with. This may be a URL, or an IP address and optional port id.
userid The userid that the script will "login" as.
password The password corresponding to the userid above.

If the authentication is successful, a ccxml.authenticated event will be generated. If the authentication is unsuccessful, a ccxml.exit event will be generated, causing the script to terminate (ccxml.exit events cannot be caught or ignored).

Section 11: Examples

Example 1: Calling Card Application: Caller calls an 800 number and after some interaction with an IVR system places and outbound call to a friend. After talking with his friend, he presses a special key to take him back to the IVR system so that he can place another call.

-----------------calling_card.cxml-----------------

<?xml version="1.0" encoding="UTF-8"?>
<ccxml version="1.0">
  <assign name="currentstate" expr="'initial'"/>
  <eventhandler statevariable="currentstate" id="'ccevthndlr'">

  <transition state="'initial'"
  event="connection.CONNECTION_ALERTING" name="evt">
    <assign name="in_sessionid" expr="evt.sessionid"/>
    <accept sessionid="in_sessionid"/>
    <assign name="currentstate" expr="'in_vxml_session'"/>
    <dialogstart sessionid="in_sessionid" src="'pin.vxml'"/> 
    <!-- VoiceXML dialog is started on a separate thread - see pin.vxml -->
  </transition>

  <transition state="'in_vxml_session'" event="dialog.exit" name="evt">
    <!-- happens when pin.vxml VoiceXML dialog thread exits -->
    <createcall dest="evt.telnum" name="out_sessionid"/> 
    <assign name="currentstate" expr="'calling'"/>
    <dialogstart sessionid="in_sessionid" src="'tryingcall.vxml'" />
    <!-- start another VoiceXML thread for telling the caller about the -->
    <!-- call progress and use of # key to place another call -->
  </transition>

  <transition state="'calling'"
  event="connection.CONNECTION_CONNECTED" name="evt"> 
    <!-- happens when called party picks up the phone -->
    <assign name="out_sessionid" expr="evt.sessionid"/>
    <dialogstart sessionid="out_sessionid" src="'callee.vxml'" />
    <!-- tell the callee he is receiving a call -->
    <assign name="currentstate" expr="'outb_ready_to_join'"/>
  </transition>

  <transition state="'outb_ready_to_join'" event="dialog.exit"  name="evt">
    <!-- happens when callee's vxml dialog (callee.vxml exits) -->
    <join sessionid1="in_sessionid" sessionid2="out_sessionid"/>
    <assign name="currentstate" expr="'wtg_for_joined'" />
  </transition>

  <transition state="'wtg_for_joined'" event="ccxml.joined"  name="evt">
    <assign name="currentstate" expr="'active'" />
  </transition>

  <transition state="'active'" event="dialog.exit"  name="evt">
      <!-- happens when caller hits '#' -->
      <assign name="currentstate" expr="'initial'"/>
      <dialogstart sessionid="in_sessionid" src="'checktime.asp'"/>
      <assign name="currentstate" expr="'in_vxml_session'"/>
  </transition>
       
  </eventhandler>
  <!-- hangup and cleanup have been left out for sake of brevity -->
</ccxml>  

----------------------pin.vxml-------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.0">
  <form id="pin">
    <block> Welcome to Acme's Calling Card </block>
    <field name="pin" type="digits">
      <prompt> Please say your PIN number </prompt>
      <filled>
        <if cond="pin.length != 8">
          <clear namelist="pin"/>
        <else/>
       <assign name="application.pin" expr="pin" />
          <submit next="checktime.asp" namelist="pin"/>
        </if>
      </filled>
    </field>
  </form>
</vxml>

-------------------------tryingcall.vxml--------------------------
<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.0">
  <form>
    <block> Please wait while we try your call. 
    After you are done press # if you want to make another call </block>
    <field name="attnkey" type="digits">  
      <option dtmf="#" value="done" />
      <filled>
        <if cond="attnkey=='done'">
          <exit namelist="attnkey"/>
        <else />
          <clear namelist="attnkey" />
        </if>
      </filled>
    </field>
  </form>
</vxml>
-----------------------callee.vxml---------------------------
<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.0">
  <form>
    <block>You have a call. Connecting</block>
  </form>
</vxml>

----------------vxml doc output by checktime.asp---------------
<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.0">
  <form id="form2">
    <!--.asp consults back-end database before filling this value-->
    <assign name="timeleft" expr="600"/>  
    <block> Time remaining is <value expr="timeleft"/> seconds </block>
    <field name="telnum" type="digits" >
      <prompt> Please speak the telephone number you want to call </prompt>
      <filled>
        <if cond="telnum.length != 7">
          <clear namelist="telnum"/>
        <else/>
          <exit namelist="telnum"/>
        </if>
      </filled>
    </field>
  </form>
</vxml>

Example 2: Conferencing application. Different callers call into a conference through an agreed upon telephone number. When each one of them joins the conference he is told how many people are there in the conference and those already in the conference are informed about a new entrant to the conference. Similarly when someone hangs up, the fact that a conference participant has exited is announced. A conference object is created at the beginning of the conference and is destroyed when all the participants have hung up.

<?xml version="1.0" encoding="UTF-8"?>
<ccxml version="1.0">
  <assign name="currentstate" expr="'initial'"/>
    <eventhandler statevariable="currentstate">
    <transition state="'initial'"
    event="connection.CONNECTION_ALERTING" name="evt">
       <assign name="in_sessionid" expr="evt.sessionid"/>
        <accept callid="in_sessionid"/>
    <submit next="'http://acme.com/conference.asp'"
    namelist="in_sessionid"/>
    </transition>
  </eventhandler>
</ccxml>

==========================================================================
<!-- ccxml page returned by conference.asp to first caller's browser-->

<?xml version="1.0" encoding="UTF-8"?>
<ccxml version="1.0">
  <assign name="currentstate" expr="'starting'"/>
  <assign name="in_sessionid" expr="'[email protected]'"/>
    <!-- above value is the value submitted to conference.asp-->
    <eventhandler statevariable="currentstate">
    <transition state="'starting'" event="" name="evt">
       <!-- this gets triggered for any event or even no event -->
        <createconference name="conf_id"/>
        <submit next="'http://acme.com/conference.asp'"
        namelist="in_sessionid conf_id"/>
    </transition>
  </eventhandler>
</ccxml>

==========================================================================
<!-- ccxml page returned by conference.asp to all callers
(incl first after the above page to create a conference has
been sent to the first caller) -->

<?xml version="1.0" encoding="UTF-8"?>
<ccxml version="1.0">
  <assign name="currentstate" expr="'ready_to_conf'"/>
  <assign name="in_sessionid" expr="'[email protected]'"/>
  <assign name="conf_id" expr="'[email protected]'"/>
    <!-- above values are the values submitted to conference.asp-->
    <eventhandler statevariable="currentstate">
    <transition state="'ready_to_conf'" event="" name="evt">
        <dialogstart sessionid="in_sessionid" src="'vconference.asp'"/>
        <join sessionid1="conf_id" sessionid2="in_sessionid" />
        <dialogstart sessionid="conf_id" src="'newcaller.vxml'"/>
        <assign name="currentstate" expr="'active'" />
    </transition>

    <transition state="'active'"
    event="connection.CONNECTION_DISCONNECTED" name="evt">
      <dialogstart sessionid="conf_id" src="'leave.vxml'"/>
      <submit next="'http://acme.com/teardown.asp'"
      namelist="in_sessionid conf_id"/>
    </transition>
  </eventhandler>
</ccxml>

==========================================================================
<!-- vxml page vconference.asp -->

<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.0">
<form>
  <block> Welcome to the W3C conference. There are already
          <value expr="'3'"/> participants in the conference.
          <!--above value is based on count kept by vconference.asp-->
  </block>
</form>
</vxml>


==========================================================================
<!-- vxml page newcaller.vxml -->

<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.0">
    <form>
        <block> A new participant has entered the conference. </block>
    </form>
</vxml>

==========================================================================
<!-- vxml page leave.vxml -->

<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.0">
    <form>
        <block> Someone just left the conference. </block>
    </form>
</vxml>

==========================================================================
<!-- ccxml page returned by teardown.asp to all but the last participant -->

<?xml version="1.0" encoding="UTF-8"?>
<ccxml version="1.0">
  <assign name="currentstate" expr="'destroying'"/>
  <assign name="conf_id" expr="'[email protected]'"/>
  <assign name="in_sessionid" expr="'[email protected]'"/>
    <!-- above values are the values submitted to teardown.asp-->
    <eventhandler statevariable="currentstate">
    <transition state="'destroying'" event="" name="evt">
      <exit/>     <!-- just exit and destroy the session -->
    </transition>
  </eventhandler>
</ccxml>

==========================================================================
<!-- ccxml page returned by teardown.asp to last participant -->
<?xml version="1.0" encoding="UTF-8"?>
<ccxml version="1.0">
  <assign name="currentstate" expr="'destroying'"/>
  <assign name="conf_id" expr="'[email protected]'"/>
  <assign name="in_sessionid" expr="'[email protected]'"/>
    <!-- above values are the values submitted to teardown.asp -->
    <eventhandler statevariable="currentstate">
    <transition state="'destroying'" event="" name="evt">
      <destroyconference conferenceid="conf_id"/>
      <exit/>
    </transition>
  </eventhandler>
</ccxml>

Example 3: Call Center Customer Support Interactions Acme customer support line wants to run a customer information and support service which allows users to call in, interact with an automated menu system using DTMF and voice. When the customer reaches a menu which requires an operator, the customer is placed in a hold queue for an available operator.

Alternatively, if the customer requests an operator at any point Acme would like to allow the customer to either wait for an operator, or continue navigating the system while in the hold queue. If the customer continues interacting with the automated system while waiting, Acme would like to be able to interrupt periodically with status about the hold queue and offer the customer the option of canceling their request if their question has been answered by the automated system. When an operator is available, the customer's interactions are stopped and the operator is connected.

For training purposes, Acme would also like to be able to have a trainer listening when the customer is connected to the operator and the operator presses a special key to get the help of the trainer. This trainer could interrupt and provide hints to the new operator about how to answer the question. The customer would not be able to hear these hints.

<?xml version="1.0" encoding="UTF-8"?>
<ccxml version="1.0">

  <assign name="currentstate" expr="'initial'"/>
  <assign name="queuemode" expr="'default'"/>

  <eventhandler statevariable="currentstate" id="'cust_supp_handler'">

    <transition state="'initial'"
    event="connection.CONNECTION_ALERTING" name="evt">

      <!-- happens when the customer first calls the call
      center's 800 number -->

      <assign name="in_callerid" expr="evt.callerid"/>
      <assign name="in_session" expr="evt.session"/>

      <accept/>
    </transition>


    <transition state="'initial'"
    event="connection.CONNECTION_CONNECTED" name="evt">

      <!-- happens if our incoming call is finally connected -->

      <assign name="currentstate" expr="'in_mainchoice'"/>

      <!-- start a VoiceXML dialog with the caller
     (see mainchoice.vxml below) -->

      <dialogstart src="'http://acme/mainchoice.vxml'"
      name="dlg_mainchoice"/>
    </transition>


    <transition state="'in_mainchoice'" event="dialog.exit"
    name="evt">

      <!-- happens when mainchoice.vxml dialog exits -->

      <if cond="evt.mainchoice == 'agent'">
        <assign name="caller_telnum" expr="in_callerid"/>
        <submit next="'http://acme/queue_call.asp'"
        namelist="caller_telnum"/>
      </if>

        <!-- assume queue_call.asp (not shown) will interact with the
        call-center's call-queuing system to queue the call. It will also
        send queue status to ccxml program when it receives one from the
        call-queuing system and inform the ccxml program when an agent is
        is available through an inter-browser communication mechanism. -->

      <assign name="currentstate" expr="'in_queue'"/>
   
      <dialogstart src="'queue1.vxml'" name="dlg_queue"/>
    </transition>


    <transition state="'in_queue'" event="user.status" name="evt">

      <!-- happens when status is received from an external program.
      Exit the customer's IVR interaction to announce the status of the
      queue -->

      <dialogterminate sessionid="dlg_queue"/>
    </transition>


    <transition state="'in_queue'" event="dialog.exit" name="evt" >

      <!-- happens when we either exited the IVR interaction or the
      queue status, so depending on the queue mode we fall back in either
      one of them -->
     
      <if cond="queuemode == 'default'">
         <assign name="queuemode" expr="'status'"/>
         <dialogstart src="'queuestate.asp'" name="dlg_queue"/>        
      <else/>
        <assign name="queuemode" expr="'default'"/>
        <dialogstart src="'queue1.vxml'" name="dlg_queue"/>
      </if>
    </transition>

    <transition state="'in_queue'" event="user.agent_avail" name="evt" >

      <!-- happens if an agent is finally available, so we try
     to connect to her/him -->

      <createcall dest="evt.telnum" name="out_callid"/>
      <assign name="currentstate" expr="'calling'"/>
    </transition>


    <transition state="'calling'"
    event="connection.CONNECTION_CONNECTED" name="evt">

      <!-- happens when agent is connected and picks up the phone,
      so we have to terminate the queue dialog -->

      <assign name="out_callid" expr="evt.callid"/>
     
      <dialogterminate sessionid="dlg_queue"/>
    </transition>


    <transition state="'calling'" event="dialog.exit" name="evt">

      <assign name="currentstate" expr="conversing"/>

      <dialogstart src="'agent.asp'"/>

      <join sessionid1="in_callid" sessionid2="out_callid"
      duplex="'full'"/>
    </transition>


    <transition state="'conversing'" name="ev
    event="user.conf_supervisor"t">

      <!-- connect to the supervisor -->

      <createcall dest="evt.telnum" name="sup_callid"/>
    </transition>


    <transition state="'conversing'" name="evt"
    event="connection.CONNECTION_CONNECTED">

      <join sessionid1="in_callid" sessionid2="sup_callid"
      duplex="'half'"/>

      <!-- supervisor can hear the customer
      but customer can't hear supervisor -->

      <join sessionid1="out_sessionid" sessionid2="sup_sessionid"
     duplex="'full'"/>

      <!-- agent and supervisor can hear each other -->

      <assign name="currentstate" expr="'coaching'"/>
    </transition>


    <transition state="'coaching'" name="evt"
    event="connection.CONNECTION_DISCONNECTED">

      <if cond="evt.sessionid==sup_sessionid">

        <!-- the supervisor hung up, kick out the supervisor -->

        <unjoin sessionid1="in_sessionid" sessionid2="sup_sessionid"/>
        <unjoin sessionid1="out_sessionid" sessionid2="sup_sessionid"/>
        <disconnect callid="sup_sessionid"/>

        <assign name="currentstate" expr="conversing"/>

      <else/>

        <!-- the customer or agent hung up -->
        <exit/>
      </if>
    </transition>


    <transition event="ccxml.connection.CONNECTION_DISCONNECTED"
    name="evt">

      <!-- this cleans up all call legs and conference objects -->
      <exit/>
    </transition>

  </eventhandler>
</ccxml>

----------------------mainchoice.vxml-------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.0">
  <form>
    <field name="mainchoice" >
      <prompt>Please press or say 0 to talk to an agent,
      1 for self service</prompt>
      <option dtmf="0" value="agent"> zero </option>
      <option dtmf="1" value="self" > one  </option>

      <filled>
        <if cond="mainchoice=='agent'">
          <exit namelist="mainchoice"/>
        <else/>
          <goto next="http://acme/ivr.vxml" /> <!-- not shown -->
        </if>
      </filled>
    </field>
  </form>
</vxml>

----------------------queue1.vxml-------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.0">
  <form>
    <block>All our customer service agents are busy right now.
    Your call will be answered in the order that it was received.
    We will connect you to an agent when one becomes available.
    Meanwhile you can continue to use our automated system.
    <goto next="http://acme/ivr.vxml"/>
    </block>
  </form>
</vxml>

----------------------queuestat.asp-------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.0">
  <form>
    <block>
    All our customer service agents are still busy. There are
     <value expr="num_queued"/> callers ahead of you in the
     queue. If the automated system has answered your question,
     you can hangup. Otherwise please continue to hold.
    </block>
  </form>
</vxml>

----------------------agent.asp-------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.0">
  <form>
    <field name="attnkey" type="digits">
      <!-- give the agent the ability to contact supervisor
      by hitting the '#' key -->
      <option dtmf="#" value="supervisor" />
      <filled>
        <if cond="attnkey==supervisor">
          <send event="conf_supervisor" dest="ccxml" />
        <else />
          <clear namelist="attnkey" />
        </if>
      </filled>
    </field>
  </form>
</vxml>

Example 4: Personal Assistant This program is not exactly the same as the Personal Assistant text from the Use Case document. However, the Use Case text is somewhat confusing. Here, we present a Personal Assistant which operates as an automated answering service.

A subscriber to this service would receive a phone number to the automated service. When a caller wants to talk to the subscriber, he calls the given number. This automated system asks who the caller is, and records the audio. Then the system calls the current number of the target person, and asks if the call should be connected.

If so, the calls are bridged. If not, then the original caller is warned and disconnected.

<?xml version="1.0" encoding="UTF-8"?>
<ccxml version="1.0">
  <assign name="currentstate" expr="'initial'"/>

  <eventhandler statevariable="currentstate">

    <transition state="'initial'" name="evt"
    event="connection.CONNECTION_ALERTING">

      <assign name="in_callid" expr="evt.callid"/>
      <accept/>
    </transition>

    <transition state="'initial'" name="evt"
    event="connection.CONNECTION_CONNECTED">

      <assign name="currentstate" expr="'welcoming_caller'"/>
      <dialogstart src="'welcome_message.vxml'"/>
    </transition>


    <transition state="'welcoming_caller'" event="dialog.exit" >
      
      <!-- place the caller on hold -->

      <dialogstart src="'holdmusic.vxml'" name="dlg_onhold"/>

      <!-- Contact the target.  The number here is server-generated -->

      <assign name="currentstate" expr="'contacting_target'"/>
      <createcall dest="6505551212" name="out_callid"/>
    </transition>


    <transition state="'contacting_target" name="evt"
   event="connection.CONNECTION_CONNECTED">

      <!-- Ask the target if (s)he would like to accept the call -->
      
      <assign name="currentstate" expr="'waiting_for_target_answer'"/>
      <dialogstart src="outbound_greetings.vxml"/>
    </transition>


    <transition state="'waiting_for_target_answer'" event="dialog.exit"
    name="evt">

      <assign name="accepted" expr="evt.accepted"/>

      <if cond="accepted == 'false'">
     
        <!-- disconnect the called party
       (but still notify the other one) -->   
        <disconnect/>
      </if>

      <assign name="currentstate" expr="'stop_hold'"/>
      <dialogterminate sessionid="dlg_onhold"/>
    </transition>


    <transition state="'stop_hold'" event="dialog.exit" name="evt">

      <if cond="accepted == 'false'">

        <dialogstart sessionid="in_sessionid" src="'vm.vxml'"/>
      <else/>

        <assign name="currentstate" expr="'playing_connecting'"/>
 
        <dialogstart sessionid="in_sessionid" src="'connecting.vxml'"/>
      </if> 
    </transition>


    <transition state="'playing_connecting'" event="dialog.exit"
    name="evt">

      <join sessionid1="in_callid" sessionid2="out_callid"/>
      
      <assign name="currentstate" expr="'talking'"/>
    </transition>


    <transition event="connection.CONNECTION_DISCONNECTED"
    name="evt">

      <if cond="evt.callid == in_callid">
        <exit/>
      </if>
    </transition>
 
  </eventhandler>
</ccxml>

-----------------welcome_message.vxml-----------------
<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.0">
  <form>
    <record name="recording">
      <prompt>
        <audio>This is the Personal Assistant.
        Please state your name.</audio>
      </prompt>

      <filled>
        <audio>OK, thanks.</audio>
        <submit next="postRecordingAndExit.vxml"
        namelist="recording"/>
      </filled>

      <default>
        <audio>Please speak up.</audio>
        <reprompt />
      </default>
    </record>
  </form>
</vxml>

-----------------outbound_greetings.vxml-----------------
<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.0">
  <form>
    <field id="answer">
      <grammar src="yesnogrammar"/>
      <prompt>
        <audio>Hi, you have a message from</audio>
        <audio src="dynamicallyRecordedName.wav"/>
        <audio> Would you like to take it?  Say Yes, or No.</audio>
      </prompt>

      <filled>
        <if cond="answer==yes">
          <audio>OK, connecting.</audio>
          <assign name="willaccept" value="true"/>
          <exit namelist="willaccept"/>
        <elseif cond="answer==no" />
          <audio>OK, goodbye. </audio>
          <assign name="willaccept" value="false"/>
          <exit namelist="willaccept"/>
        </if>
      </filled>

      <default>
        <audio>Please speak up</audio>
        <reprompt />
      </default>
    </field>
  </form>
</vxml>

-----------------vm.vxml-----------------
<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.0">
    <form>
      <prompt>Transferring you to voice mail. </prompt>
      <block>
        <goto next="voicemail.vxml" />
      </block>
    </form>
</vxml>

-----------------connecting.vxml-----------------
<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.0">
    <form>
      <prompt>Your party is ready.  Connecting... </prompt>
      <block>
        <exit />
      </block>
    </form>
</vxml>

<!-- holdmusic.vxml could be anything.  It doesn't
matter, and is not  included here -->

<!-- postRecordingAndExit.vxml writes and exposes the WAV file,
and then exits.  It's not interesting enough to include here. -->

Appendix A - Future Study

While the current proposal of CCXML adequately addresses the requirements and scenarios that we can imagine for call control, there are still several issues left to resolve. These include:

Separately-executing CCXML programs may want to exchange information for any number of reasons. They could be bridging two calls, or work together as part of a larger, multi threaded call application. We have not specified how to get addresses of other CCXML programs on the same browser. We could also specify a protocol for CCXML programs on separate browsers to communicate. We need to discuss both issues further.

Asynchronous communication from the document server or the network to the voice browser is sometimes desirable to tell the voice browser that it needs to initiate an action. For example, if the caller has been put on hold pending the availability of an agent, the ACD or call router would like to asynchronously inform the voice browser when it should perform a bridge to the call center. This communication could take the form of an http request from the document server or the network to the voice browser. Servicing such requests could mean running a http server on the voice browser, but it could also be provided through other mechanisms, e.g. SIP. It is not clear whether this specification should even try to specify the mechanism, or simply leave the matter to browser implementors.

Outbound Notification is the ability of a web application to initiate one or more telephone calls to different recipients in order to inform them of certain important events. For example, a traffic reports service could call its subscribers if there is an accident on their commute routes. A financial firm could offer a service where an application calls clients when certain financial events take place; e.g., when an interesting stock rises or drops past a certain value. After calling the client, the VoiceXML application could interact with the callee, allowing him to place trades or check other market conditions. For these scenarios, the web application must be able to asynchronously inform the voice browser that it should execute CCXML/VoiceXML documents at a specified URL. The CCXML and VoiceXML pages can then control the behavior of the voice browser and can make it place outbound calls. Again, it is not clear whether we should specify this interface, or leave the matter to browser implementors.

The scenarios described above need further study before they can be included in the CCXML specification.

Appendix B - Related Work

CPL

The Call Processing Language (CPL) is an XML based language that can be used to describe and control Internet telephony services. Its focus is user scripting of call handling behavior for incoming calls. It is designed to to be suitable for running on a server where users may not be allowed to execute arbitrary programs, and so is not Turing-complete.

The latest version of CPL can be found linked from the IETF IP Telephony (IPTEL) working group charter page.

CallXML

CallXML is a markup language created by Voxeo corporation that includes both voice and call-control functionality. Most of the following is from the CallXML 2.0 specification: CallXML is an XML based markup language used to describe the user interface of a telephone, voice over IP, or multi-media call application to a CallXML browser.

CallXML was designed to make it easy for web developers to create applications that can interact with and control any number or type of calls, including:

VoiceXML is designed to make it easy for web developers to create voice recognition based interfaces for either telephones or computer-based applications. As such, VoiceXML is an excellent solution for voice based applications which provide access to web content and information, including:

Because of the natural complexity associated with dealing with voice commands, VoiceXML uses a relatively complex form/field/grammar/filled interface model in its design. In contrast, CallXML uses a more simplified block / action / event interface model, which can be easier to learn and which allows for visual design tools which directly represent CallXML markup as simple flow-chart like user interfaces. See http://community.voxeo.com

TXML

TXML (Telera's Extensible Markup Language) is an XML based language designed by Telera for remotely controlling the behavior of Point of Presence (POP)call-centers. The POP call-center system is a Telera invention capable of answering, servicing, queuing and routing of calls at local points of presence to reduce communication costs of toll-free inbound contact centers.

TXML provides the syntax for the XML Pages, which are generated at the customer's application at the premises and used by a POP server to execute actions on behalf of the customer's contact center . The XML Pages are simple ASCII text files that are either stored in a Web server's directory at the premises or generated by scripts at the premises server. The XMLPages are requested from the premises server via HTTP requests made by a client on the POP gateway.

The language includes elements for

Go here for more Telera documentation.

SIP

SIP, the Session Initiation Protocol, is a signaling protocol for Internet conferencing, telephony, presence, events notification and instant messaging. As a signaling protocol, SIP sits "below" the application description level of VoiceXML and CCXML. We expect many CCXML and VoiceXML browsers to support SIP signaling. (These same comments apply to H.323 and MGCP/Megaco.)

Dynamicsoft has described a method for an application server to utilize a VoiceXML browser as a network resource. This is an interesting approach, but there still needs to be a mechanism for describing the call-control logic. An application server product from dynamicsoft provides language bindings in Java Servlets, CPL, and a general CGI interface.

See http://www.dynamicsoft.com/resources/pdf/AppServWhitePaper.pdf

Appendix C - Known Issues

Appendix D - CCXML DTD

<!ENTITY % uri        "CDATA" >
          <!ENTITY % field.name  "NMTOKEN" >
<!ENTITY % expression  "CDATA" >
<!ENTITY % boolean   "(true | false)" >
<!ENTITY % content.type "CDATA" >
<!ENTITY % item.attrs
        "name   %field.name #IMPLIED
    cond    %expression; #IMPLIED
   expr    %expression; #IMPLIED" >
<!ENTITY % duration   "CDATA" >

<!ELEMENT CCXML
  (assign |var | eventhandler ) >
<!ATTLIST CCXML
          version       CDATA           #REQUIRED >
<!ELEMENT authenticate EMPTY>
<!ATTLIST authenticate
    server %uri; #REQUIRED
    userid CDATA #REQUIRED
    password CDATA #REQUIRED
>

<!ELEMENT assign       EMPTY >
<!ATTLIST assign
          name          %field.name;   #REQUIRED
          expr          %expression;   #REQUIRED >
<!ELEMENT var      EMPTY >
<!ATTLIST var
          name          %field.name;  #REQUIRED
          expr          %expression;   #REQUIRED >

<!ELEMENT eventhandler
          (transition)*
  <!ATTLIST eventhandler
          id       ID  #IMPLIED
          state     NMTOKEN #REQUIRED >

<!ELEMENT transition
         (accept | createcall | join | unjoin |
          createconference | destroyconference | dialogstart|
          dialogterminate | send | disconnect |
           assign | var | if | fetch | goto | submit | exit  ) >
<!ATTLIST transition
          state           NMTOKEN     #REQUIRED
          event           NMTOKEN      #REQUIRED
           cond            %expression;    #IMPLIED
           name            NMTOKEN      #IMPLIED >

<!ELEMENT accept          EMPTY >
<!ATTLIST accept
          callid       ID     #REQUIRED >

<!ELEMENT createcall EMPTY >
<!ATTLIST createcall
          name            NMTOKEN             #REQUIRED
          dest            %uri                #REQUIRED >

<!ELEMENT join EMPTY >
<!ATTLIST join
          sessionid1      ID                  #REQUIRED
           sessionid2      ID                  #REQUIRED
           duplex          (full | half)       'full' >

<!ELEMENT unjoin EMPTY >
<!ATTLIST unjoin
          sessionid1      ID       #REQUIRED
           sessionid2      ID      #REQUIRED>
   
    <!ELEMENT createconference   EMPTY >
<!ATTLIST createconference
          name       NMTOKEN     #REQUIRED >

<!ELEMENT destroyconference    EMPTY >
<!ATTLIST destroyconference
          conferenceid  ID          #REQUIRED >


<!ELEMENT dialogstart ANY>
<!ATTLIST dialogstart
    callid NMTOKEN "_event.source"
    src %uri; #REQUIRED
    type %content.type; "application/xml+vxml"
    name NMTOKEN "_sessionid"
>
<!ELEMENT dialogterminate EMPTY>
<!ATTLIST dialogterminate
    sessionid NMTOKEN "_sessionid"
>

<!ELEMENT send  EMPTY >  
    <!ATTLIST send
        event      NMTOKEN #REQUIRED
         target        ID  #IMPLIED
        name        NMTOKEN #IMPLIED
        delay       %duration;  #IMPLIED
        namelist            %field.names    #IMPLIED>

<!ELEMENT disconnect EMPTY >  
    <!ATTLIST disconnect
        callid      NMTOKEN      "_event.source" >

<!ELEMENT createccxml  EMPTY >
<!ATTLIST createccxml
        src            %uri;  #REQUIRED >

<!ELEMENT exit  EMPTY >
<!ATTLIST exit
    expr %expression; "0"
    namelist %field.names; #IMPLIED
>

<!ELEMENT if   (%executable.content; | elseif | else)* >    
    <!ATTLIST if
 cond        CDATA       #REQUIRED >
<!ELEMENT elseif   EMPTY >  
    <!ATTLIST elseif
 cond        CDATA       #REQUIRED >
<!ELEMENT else EMPTY >  
<!ATTLIST else EMPTY >

<!ELEMENT fetch    EMPTY >  
    <!ATTLIST fetch
  next        %uri        #REQUIRED >
<!ELEMENT goto EMPTY >  
    <!ATTLIST goto
   next        %uri        #REQUIRED >
<!ELEMENT submit   EMPTY >  
    <!ATTLIST submit
        next %uri        #REQUIRED
         namelist  %field.names    #IMPLIED
        method  (get|post)  'get'
        enctype    %content.type; 'application/x-www-format-formurlencoded'>
<!ELEMENT reject EMPTY>
<!ATTLIST reject
    callid NMTOKEN "_event.source">

Appendix E - Acknowledgments

This W3C specification is based upon CCXML 1.0 submitted in April 2001. The CCXML authors were:

This version was written with the participation of members of the W3C Voice Browser Working Group.