Copyright © 2000 Nuance Communications, Inc. All rights
reserved.
This document describes SpeechObjects, a core set of reusable dialog components that are callable through a dialog markup language such as VoiceXML, to perform specific dialog tasks, for example, get a date or a credit card number, etc. The major goal of SpeechObjects is to complement the capabilities of the dialog markup language and to leverage best practices and reusable component technology in the development of speech applications.
This document is a submission to the World Wide Web Consortium from Nuance Communications, Inc. (see Submission Request, W3C Staff Comment). For a full list of all acknowledged Submissions, please see Acknowledged Submissions to W3C.
This document is a Note made available by W3C for discussion only. This work does not imply endorsement by, or the consensus of the W3C membership, nor that W3C has, is, or will be allocating any resources to the issues addressed by the Note. This document is a work in progress and may be updated, replaced, or rendered obsolete by other documents at any time.
A list of current W3C technical documents can be found at the Technical Reports page.
SpeechObjects are reusable software components that encapsulate discrete pieces of conversational dialog. SpeechObjects are based on an open architecture that can be deployed on any of the major server and IVR (interactive voice response) platforms. This paper describes a specification based on Nuance's Java implementation of SpeechObjects.
Simply stated, a SpeechObject is a reusable software component that implements a dialog flow and is packaged with the audio prompts and recognition grammars that support that dialog. A Java call to a SpeechObject can be as simple as
// Initialize the SpeechObject
SODate date = new SODate();
// Invoke the SpeechObject
SODate = date.invoke(sc, dc, cs);
// Look at the results
int month = result.getMonth();
int day = result.getDayOfMonth();
int year = result.getYear();
In this document we will present both the configuration parameters (JavaBean properties) and the return values for each of the SpeechObjects.
The Java SpeechObjects architecture was designed to be portable and extensible, as well as easy to use. To this end SpeechObjects are all based on a primary interface, SpeechObject. This simple interface defines:
From the SpeechObject interface and a set of supporting interfaces, SpeechObject developers can build objects of any complexity that can be run with a single method call. The invoke method for any given SpeechObject executes the entire dialog for that SpeechObject. A simple invoke method might just play a standard prompt, wait for speech, and return the results after recognition completes. A more complicated invoke method could include multiple dialog states, smart prompts, intelligent error handling for both user and system errors, context-sensitive help, and any other features built in by the SpeechObject developer.
To call a SpeechObject from your application, however, doesn't require you to know anything about how the invoke method is implemented. You only need to provide the correct arguments and know what information you want to extract from the results.
The SpeechChannel is the object that provides recognition functionality to a SpeechObject. When an application is launched, the environment allocates a SpeechChannel for each supported port. This SpeechChannel is passed to the application for each incoming call and persists until the application terminates. The SpeechObjects that make up the application use the SpeechChannel to interact with the caller-requesting recognition services, playing prompts, setting configuration parameters, and so on. Telephony control is an optional component of the SpeechChannel.
In short, SpeechObjects
This section briefly describes the process of invoking a SpeechObject as a motivation for the runtime requirements and then presents the SpeechChannel.
To use a SpeechObject from your application, you simply instantiate it and call its invoke method.
The invoke method executes the dialog defined by the SpeechObject, and returns an instance of the Result class used by that SpeechObject. This result provides your application with the data that was accumulated during the dialog.
The invoke method takes several arguments:
In order for the invocation described above to work, a platform must implement a SpeechChannel and provide a launcher that creates the DialogContext and CallState and invokes the object.
The SpeechChannel is an integral part of a SpeechObjects application. Application developers use SpeechObjects to implement the dialog flow, and SpeechObject developers use SpeechChannel methods to implement the recognition functionality of the dialog.
This section describes the abstract SpeechChannel architecture in more detail.
SpeechChannel interfaces
Functionality provided by the SpeechChannel is actually separated into five interfaces: the main speech channel interface that provides recognition functions, and four separate interfaces that define the functionality for:
The SpeechChannel is the primary object and provides access to the corresponding implementation of the other interfaces. SpeechObjects work with the single SpeechChannel object passed to them and can access the other interfaces when needed:
These interfaces can be implemented in the same class, or in separate classes, as appropriate for the platform. In either case, the SpeechChannel interface defines methods that return each of the other interfaces. For example, if a SpeechObject wanted to access dynamic grammar functionality, it would call the SpeechChannel getDynamicGrammarControl method and use the returned object to make dynamic grammar requests.
This section briefly describes some concepts underlying SpeechObjects that may provide a context for understanding the specific parameters and return values described in Section 4.
A universal command is an utterance that the speaker can say at any point during any dialog. The framework includes a grammar allowing recognition of a small set of universal commands, and provides default handling when these utterances are recognized. The universal commands currently defined by the SpeechObjects framework are:
Through method calls the application can substitute its own handling for any of the supported universals, including disabling them.
As mentioned earlier, these universal commands are based on standards proposed by the Telephone Speech Standards Committee (TSCC).
All SpeechObjects provide default handling, including prompts and logic (and grammar adjustments, if necessary), for all of the common recognizer error conditions: rejection, no speech timeout, too much speech, spoke too early, recognizer too slow, and unexpected key.
The default error handlers for each of these error types play an error prompt and then attempt to reexecute the dialog. The error prompt is generated by combining an application-wide error prompt that is specific to the type of error with a generic prompt provided by the current SpeechObject. For example, if a no-speech timeout occurs while a Yes/No SpeechObject dialog is executing, the framework concatenates the application-wide error prompt "I'm sorry, I didn't hear you" and the Object error prompt "Please say 'yes' or 'no'."
The default error handling mechanism continually reexecutes the dialog until a valid response is generated or the error threshold for the object is reached. When the threshold is reached, an exception is thrown and you can implement whatever error handling behavior you prefer, such as transferring the caller to a live agent. You can also override the default error handlers for any of the defined error types through class method calls.
All prompts can be overridden through configuration parameters.
All prompts used by SpeechObjects are encapsulated using classes that implement the core interface Playable, which defines the protocol for objects that can be played over an audio channel. The framework defines a set of classes that implement Playable that provide additional prompt behavior, including:
SpeechObjects are designed to let you define grammars in a variety of ways, based on the requirements for each dialog and the need to customize the grammar at runtime. Most SpeechObjects use the Nuance dynamic grammar mechanism, meaning that the grammar for a given SpeechObject is compiled and loaded onto the recognition server when the SpeechObject is constructed. This allows SpeechObjects to be reused much more easily, as you don't have to compile the grammar for each SpeechObject into a recognition package before using it.
The SpeechObjects framework grammar classes let you build your grammars in a variety of ways:
You can also initialize a grammar from a file and subsequently update it programmatically.
The framework also allows compound grammars. Compound grammars let you define a single grammar object comprised of multiple grammars to be used in parallel. For example, in a corporate dialing application you might use compound grammars containing a set of employee names and a set of employee extensions, to allow speaker to dial either by name or number. The framework uses compound grammars to combine each SpeechObject's grammar with the grammar defining the set of Universal commands.
The Result class is a subclass of a utility class KVSet, which defines an object used to encapsulate a set of key/value pairs. This structure is analogous to natural language slots and the values they are filled in with during recognition. Because the value stored in a KVSet can be any type of object, SpeechObjects have the flexibility to populate Result objects with any set of values that are appropriate. For example:
The value at any given key can also be another KVSet, providing the ability to nest result structures if appropriate.
Each Result class defined by the SpeechObjects includes convenience methods allowing easy access to the specific information it encapsulates.
Result subclasses also have another characteristic, which is that they can be played over the current audio output device. They implement the Playable interface, which allows objects to be appended to the prompt queue and then played by a SpeechChannel or other object that supports audio playback.
This lets you easily play the recognized information, for example, for confirmation dialogs or during testing.
Many SpeechObjects are used in conjunction with ConfirmAndCorrect, which confirms all of the information obtained by those SpeechObjects, and upon a negative confirmation, identifies which information needs to be corrected (e.g., "Would you like to change the date, the time, or the telephone number?"). The SpeechObjects corresponding to the piece ("the date") or pieces ("the date and the time") of information that need correction is then re-invoked (e.g., re-invoke the Date object), prompting the user for the information again.
To promote better dialog, rather than simply re-invoking the same SpeechObject again during this error-correcting phase, a SpeechObject may offer a "RedoObject" which should be used to re-obtain the desired information. This RedoObject may simply ask for the information in a different manner, by changing prompts as appropriate ("Please say the date again."). Alternatively, the "RedoObject" may actually employ a different dialog strategy, perhaps breaking up the task into a set of smaller tasks in order to facilitate recognition of complex items. RedoObjects typically share the same SOKey ('object instance name') as their original SpeechObject in order to share n-best information from the original SpeechObject. SpeechObjects that do not employ a RedoObject may return "null" to indicate that the same instance should be used during this error-correction phase of a confirmation dialog.
Many of the SpeechObjects implement the Identifiable interface, which enables them to be used in the Confirm and Correct SpeechObject. The Identification phase of the Confirm and Correct process makes use of
Here's a sample of how the prompt might be used by the system (with the prompt highlighted):
Which would you like to change - the departure city or the arrival city?
The corresponding grammar expression for a derived ArrivalCity SpeechObject might then accept phrases like "the arrival city", "the destination", or "destination city".
Many SpeechObjects inherit parameters, return values, and behavior from other SpeechObjects. These relationships are helpful in understanding what parameters might possibly be common (in syntax and behavior) across a large number of Objects. A simplified inheritance diagram for all of the SpeechObjects in this document is shown below.
Although SpeechObjects as implemented in Java have method calls for setting and getting various values, the specification below is restricted to listing only JavaBean properties of the SpeechObjects, i.e. properties for which there are both "get" and "set" methods. While this restriction limits configuration to discrete parameters which may be changed but not added to (1), it also results in a cleaner interface for the users of the Objects - these properties may be edited in a GUI, set and retrieved in a scripting environment, etc.
Parameter type and return type descriptions can be found in the appendix.
Configuration parameters:
Parameter |
Type |
Description |
RedoObject |
SpeechObject |
New object to call in case the caller negatively confirms the result from the original object in a confirmation scenario |
SOKey |
String |
Name for this instance's family (i.e., the object itself plus any redo objects for this object) |
Return values:
Return value |
Type |
Description |
getNextResult |
Next Result in n-best list, or null if no more |
|
requiredAdditionalInteraction |
boolean |
Boolean indicating whether or not additional interaction between the SO and the caller was required in obtaining this result. Typically, this means that the SO has already done any needed disambiguation |
isAutoConfirmed |
boolean |
Boolean indicating whether or not this Result has already been confirmed |
Description:
This SpeechObject does not implement a specific dialog -- it simply provides the framework for a dialog. The default behavior is:
- Append the SpeechObject's initial prompt to the current prompt buffer and play the buffer
- Wait for speech and send it to the recognizer for recognition, using the top-level grammar currently set by the SpeechObject
- If recognition was successful, pass the result to the SpeechObject's result processing methods and return the final result
- If recognition was not successful, perform the necessary error handling and attempt the dialog again
Configuration parameters:
Parameter |
Type |
Description |
Filter |
Used to examine n-best SpeechObject.Results and filter out invalid results |
|
Grammar |
The grammar used for recognition |
|
HelpPrompt |
This prompt is played if the user requests help |
|
InitialPrompt |
Unless an error occurs or the user requests help, this is the prompt that is played before recognition |
|
MaxErrorCount |
int |
The maximum number of errors (rejections, timeouts, or unexpected dtmf keypresses) permitted before the SpeechObject gives up |
MaxHelpCount |
int |
The maximum number of help requests permitted before the SpeechObject gives up |
NoResultFoundPrompt |
This prompt is played after a valid recognition but when none of the candidates in the n-best list are successfully processed into a SpeechObject.Result (e.g. if the entries fail to pass this Object's Filter) |
|
NoSpeechTimeoutPrompt |
This prompt is played when a recognition error code of "no speech timeout" is returned by the recognizer |
|
RecognitionErrorPrompt |
This prompt is played by default when a recognition error occurs unless a more specific error prompt is defined |
|
RecognizerTooSlowTimeoutPrompt |
This prompt is played when a recognition error code of "recognizer too slow timeout" is returned by the recognizer |
|
RejectedPrompt |
This prompt is played when a recognition error code of "rejected" is returned by the recognizer |
|
ReturnAllPossibleResults |
boolean |
If true, this Object returns an entire n-best list of SpeechObject.Results. Otherwise, it will return only the first valid result it interprets and processes |
SpeechTooEarlyPrompt |
This prompt is played when a recognition error code of "speech too early" is returned by the recognizer |
|
TooMuchSpeechTimeoutPrompt |
This prompt is played when a recognition error code of "too much speech timeout" is returned by the recognizer |
|
UnexpectedKeyPrompt |
This prompt is played when a recognition error code of "unexpected_key" is returned by the recognizer |
Return results:
Return value |
Type |
Description |
toString |
String |
A String representation of this Object's recognized result |
Description:
This SpeechObject expects an answer to a yes-or-no question.
Configuration parameters:
All configuration parameters of the Dialog SpeechObject, plus
Parameter |
Type |
Description |
IdentifyExpression |
Identification expression for the Identifiable interface |
|
IdentifyPrompt |
Identification prompt for the Identifiable interface |
|
StrictGrammar |
boolean |
If true, loads and uses a limited (strict) grammar to maximize performance |
Return results:
All return results of the Dialog SpeechObject, plus
Return value |
Type |
Description |
YesNo |
String |
String indicating yes or no |
saidYes |
boolean |
True if the user said yes, false otherwise |
saidNo |
boolean |
True if the user said no, false otherwise |
Description:
This SpeechObject recognizes quantities of items. By default, this SpeechObject recognizes 1-4 digit (0-9,999) quantities and has an absolute range of 1-8 digits (0-99,999,999). A developer can (and should) configure this SpeechObject to recognize quantities only within a certain range by setting the minDigits and maxDigits properties, as appropriate for a specific domain and application. The Quantity SpeechObject does not itself perform any confirmation or validity checking. The range of numbers that the speaker is allowed to say is limited by limiting the grammar used for recognition to that range. If the speaker says a number that is out of the current range, the utterance is rejected by the recognizer.
Configuration parameters:
All configuration parameters of the Dialog SpeechObject, plus
Parameter |
Type |
Description |
IdentifyExpression |
Identification expression for the Identifiable interface |
|
IdentifyPrompt |
Identification prompt for the Identifiable interface |
|
MaxDigits |
int |
Maximum allowed number of digits for the quantity that will be recognized (e.g. 4 => '9999') |
MinDigits |
int |
Minimum allowed number of digits for the quantity that will be recognized (e.g. 2 => '10') |
Return results:
All return results of the Dialog SpeechObject, plus
Return value |
Type |
Description |
Quantity |
int |
The quantity recognized |
Description:
This SpeechObject can be configured to recognize a string of digits of a fixed length. When NumberDigits is set, the SpeechObject automatically creates a grammar for recognizing that number of digits (without natural numbers). The Simple Digit String Speech Object does not itself perform any confirmation or validity checking. If there are specific constraints on what constitutes a valid number string for the controlling application, using the result filter mechanism to filter out inconsistent hypotheses is highly recommended.
Configuration parameters:
All configuration parameters of the Dialog SpeechObject, plus
Parameter |
Type |
Description |
IdentifyExpression |
Identification expression for the Identifiable interface |
|
IdentifyPrompt |
Identification prompt for the Identifiable interface |
|
NumberDigits |
int |
Number of digits to be recognized |
Return results:
All return results of the Dialog SpeechObject, plus
Return value |
Type |
Description |
DigitString |
String |
The recognized digit string |
Description:
This SpeechObject prompts for and interprets a date. The date may be specified in one of many formats, including just getting a day of the week, a day of the month, a relative date (today, tomorrow, yesterday, next tuesday), and so forth. More complex expressions which specify the date in multiple ways are allowed (tomorrow, December 12th); consistency of such dates are checked, and if the date is inconsistent, the user will be reprompted for the date with an appropriate error message. Invalid dates, such as April 31, are similarly disallowed, causing reprompting for a date.
The speech object makes an effort to interpret the date intelligently:
If the day of week is given, such as Thursday, the SpeechObject interprets the date as if it were the upcoming Thursday. For example, if today is Monday, February 15, 1999, and the caller said "Thursday", the SpeechObject would interpret this as Thursday, February 18, 1999.
If the day of month is given, such as the 27th, and this day is later than the current day (for example, February 15), this SpeechObject assumes the date is in the same month as the current date. For example, the 27th would be interpreted as Saturday, February 27, 1999.
If the day of month is given, such as the 5th, and this day is a number less than the current date (for example, February 15), the SpeechObject assumes the day is for the next month. In this example, the 5th would be interpreted as Friday, March 5, 1999. If the next month is January, then the SpeechObject assumes the date is in the following year as well.
If the month is before the current month, the Date SpeechObject assumes the caller intends this date in the following year. For example, if the caller said January 3, this would be interpreted as January 3, 2000. If the caller says "today", the SpeechObject determines the current date unless specified by the developer.
When the caller says only a month, the SpeechObject will follow up by prompting the caller to specify the day of the month. This is actually implemented by the invocation of a default DisambiguateTime SpeechObject, which may be overridden.
The SpeechObject performs the following validation checking of the recognized date:
When inconsistent information is provided by the caller, such as a conflicting day of month and day of week (for example, Tuesday, February 15, 1999), the Date SpeechObject plays a prompt that identifies the correct information (February 15th is a Monday) and then reprompts the caller.
Likewise, if the Modifier such as "today" is inconsistent with the day of month, the SpeechObject will play a prompt specifying what 'today's' date is and reprompt the caller.
Invalid date handling:
When the caller responds with an invalid date such as "February 30", the SpeechObject plays a prompt that explains why this date is invalid "... there are only 30 days in April," and then reprompts the caller.
Configuration parameters:
All configuration parameters of the Dialog SpeechObject, plus
Parameter |
Type |
Description |
DateTooEarlyPrompt |
Prompt played if the stated date is before the lower DateLimit, e.g. "I'm sorry, I thought you said 'February 12th, 1985' but that day is too far in the past" |
|
DateTooLatePrompt |
Prompt played if the stated date is after the upper DateLimit, e.g. "I'm sorry, I thought you said 'February 12th, 2085' but that day is too far in the future" |
|
DayOfMonthSO |
SODayOfMonth instance used to obtain a day of month when just the month or just the month and year are specified |
|
IdentifyExpression |
Identification expression for the Identifiable interface |
|
IdentifyPrompt |
Identification prompt for the Identifiable interface |
|
InconsistentDayOfWeekPrompt |
Prompt played if the stated date includes a day of week that doesn't match, e.g. "I'm sorry, I thought you said 'Tuesday, December 10th', but December 10th is a Friday." |
|
InconsistentModifiedDayOfWeekPrompt |
Prompt played if the stated date includes a modified day of week that doesn't match, e.g. "I'm sorry, I thought you said 'next Tuesday, December 10th', but next Tuesday is December 14th." |
|
InconsistentNamedTodayEtcPrompt |
Prompt played if the stated date includes a today expression as well as a day of week that actually refers to another date, e.g. "I'm sorry, I thought you said 'today, Tuesday, December 10th', but today is December 14th." |
|
InconsistentTodayEtcPrompt |
Prompt played if the stated date includes a today expression that refers to another date, e.g.,"I'm sorry, I thought you said 'today, December tenth', but today is December fourteenth." |
|
InvalidDatePrompt |
Prompt played if the stated date is invalid (the day of month exceeds the number for the month |
|
LowerDate |
java.util.Calendar or int or SODate.DateLimit |
Earliest permissible date, represented by a Calendar object, an offset in days, or a DateLimit object |
UpperDate |
java.util.Calendar or int or SODate.DateLimit |
Latest permissible date, represented by a Calendar object, an offset in days, or a DateLimit object |
Return results:
All return results of the Dialog SpeechObject, plus
Return value |
Type |
Description |
Calendar |
java.util.Calendar |
The date as a Calendar object |
DayOfMonth |
int |
Day of month specified by the caller |
DayOfWeek |
int |
Day of week specified by the caller |
Month |
int |
Month represented as an integer between 1 and 12 |
Year |
int |
Year represented as a four-digit integer |
Description:
This SpeechObject defines a generic dialog for getting a time expression from the speaker.
The Time SpeechObject is generic and may be specialized (through modification of parameters, prompts, and/or grammars) for use in a range of applications, for example, flight information and reservation systems, personal agenda management, or package delivery/pickup scheduling.
In response to a prompt requesting the time, the caller speaks the time in a natural way (i.e. the time using natural expressions such as "in the morning" or "at night" as well as "am" or "pm".) The Time SpeechObject recognizes a clock time, for example, "three forty-five am". If the time is ambiguous (am/pm not specified), the SpeechObject conducts any additional dialog with the caller needed to ensure that an unambiguous time is obtained. This dialog is implemented by invoking an instance of the DisambiguateTime SpeechObject.
Configuration parameters:
All configuration parameters of the Dialog SpeechObject, plus
Parameter |
Type |
Description |
DisambigObject |
Sets object that disambiguates ambiguous times, e.g. '10' => 10 am or 10 pm |
|
IdentifyExpression |
Identification expression for the Identifiable interface |
|
IdentifyPrompt |
Identification prompt for the Identifiable interface |
|
InconsistentTimePrompt |
Prompt played when the user response in the disambiguation dialog is inconsistent with the original time they said. For example, the user is asked to disambiguate if 11 o'clock is in the morning or evening, and replies "in the afternoon". |
Return results:
All return results of the Dialog SpeechObject, plus
Return value |
Type |
Description |
AM_PM |
String |
Returns whether the time said by the caller was AM or PM |
Calendar |
java.util.Calendar |
Returns the time in a Calendar representation |
ClockTime |
int |
A numerical representation of the time |
ClockTimePlayable |
The time as a Playable in the standard format (with trailing "am" or "pm") |
|
Hours |
int |
The hour portion of the time said by the caller |
Minutes |
int |
The minutes portion of the time said by the caller |
SmartTimePlayable |
A time Playable in the "intelligent" (colloquial) format (for example, "7 in the evening", "5 in the morning", "noon") |
|
UserStatedModifier |
String |
Any user-stated modifier that disambiguated the time |
Description:
The Menu SpeechObject does not itself define a default dialog. The dialog is generated dynamically based on the number of items defined. The dialog presents the list of menu items and allows the caller to choose one of them. It enables the developer to dynamically build menus from pairs of grammars and prompt atoms, and in addition it permits the developer to associate a listener with any of the items so that the listener's action is performed in response to selecting the item.
The menu may be defined dynamically by calling a method that adds menu items sometime before invocation. Each menu item is defined in terms of:
- an item name, which is the text string returned in the result if the item is selected
- an optional Playable for representing the item in the menu prompts if they are autogenerated
- an optional grammar expression to trigger selection of this item
Note that at this time the menu items cannot be set merely by setting JavaBean properties.
Configuration parameters:
All configuration parameters of the Dialog SpeechObject, plus
Parameter |
Type |
Description |
ErrorPromptPostfix |
Audio to use as the postfix of the error prompt (if the prompt is being auto-generated) |
|
ErrorPromptPrefix |
Audio to use as the prefix of the error prompt (if the prompt is being auto-generated). |
|
HelpPromptPostfix |
Audio to use as the postfix of the help prompt (if the prompt is being auto-generated) |
|
HelpPromptPrefix |
Audio to use as the prefix of the help prompt (if the prompt is being auto-generated) |
|
IdentifyExpression |
Identification expression for the Identifiable interface |
|
IdentifyPrompt |
Identification prompt for the Identifiable interface |
|
InitialPromptPostfix |
Audio to use as the postfix of the initial prompt (if the prompt is being auto-generated) |
|
InitialPromptPrefix |
Audio to use as the prefix of the initial prompt (if the prompt is being auto-generated) |
|
ItemListPrompt |
Prompt that is an explicit listing of all the menu items |
Return results:
All return results of the Dialog SpeechObject, plus
Return value |
Type |
Description |
ItemName |
String |
The name of the selected item |
RecResult |
The recognition result of the interaction |
Description:
This SpeechObject prompts for and recognizes a dollar and cent amount in one utterance. If neccessary, disambiguation is performed, for utterances like "seven fifty". This disambiguation is performed by invoking a default instance of the DisambiguateCurrency SpeechObject (which may of course be overridden). This SpeechObject provides a DTMF backoff strategy if the caller encounters recognition problems.
Configuration parameters:
All configuration parameters of the Dialog SpeechObject, plus
Parameter |
Type |
Description |
DisambigObject |
Object that disambiguates ambiguous currencies, e.g. 'ten fifty' => $10.50 or $1050 |
|
IdentifyExpression |
Identification expression for the Identifiable interface |
|
IdentifyPrompt |
Identification prompt for the Identifiable interface |
|
Range |
Sets the allowed value range (also propagated to the disambiguation object) |
Return results:
All return results of the Dialog SpeechObject, plus
Return value |
Type |
Description |
Amount |
float |
Floating point number indicating the recognized amount of dollars and cents |
Cents |
int |
Integer indicating the recognized amount of cents |
Dollars |
int |
Integer indicating the recognized dollar amount, not including cents |
North American Telephone Number
Description:
This SpeechObject prompts for and obtains a telephone number from the user, in the standard 10-digit format used in Canada, Mexico, and USA.
Configuration parameters:
All configuration parameters of the Dialog SpeechObject, plus
Parameter |
Type |
Description |
IdentifyExpression |
Identification expression for the Identifiable interface |
|
IdentifyPrompt |
Identification prompt for the Identifiable interface |
|
UseNatural |
boolean |
If true, allows natural numbers within each section, e.g. 'six five oh, eight four seven, eleven fifty five' |
Return results:
All return results of the Dialog SpeechObject, plus
Return value |
Type |
Description |
AreaCode |
String |
The first 3 digits of the 10-digit recognized phone number |
Exchange |
String |
The second set of 3 digits of the 10 digit recognized phone number |
Subscriber |
String |
The last 4 digits of the 10 digit recognized phone number |
PhoneNumber |
String |
Entire phone number as a string |
Description:
This SpeechObject can be configured to prompt for and recognize alphanumeric digit-strings which may consist of sections, such as an account credit card number, social security number, and the such. Natural numbers are optionally allowed within each section.
As with the Sectioned Digit String SpeechObject, each format of the sectioning is specified as a '-' delimited string. Each format should be of the form:
DDD-DD-DDDD or DDD-DD-AADD
and so forth. The first formatting specifies that the digitstring grammar should recognize a section of three digits, two digits, and four digits (e.g., a Social Security Number). The letter D is used for a digit (0-9), and the letter A is used for any alpha (A-Z).
One can also use a user-defined group to recognize a subset of the alphabet and optionally allow for digits in certain positions as well as an alpha. For example, one could define "V" to correspond to "AEIOU" - the vowels. This is useful for when only certain letters are allowed in a position within the digit-string - the automatically generated grammar can reflect this constraint directly.
Configuration parameters:
All configuration parameters of the Dialog and Sectioned Digit String SpeechObjects, plus
Parameter |
Type |
Description |
Group |
Group[] |
Defines the groups |
Group |
(int, Group) |
Defines a specific group |
Return results:
All return results of the Dialog and Sectioned Digit String SpeechObject.
Description:
The Confirm and Correct SpeechObject can be used to have the caller confirm one or more pieces of information together, and correct (by invoking suitable SpeechObjects) any pieces of information that are incorrect. The items that can be so confirmed, identified, and/or corrected must be SpeechObjects implementing the Identifiable interface. This is done in the three phases of Confirmation, Identification, and Correction:
- The first phase is the Confirmation phase. During the Confirmation phase, the inner Confirmation object is invoked to play the confirmation prompt ("...is this correct?"), so that the caller can indicate whether all the information is correct. If the caller answers in the affirmative, this SpeechObject is finished.
- If the caller indicates that information is not completely correct, Confirm and Correct moves on to the Identification phase, invoking the inner Identify object. During this phase, the caller identifies which piece(s) of information need to be corrected. The caller can respond by indicating up to two items that are incorrect -- or the caller can specify that all of the information is wrong; the caller can also answer that none of the items is incorrect, in which case execution returns to the Confirmation phase, and starts over.
- If the caller Identifies at least one incorrect piece of information during the Identification phase, execution moves on to the Correction phase. During the Correction phase, Confirm and Correct obtains the re-do object for the SpeechObject whose Result is wrong. After getting and invoking each re-do object, execution returns to the Confirmation phase, to confirm all of the contained SpeechObject.Results.
At the end of a successful invocation (i.e., after all results have been confirmed), the Confirm and Correct SpeechObject returns a Result that contains all the results of the contained SpeechObjects, with each contained SpeechObject Result stored under the contained SpeechObject's SO key. For example, if the contained SpeechObjects are SODate and SOTime, the Result instance returned by Confirm and Correct will contain an SODate.Result stored under SODate's SO key, and an SOTime.Result stored under SOTime's SOKey.
Configuration parameters:
Parameter |
Type |
Description |
Confirmation |
The object that performs confirmation |
|
GetInitialResultsIfNeeded |
boolean |
If true, will initially invoke all contained SpeechObjects that have not yet obtained results |
Identify |
The object that identifies which information needs to be corrected |
|
MaxRetryCount |
int |
Maximum number of retries attempted by Confirm and Correct |
SpeechObject |
SpeechObject[] |
SpeechObjects to be contained (confirmed/corrected) |
SpeechObject |
(int, SpeechObject) |
Adds/sets a SpeechObject for confirmation/correction |
Return results:
Return value |
Type |
Description |
SOKeysEnum |
Enumeration |
Enumeration of contained SpeechObjects' Result keys |
Description:
The Browsable Selection SpeechObject acts similarly to the Browsable List SpeechObject except that it also supports a "select" command the caller can use to select the current item being browsed.
Configuration parameters:
All configuration parameters of the Browsable List SpeechObject, plus
Parameter |
Type |
Description |
IdentifyExpression |
Identification expression for the Identifiable interface |
|
IdentifyPrompt |
Identification prompt for the Identifiable interface |
|
SelectionExpression |
Application-specific grammar rule to specify that the current item is to be selected |
Return results:
All return results of the Browsable List SpeechObject.
Description:
The Browsable Action List SpeechObject acts similarly to the Browsable List SpeechObject except that the user can add any custom command and an associated handler into the list. When a custom command is spoken, the corresponding handler is fired to handle it.
Note that at this time there is no way to specify these custom commands and handlers using JavaBean properties.
Configuration parameters:
All configuration parameters of the Browsable List SpeechObject.
Return results:
All return results of the Browsable List SpeechObject.
Description:
This SpeechObject will collect either a 5- or 9-digit US ZIP code. The filter used to validate 5-digit codes is based on a list of currently existing codes issued by the U.S. Postal Service. The 4-digit extension, if spoken, is not validated. It is possible to disable the filter if you want to accept any 5-digit code.
Configuration parameters:
All configuration parameters of the Dialog SpeechObject, plus
Parameter |
Type |
Description |
FilterDisabled |
boolean |
True prevents the recognized result from being validated |
FiveDigitsOnly |
boolean |
True restricts recognition to 5 digits |
IdentifyExpression |
Identification expression for the Identifiable interface |
|
IdentifyPrompt |
Identification prompt for the Identifiable interface |
Return results:
All return results of the Dialog SpeechObject, plus
Return value |
Type |
Description |
Extension |
String |
Either a string representation of the last 4 digits (if a 9-digit zip code was recognized) or null (if a 5-digit zip code). |
ZipCode |
String |
String representation of the 5 digit zip code (or first 5 digits, if a 9-digit zip code was recognized) |
Description:
This speech object encapsulates the functionality of acquiring information on credit card type, credit card number, and credit card expiration date.
Configuration parameters:
All configuration parameters of the Dialog SpeechObject, plus
Parameter |
Type |
Description |
AcceptExpiredCard |
boolean |
If true, credit cards that are expired before today are accepted |
AllTypesEnabled |
boolean |
If true, all eight built-in credit card types are acceptable |
CardTypeEnabled |
(int, boolean) |
Sets whether a specific card type is accepted or not |
CreditCardExpirationDateSpeechObject |
SpeechObject |
Internal expiration date speech object |
CreditCardInfoCANDCSpeechObject |
SpeechObject |
Internal confirm and correct speech object |
CreditCardNumberSpeechObject |
SpeechObject |
Internal card number speech object |
CreditCardTypeSpeechObject |
SpeechObject |
Internal card type speech object |
InitialState |
String |
Initial state for the call-flow |
PreamblePrompt |
Prompt that is played at the beginning of the dialog |
|
TypeQueryExplicit |
boolean |
If true, the credit card type is queried explicitly |
Return results:
All return results of the Dialog SpeechObject, plus
Return value |
Type |
Description |
CreditCardExpirationMonth |
int |
Credit card expiration month. |
CreditCardExpirationYear |
int |
Credit card expiration year |
CreditCardNumber |
String |
Credit card number as digit string |
CreditCardType |
String |
Credit card type as string |
ResultStatus |
int |
Result status |
isResultOk |
boolean |
Whether the result is Ok or not |
Description:
This SpeechObject can be configured to recognize a string of digits broken up into various sections. Alternate sectionings can be provided, and the use of natural numbers in the grammar can be enabled. The maximum length of a section is six digits.
Each format of the sectioning is specified as a '-' delimited string; the number of digits in each section of a given format is specified by a sequence of 'D' characters. For example,
"DDD-DDD-DDDD"
specifies a sectioning of three digits, three digits, and four digits.
"DDD-DD-DDDDD"
specifies a sectioning of three digits, two digits, and five digits.
Developers can also set the delimiter that may be spoken by callers when reading the sectioned digitstring. By default, this is a "dash", but this can be changed to any single word or valid GSL expression, such as "dot" or "[dash dot]" or null, etc.
Natural numbers can also be enabled through a simple property setting.
This SpeechObject does not itself perform any confirmation or validity checking. If there are specific constraints on what constitutes a valid digit string for your application, using the result filter mechanism to filter out inconsistent hypotheses is highly recommended.
The configuration of the digit string -- that is, the number of sections and the length of each section -- determines the construction of the grammar used for recognition. If the speaker says a digit string that does not match one of the defined patterns, the recognizer rejects the utterance.
Configuration parameters:
All configuration parameters of the Dialog SpeechObject, plus
Parameter |
Type |
Description |
DelimiterExpression |
String |
The (optional) delimeter expression used in the grammar between sections of the alphadigit string (dash, dot, etc.) |
DelimiterPrompt |
Audio played between sections of the recognized digit string (e.g. 'dash.wav') |
|
Format |
String[] |
Defines formats of all sections of the string |
Format |
(int, String) |
Defines the format for the given section of the string (e.g. 'DDD-DD-DDDD') |
Format |
Defines formats of all sections of the string |
|
Format |
(int, WeightedFormat) |
Defines the format for the given section of the string |
IdentifyExpression |
Identification expression for the Identifiable interface |
|
IdentifyPrompt |
Identification prompt for the Identifiable interface |
|
UseNatural |
boolean |
If true, allows natural numbers within each numeric section, e.g. 'three six two, fifty seven, eleven hundred' |
Return results:
All return results of the Dialog SpeechObject, plus
Return value |
Type |
Description |
DigitString |
String |
The recognized string without any section delimeters |
Section |
String[] |
The sections of the recognized string |
SectionedDigitString |
String |
The recognized string with '-' between sections, eg. "52-764". |
Description:
The Browsable List SpeechObject allows the caller to hear items in a list in sequence, and navigate through the list. The object provides methods through which the developer may dynamically define the list. The list of items to be browsed is encapsulated within a Browsable object.
When invoked, the list plays prompts associated with items, one after another. Depending on configuration, the list advances automatically or through a "next" command to the next item. Based on configuration, the list may terminate automatically at the end, or as the result of an "exit list" command.
The general dialog flow is as follows:
The dialog begins with a preamble if enabled (recognition state which accepts relevant navigational commands), which automatically advances to the first item. Some commands are invalid in the preamble. (For example, "previous" or application-specific commands like delete). The invalid commands are handled as errors. For each item, the item prompt is played, with optional pre-pending and post-pending prompts. Navigation and application-specific commands are active. If enabled, a timeout automatically advances list to the next item. If enabled, a timeout automatically exits the list after the last item.
The default navigation commands are: next, previous, last, first, and exit.
Configuration parameters:
Parameter |
Type |
Description |
AutoAdvance |
boolean |
If set to true, forces the list to advance automatically to the next item if there is no response from the user |
Browsable |
Object providing access to items to be browsed |
|
ExitPrompt |
Prompt played when the list exits |
|
FirstPrePrompt |
The pre-prompt played before the first item |
|
Grammar |
The grammar used for both preamble and list item recognitions |
|
LastPrePrompt |
The pre-prompt played before the last item |
|
ListNoSpeechTimeoutPrompt |
The prompt played if there is a "no-speech" timeout when the user says a command after the list item |
|
ListRecognizerTooSlowTimeoutPrompt |
The prompt played if there is a "recognizer-too-slow" timeout when the user says a command after the list item |
|
ListRejectedPrompt |
The prompt played if there is a rejection when the user says a command after the list item |
|
ListSpeechTooEarlyPrompt |
The prompt played if there is a "speech-too-early" condition when the user says a command after the list item |
|
ListTooMuchSpeechTimeoutPrompt |
The prompt played if there is a "too-much-speech" timeout when the user says a command after the list item |
|
ListUnexpectedKeyPrompt |
The prompt played if the user presses a dtmf key instead of speaking a command after the list item |
|
MultiItemListErrorPrompt |
The recognition error prompt when the list has more than one item |
|
MultiItemListHelpPrompt |
The help prompt when the list has more than one item |
|
NextPrePrompt |
The pre-prompt played when the user says "next", or when the list auto-advances to the next item |
|
OnlyItemListErrorPrompt |
The recognition error prompt when the list has only one item |
|
OnlyItemListHelpPrompt |
The help prompt when the list has only one item |
|
OnlyItemPrePrompt |
The pre-prompt played when there is only one item |
|
PreambleHelpPrompt |
The prompt played if the user asks for help during the preamble |
|
PreamblePrompt |
The prompt that is played only once when the user first enters the list |
|
PreambleRecognitionErrorPrompt |
The prompt played if there is a recognition error in the preamble |
|
PreviousPrePrompt |
The pre-prompt played when the user says "previous" |
|
ReturnAtEnd |
boolean |
If true, list will automatically exit when it reaches the end |
Return results:
Return value |
Type |
Description |
exitedFromList |
boolean |
True if the user exited during the list portion |
exitedFromPreamble |
boolean |
True if the user exited during the preamble |
Index |
int |
The index of the item on which the list exited |
toString |
String |
The index of the item on which the list exited, in string form. If the list exited in the preamble, returns the string "PREAMBLE" |
This appendix describes the types of the configuration parameters and return values.
(1) e.g. one could set a parameter to have a linked list as a value but not to add an element to the end of the list - setting to a value is allowed, but executing a function on the value is not. Of course, the calling application is free to check the value, compute a new value using this value, and set the parameter to the new value.
(2) a set of Playables, one of which is selected at random each time the random prompt is to be played.
(3) an ordered set of Playables 1 ... n such that 1 is played the first time the escalating prompt is to be played, 2 the second, and so on.