Natural Language Processing Requirements
for Voice Markup Languages

W3C Working Draft 23 December 1999

This version:: http://www.w3.org/TR/1999/WD-voice-nlu-reqs-19991223
Latest version:: http://www.w3.org/TR/voice-nlu-reqs
Editor:: Deborah Dahl

Abstract

The W3C Voice Browser working group aims to develop specifications to enable access to the Web using spoken interaction. This document is part of a set of requirements studies for voice browsers, and provides details of the requirements for natural language processing.

Status of this document

This document describes the requirements for natural language processing for voice browsers, as a precursor to starting work on specifications. Related requirement drafts are linked from the introduction. The requirements are being released as working drafts but are not intended to become proposed recommendations.

This specification is a Working Draft of the Voice Browser working group for review by W3C members and other interested parties. This is the first public version of this document. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress".

Publication as a Working Draft does not imply endorsement by the W3C membership, nor of members of the Voice Browser working groups. This is still a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite W3C Working Drafts as other than "work in progress."

This document has been produced as part of the W3C Voice Browser Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Voice Browser Working Group. This document is for public review. Comments should be sent to the public mailing list <www-voice@w3.org> (archive) by 14th January 2000.

A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR.

Introduction

The main goal of this subgroup is to establish a prioritized list of requirements for natural language processing in a voice browser environment.

The process will consist of the following steps:

Collect requirements on natural language processing.
Prioritize these requirements.
Distribute requirements to, and take feedback from, relevant groups working on natural language in spoken dialog systems.
Define specifications for natural language processing components, based on the feedback received.

0.1 Scope

This document specifies requirements that define the capabilities of any component of a voice browser system which performs natural language interpretation, that is, the task of determining and representing the content of a natural language input from a user. Interpretation components include both stand-alone natural language understanding (NLU) components which receive text string results from a speech recognizer or keyboard as well as speech recognizers that incorporate natural language understanding functionality by returning interpretations rather than, or in addition to, text strings.

0.2 Interaction with Other Groups

The activities of the Natural Language Requirements Subgroup will be coordinated with the activities of the Grammar Representation Subgroup, the Synthesis Markup Subgroup, and the Dialog Subgroup.

General Requirements

The NLU system should be able to:

Return a message stating that it cannot interpret an input at all. (must specify)
Return multiple pieces of information from multi-functional utterances (must specify)
Return partial information if it is unable to completely process an input. (must specify)
If partial information is returned, indicate how much of the input was left unanalyzed. (nice to specify)
Be extensible in the sense that it should be possible to add new types of utterances to the NLU specification. Specifically, the system should be able to incorporate modular subdialogs. (must specify)
Return a score reflecting its confidence in the overall interpretation. The exact format of confidence scores remains to be determined. It could be a rough scale or it could be probabilities, for example. (should specify)
Return a score for each attribute, reflecting its confidence in the interpretation of that attribute. (should specify)
Return multiple analyses (n-best) (nice to specify)

Input Requirements

A standalone (i.e., not integrated with a speech recognizer) NLU system should be able to:

Accept N-best ASR output with or without acoustic scores for either the whole utterance, each word in an utterance or both. (must specify)
Accept ASR output with out of vocabulary markers.(should specify)
Accept an ASCII text representation of an utterance (should specify)
Accept a word lattice as input.(must specify)
Accept prosodic notations on the ASR output. (nice to specify)

Any NLU system should be able to:

Dynamically switch to a different task model (must specify)
Dynamically modify the task model; e.g. add or remove possible choices from a slot (must specify)
Accept sequential multi-modal input (must specify)
Accept uncoordinated simultaneous multi-modal input (should specify)
Accept coordinated simultaneous multi-modal input. For example, the NLU system should be able to represent or interpret a representation of the context so that anaphoric expressions in the user's utterances which refer to items in the context can be interpreted. The context can include the speech context, including the system's utterances, as well as the external context (e.g. I'll take one of these (click)). (nice to specify)
Return ASR timing information to the dialog controller (nice to specify).
Accept multi-modal time-stamped information (should specify)

Task-specific information

These requirements are intended to insure that the natural language component is capable of representing results of processing task-specific utterances.

An NLU system should be able to:

Represent task information:

Represent values for slots in a task model: I want five lines. (must specify)
Support hierarchical attributes in task model: e.g. a slot can itself be a frame. (must specify)
Represent interpretations of sentences with anaphora and ellipsis. I want two hamburgers, one with ketchup and one without. (must specify)
Represent deictic utterances, which require reference to the non-linguistic context for their interpretation. I want this. (must specify).

Represent meta-task information (all nice to specify)

Represent a request for a definition: What does 'access code' mean?
Represent a request for the status of a filled slot: How many lines did I ask for?
Represent information about a slot: How many lines am I allowed to ask for?
Represent a request for the possible fillers of a slot: What are my choices? Can I schedule a call on Sunday?
Represent a request for the status of all slots. What have I ordered so far?
Represent questions about possible, desirable, necessary and conditional situations. Can you pay my electric bill? Should I order the chicken? Do I have to get a drink with the special? If I stay over Saturday night will I get a lower fare?
Represent requests for explanation of a system response. Why?
Represent request for the amount of task remaining. What else do you need to know? Am I almost finished?

Generic Information about the Communication Process

An NLU system should be able to represent meta-dialog information having to do with the communication process.(all nice to specify except as noted)

Represent utterances about the dialog: I want to revisit my previous answer. That's what I just said.
Represent a request for help.(must specify)
Represent a request to have the last prompt repeated
Represent requests to make the output louder, quieter, faster, slower, change languages.
Represent requests to pause the dialog.
Represent utterances indicating that the user:
- Didn't hear the prompt (must have)
- Didn't understand the prompt (must have)
- Refuses to supply the answer to a prompt.
- Doesn't know the answer to the prompt

Dialog Control

The NLU system should be able to represent: (all must specify except as noted)

A confirmation
A request to change the value of a slot due to a speech recognition error, a user error, or the user changing his/her mind. I meant p.m.
A request to empty a slot that's already been filled. (forget X)
A request to stop or end the dialog
A request to start over.
A request to transfer to an operator.
A request to suspend and resume a dialog
An explicit request to switch to another task.
An implicit request to switch to another task. (nice to specify).

Appendix: Sample Dialog with examples of a possible approach to NLU representation.

This is an example of a banking application with the user's utterances annotated with an example of a possible NLU representation, based on the following task model.

Task Model/Frame:

      Identification:
            Name:
            Address:
                Street:
            City:
            Zip Code:
                Phone:

      Action:
            Transfer:
            Source_account:
            Destination_account:
            Amount:
        Value:
                Currency:
      Balance:
            Account:

Dialog	Dialog Comments	NLU output
1. System: "Welcome to Ajax Bank. Please tell me your name
2. User: I'm Jack Jones		Name: Jack Jones
3. System: I know three people with the name Jack Jones, will you tell my your address, please?
4. User: I live at 18773 Union Street in Carlton		Address: Street: 18773 Union St City: Carlton
5. System: Then your phone number is 773-8427?
6. User: Yep, that's me.		Yes
7. System: What would you like me to do?
8. User: Transfer $200 from my savings account to my checking account.		Action: transfer Source_account: savings Destination_account: checking Amount: Value: 200 Currency: USD
9. System: You have two savings accounts, your individual savings account and a joint savings account with June Jones. From which savings account should I transfer the $200 from?
10. User: From my individual account, not the joint account	User clarifies by answering the question twice (once identifying the correct account, and again by identifying the incorrect account)	Correction: Source_account: individual
11. System: I'm sorry, I can't do that. You have only $150 in your savings account.	System enforces a business rule that says a user can't transfer more money that what's in his/her account.
12. User: OK, just transfer the $150.	User modifies the request by editing the amount to be transferred.	Correction: Amount: Value: 150 Currency: USD
13. System: OK, I have transferred $150 from your individual saving account to your checking account. Is there anything else I can do for you?	System confirms that the transaction is completed. Invite the user to begin another transaction
14. User: Yes, tell me how much money is in our joint saving account	User asks a new question	balance: joint account
15. System: You have $600 in your joint savings account with June Jones. What else can I do for you?	System correctly identifies the source because of the phrase "our joint" savings account. The system prompt message changes so that the user does not get tired of hearing the same prompt message.
16. User: Yes, transfer $50 from that account.		Action: transfer Source_account: savings Amount: Value: 50 Currency: USD
17. System: Do you want me to transfer $50 from your joint account with June Jones to your checking account?
18. No, to my individual savings account.		Destination_account: individual savings
19. System: Do you want me to transfer $50 from your joint savings account with June Jones to your individual savings account?	System assumes that the source is still joint savings account and that the amount is still $50
20: User: Yes		Yes
21.System: I've transferred $50 from your joint savings account with June Jones to your individual savings account.	System confirms transaction is completed.
22.System: Is there anything else I can do for you?	System prompts the user for a new transaction
23. User: Yes, can you pay my electric bill?	User isn't actually asking to have the bill paid, just whether the system can do it.	Capability_query: Action: transfer Destination_account: electric bill
24. System: Yes, I can transfer money to General Utilities
25: User: OK, pay it		Action: transfer Destination_account: electric bill
26: System, How much money should I transfer to General Utilities?	System solicits a value for the amount parameter from the user
27: User: transfer $80 from my joint checking account	User responds with the value for the requested parameter, and volunteers a parameter for another parameter, the source.	Operation: transfer Source_account: joint checking Amount: Value: 80 Currency: USD
28: System: You want to transfer $80 from your joint checking account to General Utilities?	System asks user to confirm entire transaction
29: User: Make it so	The user watches Captain Picard on Star Trek and copies his command phrasing.	Yes
30: System: Do you want me to do anything else?	System prompts user for another transaction
31: User: Yes, transfer $800 from the IRS to my savings account	User tries to do the impossible.	Operation: transfer Source_account: IRS Destination_account: joint checking account
32: System: I am not allowed to transfer money from the IRS account	system enforces business rules about source accounts.
33: User: What can I do to the IRS?	While the user may be joking, the system treats this as a help request	Capability_query: action actor: user theme: IRS If the request is not representable within the task model, it may be useful for the system to fall back to a less fully analyzed representation of the utterance, rather than simply to fail to understand it.
34: System: you may pay your taxes by transferring money to the IRS	System explains what operations the user can perform with the IRS account
35: User: I don't want to do that.	User cancels a transaction in progress	Cancel
36. System: OK. What else can I do for you?
37: User: You're cute, can I call you?	User tries to make fun of the system	did not understand
38: System: I'm sorry, I didn't understand your request. Can you please rephrase it?
29: No. Goodbye		Goodbye
30: Thanks for using Ajax bank, Good bye.	User is logged off.

Acknowledgments

Subgroup Members

Mike Brown, Lucent
Carolina Di Cristo, Telecom Italia
Deborah Dahl, Unisys
Linda Dorrian, Productivity Works
Robert Keiller, Canon
Bill Ledingham, SpeechWorks
Stephen Potter, Entropic
Dave Raggett, HP and W3C
Ramesh Sarukkai, Lernout and Hauspie
Volker Steinbliss, Philips

Natural Language Processing Requirements for Voice Markup Languages