ECMAScript Action Tags for JSGF

Proposal for discussion
2 September 1999

Bruce Lucas (IBM)
Will Walker (Sun Microsystems)
Andrew Hunt (Sun Microsystems)

Abstract

This document describes a proposed mechanism that allows grammars written in the Java^TM Speech API Grammar Format (JSGF) to use the JSGF tagging mechanism together with ECMAScript (standardized version of Netscape's JavaScript) to specify a transformation from an utterance to information that is meaningful to the application. The information is returned in the form of ECMAScript values, such as strings and sets of attribute-value pairs (ECMAScript objects).

By embedding semantic interpretation into the syntactic definition of grammars, this proposal is intended to address the following technical challenges in developing and using speech recognition applications.

Simplify the paraphrase problem: When a grammar allows many forms of an utterance to have the same meaning to the application, the action tags provided to the application should be the same.
Internationalization: Whenever possible, action tags can be language-neutral so that the application is less sensitive to the spoken language and so that new grammars can be developed for new languages with minimal changes to the application.
Enhance documentation and maintainability: Because syntax and semantics are jointly defined, modifications can be made simultaneously and documentation can be co-located.

Introduction

"Rule-based" or "phrase-structure" grammars in general, and JSGF in particular, by themselves only allow an application developer to specify the legal utterances - sequences of words - that the user may say. However, typically the sequence of words in is not by itself very useful to an application. Consider the following examples:

Utterance	Application needs
five thousand three hundred and six	5306
December 24th, 1998 the day before Christmas, last year>	"1998/12/24" or {year:1998, month:12, day:24}
I want to fly from Boston to Chicago. Hikoki-de, Boston-kara, Chicago-made ikitai	{action:"fly", from:"Boston", to:"Chicago"}

Note that this table illustrates two kinds of values that are useful to applications: simple values such as numbers or strings, and sets of attribute-value pairs. Simple values are useful for example in grammars for basic types such as numbers, dates, and times. Simple values are also useful in simple command & control applications and in directed-dialog applications in which the user is asked a question and is then expected to supply a single piece of information. Sets of attribute-value pairs are useful in more complex command & control applications and in more sophisticated dialog applications, in which any utterance my simultaneously provide several pieces of information to the application.

The remainder of this document discusses a proposed method for embedding ECMAScript in JSGF tags that transforms utterances into information meaningful to an application. ECMAScript is a relatively powerful and flexible object-oriented programming language. It provides, for example, means to construct arrays, fill multiple optional slots, construct objects within objects within objects, perform trivial and complex numerical operations, manipulate dates/strings and other standard object types and much more. The ECMAScript specification is also thorough, which helps to eliminate different behavior between implementations. Finally, because ECMAScript is becoming more commonly used in web page development, the learning curve for developing JSGF grammars with ECMAScript Action Tags can be greatly reduced.

Parse trees

The ECMAScript action tag mechanism will be described with reference to the parse tree corresponding to an utterance. A grammar together with an utterance define a parse tree (assuming no ambiguity). A parse tree can be viewed as a reduced version of the grammar which preserves only the non-terminals, terminals, tags, and sequences from the original grammar that correspond to the content of the utterance, and in which a separate copy of each non-terminal has been made for each use of the non-terminal. For the purposes of this document, parse trees will be represented in outline form. For example, consider the following grammar:

    <city> = New York {this.$value="NYC"} | Boston {this.$value="BOS"};

    public <top> = I want to (fly {this.action="fly"} | drive {this.action="drive"})
                   from <city> {this.from=$}
                   to   <city> {this.to=$city};

The utterance "I want to fly from New York to Boston", when parsed against this grammar, produces the following parse tree:

    <top>
        I
        want
        to
        fly
        {this.action="fly"}
        from
        <city>
            New
            York
            {this.$value="NYC"}
        {this.from=$}
        to
        <city>
            Boston
            {this.$value="BOS"}
        {this.to=$city}

As we will see below, when evaluated this parse tree will produce the ECMAScript value {action:"fly", from:"NYC", to:"BOS"} for the application to use.

All JSGF parse-tree structure in our parse trees, except non-terminal references, will be flattened. In particular, parenthesized expressions, optional items, repeated items, and tagged items will be flattened to a single level in the tree.

Parse tree evaluation

The purpose of parse tree evaluation is to recursively compute a value for each non-terminal in the tree. The value for each parent non-terminal is computed by the action tags contained in the parent non-terminal, possibly using the values computed for its child non-terminals in the tree.

Thus, for a well-written grammar we should define action tags so that each non-terminal will return a value that is a computer-understandable transformation of the spoken tokens that match the non-terminal.

For each non-terminal in a tree the action tag mechanism allocates a new object that represents the value of the non-terminal. The purpose of action tags is to construct the non-terminal value object in which they are directly contained by assigning values to fields of the object. The set of action tags for a non-terminal taken together act something like the body of a constructor for the object associated with the non-terminal:

References to fields of the non-terminal value object are qualified using this.
Variables local to the scope of the action tags for a given non-terminal must be declared using var.
References to variables not declared in the local scope of the action tags for a given non-terminal are interpreted in the nearest enclosing scope that declares the variable (specifically in the scope of a parent non-terminal), or in the global scope if no enclosing scope declares the variable.

The object constructed by the action tags for a child non-terminal may then be used in the enclosing parent non-terminal to construct its value object by referring to the child non-terminal in one of two ways:

The variable $name refers to the value object of the most recent preceding instance of the non-terminal <name> in the current scope, as in this.to=$city in the example above.
The variable $ refers to the value object of the most recent preceding non-terminal in the current scope, as in this.from=$ in the example above.

While the value of each non-terminal is an object, it is also useful in some cases for a non-terminal to be treated as a simple value such as a number or string. The standard ECMAScript toString and valueOf object methods allow this to be accomplished. To provide a simple value for a non-terminal its action tags may assign to a special field, this.$value, in the non-terminal's value object. The action tag mechanism supplies for each non-terminal's value object a toString and toValue method that return this.$value. The ECMAScript interpreter automatically calls the toString and valueOf methods when a reference is made to the non-terminal value in a context where a simple value such as a number or string is required.

The default value for this.$value (and therefore for the non-terminal value object when used in a context where a simple value is needed) is a string which is the concatenation of the string values of all the items in the non-terminal, separated by spaces.

In addition, the action tag mechanism computes for each non-terminal object a special field, this.$tokens, that contains an array of strings containing the words (terminals) used by the non-terminal and any non-terminals that it directly or indirectly references. In summary each object has the following special fields:

The field $value may be assigned to by the action tags to supply a value that will be used when the non-terminal object is referrred to in a context where a simple (non-object) value is required.
The field $tokens is an array containing the words for that non-terminal.

Examples

Hello World

The following grammar:

    <hi> = yo {this.$value="hello"} | hello;
    <who> = world | fred;
    public <helloworld> = <hi> <who> {this.greeting=$hi; this.recipient=$who};

when used to parse the utterance

    yo world

produces the following parse tree:

    <helloworld>
        <hi>
            yo
            {this.$value="hello"}
        <who>
            world
        {this.greeting=$hi; this.recipient=$who}

which when evaluated produces the ECMAScript value

    {greeting:"hello", recipient:"world"}

This illustrates three points concerning action tags:

The default value for the <who> non-terminal is in this case just the word in the utterance that matches the non-terminal. (More generally it is the ordered concatenation of the values of all the items immediately below the non-terminal in the parse tree, separated by spaces.)
The <hi> non-terminal uses an action tag to override the default value to return "hello" if the utterance contains "yo".
The final action tag retrieves the value of the <hi> non-terminal ($hi) and assigns it to the "greeting" field of the result (this.greeting), and retrieves the value of the <who> non-terminal ($who) and assigns it to the "recipient" field of the result (this.recipient).

Numbers

The following simple number grammar accepts spoken number phrases less than one million and returns a string containing the number in numeric form. (A portion of the <10to99> rule has been omitted from the version shown here for the sake of brevity.)

    <1to9> = 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ;

    <0to9> = oh {this.$value="0"} | 0 | <1to9>;

    <10to99> = 10 | 11 | 12 | ... | 99 ;

    <1to99> = <1to9> | <10to99>;

    <00to99> = [oh] <0to9> {this.$value="0"+$0to9} | <10to99>;

    <1to999>
        = <1to9> [hundred [and]] <00to99> {this.$value = $1to9 + $00to99}
        | <1to9> hundred                  {this.$value = $1to9 + "00"}
        | <1to99>                         {this.$value = $1to99}
        ;

    <000to999>
        = <0to9> [hundred [and]] <00to99> {this.$value = $0to9 + $00to99}
        | <0to9> hundred                  {this.$value = $0to9 + "00"}
        | <00to99>                        {this.$value = "0" + $00to99}
        ;

    <1to999999>
        = <1to999> thousand <000to999>    {this.$value = $1to999 + $000to999}
        | <1to999> thousand and <00to99>  {this.$value = $1to999 + "0" + $00to99}
        | <1to999> thousand               {this.$value = $1to999 + "000"}
        | <1to99> hundred [and] <00to99>  {this.$value = $1to99 + $00to99}
        | <1to99> <10to99>                {this.$value = $1to99 + $10to99}
        | <1to99> hundred                 {this.$value = $1to99 + "00"}
        | <1to99>                         {this.$value = $1to99}
        ;

    public <number> = oh {this.$value="0"} | 0 | <1to999999>;

This grammar illustrates:

Reference to the value of a non-terminal <name> using $name.
Construction of a string value using the ECMAScript + string concatenation operator.
Supplying a string value for a non-terminal by assigning a string to this.$value.

Calendar

The following grammar illustrates the use of action tags for a simple mixed-initiative form-filling dialog system for making appointments.

    <ondate> = [on] <test.date> {date=$};
    <attime> = [at] <test.time> {time=$};
    <gorp>   = [(I'd|I) (like|want) to] (make|schedule) (an appointment|a meeting);

    <appt> = <gorp> [<ondate> [<attime>]]
           | <gorp> <attime> [<ondate>]
           | <ondate> [<gorp> [<attime>]]
           | <ondate> <attime> [<gorp>]
           | <attime> [<gorp> [<ondate>]]
           | <attime> <ondate> [<gorp>]
           ;

    public <appointment> =
        <NULL> {var date, time} <appt> {this.date=date; this.time=time};

This grammar allows the user to take the initiative (by making a complete or partial request) or to respond when the computer takes the initiative (by prompting the user for missing information) as illustrated by the following table:

Utterance	Returned value
I'd like to make an appointment on January third at two o'clock	{date:"1/3", time:"2:00"}
schedule an appointment on the fourth of February	{date:"2/4"}
at five thirty	{time:"5:30"}

This grammar illustrates:

Declaring the variables date and time in the <appointment> scope. (The declarations are in a tag attached to <NULL> because this tag must be executed before tags in <appt> are executed.)
Using $ to refer to the value of the item immediately to the left of the tag.
Using the supplied values to fill in the this.date and this.time attributes in the value returned by <appointment>.

Airline reservation

The following grammar (the airline reservation grammar that was presented above):

    <city> = New York {this.$value="NYC"} | Boston {this.$value="BOS"};

    public <top> = I want to (fly {this.action="fly"} | drive {this.action="drive"})
                   from <city> {this.from=$}
                   to   <city> {this.to=$city};

illustrates a few points concerning action tags:

Action tags contained within parenthesized expressions.
Assignment to this.$value to return a simple string value for a rule.
Reference to a preceding non-terminal using $ and using $name.

Pizza toppings

The following grammar for ordering pizza:

     <topping> = mushrooms | pepperoni | onions | anchovies;
     <toppings> = <NULL>           {this.toppings = new Array()}
                  <topping>        {this.toppings=this.toppings.concat($topping)}
                  ([and] <topping> {this.toppings=this.toppings.concat($topping)})*;

     public <ask> = I would like a pizza with <toppings>
                    {this.item="pizza"; this.toppings=$toppings.toppings};