Java front-end for web-based multimodal human-computer interaction

Xing Jing, Jie Yang, Minh Tue Vo, Alex Waibel
1997
Gesture. Our approach to pen-based gesture recognition is to decompose pen strokes into sequences of basic shapes such as line, arc, arrow, circle, cross... [5] The same gesture shape may mean different things depending on the surrounding context, hence each gesture component is augmented by gesture contexts indicating spatial relationships between the gesture and nearby objects in the user interface. Handwriting. Our MS-TDNN-based handwriting recognizer [6] is capable of processing
more » ... endent, continuous (cursive) handwriting at a recognition rate of 94% on a 20,000-word vocabulary and working on the run-on mode. We employ simple heuristics to decide when to invoke handwriting recognition on pen input, e.g., when the gesture recognizer cannot identify the input strokes as basic shapes. Joint Interpretation In order to make sense of input from all available sources, we need a multimodal interpreter capable of producing an interpretation of user intent (e.g., a command to execute in the application interface) from the output of the modality processors. In our joint interpretation scheme, the user intent is represented by a frame consisting of slots specifying pieces of information such as the action to carry out or the parameters for that action. Recognition output from the modality processors are parsed into partially filled frames that are merged together to produce the combined interpretation as described in [5]. This technique leads to uniform handling of highlevel information from all input sources, which is very important for modularity and extensibility. To add another input modality we need only provide a module to convert low-level recognizer output to a partially filled frame to be merged with others. In addition, context information can be retained across input events by merging with previous interpretation frames.
doi:10.5445/ir/229197 fatcat:jborvdyzcnfklhhvx44oiy2rhm