Multimodal interaction

Multimodal interaction

Multimodal interaction provides the user with multiple modes of interfacing with a system. A multimodal interface provides several distinct tools for input and output of data.

Contents

Multimodal input

Two major groups of multimodal interfaces have merged, one concerned in alternate input methods and the other in combined input/output. The first group of interfaces combined various user input modes beyond the traditional keyboard and mouse input/output, such as speech, pen, touch, manual gestures, gaze and head and body movements. The most common such interface combines a visual modality (e.g. a display, keyboard, and mouse) with a voice modality (speech recognition for input, speech synthesis and recorded audio for output). However other modalities, such as pen-based input or haptic input/output may be used. Multimodal user interfaces are a research area in human-computer interaction (HCI).

The advantage of multiple input modalities is increased usability: the weaknesses of one modality are offset by the strengths of another. On a mobile device with a small visual interface and keypad, a word may be quite difficult to type but very easy to say (e.g. Poughkeepsie). Consider how you would access and search through digital media catalogs from these same devices or set top boxes. And in one real-world example, patient information in an operating room environment is accessed verbally by members of the surgical team to maintain an antiseptic environment, and presented in near realtime aurally and visually to maximize comprehension.

Multimodal input user interfaces have implications for accessibility.[1] A well-designed multimodal application can be used by people with a wide variety of impairments. Visually impaired users rely on the voice modality with some keypad input. Hearing-impaired users rely on the visual modality with some speech input. Other users will be "situationally impaired" (e.g. wearing gloves in a very noisy environment, driving, or needing to enter a credit card number in a public place) and will simply use the appropriate modalities as desired. On the other hand, a multimodal application that requires users to be able to operate all modalities is very poorly designed.

The most common form of input multimodality in the market makes use of the XHTML+Voice (aka X+V) Web markup language, an open specification developed by IBM, Motorola, and Opera Software. X+V is currently under consideration by the W3C and combines several W3C Recommendations including XHTML for visual markup, VoiceXML for voice markup, and XML Events, a standard for integrating XML languages. Multimodal browsers supporting X+V include IBM WebSphere Everyplace Multimodal Environment, Opera for Embedded Linux and Windows, and ACCESS Systems NetFront for Windows Mobile. To develop multimodal applications, software developers may use a software development kit, such as IBM WebSphere Multimodal Toolkit, based on the open source Eclipse framework, which includes an X+V debugger, editor, and simulator.

Multimodal input and output

The second group of multimodal systems presents users with multimedia displays and multimodal output, primarily in the form of visual and auditory cues. Interface designers have also started to make use of other modalities, such as touch and olfaction. Proposed benefits of multimodal output system include synergy and redundancy. The information that is presented via several modalities is merged and refers to various aspects of the same process. The use of several modalities for processing exactly the same information provides an increased bandwidth of information transfer .[2][3][4] Currently, multimodal output is used mainly for improving the mapping between communication medium and content and to support attention management in data-rich environment where operators face considerable visual attention demands.[5]

An important step in multimodal interface design is the creation of natural mappings between modalities and the information and tasks. The auditory channel differs from vision in several aspects. It is omnidirection, transient and is always reserved.[5] Speech output, one form of auditory information, received considerable attention. Several guidelines have been developed for the use of speech. Michaelis and Wiggins (1982) suggested that speech output should be used for simple short messages that will not be referred to later. It was also recommended that speech should be generated in time and require an immediate response.

The sense of touch was first utilized as a medium for communication in the late 1950s.[6] It is not only a promising but also a unique communication channel. In contrast to vision and hearing, the two traditional senses employed in HCI, the sense of touch is proximal: it senses objects that are in contact with the body, and it is bidirectonal in that it supports both perception and acting on the environment.

Examples of auditory feedback include auditory icons in computer operating systems indicating users’ actions (e.g. deleting a file, open a folder, error), speech output for presenting navigational guidance in vehicles, and speech output for warning pilots on modern airplane cockpits. Examples of tactile signals include vibrations of the turn-signal lever to warn drivers of a car in their blind spot, the vibration of auto seat as a warning to drivers, and the stick shaker on modern aircraft alerting pilots to an impending stall.[5]

Invisible interface spaces became available using sensor technology. Infrared, ultrasound and cameras are all now commonly used.[7] Transparency of interfacing with content is enhanced providing an immediate and direct link via meaningful mapping is in place, thus the user has direct and immediate feedback to input and content response becomes interface affordance (Gibson 1979).

See also

References

  1. ^ Vitense, H.S.; Jacko, J.A.; Emery, V.K. (2002). "Multimodal feedback: establishing a performance baseline for improved access by individuals with visual impairments". ACM Conf. on Assistive Technologies. 
  2. ^ Oviatt, S. (2002), "Multimodal interfaces", in Jacko, J.; Sears, A, The Human-Computer Interaction Handbook, Lawrence Erlbaum 
  3. ^ Bauckhage, C.; Fritsch, J.; Rohlfing, K.J.; Wachsmuth, S.; Sagerer, G. (2002). "Evaluating integrated speech-and image understanding". Int. Conf. on Multimodal Interfaces. http://dx.doi.org/10.1109/ICMI.2002.1166961. 
  4. ^ Ismail, N.A.; O'Brien, E.A. (2008). "Enabling Multimodal Interaction in Web-Based Personal Digital Photo Browsing". Int. Conf. on Computer and Communication Engineering. http://eprints.utm.my/5732/1/ICCCE2008_preprint_version_UTM_IR.pdf. 
  5. ^ a b c Sarter, N.B. (2006). "Multimodal information presentation: Design guidance and research challenges". Int. J. of Industrial Ergonomics 36 (5): pp. 439-445. http://www.sciencedirect.com/science/article/pii/S0169814106000217. 
  6. ^ Geldar, F.A (1957). "Adventures in tactile literacy". American Psychologist 12 (3): pp. 115-124. http://www.sciencedirect.com/science/article/pii/S0003066X07652459. 
  7. ^ Brooks, A.; Petersson, E. (2007). "SoundScapes: non-formal learning potentials from interactive VEs". SIGGRAPH. http://doi.acm.org/10.1145/1282040.1282059. 

External links


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Multimodal Architecture and Interfaces — is an open standard developed by the World Wide Consortium since 2005. Currently it is a working draft ( Working Draft ) of the W3C . The document is a technical report specifying a multimodal system architecture and its generic interfaces to… …   Wikipedia

  • Multimodal — may refer to: Multimodal distribution, a statistical distribution of values with multiple peaks Multimodal interaction, a form of human machine interaction using multiple modes of input/output. Multimodal transport, a journey involving the use of …   Wikipedia

  • Multimodal browser — A multimodal browser is one which allows multimodal interaction for input and/or output for example, keyboard and voice interfaces. Examples include Opera[1] and NetFront. References ^ IBM article on multimodal technology, retrieved on October 25 …   Wikipedia

  • Multimodal integration — Multimodal integration, also known as multisensory integration, is the study of how information from the different sensory modalities, such as sight, sound, touch, smell, self motion and taste, may be integrated by the nervous system. A coherent… …   Wikipedia

  • Modality (human–computer interaction) — Not to be confused with Mode (computer interface). In human–computer interaction, a modality is the general class of: a sense through which the human can receive the output of the computer (for example, vision modality) a sensor or device through …   Wikipedia

  • Extensible MultiModal Annotation markup language — (EMMA) ist ein auf XML basierende Sprachstandard aus der Medieninformatik, der durch das World Wide Web Consortium geschaffen wurde und sich noch in Bearbeitung befindet. Sie soll es ermöglichen, multimodale und bereits interpretierte… …   Deutsch Wikipedia

  • Interactive Multimodal Information Management (IM)2 — Interactive Multimodal Information Management (IM2) is one the 20 Swiss National Centres of Competence in Research (NCCR) aiming at boosting research and development in several areas considered of strategic importance to the Swiss economy. The… …   Wikipedia

  • Human-computer interaction — Human–computer interaction or HCI is the study of interaction between people (users) and computers. It is often regarded as the intersection of computer science, behavioral sciences, design and several other fields of study. Interaction between… …   Wikipedia

  • Human–computer interaction — This article is about the interaction between users and computers. For the direct communication between brain cells and computers, see Brain–computer interface. A mouse is a pointing device that functions by detecting two dimensional motion… …   Wikipedia

  • Architecture Multimodale et Interfaces — est un standard ouvert en développement par le World Wide Consortium depuis 2005. Actuellement il est considéré comme brouillon de travail (Working Draft) du W3C. Le document est le rapport technique de spécification d une architecture… …   Wikipédia en Français

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”