Multimedia Information Retrieval

Multimedia Information Retrieval

Multimedia Information Retrieval (MMIR) is a research discipline of computer science that aims at extracting semantic information from multimedia data sources.[1] Data sources include directly perceivable media such as audio, image and video, indirectly perceivable sources such as text, biosignals as well as not perceivable sources such as bioinformation, stock prices, etc. The methodology of MMIR can be organized in three groups:

  1. Methods for the summarization of media content (feature extraction). The result of feature extraction is a description.
  2. Methods for the filtering of media descriptions (for example, elimination of redundancy)
  3. Methods for the categorization of media descriptions into classes.


Feature Extraction Methods

Feature extraction is motivated by the sheer size of multimedia objects as well as their redundancy and, possibly, noisiness.[2] Generally, two possible goals can be achieved by feature extraction:

  • Summarization of media content. Methods for summarization include in the audio domain, for example, Mel Frequency Cepstral Coefficients, Zero Crossings Rate, Short-Time Energy. In the visual domain, color histograms[3] such as the MPEG-7 Scalable Color Descriptor can be used for summarization.
  • Detection of patterns by auto-correlation and/or cross-correlation. Patterns are recurring media chunks that can either be detected by comparing chunks over the media dimensions (time, space, etc.) or comparing media chunks to templates (e.g. face templates, phrases). Typical methods include Linear Predictive Coding in the audio/biosignal domain,[4] texture description in the visual domain and n-grams in text information retrieval.

Merging and Filtering Methods

Multimedia Information Retrieval implies that multiple channels are employed for the understanding of media content.[5] Each of this channels is described by media-specific feature transformations. The resulting descriptions have to be merged to one description per media object. Merging can be performed by simple concatenation if the descriptions are of fixed size. Variable-sized descriptions - as they frequently occur in motion description - have to be normalized to a fixed length first.

Frequently used methods for description filtering include factor analysis (e.g. by PCA), singular value decomposition (e.g. as latent semantic indexing in text retrieval) and the extraction and testing of statistical moments. Advanced concepts such as the Kalman filter are used for merging of descriptions.

Categorization Methods

Generally, all forms of machine learning can be employed for the categorization of multimedia descriptions[6] though some methods are more frequently used in one area than another. For example, Hidden Markov models are state-of-the-art in speech recognition, while Dynamic Time Warping - a semantically related method - is state-of-the-art in gene sequence alignment. The list of applicable classifiers includes the following:

The selection of the best classifier for a given problem (test set with descriptions and class labels, so-called ground truth) can be performed automatically, for example, using the Weka Data Miner.

Open Problems

The quality of MMIR Systems[7] depends heavily on the quality of the training data. Discriminative descriptions can be extracted from media sources in various forms. Machine learning provides categorization methods for all types of data. However, the classifier can only be as good as the given training data. On the other hand, it requires considerable effort to provide class labels for large databases. The future success of MMIR will depend on the provision of such data. The annual TRECVID competition is currently one of the most relevant sources of high-quality ground truth.

Related Areas

MMIR provides an overview over methods employed in the areas of information retrieval. Methods of one area are adapted and employed on other types of media. Multimedia content is merged before the classification is performed. MMIR methods are, therefore, usually reused from other areas such as:

The new Journal of Multimedia Information Retrieval[8] should help the development of MMIR as a research discipline independent of these areas.


  1. ^ H Eidenberger. " Fundamental Media Understanding ", atpress, 2011, p. 1.
  2. ^ H Eidenberger. " Fundamental Media Understanding ", atpress, 2011, p. 2.
  3. ^ A Del Bimbo. " Visual Information Retrieval ", Morgan Kaufmann, 1999.
  4. ^ HG Kim , N Moreau, T Sikora. " MPEG-7 Audio and Beyond", Wiley, 2005.
  5. ^ MS Lew (Ed.). " Principles of Visual Information Retrieval ", Springer, 2001.
  6. ^ H Eidenberger. " Fundamental Media Understanding ", atpress, 2011,p. 125.
  7. ^ JC Nordbotten. "Multimedia Information Retrieval Systems". Retrieved 14th October 2011.
  8. ^ "Journal of Multimedia Information Retrieval", Springer, 2011, Retrieved 21st October 2011.

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Information retrieval — This article is about information retrieval in general. For the fictional government department, see Brazil (film). Information retrieval (IR) is the area of study concerned with searching for documents, for information within documents, and for… …   Wikipedia

  • Information Retrieval — [ˌɪnfɚˈmeɪʃən ɹɪˈtɹiːvəl] (IR) bzw. Informationsrückgewinnung, gelegentlich ungenau Informationsbeschaffung, ist ein Fachgebiet, das sich mit computergestütztem Suchen nach komplexen Inhalten (also z. B. keine Einzelwörter) beschäftigt und… …   Deutsch Wikipedia

  • Information retrieval — Recherche d information Abrégée en RI ou IR (Information Retrieval en anglais), la recherche d information est la science qui consiste à rechercher l information dans des documents les documents eux mêmes ou les métadonnées qui décrivent les… …   Wikipédia en Français

  • Music information retrieval — (MIR) is the interdisciplinary science of retrieving information from music. MIR is a small but growing field of research with many real world applications. Those involved in MIR may have a background in musicology, psychology, academic music… …   Wikipedia

  • Human Computer Information Retrieval — The fields of human computer interaction (HCI) and information retrieval (IR) have both developed innovative techniques to address the challenge of navigating the complex information spaces, but their insights have to date often failed to cross… …   Wikipedia

  • European Conference on Information Retrieval — The European Conference on Information Retrieval (ECIR) is the main European research conference for the presentation of new results in the field of information retrieval (IR).It is organized by the Information Retrieval Specialist Group of the… …   Wikipedia

  • Multimedia search — enables information search using search queries in multiple data types including text and other multimedia formats. Multimedia search can be implemented through multimodal search interfaces, i.e., interfaces that allow to submit search queries… …   Wikipedia

  • Multimedia Web Ontology Language — (MOWL) has been designed to facilitate semantic interactions with multimedia contents. It supports perceptual modeling of concepts using expected media properties. While the reasoning in traditional ontology languages, e.g. Web Ontology Language… …   Wikipedia

  • information processing — Acquisition, recording, organization, retrieval, display, and dissemination of information. Today the term usually refers to computer based operations. Information processing consists of locating and capturing information, using software to… …   Universalium

  • Information theory — Not to be confused with Information science. Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental… …   Wikipedia