Human speechome project

Human speechome project

The Human Speechome Project (pronounced "speech-ome", rhymes with "genome") is being conducted at the Massachusetts Institute of Technology's Media Laboratory by the Cognitive Machines Group, headed by Associate Professor Deb Roy. It is an effort to observe and model the language acquisition of a single child unobtrusively at his English-speaking home in great detail over the first three years of his life. The resultant data is being used to create computational models which could yield further insight into language acquisition. cite web
last = Roy, et al
first= Deb
title="The Human Speechome Project"
accessdate = 2008-01-03


Most studies of human speech acquisition in children have been done in laboratory settings and with sampling rates of only a couple of hours per week. The need for studies in the more natural setting of the child's home, and at a much higher sampling rate approaching the child's total experience, led to the development of this project concept.


A digital network consisting of eleven video cameras, fourteen microphones, and an array of data capture hardware has been installed in the home of the subject, giving as complete, 24-hour coverage of the child's experiences as possible. The motion-activated cameras are ceiling-mounted, wide-angle, unobtrusive units providing overhead views of all primary living areas. Sensitive boundary layer microphones are located in the ceilings near the cameras.

Video image resolution is sufficient to capture gestures and head orientation of people and identity of mid-sized objects anywhere in a room, but insufficient to resolve direction of eye gaze and similar subtle details. Audio is sampled at greater than CD quality, yielding recordings of speech that are easily transcribed. A cluster of ten computers and audio samplers with a capacity of five terabytescite web
first=Sarah H.
title="Media Lab project explores language acquisition"
publisher=MIT News Office|accessdate=2008-01-03
] is located in the basement of the house to capture the data. Data from the cluster is moved manually to the MIT campus as necessary for storage in a one-million-gigabyte (one-petabyte) storage facility.

Privacy Issues

To provide control of the observation system to the occupants of the house, eight touch-activated displays have been wall-mounted throughout the house. These allow for stopping and starting video and or audio recording, and also provide an "oops" capability wherein the occupants can erase any number of minutes of recording permanently from the system. Motorized "privacy shutters" move to cover the cameras when video recording is turned off, providing natural feedback of the state of the system. On most days, audio recording is turned off throughout the house at night after the child is asleep and then turned back on in the morning. Audio and/or video are also often turned off periodically at the discretion of the participants, for example, during the adult dinner time.

Data Analysis Tools

Data is being gathered at an average rate of 200 gigabytes per day. This has necessitated the development of sophisticated data-mining tools to reduce analysis efforts to a manageable level. This includes analysis of audio spectrograms. Transcripts of significant speech (all that is heard and produced by the child) add a labor-intensive dimension to the study, and advanced techniques are being developed to cope with this burden. In order to securely store the project's data, a large storage array is being constructed at the MIT Media Lab. This construction is in collaboration with Bell Microproducts, Seagate, and Zetera Corporation. cite web
title= "News Announcement"
url =
accessdate = 2008-01-03

Modeling Efforts

Building upon earlier efforts of the Cognitive Machines Group, researchers are advancing from a simpler modeling of noun-picture relationships to address issues of semantic grounding in terms of physical and social action, and recognition of intentions. Semi-automation of learning behavior grammars from video data is being advanced to construct a behavior lexicon. Extensions of this work are focusing on developing a video parser that uses grammars constructed from acquired behavior patterns to infer latent structure underlying movement patterns. Cross-situational learning algorithms are being developed to learn mappings from spoken words and phrases to these latent structures.


ee also

* MIT Media Lab
* Massachusetts Institute of Technology

External links

* [ Deb Roy's MIT home page]
* [ Article in New Scientist]
* [ Article in Wired Magazine]
* [ Language Acquisition] , an article by Steven Pinker of MIT. This is a non-final, draft version of this highly informative article.

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Human cognition — is the study of how the human brain thinks. As a subject of study, human cognition tends to be more than only theoretical in that its theories lead to working models that demonstrate behavior similar to human thought. The extent to which these… …   Wikipedia

  • Computational linguistics — This article is about the scientific field. For the journal, see Computational Linguistics (journal). Linguistics …   Wikipedia

  • First language — The monument for the Mother tongue in Nakhchivan, Azerbaijan Mother tongue redirects here. For other uses, see Mother tongue (disambiguation). Native speaker redirects here. For the novel, see Native Speaker. A first language (also native… …   Wikipedia

We are using cookies for the best presentation of our site. Continuing to use this site, you agree with this.