- Speech corpus
A speech corpus (or spoken corpus) is a
databaseof speech audio files and text transcriptions in a format that can be used to create acoustic models (which can then be used with a speech recognitionengine).
A corpus is one such database. Corpora is the plural of corpus (i.e. it is many such databases).
There are two types of Speech Corpora:
*(1) Read Speech - which includes:
:*Book excerpts:*Broadcast news:*Lists of words:*Sequences of numbers
*(2) Spontaneous Speech - which includes:
:* Dialogs - between two or more people (includes meetings);:* Narratives - a person telling a story;:* Map-tasks - one person explains a route on a map to another;:* Appointment-tasks - two people try to find a common meeting time based on individual schedules.
A special kind of speech corpora are
non-native speech databasesthat contain speech with foreign accent.
* [http://www.phonetik.uni-muenchen.de/Bas/BasHomeeng.html BAS – Bavarian Archive for Speech Signals]
* [http://buckeyecorpus.osu.edu/ Buckeye Corpus] - The Buckeye Corpus of Conversational Speech
* [http://www.ece.msstate.edu/research/isip/projects/switchboard/ Switchboard] - ISIP's Switchboard database
* [http://www.voxforge.org/ VoxForge - open source speech corpora]
Wikimedia Foundation. 2010.
См. также в других словарях:
Corpus — (Latin plural corpora, English plural corpuses or corpora) is Latin for body. It may refer to: Contents 1 Law 2 Biology … Wikipedia
Speech recognition — For the human linguistic concept, see Speech perception. The display of the Speech Recognition screensaver on a PC, in which the character responds to questions, e.g. Where are you? or statements, e.g. Hello. Speech recognition (also known as… … Wikipedia
Speech recognition in Linux — There is currently no open source equivalent of proprietary speech recognition software (e.g. Nuances Dragon NaturallySpeaking or Windows Speech Recognition) for Linux. However, there are several incomplete, open source projects and solutions… … Wikipedia
Corpus linguistics — is the study of language as expressed in samples (corpora) or real world text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally … Wikipedia
Corpus Christi (play) — Corpus Christi is a passion play by Terrence McNally dramatizing the story of Jesus and the Apostles. It depicts Jesus and the Apostles as gay men living in modern day Texas. It utilizes modern devices like television with anachronisms like Roman … Wikipedia
Speech perception — is the process by which the sounds of language are heard, interpreted and understood. The study of speech perception is closely linked to the fields of phonetics and phonology in linguistics and cognitive psychology and perception in psychology.… … Wikipedia
Corpus oraux — Corpus oral En linguistique, un corpus oral est un corpus constitué de transcriptions de données orales. Bibliographie Olivier Baude, Corpus oraux. Guide des bonnes pratiques, Paris, CNRS, 2006 Douglas Biber, Variation across speech and writing,… … Wikipédia en Français
Corpus of Contemporary American English — The freely searchable 425 million word Corpus of Contemporary American English (COCA) is the largest corpus of American English currently available, and the only publicly available corpus of American English to contain a wide array of texts from… … Wikipedia
Corpus callosum — For the two films with this name, see Corpus Callosum (2002) and Corpus Callosum (2007) Brain: Corpus callosum Corpus callosum from above. (Anterior portion is at the top of the image.) … Wikipedia
Corpus callosum, agenesis of the — A congenital abnormality (a birth defect) in which there is partial or complete absence (agenesis) of the corpus callosum, the area of the brain which connects the two cerebral hemispheres (the two halves of the brain). Agenesis of the corpus… … Medical dictionary