Unified Medical Language System

Unified Medical Language System

The Unified Medical Language System (UMLS) is a compendium of many controlled vocabularies in the biomedical sciences. It provides a mapping structure among these vocabularies and thus allows one to translate among the various terminology systems; it may also be viewed as a comprehensive thesaurus and ontology of biomedical concepts. UMLS further provides facilities for natural language processing. It is intended to be used mainly by developers of systems in medical informatics.

UMLS consists of the following components:
* "Metathesaurus", the core database of the UMLS, a collection of concepts and terms from the various controlled vocabularies, and their relationships;
* "Semantic Network", a set of categories and relationships that are being used to classify and relate the entries in the Metathesaurus;
* "SPECIALIST Lexicon", a database of lexicographic information for use in natural language processing;
* a number of supporting software tools.

The UMLS was designed and is maintained by the US National Library of Medicine, is updated quarterly and may be used for free. The project was initiated in 1986 by Donald Lindberg, M.D., then Director of the Library of Medicine.

Purpose and applications

The number of biomedical resources available to researchers is enormous. Often this is a problem due to the large volume of documents retrieved when the medical literature is searched. The purpose of the UMLS is to enhance access to this literature by facilitating the development of computer systems that understand biomedical language. This is achieved by overcoming two significant barriers: "the variety of ways the same concepts are expressed in different machine-readable sources & by different people" and "the distribution of useful information among many disparate databases & systems".

UMLS can be used to design information retrieval or patient record systems, to facilitate the communication between different systems, or to develop systems that parse the biomedical literature. For many of these applications, the UMLS will have to be used in a customized form, for instance by excluding certain source vocabularies that are not relevant to the application. The Library of Medicine itself uses it for its PubMed and ClinicalTrials.gov systems.

Users of the system have to sign a "UMLS agreement" and file brief annual reports on their use. Academic users can employ the UMLS free of charge for research. Commercial or production use requires copyright licenses for some of the incorporated source vocabularies.


The Metathesaurus forms the base of the UMLS and comprises over 1 million biomedical concepts and 5 million concept names, all of which stem from the over 100 incorporated controlled vocabularies and classification systems. Some examples of the incorporated controlled vocabularies are ICD-9-CM, ICD-10, MeSH, SNOMED CT, LOINC, WHO Adverse Drug Reaction Terminology, UK Clinical Terms, RxNORM, Gene Ontology, and OMIM (see [http://www.nlm.nih.gov/research/umls/metab4.html#sb4_0 full list] ).

The Metathesaurus is organized by concept, and each concept has specific attributes defining its meaning and is linked to the corresponding concept names in the various source vocabularies. Numerous relationships between the concepts are represented, for instance hierarchical ones such as "isa" for subclasses and "is part of" for subunits, and associative ones such as "is caused by" or "in the literature often occurs close to" (the latter being derived from Medline).

The scope of the Metathesaurus is determined by the scope of the source vocabularies. If different vocabularies use different names for the same concept, or if they use the same name for different concepts, then this will be faithfully represented in the Metathesaurus. All hierarchical information from the source vocabularies is retained in the Metathesaurus. Metathesaurus concepts can also link to resources outside of the database, for instance gene sequence databases.

The Metathesaurus itself is produced by the automated processing of machine-readable versions of the source vocabularies, followed by human intervention of editing and review. It is distributed as an SQL relational database and can also be accessed via a Java object-oriented API.

emantic network

Each concept in the Metathesaurus is assigned to at least one "Semantic type" (a category), and certain "Semantic relationships" may obtain between members of the various Semantic types. The Semantic network is a catalog of these Semantic types and Semantic relationships. This is a rather broad classification; there are 135 semantic types and 54 relationships.

The major semantic types are organisms, anatomical structures, biologic function, chemicals, events, physical objects, and concepts or ideas.The links among semantic types provide the structure for the network and show important relationships between the groupings and concepts. The primary link between semantic types is the "isa" link, establishing a hierarchy of types and allowing to locate the most specific semantic type to assign to a given Metathesaurus concept. The network also has 5 major categories of non-hierarchical (or "associational") relationships. These are "physically related to", "spatially related to", "temporally related to", "functionally related to" and "conceptually related to".

The information about a Semantic type includes an identifier, definition, examples, hierarchical information about the encompassing Semantic type(s), and its associational relationships. Associational relationships within the Semantic Network are very weak. They capture at most some-some relationships, i.e. they capture the fact that some instance of the first type may be connected by the salient relationship to some instance of the second type. Phrased differently, they capture the fact that a corresponding relational assertion is meaningful (though it need not be true in all cases).


The SPECIALIST Lexicon contains information about common English vocabulary, biomedical terms, terms found in MEDLINE and in the UMLS Metathesaurus. Each entry contains syntactic (how words are put together to create meaning), morphological (form and structure) and orthographic (spelling) information. A set of Java programs use the lexicon to work through the variations in biomedical texts by relating words by their parts of speech, which can be helpful in web searches or searches through an electronic medical record.

Entries may be one-word or multiple-word terms. Records contain four parts: base form (i.e. "run" for "running"); parts of speech (of which Specialist recognizes eleven); a unique identifier; and any available spelling variants. For example, a query for "anesthetic" would return the following: {base=anaesthetic spelling_variant=anesthetic entry=E0008769 cat=noun variants=reg} {base=anaesthetic spelling_variant=anesthetic entry=E0008770 cat=adj variants=inv position=attrib(3)}(Browne et al., 2000)

The SPECIALIST lexicon is available in two formats. The "unit record" format can be seen above, and comprises "slots" and "fillers". A "slot" is the element (i.e. "base=" or "spelling variant=") and the "fillers" are the values attributable to that slot for that entry. The "relational table" format is not yet normalized and contains a great deal of duplication of data.

upporting software tools

"MetamorphoSys" is a program that can be used to customize the Metathesaurus for specific applications, for instance by excluding certain source vocabularies.

"lvg" is a program that uses the SPECIALIST lexicon to generate lexical variants of a given term and to support the parsing of natural language text.

"MetaMap" is an online tool that, when given an arbitrary piece of text, finds and returns the relevant Metathesaurus concepts. "MetaMap Transfer (MMTx)" provides the same functionality as a Java program.

"Knowledge Source Server" is an online application that allows one to browse the Metathesaurus.


*Bodenreider, Olivier. (2004) [http://nar.oxfordjournals.org/cgi/content/full/32/suppl_1/D267 The Unified Medical Language System (UMLS): integrating biomedical terminology] . Nucleic Acids Research, 32, D267-D270.
*Browne, McCray and Srinivasan (2000). [http://lexsrv3.nlm.nih.gov/LexSysGroup/Projects/lexicon/2003/release/LEXICON/DOCS/techrpt.pdf The Specialist Lexicon] . Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, MD, p. 1.
*Kumar, Anand and Smith, Barry (2003) [http://ontology.buffalo.edu/medo/UMLS_GO.pdf The Unified Medical Language System and the Gene Ontology: Some Critical Reflections] , in: KI 2003: Advances in Artificial Intelligence (Lecture Notes in Artificial Intelligence 2821), Berlin: Springer, 135–148.
*Smith, Barry Kumar, Anand and Schulze-Kremer, Steffen (2004) [http://ontology.buffalo.edu/medo/UMLS_SN.pdf Revising the UMLS Semantic Network] , in M. Fieschi, et al. (eds.), Medinfo 2004, Amsterdam: IOS Press, 1700.
*Coiera, Enrico. (2003) "Guide to Health Informatics, 2nd ed.". [http://www.coiera.com/Chap17term.htm Chapter 17 - Healthcare terminologies and classification systems]

ee also

* Medical classification

External links

* [http://www.nlm.nih.gov/research/umls/ Official UMLS site]
* [http://www.nlm.nih.gov/research/umls/about_umls.html UMLS Summary description] , with links to factsheets and documentation for Metathesaurus, Semantic Network, SPECIALIST Lexicon and MetamorphoSys
* [http://www.nlm.nih.gov/research/umls/pdf/AMIA_T12_2006_UMLS.pdf UMLS Overview and Tutorial] , by Rachel Kleinsorge, Jan Willis, Allen Browne, Alan Aronson
* [http://www.stanford.edu/~nigam/UMLS A Perl module to query a UMLS mysql installation]

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Unified Medical Language System — Das 1989 von der US National Library of Medicine initiierte Unified Medical Language System (UMLS) ist ein Projekt, das Terminologien biomedizinischer Ressourcen wie Online Datenbanken und medizinischen Wörterbüchern aneinander angleichen will.… …   Deutsch Wikipedia

  • Medical terminology — is a vocabulary for accurately describing the human body and associated components, conditions, processes and process in a science based manner. Some examples are: R.I.C.E., trapezius, and latissimus dorsi. It is to be used in the medical and… …   Wikipedia

  • Medical classification — Medical classification, or medical coding, is the process of transforming descriptions of medical diagnoses and procedures into universal medical code numbers. The diagnoses and procedures are usually taken from a variety of sources within the… …   Wikipedia

  • Medical laboratory scientist — MLS in his work environment. A Medical Laboratory Scientist (MLS) (also referred to as medical technologist or clinical laboratory technologist) is a healthcare professional who performs chemical, hematological, immunologic, microscopic, and… …   Wikipedia

  • Language acquisition — Linguistics …   Wikipedia

  • Clinical Care Classification System — The Clinical Care Classification (CCC) System is a standardized, coded nursing terminology that identifies the discrete elements of nursing practice. The CCC provides a unique framework and coding structure for documenting the plan of care… …   Wikipedia

  • Omaha System — The Omaha System is a standardized health care terminology consisting of an assessment component (Problem Classification Scheme), an intervention component (Intervention Scheme), and an outcomes component (Problem Rating Scale for Outcomes).… …   Wikipedia

  • Health care system — A health care system is the organization of people, institutions, and resources to deliver health care services to meet the health needs of target populations. There is a wide variety of health care systems around the world, with as many… …   Wikipedia

  • Web Ontology Language — OWL Web Ontology Language Current Status Published Year Started 2002 Editors Mike Dean, Guus Schreiber Base Standards Resource Description Framework, RDFS Domain Semantic Web A …   Wikipedia

  • Cornish language — For the Anglo Cornish accent and dialect, see Anglo Cornish. Cornish Kernowek, Kernewek Pronunciation [kərˈnuːək] Spoken in …   Wikipedia