Expressed sequence tag


Expressed sequence tag

An expressed sequence tag or EST is a short sub-sequence of a transcribed spliced nucleotide sequence (either protein-coding or not). They may be used to identify gene transcripts, and are instrumental in gene discovery and gene sequence determination. [Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH, Xiao H, Merril CR, Wu A, Olde B, Moreno RF, et al. Complementary DNA sequencing: expressed sequence tags and human genome project.Science. 1991 Jun 21;252(5013):1651-6. [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&TermToSearch=2047873&ordinalpos=1&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum PMID:2047873] ] The identification of ESTs has proceeded rapidly, with approximately 52 million ESTs now available in public databases (e.g. GenBank 5/2008, all species).

An EST is produced by one-shot sequencing of a cloned mRNA (i.e. sequencing several hundred base pairs from an end of a cDNA clone taken from a cDNA library). The resulting sequence is a relatively low quality fragment whose length is limited by current technology to approximately 500 to 800 nucleotides. Because these clones consist of DNA that is complementary to mRNA, the ESTs represent portions of expressed genes. They may be present in the database as either cDNA/mRNA sequence or as the reverse complement of the mRNA, the template strand.

ESTs can be mapped to specific chromosome locations using physical mapping techniques, such as radiation hybrid mapping or FISH. Alternatively, if the genome of the organism that originated the EST has been sequenced one can align the EST sequence to that genome.

The current understanding of the human set of genes (2006) includes the existence of thousands of genes based solely on EST evidence. In this respect, ESTs become a tool to refine the predicted transcripts for those genes, which leads to prediction of their protein products, and eventually of their function. Moreover, the situation in which those ESTs are obtained (tissue, organ, disease state - e.g. cancer) gives information on the conditions in which the corresponding gene is acting. ESTs contain enough information to permit the design of precise probes for DNA microarrays that then can be used to determine the gene expression.

Some authors use the term "EST" to describe genes for which little or no further information exists besides the tag. [ [http://www.ncbi.nlm.nih.gov/dbEST/how_to_submit.html dbEST ] ]

The significance of ESTs, their properties, methods to analyse EST dataset and their applications in different areas of biology have been reviewed by Nagaraj et al (2007). [Nagaraj, Shivashankar H, Gasser, Robin B, and Ranganathan, Shoba. A hitchhiker's guide to expressed sequence tag (EST) analysis. Brief Bioinform 8 (2007 Jan): 6-21. [http://bib.oxfordjournals.org/cgi/content/short/8/1/6?rss=1 article] ]

Sources of data and annotations

dbEST

dbEST is a division of Genbank established in 1992. As for GenBank, data in dbEST is directly submitted by laboratories world-wide and is not curated.

EST contigs

Because of the way ESTs are sequenced, many distinct expressed sequence tags are often partial sequences that correspond to the same mRNA of an organism. In an effort to reduce the number of expressed sequence tags for downstream gene discovery analyses, several groups assembled expressed sequence tags into EST contigs. Example of resources that provide EST contigs include:
* TIGR gene indices [Lee, Y, Tsai, J, Sunkara, S, et al. The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes. Nucleic Acids Res 33 (2005 Jan 1): D71-4. [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15608288&query_hl=16 PMID:15608288] ]
* Unigene [Stanton, Jo-Ann L, Macgregor, Andrew B, and Green, David P L. Identifying tissue-enriched gene expression in mouse tissues using the NIH UniGene database. Appl Bioinformatics 2 (2003): S65-73. [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15130819&query_hl=16 PMID:15130819] ]
* STACK [Christoffels, A, van Gelder, A, Greyling, G, et al. STACK: Sequence Tag Alignment and Consensus Knowledgebase. Nucleic Acids Res 29 (2001 Jan 1): 234-8. [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=11125101&query_hl=16 PMID:11125101] ]

Constructing EST contigs is not trivial and may yield artifacts (contigs that contain two distinct gene products). When the complete genome sequence of an organism is available and transcripts are annotated, it is possible to bypass contig assembly and directly match transcripts with ESTs. This approach is used in the TissueInfo system (see below) and makes it easy to link annotations in the genomic database to tissue information provided by EST data.

TissueInfo

High-throughput analyses of ESTs often encounter similar data management challenges. A first challenge is that tissue provenance of EST libraries is described in plain English in dbEST. [Skrabanek, L, and Campagne, F. TissueInfo: high-throughput identification of tissue expression profiles and specificity. Nucleic Acids Res 29 (2001 Nov 1): E102-2. [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=11691939&query_hl=16 PMID:11691939] ] This makes it difficult to write programs that can non ambiguously determine that two EST libraries were sequenced from the same tissue. Similarly, disease conditions for the tissue are not annotated in a computationally friendly manner. For instance, cancer origin of a library is often mixed with the tissue name (e.g., the tissue name "glioblastoma" indicates that the EST library was sequenced from brain tissue and the disease condition is cancer). [Campagne, Fabien, and Skrabanek, Lucy. Mining expressed sequence tags identifies cancer markers of clinical interest. BMC Bioinformatics 7 (2006): 481. [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=17078886&query_hl=16 PMID:17078886] ] With the notable exception of cancer, the disease condition is often not recorded in dbEST entries. The TissueInfo project was started in 2000 to help with these challenges. The project provides curated data (updated daily) to disambiguate tissue origin and disease state (cancer/non cancer), offers a tissue ontology that links tissues and organs by "is part of" relationships (i.e., formalizes knowledge that hypothalamus is part of brain, and that brain is part of the central nervous system) and distributes open-source software for linking transcript annotations from sequenced genomes to tissue expression profiles calculated with data in dbEST [ [http://icb.med.cornell.edu/crt/tissueinfo/ :institute for computational biomedicine::TissueInfo ] ] .

See also

* gene expression
* complementary DNA (cDNA)
* IMAGE cDNA clones

References

External links

* [http://www.ncbi.nlm.nih.gov/About/primer/est.html ESTs Factsheet] from NCBI, a good and easy to read introduction to ESTs.
* [http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.section.858 The NCBI Handbook, Part 3, Chapter 21] has a very nice overview.
* [http://mips.gsf.de/proj/est/ ECLAT] a server for the classification of ESTs from mixed EST pools (from fungus infected plants) using codon usage.
* [http://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.html The current number of EST sequences in the GenBank division dbEST] .
* [http://biolinfo.org/EST/ Web Resources for EST data and analysis]
* http://icb.med.cornell.edu/crt/tissueinfo/ TissueInfo project: Curated EST tissue provenance, tissue ontology, open-source software.


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Expressed Sequence Tag — Expressed Sequence Tags (EST) sind kurze DNA Sequenzen von meist 100–800 Basenpaaren Länge, die durch die teilweise Sequenzierung von cDNAs von deren 5 oder 3 Ende ausgehend gewonnen werden. Da cDNAs durch die reverse Transkription von mRNA… …   Deutsch Wikipedia

  • Expressed Sequence Tag — Marqueur de séquence exprimée Un marqueur de séquence exprimée, ou Expressed Sequence Tag (EST), est une courte portion séquencée d un ADN complémentaire (ADNc), utilisée comme marqueur pour différencier les gènes entre eux dans une séquence ADN… …   Wikipédia en Français

  • Expressed sequence tag — Marqueur de séquence exprimée Un marqueur de séquence exprimée, ou Expressed Sequence Tag (EST), est une courte portion séquencée d un ADN complémentaire (ADNc), utilisée comme marqueur pour différencier les gènes entre eux dans une séquence ADN… …   Wikipédia en Français

  • Expressed sequence tag — Expressed Sequence Tags (EST) sind transkribierte Nukleotidsequenzen, die gewöhnlich durch Sequenzierung einer cDNA Bibliothek erhalten werden. Die EST Sequenzierung wurde von dem US amerikanischen Genetiker Craig Venter entwickelt und ist eine… …   Deutsch Wikipedia

  • expressed sequence tag — DNA sequence derived by sequencing an end of a random cDNA clone from a library of interest. Usually, tens of thousands of such ESTs are generated as part of genome projects. These ESTs provide a rapid way of identifying cDNAs of interest, based… …   Dictionary of molecular biology

  • expressed sequence tag — (EST) a short (several hundred base pairs) DNA sequence obtained by randomly sequencing a clone from a cDNA library; the sequence thus represents a portion of an expressed gene and is sufficient to identify the gene to which it corresponds… …   Medical dictionary

  • expressed sequence tag — Abbreviation: EST Partially sequenced cDNA clone. Because the read length of a standard DNA sequencing reaction is shorter than the majority of cDNA clones, full length sequence can only be obtained by further manipulations. For the purposes of… …   Glossary of Biotechnology

  • expressed sequence tag — (EST) A partial gene sequence unique to a gene that can be used to identify and position the gene during genomic analysis …   Dictionary of microbiology

  • Expressed sequence tag (EST) — A unique stretch of DNA within a coding region of a gene that is useful for identifying full length genes and serves as a landmark for mapping. An EST is a sequence tagged site (STS) derived from cDNA. An STS is a short segment of DNA which… …   Medical dictionary

  • EST or expressed sequence tag — — частичная последовательность кДНК (cDNA) [база данных ESTs расположена на веб сайте http://ncbi.nlm.nih.gov]. Последовательности содержат до 5% ошибок …   Генетика. Энциклопедический словарь