Horizontal correlation

Horizontal correlation

Horizontal correlation is a methodology for gene sequence analysis. Rather than referring to one specific technique, "horizontal correlation" instead encompasses a variety of approaches to sequence analysis that are unified by two specific themes:

* Sequence analysis is performed by making comparisons "horizontally", along the length of a single genetic sequence; this is in contrast to "vertical" methods that make comparisons across several different genetic sequences.
* The comparisons made generally measure information theoretic quantities such as value of the mutual information function between two regions of the sequence.

The core ideas of the horizontal correlation approach were first presented in a year 2000 paper by Grosse, Herzel, Buldyrev, and Stanley (Grosse, et al. 2000). In this first formulation, Grosse and colleagues sought to characterize a large genetic sequence by dividing the sequence into coding and non-coding regions. Whereas traditional approaches to the coding-vs.-non-coding problem generally relied on sophisticated pattern recognition systems that were first trained on small inputs and then run over the entire sequence (Ohler, et al. 1999), the horizontal correlation approach of Grosse and colleagues worked instead by breaking the sequence into many relatively short sequence fragments, each only 500 base pairs in length. They then sought to characterize each of these fragments as either coding or non-coding. This was accomplished by comparing each size 3 window along the length of a fragment with the first size 3 window in that fragment, then measuring the value of the mutual information function between the two windows. Coding sequences were found to display a stylized pattern of 3-periodicity that non-coding sequences did not. Such a pattern was easy to recognize, and enabled significantly more rapid, more species-independent identification of coding regions (Grosse, et al. 2000).

Since 2000, horizontal correlation methodologies emphasizing the measurement of information theoretic quantities along the length of a gene sequence have been put to widespread use, and have even found application in shotgun sequencing fragment assembly (Otu & Sayood, 2004).

References

* I. Grosse, H. Herzel, S. Buldyrev, H. Stanley: "Species Independence of Mutual Information in Coding and non-Coding DNA," "Physical Review E," Vol. 61, No. 5 (2000)
* U. Ohler, S. Harbeck, H. Niemann, E. Noth, and M. Reese: "Interpolated Markov Chains for Eukaryotic Promoter Recognition," "Bioinformatics," Vol. 15, pp. 362-369 (1999)
* H. Otu, K. Sayood: "A Divide and Conquer Approach to Fragment Assembly," "Bioinformatics," Vol. 19, No. 1 pp. 22-29 (2004)


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • correlation — 1. The mutual or reciprocal relation of two or more items or parts. 2. The act of bringing into such a relation. 3. The degree to which variables change together. product moment c. a statistical procedure which yields the c. coefficient referred… …   Medical dictionary

  • Probability plot correlation coefficient plot — Many statistical analyses are based on distributional assumptions about the population from which the data have been obtained. However, distributional families can have radically different shapes depending on the value of the shape parameter.… …   Wikipedia

  • statistics — /steuh tis tiks/, n. 1. (used with a sing. v.) the science that deals with the collection, classification, analysis, and interpretation of numerical facts or data, and that, by use of mathematical theories of probability, imposes order and… …   Universalium

  • Q-Q plot — Not to be confused with P P plot. A normal Q Q plot of randomly generated, independent standard exponential data, (X   Exp(1)). This Q Q plot compares a sample of data on the vertical axis to a statistical population on the horizontal… …   Wikipedia

  • Earth Sciences — ▪ 2009 Introduction Geology and Geochemistry       The theme of the 33rd International Geological Congress, which was held in Norway in August 2008, was “Earth System Science: Foundation for Sustainable Development.” It was attended by nearly… …   Universalium

  • Two-dimensional nuclear magnetic resonance spectroscopy — (2D NMR) is a set of nuclear magnetic resonance spectroscopy (NMR) methods which give data plotted in a space defined by two frequency axes rather than one. Types of 2D NMR include correlation spectroscopy (COSY), J spectroscopy, exchange… …   Wikipedia

  • geology — /jee ol euh jee/, n., pl. geologies. 1. the science that deals with the dynamics and physical history of the earth, the rocks of which it is composed, and the physical, chemical, and biological changes that the earth has undergone or is… …   Universalium

  • Weather radar — in Norman, Oklahoma with rainshaft …   Wikipedia

  • Marcellus Formation — Stratigraphic range: Middle Devonian …   Wikipedia

  • Peak ground acceleration — (PGA) is a measure of earthquake acceleration on the ground and an important input parameter for earthquake engineering, also known as the design basis earthquake ground motion (DBEGM)[1] Unlike the Richter and moment magnitude scales, it is not… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”