Gap penalty

Gap penalty

Gap penalties are used during sequence alignment. Gap penalties contribute to the overall score of alignments, and therefore, the size of the gap penalty relative to the entries in the similarity matrix affects the alignment that is finally selected. Selecting a higher gap penalty will cause less favourable characters to be aligned, to avoid creating as many gaps.

Constant gap penalty

Constant gap penalties are the simplest type of gap penalty. The only parameter, d, is added to the alignment score when the gap is first opened. This means that any gap, receives the same penalty, what ever size it is.

Linear gap penalty

Linear gap penalties have only parameter, d, which is a penalty per unit length of gap. This is almost always negative, so that the alignment with fewer gaps is favoured over the alignment with more gaps. Under a linear gap penalty, the overall penalty for one large gap is the same as for many small gaps.

Affine gap penalty

Affine gap penalties attempt to overcome this problem. In biological sequences, for example, it is much more likely that one big gap of length 10 occurs in one sequence, due to a single insertion or deletion event, than it is that 10 small gaps of length 1 are made. Therefore, affine gap penalties are length dependent (unlike linear gap penalties which are length independent) and use a gap opening penalty, o, and a gap extension penalty, e. A gap of length l is then given a penalty o + (l-1)e. So that gaps are discouraged, o and e are almost always negative. Furthermore, because a few large gaps are better than many small gaps, e is almost always smaller than o to encourage gap extension rather than gap introduction.

Further reading

* Taylor WR, Munro RE (1997). Multiple sequence threading: conditional gap placement. "Fold Des, 2(4):S33-9".
* Taylor WR (1996). A non-local gap-penalty for profile alignment. "Bull Math Biol, 58(1):1-18".
* Vingron M, Waterman MS (1994). Sequence alignment and penalty choice. Review of concepts, case studies and implications. "J Mol Biol, 235(1):1-12".
* Panjukov VV (1993). Finding steady alignments: similarity and distance. "Comput Appl Biosci, 9(3):285-90".
* Alexandrov NN (1992). Local multiple alignment by consensus matrix. "Comput Appl Biosci, 8(4):339-45".
* Hein J (1989). A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given. "Mol Biol Evol, 6(6):649-68".
* Henneke CM (1989). A multiple sequence alignment algorithm for homologous proteins using secondary structure information and optionally keying alignments to functionally important sites. "Comput Appl Biosci, 5(2):141-50".
* Reich JG, Drabsch H, Daumler A (1984). On the statistical assessment of similarities in DNA sequences. "Nucleic Acids Res, 12(13):5529-43".

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Gap Connah's Quay F.C. — gap Connah s Quay Nomads Full name gap Connah s Quay Nomads Football Club Nickname(s) The Nomads Founded 1946 (as Connah s Quay Juniors) …   Wikipedia

  • Gap (Bioinformatik) — Ein Gap (engl., zu deutsch: Lücke) bezeichnet in der Bioinformatik eine Lücke oder Leerstelle in einer Sequenz, insbesondere beim Sequenzalignment. Ein Gap bedeutet, dass an der entsprechenden Stelle in einer verwandten Sequenz ein weiteres… …   Deutsch Wikipedia

  • Penalty (American football) — A penalty flag on the field during a game on November 16, 2008 between the San Francisco 49ers and St. Louis Rams. In American football and Canadian football, a penalty is a sanction called against a team for a violation of the rules, called a… …   Wikipedia

  • Info-gap decision theory — is a non probabilistic decision theory that seeks to optimize robustness to failure – or opportuneness for windfall – under severe uncertainty,[1][2] in particular applying sensitivity analysis of the stability radius type[3] to perturbations in… …   Wikipedia

  • Needleman–Wunsch algorithm — The Needleman–Wunsch algorithm performs a global alignment on two sequences (called A and B here). It is commonly used in bioinformatics to align protein or nucleotide sequences. The algorithm was published in 1970 by Saul B. Needleman and… …   Wikipedia

  • Sequence alignment — In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences.[1]… …   Wikipedia

  • Multiple sequence alignment — A multiple sequence alignment (MSA) is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they… …   Wikipedia

  • Needleman-Wunsch algorithm — The Needleman–Wunsch algorithm performs a global alignment on two sequences (called A and B here). It is commonly used in bioinformatics to align protein or nucleotide sequences. The algorithm was published in 1970 by Saul Needleman and Christian …   Wikipedia

  • JAligner — is an open source Java implementation of the Smith Waterman algorithm [Smith TF and Waterman MS (1981). Identification of common molecular subsequences. J Mol Biol, 147:195 197 .] with Gotoh s improvement [Gotoh O (1982). An improved algorithm… …   Wikipedia

  • Male–female income disparity in the United States — Main article: Gender pay gap Median weekly earnings of full time wage and salary workers, by sex, race, and ethnicity, 2009.[1] Male–female income diference, also referred to as the gender gap in earnings in t …   Wikipedia