 Needleman–Wunsch algorithm

The Needleman–Wunsch algorithm performs a global alignment on two sequences (called A and B here). It is commonly used in bioinformatics to align protein or nucleotide sequences. The algorithm was published in 1970 by Saul B. Needleman and Christian D. Wunsch.^{[1]}
The Needleman–Wunsch algorithm is an example of dynamic programming, and was the first application of dynamic programming to biological sequence comparison.
Contents
A modern presentation
Scores for aligned characters are specified by a similarity matrix. Here, S(a,b) is the similarity of characters a and b. It uses a linear gap penalty, here called d.
For example, if the similarity matrix were
A G C T A 10 1 3 4 G 1 7 5 3 C 3 5 9 0 T 4 3 0 8 then the alignment:
AGACTAGTTAC CGA‒‒‒GACGT
with a gap penalty of 5, would have the following score:
To find the alignment with the highest score, a twodimensional array (or matrix) F is allocated. The entry in row i and column j is denoted here by F_{ij}. There is one column for each character in sequence A, and one row for each character in sequence B. Thus, if we are aligning sequences of sizes n and m, the amount of memory used is in O(nm). (Hirschberg's algorithm can compute an optimal alignment in Θ(min{n,m}) space, roughly doubling the running time.^{[ambiguous]})
As the algorithm progresses, the F_{ij} will be assigned to be the optimal score for the alignment of the first i = 0,...,n characters in A and the first j = 0,...,m characters in B. The principle of optimality is then applied as follows.
Basis: F_{0j} = d * j F_{i0} = d * i Recursion, based on the principle of optimality:
The pseudocode for the algorithm to compute the F matrix therefore looks like this:
for i=0 to length(A) F(i,0) ← d*i for j=0 to length(B) F(0,j) ← d*j for i=1 to length(A) for j=1 to length(B) { Match ← F(i1,j1) + S(A_{i}, B_{j}) Delete ← F(i1, j) + d Insert ← F(i, j1) + d F(i,j) ← max(Match, Insert, Delete) }
Once the F matrix is computed, the entry F_{nm} gives the maximum score among all possible alignments. To compute an alignment that actually gives this score, you start from the bottom right cell, and compare the value with the three possible sources (Match, Insert, and Delete above) to see which it came from. If Match, then A_{i} and B_{j} are aligned, if Delete, then A_{i} is aligned with a gap, and if Insert, then B_{j} is aligned with a gap. (In general, more than one choices may have the same value, leading to alternative optimal alignments.)
AlignmentA ← "" AlignmentB ← "" i ← length(A) j ← length(B) while (i > 0 and j > 0) { Score ← F(i,j) ScoreDiag ← F(i  1, j  1) ScoreUp ← F(i, j  1) ScoreLeft ← F(i  1, j) if (Score == ScoreDiag + S(A_{i}, B_{j})) { AlignmentA ← A_{i} + AlignmentA AlignmentB ← B_{j} + AlignmentB i ← i  1 j ← j  1 } else if (Score == ScoreLeft + d) { AlignmentA ← A_{i} + AlignmentA AlignmentB ← "" + AlignmentB i ← i  1 } otherwise (Score == ScoreUp + d) { AlignmentA ← "" + AlignmentA AlignmentB ← B_{j} + AlignmentB j ← j  1 } } while (i > 0) { AlignmentA ← A_{i} + AlignmentA AlignmentB ← "" + AlignmentB i ← i  1 } while (j > 0) { AlignmentA ← "" + AlignmentA AlignmentB ← B_{j} + AlignmentB j ← j  1 }
Historical notes
Needleman and Wunsch describe their algorithm explicitly for the case when the alignment is penalized solely by the matches and mismatches, and gaps have no penalty (d=0). The original publication^{[1]} from 1970 suggests the recursion
F_{ij} = max _{h < i,k < j}{F_{h,j − 1} + S(A_{i},B_{j}),F_{i − 1,k} + S(A_{i},B_{j})}.
The corresponding dynamic programming algorithm takes cubic time. The paper also points out that the recursion can accommodate arbitrary gap penalization formulas:
A penalty factor, a number subtracted for every gap made, may be assessed as a barrier to allowing the gap. The penalty factor could be a function of the size and/or direction of the gap. [page 444]
A better dynamic programming algorithm with quadratic running time for the same problem (no gap penalty) was first introduced^{[2]} by David Sankoff in 1972. Similar quadratictime algorithms were discovered independently by T. K. Vintsyuk^{[3]} in 1968 for speech processing ("time warping"), and by Robert A. Wagner and Michael J. Fischer^{[4]} in 1974 for string matching.
Needleman and Wunsch formulated their problem in terms of maximizing similarity. Another possibility is to minimize the edit distance between sequences, introduced by Vladimir Levenshtein. Peter H. Sellers showed^{[5]} in 1974 that the two problems are equivalent.
In modern terminology, "NeedlemanWunsch" refers to a global alignment algorithm that takes quadratic time for a linear or affine gap penalty.
References
 ^ ^{a} ^{b} Needleman, Saul B.; and Wunsch, Christian D. (1970). "A general method applicable to the search for similarities in the amino acid sequence of two proteins". Journal of Molecular Biology 48 (3): 443–53. doi:10.1016/00222836(70)900574. PMID 5420325. http://linkinghub.elsevier.com/retrieve/pii/00222836(70)900574.
 ^ Sankoff, D. (1972). "Matching sequences under deletion/insertion constraints". Proceedings of the National Academy of Sciences of the USA 69 (1): 4–6. doi:10.1073/pnas.69.1.4. PMC 427531. PMID 4500555. http://www.pnas.org/content/69/1/4.abstract.
 ^ Vintsyuk, T. K. (1968). "Speech discrimination by dynamic programming". Kibernetika 4: 81–88.
 ^ Wagner, R. A. and Fischer, M. J. (1974). "The stringtostring correction problem". Journal of the ACM 21 (1): 168–173. doi:10.1145/321796.321811.
 ^ Sellers, P. H. (1974). "On the theory and computation of evolutionary distances". SIAM Journal on Applied Mathematics 26 (4): 787–793. doi:10.1137/0126070.
External links
 NWalign: A protein sequencetosequence alignment program by NeedlemanWunsch algorithm (online server & source code)
 NeedlemanWunsch Algorithm as Ruby Code
 Java Implementation of the NeedlemanWunsch Algorithm
 B.A.B.A. — an applet (with source) which visually explains the algorithm.
 A clear explanation of NW and its applications to sequence alignment
 Sequence Alignment Techniques at Technology Blog
 OPAL JavaScript implementation of algorithms: NeedlemanWunsch, NeedlemanWunschSellers and SmithWaterman
 Biostrings R package implementing NeedlemanWunsch algorithm among others
See also
Categories: Bioinformatics algorithms
 Algorithms on strings
 Computational phylogenetics
 Dynamic programming
Wikimedia Foundation. 2010.
Look at other dictionaries:
NeedlemanWunsch algorithm — The Needleman–Wunsch algorithm performs a global alignment on two sequences (called A and B here). It is commonly used in bioinformatics to align protein or nucleotide sequences. The algorithm was published in 1970 by Saul Needleman and Christian … Wikipedia
Algorithme de NeedlemanWunsch — L algorithme de Needleman Wunsch effectue un alignement global maximal de deux chaînes de caractères (appelées ici A et B). Il est couramment utilisé en bio informatique pour aligner des séquences de protéines ou de nucléotides. L algorithme a… … Wikipédia en Français
Hirschberg's algorithm — Hirschberg s algorithm, named after its inventor, Dan Hirschberg, is a dynamic programming algorithm that finds the least cost sequence alignment between two strings, where cost is measured as Levenshtein edit distance, defined to be the sumof… … Wikipedia
SmithWaterman algorithm — The Smith Waterman algorithm is a well known algorithm for performing local sequence alignment; that is, for determining similar regions between two nucleotide or protein sequences. Instead of looking at the total sequence, the Smith Waterman… … Wikipedia
Saul Needleman — Saul B. Needleman is a bioinformaticist. Along with Christian Wunsch, in 1970 he published A general method applicable to the search for similarities in the amino acid sequence of two proteins , which introduced the Needleman Wunsch algorithm.… … Wikipedia
Christian Wunsch — Christian D. Wunsch is a bioinformaticist. Along with Saul Needleman, in 1970 he published A general method applicable to the search for similarities in the amino acid sequence of two proteins , which introduced the Needleman Wunsch algorithm.… … Wikipedia
Алгоритм Нидлмана — Алгоритм Нидлмана Вунша это алгоритм для выполнения выравнивания двух последовательностей (будем называть их и ), который используется в биоинформатике при построении выравниваний аминокислотных или нуклеотидных последовательностей.… … Википедия
Sequence alignment — In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences.[1]… … Wikipedia
List of algorithms — The following is a list of the algorithms described in Wikipedia. See also the list of data structures, list of algorithm general topics and list of terms relating to algorithms and data structures.If you intend to describe a new algorithm,… … Wikipedia
Levenshtein distance — In information theory and computer science, the Levenshtein distance is a string metric for measuring the amount of difference between two sequences. The term edit distance is often used to refer specifically to Levenshtein distance. The… … Wikipedia