BLOSUM


BLOSUM

BLOSUM (BLOcks of Amino Acid SUbstitution Matrix [Note that in the acronym BLOSUM the last 'M' stands for 'matrix' and it is therefore incorrect and unnecessary to write 'BLOSUM matrix', see RAS syndrome.] ) is a substitution matrix used for sequence alignment of proteins. BLOSUM are used to score alignments between evolutionarily divergent protein sequences. BLOSUM is based on local alignments. BLOSUM was first introduced in a paper by Henikoff and Henikoff. [cite journal| date=1992| journal=PNAS | volume=89 | pages=10915–10919| pmid=1438297 | title = Amino Acid Substitution Matrices from Protein Blocks | doi = 10.1073/pnas.89.22.10915 | url=http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=EBI&pubmedid=1438297 | author = Henikoff, S.] They scanned the BLOCKS database for very conserved regions of protein families (that do not have gaps in the sequence alignment) and then counted the relative frequencies of amino acids and their substitution probabilities. Then, they calculated a log-odds score for each of the 210 possible substitutions of the 20 standard amino acids. All BLOSUM are based on observed alignments; they are not extrapolated from comparisons of closely related proteins like the PAM Matrices.

Several sets of BLOSUM exist using different alignment databases, named with numbers. BLOSUM with high numbers are designed for comparing closely related sequences, while BLOSUM with low numbers are designed for comparing distant related sequences. For example, BLOSUM80 is used for less divergent alignments, and BLOSUM45 is used for more divergent alignments. The matrices were created by merging (clustering) all sequences that were more similar than a given percentage into one single sequence and then comparing those sequences (that were all more divergent than the given percentage value) only; thus reducing the contribution of closely related sequences. The percentage used was appended to the name, giving BLOSUM80 for example where sequences that were more than 80% identical were clustered.

Scores within a BLOSUM are log-odds scores that measure, in an alignment, the logarithm for the ratio of the likelihood of two amino acids appearing with a biological sense and the likelihood of the same amino acids appearing by chance.cite book | url=http://books.google.com/books?id=kDFltuQo1dMC&pg=PA673&lpg=PA673&dq=blosum+matrix&source=web&ots=LBo5qtEF60&sig=o-c0PVGPT_HPaPuw_EgTHTPRIEc#PPA697,M1 | title=Handbook of Nature-Inspired And Innovative Computing | isbn=0387405321 | author=Albert Y. Zomaya | date=2006page 673] The matrices are based on the minimum percentage identity of the aligned protein sequence used in calculating them. Every possible identity or substitution is assigned a score based on its observed frequences in the alignment of related proteins. [ [http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Scoring2.html NIH "Scoring Systems"] ] A positive score is given to the more likely substitutions while a negative score is given to the less likely substitutions.

To calculate a matrix for BLOSUM, the following equation is used: S_{ij}= left( frac{1}{lambda} ight)log{left( frac{p_{ij{q_i * q_j} ight)}

Here, p_{ij} is the probability of two amino acids i and j replacing each other in a homologous sequence, and q_i and q_j are the background probabilities of finding the amino acids i and j in any protein sequence at random. The factor lambda is a scaling factor, set such that the matrix contains easily computable integer values.

References

External links

* [http://helix.biology.mcmaster.ca/721/distance/node10.html Page on BLOSUM]
*
* [http://blocks.fhcrc.org/ BLOCKS WWW server]
* [http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Scoring2.html Scoring systems for BLAST at NCBI]
* [ftp://ftp.ncbi.nih.gov/blast/matrices/ Data files of BLOSUM on the NCBI FTP server] .

ee also

* Sequence alignment
* Point accepted mutation

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • BLOSUM — Saltar a navegación, búsqueda La matriz BLOSUM 62. BLOSUM (BLOcks of Amino Acid SUbstitution Matrix, o matriz de sustitución de bloques de aminoácidos) es una matriz de sustitución utilizada para el alineamiento de secuencias de proteínas. BLOSUM …   Wikipedia Español

  • BloSUM — матрикс блоков замещений. Матрикс замещений, в котором оценочный балл для каждой позиции определяется, исходя из частоты замещений в блоках локальных линейных структур родственных (или связанных) белков. Каждый матрикс выполнен в строгом… …   Генетика. Энциклопедический словарь

  • Blosum — Die BLOSUM62 Matrix BLOSUM (BLOcks SUbstitution Matrix[1]) ist eine evidenzbasierte Substitutionsmatrix, die für Sequenzalignment von Proteinen benutzt wird und spielt neben der Point Accepted Mutation Matrix (PAM Matrix) eine wichtige Rolle in… …   Deutsch Wikipedia

  • BLOSUM — Die BLOSUM62 Matrix BLOSUM (BLOcks SUbstitution Matrix[1]) ist eine evidenzbasierte Substitutionsmatrix, die für Sequenzalignment von Proteinen benutzt wird und spielt neben der Point Accepted Mutation Matrix (PAM Matrix) eine wichtige Rolle in… …   Deutsch Wikipedia

  • BLOSUM — block substitution matrix [searching algorithm] …   Medical dictionary

  • BLOSUM — • block substitution matrix [searching algorithm] …   Dictionary of medical acronyms & abbreviations

  • Blosum-Matrix — In der Bioinformatik beschreiben die Einträge in einer Substitutionsmatrix eine relative Rate, mit welcher im Laufe der Evolution eine Aminosäure in eine andere mutiert (für den Fall einer Protein Matrix). Dabei gibt der Eintrag aij die relative… …   Deutsch Wikipedia

  • Matriz de sustitución — Matriz PAM70 para 23 aminoácidos, calculada con el servicio web del Wageningen University Laboratory of Bioinformatic …   Wikipedia Español

  • Blocks Substitution Matrix — Die BLOSUM62 Matrix BLOSUM (BLOcks SUbstitution Matrix[1]) ist eine evidenzbasierte Substitutionsmatrix, die für Sequenzalignment von Proteinen benutzt wird und spielt neben der Point Accepted Mutation Matrix (PAM Matrix) eine wichtige Rolle in… …   Deutsch Wikipedia

  • Substitution matrix — In evolutionary biology, a substitution matrix describes the rate at which one character in a sequence changes to other character states over time. Substitution matrices are usually seen in the context of amino acid or DNA sequence alignments,… …   Wikipedia