- String metric
**String metrics**(also known as**similarity metrics**) are a class of textual based metrics resulting in asimilarity or dissimilarity (distance ) score between two pairs of text strings for approximate matching or comparison and infuzzy string searching .For example the strings "Sam" and "Samuel" can be considered (although not the same) to a degree similar. A string metric provides a floating point number indicating an algorithm-specific indication of similarity.The most widely known (although rudimentary) string metric is

Levenshtein Distance (also known as Edit Distance), which operates between two input strings, returning a score equivalent to the number of substitutions and deletions needed in order to transform one input string into another. Simplistic string metrics such asLevenshtein distance have expanded to include phonetic,token , grammatical and character-based methods of statisticalcomparison s.A widespread example of a string metric is

DNA sequence analysis and RNA analysis, which are performed by optimised string metrics to identify matching sequences.String metrics are used heavily in

information integration and are currently used infraud detection ,fingerprint analysis ,plagiarism detection ,ontology merging ,DNA analysis , RNA analysis,image analysis , evidence-based machine learning,database deduplication ,data mining , Web interfaces, e.g. Ajax-style suggestions as you type,data integration , semanticknowledge integration , etc..**List of string metrics***

Hamming distance

*Levenshtein distance andDamerau–Levenshtein distance

*Needleman-Wunsch distance orSellers algorithm

*Smith-Waterman distance

**FASTA

**Blast

*Gotoh distance or Smith-Waterman-Gotoh distance

**Monge Elkan distance

*Block distance orL1 distance orCity block distance

*Jaro distance metric

*Jaro-Winkler

*Soundex distance metric

*Matching coefficient

*Dice's coefficient

*Jaccard similarity orJaccard coefficient orTanimoto coefficient

*Overlap coefficient

*Euclidean distance orL2 distance

*Cosine similarity

*Variational distance

*Hellinger distance orBhattacharyya distance

*Information radius (Jensen-Shannon divergence )

*Harmonic mean

*Skew divergence

*Confusion probability

*Tau metric , an approximation of theKullback-Leibler divergence

*Fellegi and Sunters metric (SFS)

*TFIDF or TF/IDF

*Maximal matches

*Monge-Elkan **ee also***

String matching

*SimMetrics - an implementation

* [*http://www.speech.cs.cmu.edu/ Carnegy Mellon University open source library*]**External links***http://www.dcs.shef.ac.uk/~sam/stringmetrics.html A fairly complete overview

*Wikimedia Foundation.
2010.*

### См. также в других словарях:

**String (computer science)**— In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet. In computer programming, a string is traditionally a sequence of… … Wikipedia**String art**— String art, created with thread and paper. A string art representing a projection of the 8 dimen … Wikipedia**String theory**— This article is about the branch of theoretical physics. For other uses, see String theory (disambiguation). String theory … Wikipedia**String frame and Einstein frame**— In string theory, there are two different definitions of the metric tensor, both related by a Weyl rescaling given by the dilaton. One metric gives the Einstein frame and the other gives the string frame … Wikipedia**String Quartet No. 11 (Beethoven)**— Ludwig van Beethoven s opus 95, his String Quartet No. 11 in F minor, is his last before his exalted late string quartets. It is commonly referred to as the Serioso, stemming from his title Quartett [o] Serioso at the beginning and the tempo… … Wikipedia**String Quartet No. 1 (Carter)**— The First String Quartet by American composer Elliott Carter (1908 ) was written during a year spent in the Arizona desert from 1950 51. To some extent, it can be said that this was his first major breakthrough work as a composer.A primary… … Wikipedia**Fuzzy string searching**— Approximate string search is the name that is used for a category of techniques for finding strings that approximately match some given pattern string. It may also be known as approximate or inexact matching. Approximate string searching has two… … Wikipedia**Mirror symmetry (string theory)**— In physics and mathematics, mirror symmetry is a relation that can exist between two Calabi Yau manifolds. It happens, usually for two such six dimensional manifolds, that the shapes may look very different geometrically, but nevertheless they… … Wikipedia**Damerau–Levenshtein distance**— In information theory and computer science, the Damerau–Levenshtein distance (named after Frederick J. Damerau and Vladimir I. Levenshtein) is a distance (string metric) between two strings, i.e., finite sequence of symbols, given by counting the … Wikipedia**Levenshtein distance**— In information theory and computer science, the Levenshtein distance is a string metric for measuring the amount of difference between two sequences. The term edit distance is often used to refer specifically to Levenshtein distance. The… … Wikipedia