Directed acyclic word graph

Directed acyclic word graph
The strings "tap", "taps", "top", and "tops" stored in a Trie (left) and a DAWG (right), EOW stands for End-of-word.

In computer science, a directed acyclic word graph (sometimes abbreviated as DAWG) is a data structure that represents a set of strings, and allows for a query operation that tests whether a given string belongs to the set in time proportional to its length. In these respects, a DAWG is very similar to a trie, but it is much more space efficient.

A DAWG is represented as a directed acyclic graph with a single source vertex (a vertex with no incoming edges), in which each edge of the graph is labeled by a letter, symbol, or special end-of-string marker, and in which each vertex has at most one outgoing edge for each possible letter or symbol. The strings represented by the DAWG are formed by the symbols on paths in the DAWG from the source vertex to any sink vertex (a vertex with no outgoing edges). A DAWG can also be interpreted as an acyclic finite automaton that accepts the words that are stored in the DAWG.

Thus, a trie (a rooted tree with the same properties of having edges labeled by symbols and strings formed by root-to-leaf paths) is a special kind of DAWG. However, by allowing the same vertices to be reached by multiple paths, a DAWG may use significantly fewer vertices than a trie. Consider, for example, the four English words "tap", "taps", "top", and "tops". A trie for those four words would have 11 vertices, one for each of the strings formed as a prefix of one of these words, or for one of the words followed by the end-of-string marker. However, a DAWG can represent these same four words using only six vertices vi for 0 ≤ i ≤ 5, and the following edges: an edge from v0 to v1 labeled "t", two edges from v1 to v2 labeled "a" and "o", an edge from v2 to v3 labeled "p", an edge v3 to v4 labeled "s", and edges from v3 and v4 to v5 labeled with the end-of-string marker.

The primary difference between DAWG and trie is the elimination of suffix redundancy in storing strings. The trie eliminates prefix redundancy since all common prefixes are shared between strings, such as between doctors and doctorate the doctor prefix is shared. In a DAWG common suffixes are also shared, such as between desertion and destruction both the prefix des- and suffix -tion are shared. For dictionary sets of common English words, this translates into major memory usage reduction.

Because the terminal nodes of a DAWG can be reached by multiple paths, a DAWG cannot directly store auxiliary information relating to each path, e.g. a word's frequency in the English language. However, if at each node we store a count of the number of unique paths through the structure from that point, we can use it to retrieve the index of a word, or a word given its index.[1] The auxiliary information can then be stored in an array.

See also

  • GADDAG

References

  1. ^ Kowaltowski, T.; CL. Lucchesi (1993), "Applications of finite automata representing large vocabularies", Software-Practice and Experience 1993: 15–30, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.56.5272&rep=rep1&type=pdf. 

External links


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

  • directed acyclic word graph — noun A data structure that represents a set of strings and allows for a query operation that tests whether a given string belongs to the set in time proportional to its length (thus more efficient in some situations than a trie). Syn: DAWG …   Wiktionary

  • Directed acyclic graph — An example of a directed acyclic graph In mathematics and computer science, a directed acyclic graph (DAG i …   Wikipedia

  • Glossary of graph theory — Graph theory is a growing area in mathematical research, and has a large specialized vocabulary. Some authors use the same word with different meanings. Some authors use different words to mean the same thing. This page attempts to keep up with… …   Wikipedia

  • DAWG — Directed Acyclic Word Graph …   Acronyms

  • DAWG — Directed Acyclic Word Graph …   Acronyms von A bis Z

  • Направленный ациклический граф — или ориентированный ациклический граф (англ. directed acyclic graph, сокращённо англ.  …   Википедия

  • List of terms relating to algorithms and data structures — The [http://www.nist.gov/dads/ NIST Dictionary of Algorithms and Data Structures] is a reference work maintained by the U.S. National Institute of Standards and Technology. It defines a large number of terms relating to algorithms and data… …   Wikipedia

  • Список терминов, относящихся к алгоритмам и структурам данных —   Это служебный список статей, созданный для координации работ по развитию темы.   Данное предупреждение не устанавливается на информационные списки и глоссарии …   Википедия

  • Список терминов — Список терминов, относящихся к алгоритмам и структурам данных   Это сл …   Википедия

  • Trie — A trie for keys A , to , tea , ted , ten , i , in , and inn . In computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”