Hash tree

Hash tree
A binary hash tree

In cryptography and computer science Hash trees or Merkle trees are a type of data structure[citation needed] which contains a tree of summary information about a larger piece of data – for instance a file – used to verify its contents. Hash trees are a combination of hash lists and hash chaining, which in turn are extensions of hashing. Hash trees in which the underlying hash function is Tiger are often called Tiger trees or Tiger tree hashes.



Hash trees can be used to verify any kind of data stored, handled and transferred in and between computers. Currently the main use of hash trees is to make sure that data blocks received from other peers in a peer-to-peer network are received undamaged and unaltered, and even to check that the other peers do not lie and send fake blocks. Suggestions have been made to use hash trees in trusted computing systems. Sun Microsystems has used Hash Trees in the ZFS filesystem.[1] Hash Trees are used in Google Wave protocol,[2] Git distributed revision control system, the Tahoe-LAFS and tarsnap backup systems, and a number of NoSQL systems like Apache Cassandra & Riak[3]

Hash trees were invented in 1979 by Ralph Merkle.[4] The original purpose was to make it possible to efficiently handle many Lamport one-time signatures. Lamport signatures are believed to still be secure in the event that quantum computers become reality. Unfortunately each Lamport key can only be used to sign a single message. But combined with hash trees they can be used for many messages and then become a fairly efficient digital signature scheme.

How hash trees work

A hash tree is a tree of hashes in which the leaves are hashes of data blocks in, for instance, a file or set of files. Nodes further up in the tree are the hashes of their respective children. For example, in the picture hash 0 is the result of hashing hash 0-0 and then hash 0-1. That is, hash 0 = hash( hash 0-0 || hash 0-1 ) where || denotes concatenation.

Most hash tree implementations are binary (two child nodes under each node) but they can just as well use many more child nodes under each node.

Usually, a cryptographic hash function such as SHA-1, Whirlpool, or Tiger is used for the hashing. If the hash tree only needs to protect against unintentional damage, much less secure checksums such as CRCs can be used.

In the top of a hash tree there is a top hash (or root hash or master hash). Before downloading a file on a p2p network, in most cases the top hash is acquired from a trusted source, for instance a friend or a web site that is known to have good recommendations of files to download. When the top hash is available, the hash tree can be received from any non-trusted source, like any peer in the p2p network. Then, the received hash tree is checked against the trusted top hash, and if the hash tree is damaged or fake, another hash tree from another source will be tried until the program finds one that matches the top hash.

The main difference from a hash list is that one branch of the hash tree can be downloaded at a time and the integrity of each branch can be checked immediately, even though the whole tree is not available yet. For example, in the picture the integrity of data block 001 can be verified immediately if the tree already contains hash 0-0 and hash 1 by hashing the data block and iteratively combining the result with hash 0-0 and then hash 1 and finally comparing the result with the top hash. Similarly, the integrity of data block 002 can be verified if the tree already has hash 1-1 and hash 0. This can be an advantage since it is efficient to split files up in very small data blocks so that only small blocks have to be redownloaded if they get damaged. If the hashed file is very big, such a hash tree or hash list becomes fairly big. But if it is a tree, one small branch can be downloaded quickly, the integrity of the branch can be checked, and then the downloading of data blocks can start.

There are several additional tricks, benefits and details regarding hash trees. See the references and external links below for more in-depth information.

Tiger tree hash

The Tiger tree hash is a widely used form of hash tree. It uses a binary hash tree (two child nodes under each node), usually has a data block size of 1024-bytes and uses the cryptographically secure Tiger hash.

Tiger tree hashes are used in the Gnutella, Gnutella2, and Direct Connect P2P file sharing protocols and in file sharing applications such as Phex, BearShare, LimeWire, Shareaza, DC++[5] and Valknut.[citation needed]

See also


  1. ^ Jeff Bonwick's Blog ZFS End-to-End Data Integrity
  2. ^ Google Wave Federation Protocol Wave Protocol Verification Paper
  3. ^ "When a replica is down for an extended period of time, or the machine storing hinted handoffs for an unavailable replica goes down as well, replicas must synchronize from one-another. In this case, Cassandra and Riak implement a Dynamo-inspired process called anti-entropy. In anti-entropy, replicas exchange Merkle Trees to identify parts of their replicated key ranges which are out of sync. A Merkle tree is a hierarchical hash verification: if the hash over the entire keyspace is not the same between two replicas, they will exchange hashes of smaller and smaller portions of the replicated keyspace until the out-of-sync keys are identified. This approach reduces unnecessary data transfer between replicas which contain mostly similar data." http://www.aosabook.org/en/nosql.html
  4. ^ R. C. Merkle, A digital signature based on a conventional encryption function, Crypto '87
  5. ^ "DC++'s feature list"

External links

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Prefix hash tree — A prefix hash tree (PHT) is a distributed data structure that enables more sophisticated queries over a distributed hash table (DHT). The prefix hash tree uses the lookup interface of a DHT to construct a trie based data structure that is both… …   Wikipedia

  • Hash-Baum — Ein binärer Hash Baum In der Kryptographie und Informatik ist ein Hash Baum (engl. hash tree oder merkle tree) eine Datenstruktur, die einen Baum aus Hashwerten von Datenblöcken bildet, beispielsweise von einer Datei. Hash Bäume sind eine… …   Deutsch Wikipedia

  • Hash table — Not to be confused with Hash list or Hash tree. Unordered map redirects here. For the proposed C++ class, see unordered map (C++). Hash Table Type unsorted dictionary Invented 1953 Time complexity in big O notation Average Worst case Space …   Wikipedia

  • Hash list — In computer science, a hash list is typically a list of hashes of the data blocks in a file or set of files. Lists of hashes are used for many different purposes, such as fast table lookup (hash tables) and distributed databases (distributed hash …   Wikipedia

  • Hash function — A hash function is any well defined procedure or mathematical function for turning some kind of data into a relatively small integer, that may serve as an index into an array. The values returned by a hash function are called hash values, hash… …   Wikipedia

  • Tree (data structure) — A simple unordered tree; in this diagram, the node labeled 7 has two children, labeled 2 and 6, and one parent, labeled 2. The root node, at the top, has no parent. In computer science, a tree is a widely used data structure that emulates a… …   Wikipedia

  • Hash trie — In computer science, hash trie refers to two kinds of data structure:* A space efficient implementation of a sparse trie, in which the descendants of each node may be interleaved in memory. (The name is suggested by a similarity to a closed hash… …   Wikipedia

  • Hash-Algorithmus — Eine Hashfunktion oder Streuwertfunktion ist eine Funktion bzw. Abbildung, die zu einer Eingabe aus einer üblicherweise großen Quellmenge eine Ausgabe, den Hashcode, erzeugt, meist aus einer kleineren Zielmenge. Die Hashwerte beziehungsweise… …   Deutsch Wikipedia

  • Hash-Funktion — Eine Hashfunktion oder Streuwertfunktion ist eine Funktion bzw. Abbildung, die zu einer Eingabe aus einer üblicherweise großen Quellmenge eine Ausgabe, den Hashcode, erzeugt, meist aus einer kleineren Zielmenge. Die Hashwerte beziehungsweise… …   Deutsch Wikipedia

  • Hash-Wert — Eine Hashfunktion oder Streuwertfunktion ist eine Funktion bzw. Abbildung, die zu einer Eingabe aus einer üblicherweise großen Quellmenge eine Ausgabe, den Hashcode, erzeugt, meist aus einer kleineren Zielmenge. Die Hashwerte beziehungsweise… …   Deutsch Wikipedia