Cdb (software)


Cdb (software)

cdb, short for "constant database", refers to both a library and data format created by Daniel J. Bernstein. cdb acts as an on-disk associative array, mapping keys to values, and allows multiple values to be stored for a single key. A constant database allows for only two operations: creation and reading. Both operations are designed to be very fast and highly reliable. Since the database does not change while it is in use, multiple processes can access a single database without locking. Additionally, since all modifications are actually the creation of a replacement database, it can take advantage of UNIX filesystem semantics to provide a guarantee of reliability.

Keys, values, and total database size have no arbitrary limits, though the use of 4-byte offsets restricts it to 4 gigabytes or smaller. cdb is used by djbdns, fastforward, mess822, qmail and ucspi-tcp to provide highly efficient, reliable, and simple data access.

tructure

A database contains an entire data set (e.g. a single associative array) in a single computer file. It consists of three parts: a fixed-size header, data, and a set of hash tables. Lookups are designed for exact keys only, though other types of searches could be performed by scanning the entire database. Lookups are performed using the following algorithm:

* Hash the key.
* Determine which hash table and slot this record should be located at.
* Test the indicated slot in the hash table.
** If the slot is empty, the record does not exist. Abort the search.
** If the slot's hash matches the key's hash, seek to the record. Read and compare the key. If it matches, the data has been found, so abort the search.
** The record is not in this slot. Proceed to the next slot, wrapping around to the beginning of the hash table if necessary.

For lookups of keys multiple values, additional values may be found by simply resuming the search at the next slot.

Format

All numbers -- offsets, lengths, and hash values -- are unsigned 32-bit integers, stored in little endian format. Keys and data are considered to be opaque byte strings, and have no special treatment.

The fixed-size header at the beginning of the database describes 256 hash tables by listing their position within the file and their length in slots. Data is stored as a series of records, each storing key length, data length, key, and data. There are no alignment or sorting rules. The records are followed by a set of 256 hash tables of varying lengths. Since zero is a valid length, there may be fewer than 256 hash tables physically stored in the database, but there are nonetheless considered to be 256 tables. Hash tables contain a series of slots, each of which contains a hash value and a record offset. "Empty slots" have an offset of zero.

Hashes are unsigned 32 bit integers, and start with a value of 5381. For each byte of the key, the current hash is multiplied by 33, then XOR'ed with the current byte of the key. Overflow bits are discarded. Slots and tables are trivially computed from hashes. The target table is simply the lowest eight bits of the hash (e.g. hash modulo 256), and the slot within the table is the remaining bits of the hash modulo the table length (e.g. hash divided by 256 modulo table length).

Library

The official cdb library code is public domain: the individual source files are marked as such, and are also available in the public domain djbdns package. However, the rest of the cdb package is license-free software, meaning it must be distributed verbatim. The unusual licensing and simplicity of the format has prompted others to re-implement the library and release it under more common terms, such as Michael Tokarev's [http://www.corpit.ru/mjt/tinycdb.html TinyCDB] library, available under the public domain.

Notably, the creator of cdb does not intend for cdb to be used as a shared library. This differs from virtually all similar databases, such as Berkeley DB.

External links

* [http://cr.yp.to/cdb.html cdb]
* [http://www.corpit.ru/mjt/tinycdb.html TinyCDB]
* [http://qdbm.sourceforge.net/benchmark.pdf QDBM benchmark comparing cdb against similar packages]


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • cdb (software) — For other uses, see CDB (disambiguation). cdb, short for constant database , refers to both a library and data format created by Daniel J. Bernstein. cdb acts as an on disk associative array, mapping keys to values, and allows multiple values to… …   Wikipedia

  • CDB — can refer to:In music: * CDB (band), an Australian band * Charlie Daniels Band, the band of American musician Charlie DanielsIn organizations: * Caribbean Development Bank, an international financial institution * China Development Bank, a… …   Wikipedia

  • Comparison of open source configuration management software — This is a comparison of free (libre) and open source configuration management software. Contents 1 Basic properties 2 Platform support 3 Short descriptions 4 Refere …   Wikipedia

  • Disk encryption software — To protect confidentiality of the data stored on a computer disk a computer security technique called disk encryption is used. This article discusses software that is used to implement the technique (for cryptographic aspects of the problem see… …   Wikipedia

  • Postfix (software) — This article is about the Mail Transfer Agent. For other uses, see Postfix (disambiguation). Postfix Developer(s) Wietse Venema and many others Stable release 2.8.7 / November 7, 2011; 13 days ago …   Wikipedia

  • Tablón de anuncios de los bibliotecarios/Portal/Archivo/Solicitudes de nueva consulta/Actual — Wikipedia:Tablón de anuncios de los bibliotecarios/Portal/Archivo/Solicitudes de nueva consulta/Actual Saltar a navegación, búsqueda Solicitudes de nueva consulta Esta sección del tablón de anuncios de los bibliotecarios sirve para solicitar una… …   Wikipedia Español

  • Xiangqi — Chinese chess redirects here. For other uses, see Chinese chess (disambiguation). Xiangqi Xiangqi board with pieces in their starting positions Genre(s) Board game Players 2 …   Wikipedia

  • Liste von Dateinamenserweiterungen/C — In dieser Liste sind übliche Dateinamenserweiterungen aufgelistet, die in einigen Betriebssystemen zur Unterscheidung von Dateiformaten verwendet werden. In anderen Betriebssystemen erfolgt die Dateitypenidentifikation hauptsächlich über den… …   Deutsch Wikipedia

  • Ctb — In dieser Liste sind übliche Dateinamenserweiterungen aufgelistet, die in einigen Betriebssystemen (wie zum Beispiel Microsoft Windows) zur Unterscheidung von Dateiformaten verwendet werden. In anderen Betriebssystemen erfolgt die… …   Deutsch Wikipedia

  • Liste der Dateiendungen/C — In dieser Liste sind übliche Dateinamenserweiterungen aufgelistet, die in einigen Betriebssystemen (wie zum Beispiel Microsoft Windows) zur Unterscheidung von Dateiformaten verwendet werden. In anderen Betriebssystemen erfolgt die… …   Deutsch Wikipedia