Bitmap index


Bitmap index

A bitmap index is a special kind of database index.

Bitmap indexes have traditionally been considered to work well for data such as gender, which has a small number of distinct values, e.g., male and female, but many occurrences of those values. This would happen if, for example, you had gender data for each resident in a city. However, some researchers argue that Bitmap indexes are also useful for unique valued data which is not updated frequently. [http://www.oracle.com/technology/pub/articles/sharma_indexes.html Bitmap Index vs. B-tree Index: Which and When?] , Vivek Sharma, Oracle Technical Network.] Bitmap indexes have a significant space and performance advantage over other structures for such data. Bitmap indexes use bit arrays (commonly called "bitmaps") and answer queries by performing bitwise logical operations on these bitmaps.

Bitmap indexes are also useful in the data warehousing applications for joining a large fact table to smaller dimension tables [http://www.dwoptimize.com/2007/06/101010-answer-to-life-universe-and.html] such as those arranged in a star schema. In other scenarios, a B-tree index would be more appropriate.

Example

Continuing the gender example, a bitmap index may be logically viewed as follows:

On the left, "identifier" refers to the unique number assigned to each customer, "gender" is the data to be indexed, the content of the bitmap index is shown as two columns under the heading "bitmaps". Each column in the above illustration is a "bitmap" in the bitmap index. In this case, there are two such bitmaps, one for gender "Female" and one for gender "Male". It is easy to see that each bit in bitmap "M" shows whether a particular row refers to a male. This is the simplest form of bitmap index. Most columns will have more distinct values. For example, the sales amount is likely to have a much larger number of distinct values. Variations on the bitmap index can effectively index this data as well. We briefly review three such variations.

Note: many of the references cited here are reviewed at. [cite web |author=John Wu |date=2007 |title=Annotated References on Bitmap Index |url=http://www.cs.umn.edu/~kewu/annotated.html ] For those who might be interested in experimenting with some of the ideas mentioned here, many of them are implemented in an open source software called [https://codeforge.lbl.gov/projects/fastbit/ FastBit] or in the [http://code.google.com/p/lemurbitmapindex/ Lemur Bitmap Index C++ Library] .

Compression

Software can compress each bitmap in a bitmap index to save space. There has been considerable amount of work on this subject. [cite web |author=T. Johnson |date=1999 |title=Performance Measurements of Compressed Bitmap Indices | booktitle=VLDB |pages=278–289 | url=http://www.informatik.uni-trier.de/~ley/db/conf/vldb/Johnson99.html] [cite web |author=K. Wu, E. Otoo and A. Shoshani | title=On the performance of bitmap indices for high cardinality attributes | date=2004 | url=http://www.osti.gov/energycitations/product.biblio.jsp?osti_id=822860] Bitmap compression algorithms typically employ run-length encoding, such as the Byte-aligned Bitmap Code ( [http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=5363098.PN.&OS=PN/5363098&RS=PN/5363098 BBC] ) and Word-Aligned Hybrid code ( [http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=6831575.PN.&OS=PN/6831575&RS=PN/6831575 WAH] ). These compression methods require very little effort to compress and decompress. More importantly, bitmaps compressed with BBC and WAH can directly participate in bitwise operations without decompression. This gives them considerable advantages over generic compression techniques such as LZ77. BBC compression and its derivatives are used in a commercial database system. BBC was shown by Johnson (1999) to be effective in both reducing index sizes and maintaining query performance. BBC encodes the bitmaps in bytes, while WAH encodes in words, better matching current CPUs. According to a published report, "On both synthetic data and real application data, the new word aligned schemes use only 50% more space, but perform logical operations on compressed data 12 times faster than BBC." [cite web |author=K. Wu, E. Otoo and A. Shoshani |title=A Performance Comparison of bitmap indexes | date=2001 |url=http://doi.acm.org/10.1145/502585.502689 ]

The performance of schemes such as BBC and WAH is dependent on the order of the rows. A simple lexicographical sort can divide the index size by 9 and make indexes several times faster. [cite web|author=O. Kaser, D. Lemire, K. Aouiche |title=Histogram-Aware Sorting for Enhanced Word-Aligned Compression in Bitmap Indexes | date=2008 |url=http://arxiv.org/abs/0808.2083 ] The larger the table, the more important it is to sort the rows.

Encoding

Basic bitmap indexes use one bitmap for each distinct value. It is possible to reduce the number of bitmaps used by using a different encoding method. [cite web|title=Bitmap index design and evaluation | author=C.-Y. Chan and Y. E. Ioannidis | year=1998 | booktitle=SIGMOD | url=http://doi.acm.org/10.1145/276304.276336 ] [cite web|title=An efficient bitmap encoding scheme for selection queries | author=C.-Y. Chan and Y. E. Ioannidis | year=1999 | booktitle=SIGMOD | url=http://doi.acm.org/10.1145/304182.304201 ] For example, it is possible to encode C distinct values using log(C) bitmaps with binary encoding. [ cite web |author = P. E. O'Neil and D. Quass| title = Improved Query Performance with Variant Indexes | booktitle = SIGMOD | date= 1997 | url=http://doi.acm.org/10.1145/253260.253268 ] This reduces the number of bitmaps, further saving space, but to answer any query, most of the bitmaps have to be accessed. This makes it potentially not as effective as scanning a vertical projection of the base data, also known as a materialized view or projection index. Finding the optimal encoding method that balances query performance and index size remains an interesting challenge.

Without considering compression, Chan and Ioannidis analyzed a class of multi-component encoding methods and came to the conclusion that two-component encoding sits at the kink of the performance vs. index size curve and therefore represents the best trade-off between index size and query performance.

Binning

For high-cardinality columns, it is useful to bin the values, where each bin covers multiple values and build the bitmaps to represent the values in each bin. This approach reduces the number of bitmaps used regardless of encoding method. [cite web |title = Space efficient bitmap indexing| booktitle= CIKM | date=2000 | author =N. Koudas | url=http://doi.acm.org/10.1145/354756.354819 ] However, binned indexes can only answer some queries without examining the base data. For example, if a bin covers the range from 0.1 to 0.2, then when the user asks for all values less than 0.15, all rows that fall in the bin are possible hits and have to be checked to verify whether they are actually less than 0.15. The process of checking the base data is known as the candidate check. In most cases, the time used by the candidate check is significantly longer than the time needed to work with the bitmap index. Therefore, binned indexes exhibit irregular performance. They can be very fast for some queries, but much slower if the query doesn't exactly match a bin.

History

The first commercial database product to implement a bitmap index is Computer Corporation of America's Model 204. Professor [http://www.cs.umb.edu/~poneil/ Patrick O'Neil] implemented the bitmap index around 1982.cite conference | last = O'Neil | first = Patrick | title = Model 204 Architecture and Performance | booktitle = Proceedings of the 2nd International Workshop on High Performance Transaction Systems | pages = 40 - 59 | publisher = Springer-Verlag | date = 1987 | url=http://portal.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&id=658338] This implementation is a hybrid between the basic bitmap index (without compression) and the list of Row Identifiers (RID-list). Overall, the index is organized as a B+tree. When the column cardinality is low, each leaf node of the B-tree would contain long list of RIDs. In this case, it requires less space to represent the RID-lists as bitmaps. Since each bitmap represents one distinct value, this is the basic bitmap index. As the column cardinality increases, each bitmap becomes sparse and it may take more disk space to store the bitmaps than to store the same content as RID-lists. In this case, it switches to use the RID-lists, which makes it a B+tree index. [cite conference | title=Bit-sliced index arithmetic | bootitle= SIGMOD | date = 2001 | author = D. Rinfret, P. O'Neil and E. O'Neil | url=http://doi.acm.org/10.1145/375663.375669 ] [cite conference | author = E. O'Neil, P. O'Neil and K. Wu | title = Bitmap Index Design Choices and Their Performance Implications | booktitle = IDEAS | date = 2007 | url=http://crd.lbl.gov/%7Ekewu/ps/LBNL-62756.html ]

In-memory bitmaps

One of the strongest reasons for using bitmap indexes is that the intermediate results produced from them are also bitmaps and can be efficiently combined to answer more complex queries. For this reason, some database systems that do not offer persistent bitmap indexes use bitmaps internally to speed up query processing. For example, PostgreSQL versions 8.1 and later implement a "bitmap index scan" optimization to speed up arbitrarily complex logical operations between available indexes on a single table. For a table with "n" columns, the total number of distinct indexes to satisfy all possible queries is "n!". A bitmap index scan combines expressions on different indexes, thus requiring only one index per column to support all possible queries on a table.

In this approach, a temporary in-memory bitmap is created with one bit for each row in the table (1 MiB can thus store over 8 million entries). Next, the results from each index are combined into the bitmap using bitwise operations. After all conditions are evaluated, the bitmap contains a "1" for rows that matched the expression. Finally, the bitmap is traversed and matching rows are retrieved. In addition to efficiently combining indexes, this also improves locality of reference of table accesses, because all rows are fetched sequentially. [cite web |author=Tom Lane |date=2005-12-26 |title=Re: Bitmap indexes etc. |publisher=PostgreSQL mailing lists |url=http://archives.postgresql.org/pgsql-performance/2005-12/msg00623.php |accessdate=2007-04-06 ] The internal bitmap is discarded after the query. If there are too many rows in the table to use 1 bit per row, a "lossy" bitmap is created instead, with a single bit per disk page. In this case, the bitmap is just used to determine which pages to fetch; the filter criteria are then applied to all rows in matching pages.

References

*O'Connell, S. (2005) "Advanced Databases Course Notes", Southampton, University of Southampton.
*O'Neil, P. and O'Neil, E. (2001) "Database Principles, Programming, and Performance", Morgan Kaufmann Publishers.


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Bitmap-Index — Ein Bitmap Index ist ein Datenbankindex, der dazu dient, mehrdimensionale Daten effizient zu indizieren. Auf Grund seiner Eigenschaften findet der Bitmap Index vor allem bei Data Warehouses Einsatz. Die Bezeichnung rührt daher, dass der Bitmap… …   Deutsch Wikipedia

  • Bitmap index — Ein Bitmapindex ist ein Datenbankindex, der dazu dient, mehrdimensionale Daten effizient zu indizieren. Auf Grund seiner Eigenschaften findet der Bitmapindex vor allem bei Data Warehouses Einsatz. Die Bezeichnung rührt daher, dass der Bitmapindex …   Deutsch Wikipedia

  • Index (database) — A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of slower writes and increased storage space. Indexes can be created using one or more columns of a database table,… …   Wikipedia

  • Index (base de donnees) — Index (base de données) En informatique, et en particulier dans le contexte des bases de données, un index est un élément de redondance que l on va spécifier pour permettre au Système de Gestion de Base de Données d optimiser certaines requêtes.… …   Wikipédia en Français

  • Bitmap Brothers — The Bitmap Brothers war ein Entwicklerstudio für Computerspiele aus Großbritannien. Das Entwicklerstudio programmierte mehrere Spiele für den Amiga und den Atari ST und gehörte zu den erfolgreichsten Entwicklern für diese Plattformen. Ihre… …   Deutsch Wikipedia

  • Index (base de données) — En informatique, dans les bases de données, un index est une structure de données utilisée et entretenue par le système de gestion de base de données (SGBD) pour lui permettre de retrouver rapidement les données. L utilisation d un index… …   Wikipédia en Français

  • BitMaP — Windows bitmap Pour les articles homonymes, voir BMP. Windows bitmap Extension de fichier .bmp Type MIME image/x ms bmp (non officiel) Type de format …   Wikipédia en Français

  • Windows Bitmap — Vorlage:Infobox Dateiformat/Wartung/Standard fehltVorlage:Infobox Dateiformat/Wartung/Website fehlt Windows Bitmap Dateiendung: .bmp, .dib MIME Type: image/x ms bmp, image/x bmp, image/bmp …   Deutsch Wikipedia

  • Windows/OS2 Bitmap — Windows Bitmap („BMP“) oder device independent bitmap (DIB) ist ein zweidimensionales Rastergrafikformat, das für die Betriebssysteme Microsoft Windows und OS/2 entwickelt und mit Windows 3.0 eingeführt wurde. Die Dateiendung ist .bmp, seltener… …   Deutsch Wikipedia

  • Windows bitmap — („BMP“) oder device independent bitmap (DIB) ist ein zweidimensionales Rastergrafikformat, das für die Betriebssysteme Microsoft Windows und OS/2 entwickelt und mit Windows 3.0 eingeführt wurde. Die Dateiendung ist .bmp, seltener .dib.… …   Deutsch Wikipedia