Incompressible string

Incompressible string

An incompressible string is one that cannot be compressed because it lacks sufficient repeating sequences. Whether a string is compressible will often depend on the algorithm being used. Some examples may illuminate this.

Suppose we have the string 12349999123499991234, and we are using a compression method that works by putting a special character into the string (say '@') followed by a value that points to an entry in a lookup table (or dictionary) of repeating values. Let's imagine we have an algorithm that examines the string in 4 character chunks. Looking at our string, our algorithm might pick out the values 1234 and 9999 to place into its dictionary. Let's say 1234 is entry 0 and 9999 is entry 1. Now the string can become:

@0@1@0@1

Obviously, this is much shorter, although storing the dictionary itself will cost some space. However, the more repeats there are in the string, the better the compression will be.

Our algorithm can do better though, if it can view the string in chunks larger than 4 characters. Then it can put 12349999 and 1234 into the dictionary, giving us:

@0@0@1

Even shorter! Now let's consider another string:

1234999988884321

This string is incompressible by our algorithm. The only repeats that occur, are 88 and 99. If we were to store 88 and 99 in our dictionary, we would produce:

1234@1@1@0@04321

Unfortunately this is just as long as the original string, because our placeholders for items in the dictionary are 2 bytes long, and the items they replace are the same length.

Hence, this string is incompressible by our algorithm.


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

  • String (computer science) — In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet. In computer programming, a string is traditionally a sequence of… …   Wikipedia

  • Kolmogorov complexity — In algorithmic information theory (a subfield of computer science), the Kolmogorov complexity of an object, such as a piece of text, is a measure of the computational resources needed to specify the object. It is named after Soviet Russian… …   Wikipedia

  • List of terms relating to algorithms and data structures — The [http://www.nist.gov/dads/ NIST Dictionary of Algorithms and Data Structures] is a reference work maintained by the U.S. National Institute of Standards and Technology. It defines a large number of terms relating to algorithms and data… …   Wikipedia

  • Список терминов, относящихся к алгоритмам и структурам данных —   Это служебный список статей, созданный для координации работ по развитию темы.   Данное предупреждение не устанавливается на информационные списки и глоссарии …   Википедия

  • Список терминов — Список терминов, относящихся к алгоритмам и структурам данных   Это сл …   Википедия

  • Algorithmically random sequence — Intuitively, an algorithmically random sequence (or random sequence) is an infinite sequence of binary digits that appears random to any algorithm. The definition applies equally well to sequences on any finite set of characters. Random sequences …   Wikipedia

  • physical science, principles of — Introduction       the procedures and concepts employed by those who study the inorganic world.        physical science, like all the natural sciences, is concerned with describing and relating to one another those experiences of the surrounding… …   Universalium

  • M-theory — For a generally accessible and less technical introduction to the topic, see Introduction to M theory. String theory …   Wikipedia

  • Orbifold — This terminology should not be blamed on me. It was obtained by a democratic process in my course of 1976 77. An orbifold is something with many folds; unfortunately, the word “manifold” already has a different definition. I tried “foldamani”,… …   Wikipedia

  • Normal number — For the floating point meaning in computing, see normal number (computing). In mathematics, a normal number is a real number whose infinite sequence of digits in every base b[1] is distributed uniformly in the sense that each of the b digit… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”