String interning

String interning

In computer science, string interning is a method of storing only one copy of each distinct string value, which must be immutable. Interning strings makes some string processing tasks more time- or space-efficient at the cost of requiring more time when the string is created or interned. The distinct values are stored in a string intern pool.

String interning is supported by some modern object-oriented programming languages, including Python, Java and .NET languages.

The single copy of each string is called its 'intern' and is typically looked up by a method of the string class, for example Javadoc:SE|java/lang|String|intern() in Java.

Interned strings speed up string comparisons, which are sometimes a performance bottleneck in applications (such as compilers and dynamic programming language runtimes) that rely heavily on hash tables with string keys. Without interning, checking that two different strings are equal involves examining every character of both strings. This is slow for several reasons: it is inherently O(n) in the length of the strings; it typically requires reads from several regions of memory, which take time; and the reads fills up the processor cache, meaning there is less cache available for other needs. With interned strings, a simple object identity test suffices after the original intern operation; this is typically implemented a pointer equality test, normally just a single machine instruction with no memory reference at all.

String interning also reduces memory usage if there are many instances of the same string value; for instance, it is read from a network or from storage). Such strings may include magic numbers or network protocol information.

The intern pool in other programming languages

Even if the standard library of a programming language does not support string interning, it is possible to implement an intern pool in user code.

An example of such an implementation follows, in the C++ programming language. Note that this implementation returns pointers to arrays of characters, not String objects.

class StringTable {

char** array; int* count; int size; int numStrings;

public:

StringTable() { size = 20; array = new char* [size] ; count = new int [size] ; numStrings = 0; }

int addString(const char* ch) { // this method adds a String to the pool and returns its index

if (numStrings = 0) { // if there is no string in the pool, add first array [0] = new char [strlen(ch)+1] ; strcpy(array [0] , ch); numStrings++; return 0; }

int i = 0;

for (i; i < numStrings; i++) { // compares, if the String is already in the pool if (strcmp(ch, array [i] ) = 0) { count [i] ++; return i; } }

if (i = size) { return -1; } array [i] = new char [strlen(ch)+1] ; // if not, add String to the pool and return its index strcpy(array [i] , ch); numStrings++; return numStrings-1; }

char* getString(int index) const {

return array [index] ; }

};

class String {

int length; char* text; int index;

public:

static StringTable* table;

String(const char* ch) { length = strlen(ch); text = new char [length+1] ; strcpy(text,ch); index = table->addString(text); }

char* intern() { // return pointer to the String in pool if (index = -1) return 0; return table->getString(index); }

};

StringTable* String::table = new StringTable();

History

Lisp introduced the notion of interned strings for its atomic symbols or atoms. The data structure used as a string intern pool was called an 'oblist' (when it was implemented as a linked list) or an 'obarray' (when it was implemented as an array.

Other links

* [http://javatechniques.com/public/java/docs/basics/string-equality.html Java String equality and interning]
* [http://msdn2.microsoft.com/en-us/library/ms177906.aspx Visual J# String class]
* [http://msdn2.microsoft.com/en-us/library/system.string.intern.aspx .NET String Class]


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • Hash table — Not to be confused with Hash list or Hash tree. Unordered map redirects here. For the proposed C++ class, see unordered map (C++). Hash Table Type unsorted dictionary Invented 1953 Time complexity in big O notation Average Worst case Space …   Wikipedia

  • Immutable object — Immutable and Immutability redirect here. For the Christian doctrine, see Immutability (theology). In object oriented and functional programming, an immutable object is an object whose state cannot be modified after it is created.[1] This is in… …   Wikipedia

  • Objet immuable — Un objet immuable, en programmation orientée objet et fonctionnelle, est un objet dont l état ne peut pas être modifié après sa création. Ce concept est à contraster avec celui d objet variable. Sommaire 1 Généralités 2 Dans les langages de… …   Wikipédia en Français

  • Chandra Levy — Senior portrait of Chandra Levy …   Wikipedia

  • Virtual method table — A virtual method table, virtual function table, dispatch table, or vtable, is a mechanism used in programming language to support dynamic dispatch (or run time method binding).Suppose a program contains several classes in an inheritance hierarchy …   Wikipedia

  • St. Xavier High School (Cincinnati) — Saint Xavier High School Academia Sancti Xaverii Cincinnatensis School seal Vidit Mirabilia Magna; Ad Majorem Dei Gloriam He has seen great wonders; For the Greater Glory of God …   Wikipedia

  • Colin Kaepernick — in 2010. No. 7     San Francisco 49ers Quarterback Personal informati …   Wikipedia

  • Intern (disambiguation) — Intern may refer to:* An intern, one who works in a temporary position with an emphasis on on the job training rather than merely employment ** For uses in medicine, see internship (medicine) and residency (medicine)* For the film, see The Intern …   Wikipedia

  • List of past General Hospital characters — The following is a list of notable past characters from the soap opera, General Hospital who are not notable enough for their own articles. Characters are listed based on the decade in which they first appeared. Contents 1 2010 present 1.1 Warren …   Wikipedia

  • Doug Fenske — Birth name Albert Douglas Fenske, J.R. Born January 11, 1982 (1982 01 11) (age 29) Harvey, Illinois, United States Origin Richton Park, IL …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”