Query expansion

Query expansion

Query expansion (QE) is the process of reformulating a seed query to improve retrieval performance in information retrieval operations. [cite journal
last = Vectomova | first = Olga | coauthors = Wang, Ying | year = 2006
title = A study of the effect of term proximity on query expansion | journal = Journal of Information Science
volume = 32 | issue = 4 | pages = 324–333
doi = 10.1177/0165551506065787 | id = | url = http://jis.sagepub.com/cgi/content/abstract/32/4/324
format = Abstract | accessdate = 2006-12-09
] In the context of web search engines, query expansion involves evaluating a user's input (what words were typed into the search query area, and sometimes other types of data) and expanding the search query to match additional documents. Query expansion involves techniques such as:

* Finding synonyms of words, and searching for the synonyms as well
* Finding all the various morphological forms of words by stemming each word in the search query
* Fixing spelling errors and automatically searching for the corrected form or suggesting it in the results
* Re-weighting the terms in the original query

Query expansion is a technology studied in the field of computer science, particularly within the realm of natural language processing and information retrieval.

Precision and recall tradeoffs

Search engines invoke query expansion to increase the quality of user search results. It is assumed that users do not always formulate search queries using the best terms. Best in this case may be because the database does not contain the user entered terms.

By stemming a user-entered term, more documents are matched, as the alternate word forms for a user entered term are matched as well, increasing the total recall. This comes at the expense of reducing the precision. By expanding a search query to search for the synonyms of a user entered term, the recall is also increased at the expense of precision. This is due to the nature of the equation of how precision is calculated, in that a larger recall implicitly causes a decrease in precision, given that factors of recall are part of the denominator. It is also inferred that a larger recall negatively impacts overall search result quality, given that many users do not want more results to comb through, regardless of the precision.

The goal of query expansion in this regard is by increasing recall, precision can potentially increase (rather than decrease as mathematically equated), by including in the result set pages which are more relevant (of higher quality), or at least equally relevant. Pages which would not be included in the result set, which have the potential to be more relevant to the user's desired query, are included, and without query expansion would not have, regardless of relevance. At the same time, many of the current commercial search engines use word frequency (Tf-idf) to assist in ranking. By ranking the occurrences of both the user entered words and synonyms and alternate morphological forms, documents with a higher density (high frequency and close proximity) tend to migrate higher up in the search results, leading to a higher quality of the search results near the top of the results, despite the larger recall.

This tradeoff is one of the defining problems in query expansion, regarding whether it is worthwhile to perform given the questionable effects on precision and recall. Criticswho state one of the problems is that the dictionaries and thesauri, and the stemming algorithm, are driven by human bias and while this is implicitly handled by the query expansion algorithm, this explicitly affects the results in a non-automated manner (similar to how statisticians can 'lie' with statistics). Other criticswho point out potential for corporate influence on the dictionaries, promoting advertising of online web pages in the case of web search engines. Fact|date=December 2007|date=December 2007

ee also

* Search engine
* Search engine indexing
* Information retrieval
* Document retrieval
* Linguistics
* Natural language processing
* Stemming
* Morphology (linguistics)

Software libraries

* [http://lucene-qe.sourceforge.net/ LucQE] - open-source, Java. Provides a framework along with several implementations that allow to perform query expansion with the use of Apache Lucene.

References

* D. Abberley, D. Kirby, S. Renals, and T. Robinson, The THISL broadcast news retrieval system. In "Proc. ESCA ETRW Workshop Accessing Information in Spoken Audio", (Cambridge), pp. 14-19, 1999. Section on [http://homepages.inf.ed.ac.uk/srenals/pubs/1999/esca99-thisl/node6.html Query Expansion ] - concise, mathematical overview.
* Y. Qiu and H.P. Frei. Concept Based Query Expansion. In "Proceedings of SIGIR-93, 16th ACM International Conference on Research and Development in Information Retrieval", Pittsburgh, SIGIR Forum, ACM Press, June 1993. Available [http://citeseer.ist.psu.edu/qiu93concept.html here] - academic document on a specific method of query expansion
* Efthimis N. Efthimiadis. Query Expansion. In: Martha E. Williams (ed.), "Annual Review of Information Systems and Technology (ARIST)", v31, pp 121-187, 1996. Available [http://faculty.washington.edu/efthimis/pubs/Pubs/qe-arist/QE-arist.html here] - an introduction for less-technical viewers.

Notes


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • query expansion — The ability of a search engine to retrieve documents that contain related items and synonyms of search terms in addition to documents containing the search terms themselves …   Big dictionary of business and management

  • Web query classification — Web query topic classification/categorization is a problem in information science. The task is to assign a Web search query to one or more predefined categories, based on its topics. The importance of query classification is underscored by many… …   Wikipedia

  • 2002 NFL Expansion Draft — The Houston Texans selected the first players in franchise history. The 2002 National Football League Expansion Draft was the most recent National Football League (NFL) draft in which a new expansion team, named the Houston Texans, selected its… …   Wikipedia

  • Tercera expansión del Imperio incaico — Artículo principal: Expansión del Imperio incaico Después de la fundación del Curacazgo del Cuzco por parte de Manco Cápac y durante esos cien años de expansión y consolidación del estado inca, se puede observar tres partes definidas: la… …   Wikipedia Español

  • Colorado T-REX Project (TRansportation EXpansion) — The Transportation Expansion Project or T REX was a $1.67 billion venture that had a goal of transforming the way people in the metro Denver area commute within the areas of Interstates 25 and 225, then the country s 14th busiest intersection.… …   Wikipedia

  • Tissue expansion — is a technique used by plastic and restorative surgeons to cause the body to grow additional skin, bone or other tissues. kin expansionKeeping living tissues under tension causes new cells to form and the amount of tissue to increase. In some… …   Wikipedia

  • 1999 NFL Expansion Draft — [ Cleveland Browns selected the first players for the renewed franchise] On Fact|date=October 2008, National Football League owners voted to renew the Cleveland Browns franchise that had moved from Cleveland, [ohio) to Baltimore, Maryland and had …   Wikipedia

  • 1995 NFL Expansion Draft — Panthers and the Jaguars selected the first players in franchise historyThe 1995 National Football League Expansion Draft was held on February 15, 1995. [cite web |url=http://query.nytimes.com/gst/fullpage.html?res=990CEEDA133BF935A25751C0A9639582… …   Wikipedia

  • Concept Search — A concept search (or conceptual search) is an automated information retrieval method that is used to search electronically stored unstructured text (for example, digital archives, email, scientific literature, etc.) for information that is… …   Wikipedia

  • Google search — Infobox Website name = Google caption = The Google homepage (using the Safari web browser.) url = [http://www.google.com/ www.google.com] list of domain names commercial = yes type = Search Engine language = multilingual ( 100) registration =… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”