 Conjunctive query

In database theory, a conjunctive query is a restricted form of firstorder queries. A large part of queries issued on relational databases can be written as conjunctive queries, and large parts of other firstorder queries can be written as conjunctive queries. Conjunctive queries also have a number of desirable theoretical properties that larger classes of queries (e.g., the relational algebra queries) do not share.
Contents
Definition
The conjunctive queries are simply the fragment of firstorder logic given by the set of formulae that can be constructed from atomic formulae using conjunction and existential quantification , but not using disjunction , negation , or universal quantification . Each such formula can be rewritten (efficiently) into an equivalent formula in prenex normal form, thus this form is usually simply assumed.
Thus conjunctive queries are of the general form
 ,
with the free variables being called distinguished variables, and the bound variables being called undistinguished variables. are atomic formulae. Conjunctive queries without distinguished variables are called boolean conjunctive queries.
Conjunctive queries can express a large part of queries, which are frequently issued on relational databases. To give an example, imagine a relational database for storing information about students, their address, the courses they visit and their gender. Finding all male students and their addresses who attend a course that is also attended by a female student is expressed by the following conjunctive query:
(student, address) . (student2, course) . attends(student, course) gender(student, 'male') attends(student2, course) gender(student2, 'female') lives(student, address)
Note that since the only entity of interest is the male student and his address, these are the only distinguished variables, while the variables
course
,student2
are only existentially quantified, i.e. undistinguished.Relationship to other query languages
Conjunctive queries also correspond to selectprojectjoin queries in relational algebra (i.e., relational algebra queries that do not use the operations union or difference) and to selectfromwhere queries in SQL in which the wherecondition uses exclusively conjunctions of atomic equality conditions, i.e. conditions constructed from column names and constants using no comparison operators other than "=", combined using "and". Notably, this excludes the use of aggregation and subqueries. For example, the above query can be written as an SQL query of the conjunctive query fragment as
select l.student, l.address from attends a1, gender g1, attends a2, gender g2, lives l where a1.student=g1.student and a2.student=g2.student and l.student=g1.student and a1.course=a2.course and g1.gender='male' and g2.gender='female';
Conjunctive queries and datalog
Besides their logical notation, conjunctive queries can also be written as datalog rules. Many authors in fact prefer the following datalog notation for the query above:
result(student, address) : attends(student, course), gender(student, male), attends(student2, course), gender(student2, female), lives(student, address).
Although there are no quantifiers in this notation, variables appearing in the head of the rule are still implicitly universally quantified, while variables only appearing in the body of the rule are still implicitly existentially quantified.
While any conjunctive query can be written as a datalog rule, not every datalog program can be written as a conjunctive query. In fact, only single rules over extensional predicate symbols can be easily rewritten as an equivalent conjunctive query. The problem of deciding whether for a given datalog program there is an equivalent nonrecursive program (corresponding to a positive relational algebra query, or, equivalently, a formula of positive existential firstorder logic, or, as a special case, a conjunctive query) is known as the datalog boundedness problem and is undecidable^{[1]}.
Extensions of conjunctive queries
Extensions of conjunctive queries capturing more expressive power include unions of conjunctive queries, which are equivalent to positive (i.e., negationfree) relational algebra, conjunctive queries extended by union and negation, which by Codd's theorem correspond to relational algebra and firstorder logic, conjunctive queries with builtin predicates and conjunctive queries with aggregate functions. The formal study of all of these extensions is justified by their application in relational databases and is in the realm of database theory.
Complexity of conjunctive queries
For the study of the computational complexity of evaluating conjunctive queries, two problems have to be distinguished. The first is the problem of evaluating a conjunctive query on a relational database where both the query and the database are considered part of the input. The complexity of this problem is usually referred to as combined complexity, while the complexity of the problem of evaluating a query on a relational database, where the query is assumed fixed, is called data complexity^{[2]}.
Conjunctive queries are NPcomplete with respect to combined complexity^{[3]}, while the data complexity of conjunctive queries is very low, in the parallel complexity class AC0, which is contained in LOGSPACE and thus in polynomial time. The NPhardness of conjunctive queries may appear surprising, since relational algebra and SQL strictly subsume the conjunctive queries and are thus at least as hard (in fact, relational algebra is PSPACEcomplete with respect to combined complexity and is therefore even harder under widelyheld complexitytheoretic assumptions). However, in the usual application scenario, databases are large, while queries are very small, and the data complexity model may be appropriate studying and describing their difficulty.
Formal properties of conjunctive queries
Conjunctive queries are one of the great success stories of database theory in that many interesting problems that are computationally hard or undecidable for larger classes of queries are feasible for conjunctive queries^{[4]}. For example, consider the query containment problem. We write for two database relations R,S of the same schema if and only if each tuple occurring in R also occurs in S. Given a query Q and a relational database instance I, we write the result relation of evaluating the query on the instance simply as Q(I). Given two queries Q_{1} and Q_{2} and a database schema, the query containment problem is the problem of deciding whether for all possible database instances I over the input database schema, . The main application of query containment is in query optimization: Deciding whether two queries are equivalent is possible by simply checking mutual containment.
The query containment problem is undecidable for relational algebra and SQL but is decidable and NPcomplete for conjunctive queries. In fact, it turns out that the query containment problem for conjunctive queries is exactly the same problem as the query evaluation problem^{[4]}. Since queries tend to be small, NPcompleteness here is usually considered acceptable. The query containment problem for conjunctive queries is also equivalent to the constraint satisfaction problem^{[5]}.
An important class of conjunctive queries that have polynomialtime combined complexity are the acyclic conjunctive queries^{[6]}. The query evaluation, and thus query containment, is LOGCFLcomplete and thus in polynomial time^{[7]}. Acyclicity of conjunctive queries is a structural property of queries that is defined with respect to the query's hypergraph^{[4]}: a conjunctive query is acyclic if and only if it has hypertreewidth 1. For the special case of conjunctive queries in which all relations used are binary, this notion corresponds to the treewidth of the dependency graph of the variables in the query (i.e., the graph having the variables of the query as nodes and an undirected edge {x,y} between two variables if and only if there is an atomic formula R(x,y) or R(y,x) in the query) and the conjunctive query is acyclic if and only if its dependency graph is acyclic.
An important generalization of acyclicity is the notion of bounded hypertreewidth, which is a measure of how close to acyclic a hypergraph is, analogous to bounded treewidth in graphs. Conjunctive queries of bounded treewidth have LOGCFL combined complexity^{[8]}.
Unrestricted conjunctive queries over tree data (i.e., a relational database consisting of a binary child relation of a tree as well as unary relations for labeling the tree nodes) have polynomial time combined complexity^{[9]}.
References
 ^ Gerd G. Hillebrand, Paris C. Kanellakis, Harry G. Mairson, Moshe Y. Vardi: Undecidable Boundedness Problems for Datalog Programs. J. Log. Program. 25(2): 163190 (1995)
 ^ Vardi, Moshe Y. (1982), "The Complexity of Relational Query Languages (Extended Abstract)", STOC '82: Proceedings of the fourteenth annual ACM symposium on Theory of computing, pp. 137–146, doi:10.1145/800070.802186, http://www.cs.rice.edu/~vardi/papers/stoc82.pdf.gz
 ^ Ashok K. Chandra and Philip M. Merlin, 1977. Optimal Implementation of Conjunctive Queries in Relational Data Bases. STOC '77: Proceedings of the ninth annual ACM symposium on Theory of computing [1]
 ^ ^{a} ^{b} ^{c} Serge Abiteboul, Richard B. Hull, Victor Vianu: Foundations of Databases. AddisonWesley, 1995.
 ^ Kolaitis, Phokion G.; Vardi, Moshe Y. (2000), "ConjunctiveQuery Containment and Constraint Satisfaction", Journal of Computer and System Sciences 61: 302–332, doi:10.1006/jcss.2000.1713
 ^ Mihalis Yannakakis: Algorithms for Acyclic Database Schemes . Proc. VLDB 1981: 8294.
 ^ Georg Gottlob, Nicola Leone, and Francesco Scarcello (2001). "The complexity of acyclic conjunctive queries". Journal of the ACM (JACM) 48 (3): 431–498. doi:10.1145/382780.382783.
 ^ Georg Gottlob, Nicola Leone, and Francesco Scarcello: Hypertree Decompositions and Tractable Queries. J. Comput. Syst. Sci. 64(3): 579627 (2002)
 ^ Georg Gottlob, Christoph Koch, Klaus U. Schulz: Conjunctive queries over trees. J. ACM 53(2): 238272 (2006)
External links
Categories:
Wikimedia Foundation. 2010.
Look at other dictionaries:
Boolean conjunctive query — In the theory of relational databases, a Boolean conjunctive query is a conjunctive query without distinguished predicates, i.e., a query in the form , where each Ri is a relation symbol and each ti is a tuple of variables and constants; the… … Wikipedia
RDFLib — Infobox Software name = RDFLib developer = Daniel Krech latest release version = 2.4.0 latest release date = April 4, 2007 operating system = Cross platform genre = Library license = BSD website = [http://rdflib.net/ http://rdflib.net/] RDFLib is … Wikipedia
Data integration — involves combining data residing in different sources and providing users with a unified view of these data.[1] This process becomes significant in a variety of situations, which include both commercial (when two similar companies need to merge… … Wikipedia
Complexity of constraint satisfaction — The complexity of constraint satisfaction is the application of computational complexity theory on constraint satisfaction. It has mainly been studied for discriminating between tractable and intractable classes of constraint satisfaction… … Wikipedia
Database theory — encapsulates a broad range of topics related to the study and research of the theoretical realm of databases and database management systems. Theoretical aspects of data management include, among other areas, the foundations of query languages,… … Wikipedia
Structure (mathematical logic) — In universal algebra and in model theory, a structure consists of a set along with a collection of finitary operations and relations which are defined on it. Universal algebra studies structures that generalize the algebraic structures such as… … Wikipedia
List of NPcomplete problems — Here are some of the more commonly known problems that are NP complete when expressed as decision problems. This list is in no way comprehensive (there are more than 3000 known NP complete problems). Most of the problems in this list are taken… … Wikipedia
Liste De Problèmes NPComplets — Ceci est une liste des problèmes NP complets les plus connus en théorie de la complexité des algorithmes, exprimés sous la forme d un problèmes de la décision. Puisqu on connaît plus de 3000 problèmes NP complets, cette liste n est pas exhaustive … Wikipédia en Français
Liste de problemes NPcomplets — Liste de problèmes NP complets Ceci est une liste des problèmes NP complets les plus connus en théorie de la complexité des algorithmes, exprimés sous la forme d un problèmes de la décision. Puisqu on connaît plus de 3000 problèmes NP complets,… … Wikipédia en Français
Liste de problèmes NPcomplets — Ceci est une liste des problèmes NP complets les plus connus en théorie de la complexité des algorithmes, exprimés sous la forme d un problème de décision. Puisqu on connaît plus de 3000 problèmes NP complets, cette liste n est pas exhaustive. La … Wikipédia en Français