 Unification (computer science)

Unification, in computer science and logic, is an algorithmic process by which one attempts to solve the satisfiability problem. The goal of unification is to find a substitution which demonstrates that two seemingly different terms are in fact either identical or just equal. Unification is widely used in automated reasoning, logic programming and programming language type system implementation.
Several kinds of unification are commonly studied: that for theories without any equations (the empty theory) is referred to as syntactic unification: one wishes to show that (pairs of) terms are identical. If one has a nonempty equational theory, then one is typically interested in showing the equality of (a pair of) terms; this is referred to as semantic unification. Since substitutions can be ordered into a partial order, unification can be understood as the procedure of finding a join on a lattice.
Unification on ground terms is just the ground word problem; because the ground word problem is undecidable, so is unification.
The first formal investigation of unification can be attributed to John Alan Robinson, who used firstorder unification as a basic building block of his resolution procedure for firstorder logic, a great step forward in automated reasoning technology, as it eliminated one source of combinatorial explosion: searching for instantiation of terms.
Syntactic unification of firstorder terms
Firstorder terms
Main article: Term (logic)Given a set of variable symbols X = {x,y,z,...}, a set of distinct constant symbols C = {a,b,c,...}, and a set of distinct function symbols F = {f,g,h,...}, a term is defined as any expression that can be generated by a finite number of applications of the following rules:
 Basis: any variable , and also any constant is a term
 Induction: if are terms then is term for finite k, 0 < k.
E.g. x,y,a and b are terms that can be generated directly from the Basis rule; f(a,x), g(b,y,x) are terms by a single application of the Induction rule to terms generated from the Basis; f(a,f(a,x)) is a term generated by a second application of the Induction rule to terms we have already generated; and so on... Often, for brevity, the constant symbols are regarded as function symbols taking zero arguments, the induction rule is relaxed to allow terms with zero arguments , and a() is regarded as syntactically equal to a: see firstorder terms. The distinction between constants (zero arity functions) and function symbols with arity greater than zero is often made to clearly distinguish Basis and Induction cases of terms for purposes of proofs. Often mathematicians fix the arity (number of arguments) of a function symbol (see Signature) while typically in syntactic unification problems a function symbol may have any (finite) number of arguments, and possibly may have different numbers of arguments in different contexts. e.g. f(f(a),f(x,y,z)) is a well formed term for unification problems.
Substitution
Main article: Substitution (logic)A substitution is defined as a finite set of mappings from variables to terms where each mapping must be unique, because mapping the same variable to two different terms would be ambiguous. A substitution may be applied to a term u and is written , which means (simultaneously) replace every occurrence of each variable x_{i} in the term u with the term t_{i} for . E.g.
Syntactic unification problem on firstorder terms
A syntactic unification problem on firstorder terms is a conjunction of a finite number of potential equations on terms . To solve the problem, a substitution θ must be found so that the terms on LHS and RHS of each of the potential equalities become syntactically equivalent when the substitution is applied i.e. . Such a substitution θ is called a unifier. Alternatively a unification problem might have no solution. E.g. has a unifier , because
 and
Occurs check
Main article: Occurs checkIf there is ever an attempt to unify a variable x with a term with a function symbol containing x as a strict subterm x=f(...,x,...) then x would have to be an infinite term, which contradicts the strict definition of terms that requires a finite number of applications of the induction rule. e.g x=f(x) does not represent a strictly valid term.
Informal overview
Given two input terms s and t, (syntactic) unification is the process which attempts to find a substitution that structurally identifies s and t. If such a substitution exists, we call this substitution a unifier of s and t.
In theory, some pairs of input terms may have infinitely many unifiers. However, for most applications of unification, it is sufficient to consider a most general unifier (mgu). An mgu is useful because all other unifiers are instances of the mgu.
In particular, firstorder unification is the syntactic unification of firstorder terms (terms built from variable and function symbols). Higherorder unification, on the other hand, is the unification of higherorder terms (terms containing some higherorder variables).
The set of all syntactically equivalent terms is variously called the free theory (because it is a free object), the empty theory (because the set of equational sentences is empty), or the theory of uninterpreted functions (because unification is done on uninterpreted terms.)
The theoretical properties of a particular unification algorithm depend upon the variety of term used as input. Firstorder unification, for instance, is decidable, efficiently so, and all unifiable terms have (unique, up to variable renamings) most general unifiers. Higherorder unification, on the other hand, is generally undecidable and lacks most general unifiers, though Huet's higherorder unification algorithm seems to work sufficiently well in practice.
Aside from syntactic unification, another form of unification, called semantic unification (known also as equationalunification, Eunification) is also widely used. The key difference between the two notions of unification is how two unified terms should be considered 'equal'. Syntactic unification attempts to find a substitution making the two input terms structurally identical. However, semantic unification attempts to make the two input terms equal modulo some equational theory. Some particularly important equational theories have been widely studied in the literature. For instance, ACunification is the unification of terms modulo associativity and commutativity.
Unification is a significant tool in computer science. Firstorder unification is especially widely used in logic programming, programming language type system design, especially in type inferencing algorithms based on the HindleyMilner type system, and automated reasoning. Higherorder unification is also widely used in proof assistants, for example Isabelle and Twelf, and restricted forms of higherorder unification (higherorder pattern unification) are used in some programming language implementations, such as lambdaProlog, as higherorder patterns are expressive, yet their associated unification procedure retains theoretical properties closer to firstorder unification. Semantic unification is widely used in SMT solvers and term rewriting algorithms.
Definition of unification for firstorder logic
Let p and q be sentences in firstorder logic.
 UNIFY(p,q) = U where subst(U,p) = subst(U,q)
Where subst(U,p) means the result of applying substitution U on the sentence p. Then U is called a unifier for p and q. The unification of p and q is the result of applying U to both of them.^{[1]}
Let L be a set of sentences, for example, L = {p,q}. A unifier U is called a most general unifier for L if, for all unifiers U' of L, there exists a substitution s such that applying s to the result of applying U to L gives the same result as applying U' to L:
 subst(U',L) = subst(s,subst(U,L)).
Unification in logic programming
The concept of unification is one of the main ideas behind logic programming, best known through the language Prolog. It represents the mechanism of binding the contents of variables and can be viewed as a kind of onetime assignment. In Prolog, this operation is denoted by the equality symbol =, but is also done when instantiating variables (see below). It is also used in other languages by the use of the equality symbol =, but also in conjunction with many operations including +, , *, /. Type inference algorithms are typically based on unification.
In Prolog:
 A variable which is uninstantiated—i.e. no previous unifications were performed on it—can be unified with an atom, a term, or another uninstantiated variable, thus effectively becoming its alias. In many modern Prolog dialects and in firstorder logic, a variable cannot be unified with a term that contains it; this is the so called occurs check.
 Two atoms can only be unified if they are identical.
 Similarly, a term can be unified with another term if the top function symbols and arities of the terms are identical and if the parameters can be unified simultaneously. Note that this is a recursive behavior.
Type inference
Unification is used during type inference, for instance in the functional programming language Haskell (programming language). On one hand, the programmer does not need to provide type information for every function, on the other hand it is used to detect typing errors. The Haskell expression 1:['a','b','c'] is not correctly typed, because the list construction function ":" is of type a>[a]>[a] and for the first argument "1" the polymorphic type variable "a" has to denote the type Int whereas "['a','b','c']" is of type [Char], but "a" cannot be both Char and Int at the same time.
Like for prolog an algorithm for type inference can be given:
 Any type variable unifies with any type expression, and is instantiated to that expression. A specific theory might restrict this rule with an occurs check.
 Two type constants unify only if they are the same type.
 Two type constructions unify only if they are applications of the same type constructor and all of their component types recursively unify.
Due to its declarative nature, the order in a sequence of unifications is (usually) unimportant.
Note that in the terminology of firstorder logic, an atom is a basic proposition and is unified similarly to a Prolog term.
Higherorder unification
Many applications require one to consider the unification of typed lambdaterms instead of firstorder terms. Such unification is often called higherorder unification. A well studied branch of higherorder unification is the problem of unifying simply typed lambda terms modulo the equality determined by αβη conversions. Such unification problems do not have most general unifiers. While higherorder unification is undecidable,^{[2]}^{[3]} Gérard Huet gave a semidecidable (pre)unification algorithm^{[4]} that allows a systematic search of the space of unifiers (generalizing the unification algorithm of MartelliMontanari^{[5]} with rules for terms containing higherorder variables. Huet^{[6]} and Gilles Dowek^{[7]} have written articles surveying this topic.
Dale Miller has described what is now called higherorder pattern unification.^{[8]} This subset of higherorder unification is decidable and solvable unification problems have mostgeneral unifiers. Many computer systems that contain higherorder unification, such as the higherorder logic programming languages λProlog and Twelf, often implement only the pattern fragment and not full higherorder unification.
In computational linguistics, one of the most influential theories of ellipsis is that ellipses are represented by free variables whose values are then determined using HigherOrder Unification (HOU). For instance, the semantic representation of "Jon likes Mary and Peter does too" is like(j; m)R(p) and the value of R (the semantic representation of the ellipsis) is determined by the equation like(j; m) = R(j). The process of solving such equations is called HigherOrder Unification.^{[9]}
Examples of syntactic unification of firstorder terms
In the Prolog syntactical convention a symbol starting with an upper case letter is a variable name; a symbol that starts with a lowercase letter is a function symbol; the comma is used as the logical and operator. In maths the convention is to use lower case letters at the end of the alphabet as variable names (e.g. x,y,z); letters f,g,h as function symbols, and letters a,b,c as constants, and constants are often regarded as functions that take no arguments; logical "and" is represented by & or
Prolog Notation Maths Notation Unifying Substitution Explanation a = a a = a {} Succeeds. (tautology) a = b a = b fail a and b do not match X = X x = x {} Succeeds. (tautology) a = X a = x x is unified with the constant a X = Y x = y x and y are aliased f(a,X) = f(a,b) f(a,x) = f(a,b) function and constant symbols match, x is unified with the constant b f(a) = g(a) f(a) = g(a) fail f and g do not match f(X) = f(Y) f(x) = f(y) x and y are aliased f(X) = g(Y) f(x) = g(y) fail f and g do not match f(X) = f(Y,Z) f(x) = f(y,z) fail Fails. The terms have different arity f(g(X)) = f(Y) f(g(x)) = f(y) Unifies y with the term g(x) f(g(X),X) = f(Y,a) f(g(x),x) = f(y,a) Unifies x with constant a, and y with the term g(a) X = f(X) x = f(x) should fail Fails in strict firstorder logic and many modern Prolog dialects (enforced by the occurs check). Succeeds in traditional Prolog, unifying x with infinite expression x=f(f(f(f(...)))) which is not strictly a term.
X = Y, Y = a x = y & y = a Both x and y are unified with the constant a a = Y, X = Y a = y & x = y As above (unification is symmetric, and transitive) X = a, b = X x = a & b = x fail Fails. a and b do not match, so x can't be unified with both A Unification Algorithm
Given a unification problem G consisting of a finite multiset of potential equations on terms , the algorithm applies term rewriting rules to transform the multiset of potential equations to an equivalent multiset of equations of the form where are unique variables (appearing exactly once on the LHS of one equation, and nowhere else). A multiset of this form can be read as a substitution. If there is no solution the algorithm terminates with . The set of variables in a term t is written as Vars(t), and the set of variables in all terms on LHS or RHS of potential equations in a problem G is similarly written as Vars(G). The operation of substituting all occurrences of variable x in problem G with term t is denoted . For brevity constant symbols are regarded as function symbols having zero arguments.
Proof of Termination
For the proof of termination consider a 3tuple <NUVN,NLHS,EQN> where NUVN is the number of nonunique variables, NLHS is the number of function symbols and constants on the LHS of potential equations, and EQN is the number of equations. The application of these rewrite rules in any order terminates because NUVN remains the same or is reduced by each rewrite. Where NUVN remains the same, NLHS remains the same or is reduced by each rewrite. Where NUVN remains the same and NLHS remains the same, EQN is reduced by each rewrite.
See also
Notes
 ^ Russell, Norvig: Artificial Intelligence, A Modern Approach, p. 277
 ^ Warren Goldfarb: The undecidability of the secondorder unification problem [1]
 ^ Gérard Huet: The undecidability of unification in third order logic
 ^ Gérard Huet: A Unification Algorithm for typed LambdaCalculus []
 ^ Martelli, Montanari: An Efficient Unification Algorithm
 ^ Gérard Huet: Higher Order Unification 30 Years Later
 ^ Gilles Dowek: HigherOrder Unification and Matching. Handbook of Automated Reasoning 2001: 10091062
 ^ Dale Miller: A Logic Programming Language with LambdaAbstraction, Function Variables, and Simple Unification, Journal of Logic and Computation, 1991, pp. 497536 [2]
 ^ "Claire Gardent, Michael Kohlhase, Karsten Konrad, A multilevel, HigherOrder Unification approach to ellipsis (1997)". http://en.scientificcommons.org/48410.
References
 F. Baader and T. Nipkow, Term Rewriting and All That. Cambridge University Press, 1998.
 F. Baader and W. Snyder, Unification Theory. In J.A. Robinson and A. Voronkov, editors, Handbook of Automated Reasoning, volume I, pages 447–533. Elsevier Science Publishers, 2001.
 Joseph Goguen, What is Unification?.
 Alex Sakharov, "Unification" from MathWorld.
Categories:
Wikimedia Foundation. 2010.
Look at other dictionaries:
computer science — computer scientist. the science that deals with the theory and methods of processing information in digital computers, the design of computer hardware and software, and the applications of computers. [1970 75] * * * Study of computers, their… … Universalium
Assignment (computer science) — In computer programming, an assignment statement sets or re sets the value stored in the storage location(s) denoted by a variable name. In most imperative computer programming languages, assignment statements are one of the basic statements.… … Wikipedia
Reification (computer science) — Reification is the act of making an abstract concept or low level implementation detail of a programming language accessible to the programmer.For example, * the C programming language reifies the low level detail of memory addresses * the Scheme … Wikipedia
Nominal terms (computer science) — Nominal terms are a metalanguage for embedding object languages with binding constructs into. Intuitively, they may be seen as an extension of first order terms with support for name binding. Consequently, the native notion of equality between… … Wikipedia
Unification — In mathematical logic, in particular as applied to computer science, a unification of two terms is a join (in the lattice sense) with respect to a specialisation order. That is, we suppose a preorder on a set of terms, for which t * ≤ t means… … Wikipedia
Science — This article is about the general term, particularly as it refers to experimental sciences. For the specific topics of study by scientists, see Natural science. For other uses, see Science (disambiguation) … Wikipedia
science, philosophy of — Branch of philosophy that attempts to elucidate the nature of scientific inquiry observational procedures, patterns of argument, methods of representation and calculation, metaphysical presuppositions and evaluate the grounds of their validity… … Universalium
Unification (disambiguation) — In mathematical logic and in computer science, a unification of two terms is a join with respect to a specialisation order.Unification may also refer to:* Championship unification, a professional wrestling term used to describe the consolidation… … Wikipedia
Semantic unification — has a long history in fields like philosophy and linguistics. In computer science it has been used in different research areas like grammar unification [1] [2] , and Prolog extensions [3] . For business processes and workflow management the term… … Wikipedia
History of science — History of science … Wikipedia