Type inference

Type inference

Type inference, or implicit typing, refers to the ability to deduce automatically the type of a value in a programming language. It is a feature present in some strongly statically typed languages. It is often characteristic of — but not limited to — functional programming languages in general. Some languages that include type inference are: Ada, Boo, C# 3.0, Cayenne, Clean, Cobra, D, Epigram, F#, Haskell, ML, Nemerle , OCaml, Oxygene, Scala, and Visual Basic .NET 9.0. This feature is planned for Fortress, C++0x and Perl 6. The ability to infer types automatically makes many programming tasks easier, leaving the programmer free to omit type annotations while maintaining some level of type safety. Explicitly converting to another data type is called "casting" (or a "cast").

Nontechnical explanation

In most programming languages, all values have a type which describes the kind of data a particular value describes. In some languages, the type is known only at runtime; these languages are dynamically typed. In other languages, the type is known at compile time; these languages are statically typed. In statically typed languages, the input and output types of functions and local variables ordinarily must be explicitly provided by type annotations. For example, in C:int addone(int x) { int result; /*declare integer result (C language)*/

result = x+1; return result;}The beginning of this function definition, int addone(int x) declares that addone is a function which takes one argument, an integer, and returns an integer. int result; declares that the local variable result is an integer. In a proposed language where type inference is available, the code might be written like this instead: addone(x) { val result; /*inferred-type result (in proposed language)*/ val result2; /*inferred-type result #2 */ result = x+1; result2 = x+1.0; /* this line won't work */ return result; }This looks very similar to how code is written in a dynamically typed language, yet all types are known at compile time. In the imaginary language in which the last example is written, + always takes two integers and returns one integer. (This is how it works in, for example, OCaml.) From this, the type inferencer can infer that the value of x+1 is an integer, therefore result is an integer, therefore the return value of addone is an integer. Similarly, since + requires that both of its arguments be integers, x must be an integer, and therefore addone accepts one integer as an argument.

However, in the subsequent line, "result2" is calculated by adding a decimal "1.0" with floating-point arithmetic, causing a conflict in the use of x for both integer and floating-point expressions. Such a situation would generate a compile-time error message. In older languages, "result2" might have been implicitly declared as a floating-point variable, from implicitly converting integer x in the expression, simply because a decimal point was accidentally placed after the integer 1. Such a situation shows the difference between "type inference", which does not involve type conversion, and "implicit type conversion", which forces data to the higher-precision data type, often without restrictions.

Technical description

Type inference refers to the ability to deduce automatically, either partially or fully, the type of a value derived from the eventual evaluation of an expression. As this process is systematically performed at compile time, the compiler is often able to infer the type of a variable or the type signature of a function, without explicit type annotations having been given. In many cases, it is possible to omit type annotations from a program completely if the type inference system is robust enough, or the program or language simple enough.

To obtain the information required to infer correctly the type of an expression lacking an explicit type annotation, the compiler either gathers this information as an aggregate and subsequent reduction of the type annotations given for its subexpressions (which may themselves be variables or functions), or through an implicit understanding of the type of various atomic values (e.g., () : Unit; true : Bool; 42 : Integer; 3.14159 : Real; etc.). It is through recognition of the eventual reduction of expressions to implicitly typed atomic values that the compiler for a type inferring language is able to compile a program completely without type annotations. In the case of highly complex forms of higher order programming and polymorphism, it is not always possible for the compiler to infer as much, however, and type annotations are occasionally necessary for disambiguation.

Example

For example, let us consider the Haskell function map, which applies a procedure to each element of a list, and may be defined as: map f [] = [] map f (first:rest) = f first : map f rest

From this, it is evident that the function map takes a list as its second argument, that its first argument f is a function that can be applied to the type of elements of that list, and that the result of map is constructed as a list with elements that are results of f.So assuming that a list contains elements of the same type, we can reliably construct a type signature
a -> b" denotes a function that takes an a as its parameter and produces a b. "a -> b -> c" is equivalent to "a -> (b -> c)".

Note that the inferred type of map is parametrically polymorphic: The type of the arguments and results of f are not inferred, but left as type variables, and so map can be applied to functions and lists of various types, as long as the actual types match in each invocation.

Hindley–Milner type inference algorithm

The common algorithm used to perform type inference is the one now commonly referred to as Hindley–Milner, Damas–Milner algorithm. It has been referred to in the past as polymorphic type checking or Algorithm W.

The origin of this algorithm is the type inference algorithm for the simply typed lambda calculus, which was devised by Haskell B. Curry and Robert Feys in 1958.In 1969 Roger Hindley extended this work and proved that their algorithm always inferred the most general type.In 1978 Robin Milner [Citation
last1 = Milner | first1 = Robin
title = A Theory of Type Polymorphism in Programming
journal = jcss
pages = 348-375
volume = 17
year = 1978
] , independently of Hindley's work, provided an equivalent algorithm, Algorithm W. In 1982 Luis Damas [Citation
last1 = Damas | first1 = Luis
last2 = Milner | first2 = Robin
contribution = Principal type-schemes for functional programs
title = POPL '82: Proceedings of the 9th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
publisher = ACM
pages = 207--212
year = 1982
url = http://groups.csail.mit.edu/pag/6.883/readings/p207-damas.pdf
] finally proved that Milner's algorithm is complete and extended it to support systems with polymorphic references.

The algorithm

The algorithm proceeds in two steps. First, we need to generate a number of equations to solve, then we need to solve them.

Generating the equations

The algorithm used for generating the equations is similar to a regular type checker, so let's consider first a regular type checker for the typed lambda calculus given by

e , ::= E mid v mid (lambda v: au. e) mid (e, e)

and

au , ::= T mid au o au

where E is a primitive expression (such as "3") and T is a primitive type (such as "Integer").

We want to construct a function f which maps a pair (epsilon, t), where epsilon is a type environment and t is a term, to some type au. We assume that this function is already defined on primitives. The other cases are:

* f(epsilon, v) = au where (v, au) is in epsilon
* f(epsilon, g, e) = au where f(epsilon, g) = au_1 o au and f(epsilon, e) = au_1
* f(epsilon, lambda v: au. e) = au o f(epsilon_1, e) where epsilon_1 = epsilon cup (v, au)

Note that so far we do not specify what to do when we fail to meet the various conditions. This is because, in the simple type "checking" algorithm, the check simply fails whenever anything goes wrong.

Now, we develop a more sophisticated algorithm that can deal with type variables and constraints on them. Therefore, we extend the set T of primitive types to include an infinite supply of variables, denoted by lowercase Greek letters alpha, eta, ...

See [cite book | author=Pierce, Benjamin C. | title=Types and Programming Languages | chapter=Chapter 22 | publisher=MIT Press | year=2002 | id=ISBN 0-262-16209-1] for more details.

olving the equations

Solving the equations proceeds by unification. This is—maybe surprisingly—a rather simple algorithm. The function u operates on type equations and returns a structure called a "substitution". A substitution is simply a mapping from type variables to types. Substitutions can be composed and extended in the obvious ways.

Unifying the empty set of equations is easy enough: u, emptyset = mathbf{i}, where mathbf{i} is the identity substitution.

Unifying a variable with a type goes this way: u, ( [alpha = T] cup C) = u, (C') cdot (alpha mapsto T), where cdot is the substitution composition operator, and C' is the set of remaining constraints C with the new substitution alpha mapsto T applied to it.

Of course, u, ( [T = alpha] cup C) = u ( [alpha = T] cup C).

The interesting case remains as u, ( [S o S' = T o T'] cup C) = u , ({ [S = T] , [S' = T'] }cup C).

A simple example would be a [i] = b [i] (assume this to be c-like syntax for this example). First Hindley-Milner would find that i must be of type int, further more that 'a' must be an "array of alpha" and 'b' must an "array of eta". Now, since there is an assignment of eta to alpha, alpha must be of the same type (assuming no implicit type conversions) as eta. In the very least alpha must be a supertype of eta.

References

See also

*Duck typing, an analogous concept in languages with dynamic or weak typing

External links

* [http://www.cis.upenn.edu/~bcpierce/types/archives/1988/msg00042.html Archived e-mail message] by Roger Hindley explaining the history of type inference
* [http://web.archive.org/web/20050911123640/http://www.cs.berkeley.edu/~nikitab/courses/cs263/hm.html Implementation] of Hindley-Milner in Perl 5, by Nikita Borisov (via Internet Archive, version Sep 112005)


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Type system — Type systems Type safety Inferred vs. Manifest Dynamic vs. Static Strong vs. Weak Nominal vs. Structural Dependent typing Duck typing Latent typing Linear typing Uniqueness typing …   Wikipedia

  • Type polymorphism — In computer science, polymorphism is a programming language feature that allows values of different data types to be handled using a uniform interface. The concept of parametric polymorphism applies to both data types and functions. A function… …   Wikipedia

  • Inference de types — Inférence de types L inférence de types est un mécanisme qui permet à un compilateur ou un interpréteur de rechercher automatiquement les types associés à des expressions, sans qu ils soient indiqués explicitement dans le code source. Il s agit… …   Wikipédia en Français

  • Inférence De Types — L inférence de types est un mécanisme qui permet à un compilateur ou un interpréteur de rechercher automatiquement les types associés à des expressions, sans qu ils soient indiqués explicitement dans le code source. Il s agit pour le compilateur… …   Wikipédia en Français

  • Type de données — Type (informatique) Pour les articles homonymes, voir Type (homonymie). En programmation un type de données, ou simplement type, définit le genre de contenu d une donnée et les opérations pouvant être effectuées sur la variable correspondante.… …   Wikipédia en Français

  • Type recursif — Type récursif Dans un langage de programmation, un type récursif ou type inductif est un type de données pour des valeurs qui contiennent d autres valeurs du même type. Un exemple est le type liste en Haskell : data List a = Nil | Cons a… …   Wikipédia en Français

  • Type III — may stand for:* Glycogen storage disease type III, a genetic disorder * Hyperlipproteinemia type III, a risk factor for cardiovascular disease * The IBM Type III Library, a distribution mechanism for unsupported IBM mainframe software such as… …   Wikipedia

  • Inférence de types — L inférence de types est un mécanisme qui permet à un compilateur ou un interpréteur de rechercher automatiquement les types associés à des expressions, sans qu ils soient indiqués explicitement dans le code source. Il s agit pour le compilateur… …   Wikipédia en Français

  • Type statique — Typage statique Sommaire 1 Définition 2 Langages à objets et typage statique 3 Problèmes 4 Résolution, autres difficultés …   Wikipédia en Français

  • Inférence bayésienne — On nomme inférence bayésienne la démarche logique permettant de calculer ou réviser la probabilité d un événement. Cette démarche est régie en particulier par théorème de Bayes. Dans la perspective bayésienne, une probabilité n est pas… …   Wikipédia en Français

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”