Union (computer science)


Union (computer science)

In computer science, a union is a data structure that stores one of several types of data at a single location. There are only two safe ways of accessing a union object. One is to always read the field of a union most recently assigned; tagged unions enforce this restriction. The other is to only access functionality common to all types in the union. For example, if the fields are all subtypes of a common supertype, then it is always legal to perform operations on the union object that one can perform on the supertype.

The remainder of this article will refer strictly to primitive untagged unions, as opposed to tagged unions.

Because of the limitations of their use, untagged unions are generally only provided in untyped languages or in an unsafe way (as in C). They have the advantage over simple tagged unions of not requiring space to store the tag.

The name "union" stems from the type's formal definition. If one sees a type as the set of all values that type can take on, a union type is simply the mathematical union of its constituting types, since it can take on any value any of its fields can. Also, because a mathematical union discards duplicates, if more than one field of the union can take on a single common value, it is impossible to tell from the value alone which field was last written.

Unions in various programming languages

C/C++

In C and C++, untagged unions are expressed nearly exactly like structures (structs), except that each data member begins at the same location in memory. The data members, as in structures, need not be primitive values, and in fact may be structures or even other unions. However, C++ does not allow for a data member to be any type that has "a non-trivial constructor, a non-trivial copy constructor, a non-trivial destructor, or a non-trivial copy assignment operator." In particular, it is impossible to have the standard C++ string as a member of a union. The union object occupies as much space as the largest member, whereas structures require space equal to at least the sum of the size of its members. This gain in space efficiency, while valuable in certain circumstances, comes at a great cost of safety: the program logic must ensure that it only reads the field most recently written along all possible execution paths.

The primary usefulness of a union is to conserve space, since it provides a way of letting many different types be stored in the same space. Unions also provide crude polymorphism. However, there is no checking of types, so it is up to the programmer to be sure that the proper fields are accessed in different contexts. The relevant field of a union variable is typically determined by the state of other variables, possibly in an enclosing struct.

One common C programming idiom uses unions to perform what C++ calls a reinterpret_cast, by assigning to one field of a union and reading from another, as is done in code which depends on the raw representation of the values. This is not, however, a safe use of unions in general.

Note that the safer tagged unions can be constructed from untagged unions (see tagged union). The safe C dialect Cyclone encourages the preference of tagged unions to untagged.

:Structure and union specifiers have the same form. [ . . . ] The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, then to the unit in which it resides), and vice versa.: - ANSI/ISO 9899:1990 (the ANSI C standard), section 6.5.2.1

See also

* Union (set theory)
* Union (SQL)

External links

* [http://boost.org/doc/html/variant.html boost::variant] , a type-safe alternative to C++ unions


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Computer Science and Engineering — is the discipline formed by the union of Computer Science and Computer Engineering.It is an ABET accredited major [ [http://www.seasoasa.ucla.edu/curric05 06.html/HTML/compsci.html#marker 1005090 UCLA CS E Curriculum] Accessed August 14 2006]… …   Wikipedia

  • Object (computer science) — In computer science, an object is any entity that can be manipulated by the commands of a programming language, such as a value, variable, function, or data structure. (With the later introduction of object oriented programming the same word,… …   Wikipedia

  • String (computer science) — In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet. In computer programming, a string is traditionally a sequence of… …   Wikipedia

  • Polymorphism (computer science) — This article is about the programming language theory concepts with direct application to functional programming languages. For a gentler introduction of these notions as commonly implemented in object oriented programming, see Polymorphism in… …   Wikipedia

  • Reference (computer science) — This article is about a general notion of reference in computing. For the more specific notion of reference used in C++, see Reference (C++). In computer science, a reference is a value that enables a program to indirectly access a particular… …   Wikipedia

  • Integer (computer science) — In computer science, an integer is a datum of integral data type, a data type which represents some finite subset of the mathematical integers. Integral data types may be of different sizes and may or may not be allowed to contain negative values …   Wikipedia

  • Set (computer science) — In computer science, a set is a collection (container) of certain values, without any particular order, and no repeated values. It corresponds with a finite set in mathematics. Disregarding sequence, and the fact that there are no repeated values …   Wikipedia

  • Record (computer science) — This article is about the data structure and the type. In computer science, a record type is a type whose values are records, i.e. aggregates of several items of possibly different types. The items being aggregated are called fields (or members ) …   Wikipedia

  • Production (computer science) — A production or production rule in computer science is a rewrite rule specifying a symbol substitution that can be recursively performed to generate new symbol sequences. A finite set of productions P is the main component in the specification of …   Wikipedia

  • Department of Computer Science (University of Toronto) — Bahen Centre at the University of Toronto. Most of the department is housed in this building. The Department of Computer Science (DCS) at the University of Toronto is an administrative unit within the Faculty of Arts and Science. Established on 1 …   Wikipedia


Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”

We are using cookies for the best presentation of our site. Continuing to use this site, you agree with this.