Register allocation


Register allocation

In compiler optimization, register allocation is the process of multiplexing a large number of target program variables onto a small number of CPU registers. The goal is to keep as many operands as possible in registers to maximise the execution speed of software programs. Register allocation can happen over a basic block ("local register allocation"), over a whole function/procedure ("global register allocation"), or in-between functions as a calling convention ("interprocedural register allocation").

Most computer programs need to process large numbers of different data items. However, most CPUs can only perform operations on a small fixed number of "slots" called registers. Even on machines that support memory operands, register access is considerably faster than memory access. Variables not allocated to registers must be loaded in and out of RAM whenever they are used.

"Register spilling" occurs where there are more live variables than the machine has registers. When a compiler is generating machine code and there are more live variables than the machine has registers, it has to transfer or "spill" some variables from registers to memory. This incurs a certain cost, as access from memory is typically slower than access from a register.

In compilers, "register pressure" occurs when there are more variables to allocate than there are registers available. This typically results in register spilling.

Challenges

Register allocation is an NP-complete problem. [Gregory J. Chaitin, Mark A. Auslander, Ashok K. Chandra, John Cocke, Martin E. Hopkins, and Peter W. Markstein. "Register allocation via coloring." Computer Languages, 6:47-57, 1981.] The number of variables in a typical program is much larger than the number of available registers in a processor, so the contents of some variables have to be spilled (saved) into memory locations. The cost of such spilling is minimised by spilling the least frequently used variables first, but it is not easy to know which variables will be used the least. In addition to this the hardware and operating system may impose restrictions on the usage of some registers.

Global register allocation

Like most other compiler optimizations, register allocation is based on the result of some compiler analysis, typically one of live variable analysis, a form of data flow analysis, or the constructionof static single assignment form, which encodes similar information into the name space of the code.

"Global register allocators", which consider an entire procedure, have been built using several paradigms, including bin packing (e.g., the DEC VAX compilers), priority-based graph coloring (e.g.,Chow-style allocators), and bottom-up graph coloring (e.g., Chaitin-style allocators). Of these approaches, the Chaitin-style allocators appear to be most common.

Chaitin's algorithm can be divided into two phases:
# Machine instructions are generated as if there are an infinite number of symbolic registers. So all variables suitable to being in registers will be assigned to numbered logical registers. The phase is sometimes called "register variable recognition".
# symbolic registers are replaced by physical registers in a target machine, with the minimum cost of spills.

In phase two, an "interference graph" is constructed where nodes are symbolic registers (created in the previous phase) and an arc connects two nodes if they are alive at the same time. More precisely, if one variable is alive at the time the other is defined then they are said to interfere. If the graph can be colored with "R" colors then the variables can be stored in "R" registers. This insight was pointed out by John Cocke, "father of the RISC architecture". The problem is that coloring a graph is an NP-hard problem.

The key insight to Chaitin’s algorithm is called the "degree < R" rule which is as follows. Given a graph "G" which contains a node "N" with degree less than "R", "G" is "R"-colorable iff the graph "G’", where "G’" is "G" with node "N" removed, is "R"-colorable. The proof is obvious in one direction: if a graph "G" can be colored with "R" colors then the graph "G’" can be created without changing the coloring. In the other direction, suppose we have an "R"-coloring of "G’". Since "N" has a degree of less than "R" there must be at least one color that is not in use for a node adjacent to "N". We can color "N" with this color.

While G cannot be R-colored While graph G has a node N with degree less than R Remove N and its associated edges from G and push N on a stack S End While If the entire graph has been removed then the graph is R-colorable While stack S contains a node N Add N to graph G and assign it a color from the R colors End While Else graph G cannot be colored with R colors Simplify the graph G by choosing an object to spill and remove its node N from G (spill nodes are chosen based on object’s number of definitions and references) End While

This algorithm is O(n^2). This algorithm can be improved through subsumption which is the act of coalescing nodes which are the source and target of copy operations into a single node before running the algorithm. This reduces the number of nodes to color but can increase the degree of any coalesced node. This can only be done when the nodes do not interfere with each other, however, and aggressive coalescing can lead to uncolorable graphs. (Preston Briggs’ thesis work introduces safer methods to determine which nodes to coalesce and spill. Based on his improvements this algorithm is often called the Chaitin-Briggs algorithm.) The subsumption step is slow and is not done in fast register allocators.

Recent developments

Graph coloring allocators produce efficient code, but their allocation time is high. In cases of static compilation, allocation time is not a significant concern. In cases of dynamic compilation, such as just-in-time (JIT) compilers, fast register allocation is important. An efficient technique proposed by Poletto and Sarkar is [http://www.research.ibm.com/jalapeno/papers/toplas99.pdf linear scan allocation] . This technique requires only a single pass over the list of variable live ranges. Ranges with short lifetimes are assigned to registers, whereas those with long lifetimes tend to be spilled, or reside in memory. The results are on average only 12% less efficient than graph coloring allocators.

The linear scan algorithm follows:

# Perform dataflow analysis to gather liveness information. Keep track of all variables’ live intervals, the interval when a variable is live, in a list sorted in order of increasing start point (note that this ordering is free if the list is built when computing liveness.) We consider variables and their intervals to be interchangeable in this algorithm.
# Iterate through liveness start points and allocate a register from the available register pool to each live variable.
# At each step maintain a list of active intervals sorted by the end point of the live intervals. (Note that insertion sort into a balanced binary tree can be used to maintain this list at linear cost.) Remove any expired intervals from the active list and free the expired interval’s register to the available register pool.
# In the case where the active list is size R we cannot allocate a register. In this case add the current interval to the active pool without allocating a register. Spill the interval from the active list with the furthest end point. Assign the register from the spilled interval to the current interval or, if the current interval is the one spilled, do not change register assignments.

Cooper and Dasgupta recently developed a "lossy" Chaitin-Briggs graph coloring algorithm suitable for use in a JIT [Cooper, Dasgupta, "Tailoring Graph-coloring Register Allocation For Runtime Compilation", http://llvm.org/pubs/2006-04-04-CGO-GraphColoring.html] . The "lossy" moniker refers to the imprecision the algorithm introduces into the interference graph. This optimization reduces the costly graph building step of Chaitin-Briggs making it suitable for runtime compilation. Experiments indicate that this lossy register allocator outperforms linear scan on the majority of tests used.

"Optimal" register allocation algorithms based on Integer Programming have been developed by Goodwin and Wilken for regular architectures. These algorithms have been extended to irregular architectures by Kong and Wilken.

While the worst case execution time is exponential, the experimental results show that the actual time is typically of order O(n^{2.5}) of the number of constraints [Kong, Wilken, "Precise Register Allocation for Irregular Architectures", http://www.ece.ucdavis.edu/cerl/cerl_arch/irreg.pdf] .

References


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Register Allocation — In der Programmoptimierung bezeichnet die Registerzuteilung (engl. Register Allocation) den Prozess der Zuteilung einer großen Anzahl von Programmvariablen auf eine kleine Anzahl von Registern im Hauptprozessor. Das NP vollständige[1] Problem der …   Deutsch Wikipedia

  • Allocation — may refer to:* Computers ** Delayed allocation ** Block allocation map ** FAT ** IP address allocation ** Memory allocation ** No write allocation ** Register allocation * Economics ** Economic system ** Asset allocation ** Allocation of… …   Wikipedia

  • Processor register — In computer architecture, a processor register is a small amount of storage available as part of a CPU or other digital processor. Such registers are (typically) addressed by mechanisms other than main memory and can be accessed more quickly.… …   Wikipedia

  • File Allocation Table — For other uses, see Fat (disambiguation). FAT Developer Microsoft Full Name File Allocation Table FAT12 (12‑bit version) FAT16/FAT16B (16‑bit versions) FAT32 (32‑bit version with 28 bits used) Introduced …   Wikipedia

  • Instruction scheduling — In computer science, instruction scheduling is a compiler optimization used to improve instruction level parallelism, which improves performance on machines with instruction pipelines. Put more simply, without changing the meaning of the code, it …   Wikipedia

  • Graph coloring — A proper vertex coloring of the Petersen graph with 3 colors, the minimum number possible. In graph theory, graph coloring is a special case of graph labeling; it is an assignment of labels traditionally called colors to elements of a graph… …   Wikipedia

  • Java performance — Programs written in Java have had a reputation for being slower and requiring more memory than those written in natively compiled languages such as C or C++ (see e.g. [cite web url=http://www.jelovic.com/articles/why java is slow.htm title=Why… …   Wikipedia

  • Практическое применение раскраски графов — Эту статью следует викифицировать. Пожалуйста, оформите её согласно правилам оформления статей. Раскраска графов практически применяется (постановку задачи различиных раскрасок здесь обсуждаться не будет) дл …   Википедия

  • Compiler optimization — is the process of tuning the output of a compiler to minimize or maximize some attributes of an executable computer program. The most common requirement is to minimize the time taken to execute a program; a less common one is to minimize the… …   Wikipedia

  • Chaitin's algorithm — is a bottom up, graph coloring register allocation algorithm that uses cost/degree as its spill metric. It is named after its designer, Gregory Chaitin. Chaitin s algorithm was the first register allocation algorithm that made use of coloring of… …   Wikipedia