Loop unwinding


Loop unwinding

Loop unwinding, also known as loop unrolling, is a loop transformation technique that attempts optimize a program's execution speed at the expense of its size.

The goal of loop unwinding is to increase the programs speed by reducing (or eliminating) the "end of loop" test on each iteration. Loops can be re-written as a sequence of independent statements which eliminates the loop controller overhead. Significant gains can be realized if the speed gained (by eliminating the "end of loop" test) offsets the performance reduction caused by the increased size of the program. Furthermore, if the statements in the loop are independent of each other, where statements that occur earlier in the loop do not effect statements that follow them, then the statements can be executed in parallel.

ide effects

The major side effects of loop unrolling are:
* a) the increased register usage in a single iteration to store temporary variables (though much will depend on possible optimizations), which may hurt performance; and
* b) the code size expansion after the unrolling, which is undesirable for embedded applications and for purposes of code readability.

A simple example

A procedure in a computer program is to delete 100 items from a collection. This is accomplished by means of a "for"-loop which calls the function "delete(item_number)":

for (int x = 0; x < 100; x++) { delete(x); }

If this part of the program is to be optimized, and the overhead of the loop requires significant resources compared to that for "delete(x)", loop unwinding can be used to speed it up. This will result in an optimized code fragment like:

for (int x = 0; x < 100; x += 5) { delete(x); delete(x+1); delete(x+2); delete(x+3); delete(x+4); } As a result of this modification, the new program has to make only twenty loops, instead of a hundred. There are now a fifth as many jumps and conditional branches that need to be taken, which over many iterations would be a great improvement in the loop administration time.

On the other hand, the loop unrolling expands the source code size from three lines to seven lines that have to be produced checked and debugged, and the compiler has to allocate more registers to store variables in the expanded loop iteration. In addition, the loop control variables and number of operations inside the unrolled loop structure have to be chosen carefully so that the result is indeed the same as in the original code.

Consider the implications if the iteration count were not divisible by five. It also becomes somewhat more complicated if the test conditions are variables. See also Duff's device.

Early complexity

In the simple case the loop control is merely an administrative overhead, that arranges the productive statements. The loop itself contributes nothing to the results desired, merely saving the programmer the tedium of replicating the code a hundred times which could have been done by a pre-processor generating the replications, or a text editor. Similarly, if-statements and other flow control statements could be replaced by code replication, except that bloated messes soon result. Computer programs easily track the combinations, but humans find this repugnant and make mistakes. Consider for i:=1:8 do if i mod 2 = 0 then do_evenstuff(i) else do_oddstuff(i); next i;This could be fully unrolled, whereupon the if-test involves constants (unlike the previous example) and so the code becomes do_oddstuff(1); do_evenstuff(2); do_oddstuff(3); do_evenstuff(4); do_oddstuff(5); do_evenstuff(6); do_oddstuff(7); do_evenstuff(8);

But of course the stuff done need not be an invocation of a procedure, and this example involves the index variable in computation: x(1) = 1; For i:=2:9 do x(i):=x(i - 1)*i; "print" i,x(i); Next i;Which would unroll to x(1):=1; x(2):=x(1)*2; "print" 2,x(2); x(3):=x(2)*3; "print" 3,x(3); ...etc.which if compiled would produce a lot of code ("print" statements being notorious) but further optimisation is possible. This example make reference only to "x(i)" and "x(i - 1)" in the loop (the latter only to develop the new value "x(i)") so, given that there is no later reference to the array "x" developed here, its usages could be replaced by a simple variable. However it would be quite a surprise if the compiler were to recognise "x(n) = Factorial(n)".

In general, the content of a loop might be large, involving intricate array indexing: such cases should be left alone. Replicating innermost loops might raise many possible optimisations yet yield only a small gain.

Further reading

ee also

* Loop splitting
* Loop fusion
* Duff's device


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Loop splitting — (or loop peeling) is a compiler optimization technique. It attempts to simplify a loop or eliminate dependencies by breaking it into multiple loops which have the same bodies but iterate over different contiguous portions of the index range. A… …   Wikipedia

  • Loop interchange — In compiler theory, loop interchange is the process of exchanging the order of two iteration variables. For example, in the code fragment: for i from 0 to 10 for j from 0 to 20 a [i,j] = i + jloop interchange would result in: for j from 0 to 20… …   Wikipedia

  • Duff's device — In computer science, Duff s device is an optimized implementation of a serial copy that uses a technique widely applied in assembly language for loop unwinding. Its discovery is credited to Tom Duff in November of 1983, who at the time was… …   Wikipedia

  • Duff’s Device — Duff s Device (auf deutsch etwa: Duff Apparat) ist ein nach seinem Erfinder Tom Duff benanntes Programmierverfahren zur Effizienzsteigerung bei Schleifen unter Ausnutzung einer speziellen Eigenschaft der Programmiersprache C. Inhaltsverzeichnis 1 …   Deutsch Wikipedia

  • Размотка цикла — В программировании, размотка цикла (англ. loop unwinding) или раскрутка цикла (англ. loop unrolling) техника оптимизации компьютерных программ, состоящая в искусственном увеличении количества инструкций, исполняемых в течение одной… …   Википедия

  • Раскрутка цикла — В программировании, размотка цикла (англ. loop unwinding) или раскрутка цикла (англ. loop unrolling) техника оптимизации компьютерных программ, состоящая в искусственном увеличении количества инструкций, исполняемых в течение одной итерации цикла …   Википедия

  • Parallélisation interprocédurale de programmes scientifiques — Pour les articles homonymes, voir PIPS. PIPS Développeur Centre de Recherc …   Wikipédia en Français

  • Space-time tradeoff — In computer science, a space time or time memory tradeoff is a situation where the memory use can be reduced at the cost of slower program execution, or vice versa, the computation time can be reduced at the cost of increased memory use. As the… …   Wikipedia

  • Switch statement — In computer programming, a switch statement is a type of control statement that exists in most modern imperative programming languages (e.g., Pascal, C, C++, C#, and Java). Its purpose is to allow the value of a variable or expression to control… …   Wikipedia

  • Binary multiplier — A binary multiplier is an electronic circuit used in digital electronics, such as a computer, to multiply two binary numbers. It is built using binary adders. A variety of computer arithmetic techniques can be used to implement a digital… …   Wikipedia