Schönhage-Strassen algorithm

﻿
Schönhage-Strassen algorithm

The Schönhage-Strassen algorithm is an asymptotically fast multiplication algorithm for large integers. It was developed by Arnold Schönhage and Volker Strassen in 1971. [A. Schönhage and V. Strassen, "Schnelle Multiplikation großer Zahlen", "Computing" 7 (1971), pp. 281–292.] The run-time bit complexity is, in Big O notation, $O\left(Ncdotlog Ncdotloglog N\right)$, while the arithmetic complexity is $O\left(N cdot log N\right)$. [Peter Bürgisser, Michael Clausen and Amin Shokrollahi, Algebraic Complexity Theory, 1997. Springer-Verlag, ISBN 3540605827.] The algorithm uses recursive Fast Fourier transforms in rings with $2^\left\{2^n\right\}+1$ elements, a specific type of number theoretic transform.

The Schönhage-Strassen algorithm was the asymptotically fastest multiplication method known from 1971 to 2007 when a new method, Fürer's algorithm, was announced with lower asymptotic complexity; [Martin Fürer, " [http://www.cse.psu.edu/~furer/Papers/mult.pdf Faster integer multiplication] ", STOC 2007 Proceedings, pp. 57-66.] however, Fürer's algorithm currently only achieves an advantage for astronomically large values and is not used in practice.

In practice the Schönhage-Strassen algorithm starts to outperform older methods such as Karatsuba and Toom–Cook multiplication for numbers beyond $2^\left\{2^\left\{15$ to $2^\left\{2^\left\{17$ (10,000 to 40,000 decimal digits). [Rodney Van Meter and Kohei M. Itoh, " [http://www.appi.keio.ac.jp/Itoh_group/publications/PhysRevA_71_052320.pdf Fast quantum modular exponentiation] ", "Physical Review" A, Vol. 71 (2005).] [ [http://magma.maths.usyd.edu.au/magma/Features/node86.html Overview of Magma V2.9 Features, arithmetic section] : Discusses practical crossover points between various algorithms.] [Luis Carlos Coronado García, " [http://www.cdc.informatik.tu-darmstadt.de/~coronado/Vortrag/MoraviaCrypt-talk-s.pdf Can Schönhage multiplication speed up the RSA encryption or decryption?] ", "University of Technology, Darmstadt" (2005)]

Details

This section explains in detail how Schönhage-Strassen is implemented. It is based primarily on an overview of the method by Crandall and Pomerance in their "Prime Numbers: A Computational Perspective". [R. Crandall & C. Pomerance. "Prime Numbers - A Computational Perspective". Second Edition, Springer, 2005. Section 9.5.6: Schönhage method, pg.459. ISBN 0-387-94777-9] Another source for detailed information is Knuth's "The Art of Computer Programming". [Donald E. Knuth, The Art of Computer Programming, Volume 2: Seminumerical Algorithms (3rd Edition), 1997. Addison-Wesley Professional, ISBN 0201896842. Section 4.3.3.C: Discrete Fourier transforms, pg.305.]

Convolutions

Suppose we are multiplying two numbers like 123 and 456 using long multiplication with base "B" digits, but without performing any carrying. The result might look something like this:

If we perform carrying on the cyclic convolution, the result is equivalent to the product of the inputs mod B"n" + 1. In the example, 103 + 1 = 1001, performing carrying on (28, 23, 5) yields 3035, and 3035 &equiv; 56088 (mod 1001). The negacyclic convolution can contain negative numbers, which can be eliminated during carrying using borrowing, as is done in long subtraction.

Convolution theorem

Like other multiplication methods based on the Fast Fourier transform, Schönhage-Strassen depends fundamentally on the convolution theorem, which provides an efficient way to compute the cyclic convolution of two sequences. It states that:

:The cyclic convolution of two vectors can be found by taking the discrete Fourier transform (DFT) of each of them, multiplying the resulting vectors element by element, and then taking the inverse discrete Fourier transform (IDFT).

Or in symbols:

:CyclicConvolution(X, Y) = IDFT(DFT(X) &middot; DFT(Y))

If we compute the DFT and IDFT using a fast Fourier transform algorithm, and invoke our multiplication algorithm recursively to multiply the entries of the transformed vectors DFT(X) and DFT(Y), this yields an efficient algorithm for computing the cyclic convolution.

In this algorithm, it will be more useful to compute the "negacyclic" convolution; as it turns out, a slightly modified version of the convolution theorem can enable this as well. Suppose the vectors X and Y have length "n", and "a" is a primitive root of unity of order 2"n" (that is, "a"2"n" = 1 and "a" to all smaller powers is not 1). Then we can define a third vector A, called the "weight vector", as:

:A = ("a""j"), 0 &le; "j" < "n":A-1 = ("a""-j"), 0 &le; "j" < "n"

Now, we can state:

:NegacyclicConvolution(X, Y) = A-1 &middot; IDFT(DFT(A &middot; X) &middot; DFT(A &middot; Y))

In other words, it's the same as before except that the inputs are first multiplied by A, and the result is multiplied by A-1.

Choice of ring

The discrete Fourier transform is an abstract operation that can be performed in any algebraic ring; typically it's performed in the complex numbers, but actually performing complex arithmetic to sufficient precision to ensure accurate results for multiplication is slow and error-prone. Instead, we will use the approach of the number theoretic transform, which is to perform the transform in the integers mod N for some integer N. Just like there are primitive roots of unity of every order in the complex plane, given any order "n" we can choose a suitable N such that "b" is a primitive root of unity of order "n" in the integers mod N (in other words, "b""n" &equiv; 1 (mod N), and no smaller power of "b" is equivalent to 1 mod N).

The algorithm will spend most of its time performing recursive multiplications of smaller numbers; with a naive algorithm, these occur in a number of places:

# Inside the fast Fourier transform algorithm, where the primitive root of unity "b" is repeatedly powered, squared, and multiplied by other values.
# When taking powers of the primitive root of unity "a" to form the weight vector A and when multiplying A or A-1 by other vectors.
# When performing element-by-element multiplication of the transformed vectors.

The key insight to Schönhage-Strassen is to choose N, the modulus, to be equal to 2"n" + 1 for some integer "n". This has a number of benefits in standard systems that represent large integers in binary form:

* Any value can be rapidly reduced modulo 2"n" + 1 using only shifts and adds, as explained in the next section.
* All primitive roots of unity in this ring can be written in the form 2"k"; consequently we can multiply or divide any number by a root of unity using a shift, and power or square a root of unity by operating only on its exponent.
* The element-by-element recursive multiplications of the transformed vectors can be performed using a negacyclic convolution, which is faster than an acyclic convolution and already has "for free" the effect of reducing its result mod 2"n" + 1.

To make the recursive multiplications convenient, we will frame Schönhage-Strassen as being a specialized multiplication algorithm for computing not just the product of two numbers, but the product of two numbers mod 2n + 1 for some given "n". This is not a loss of generality, since one can always choose "n" large enough so that the product mod 2n + 1 is simply the product.

Shift optimizations

In the course of the algorithm, there are many cases in which multiplication or division by a power of two (including all roots of unity) can be profitably replaced by a small number of shifts and adds. This makes use of the observation that:

:2"n" &equiv; −1 (mod 2"n" + 1)

This makes it simple to reduce a number represented in binary mod 2"n" + 1: take the rightmost (least significant) "n" bits, subtract the next "n" bits, add the next "n" bits, and so on until the bits are exhausted. If the resulting value is still not between 0 and 2"n", normalize it by adding or subtracting a multiple of the modulus 2"n" + 1. For example, if "n"=3 (and so the modulus is 23+1 = 9) and the number being reduced is 656, we have:

:656 = 10100100002 &equiv; 0002 - 0102 + 0102 - 12 = 0 - 2 + 2 - 1 = −1 &equiv; 8 (mod 23 + 1).

Moreover, it's possible to effect very large shifts without ever constructing the shifted result. Suppose we have a number A between 0 and 2"n", and wish to multiply it by 2"k". Dividing "k" by "n" we find "k" = "qn" + "r" with "r" < "n". It follows that:

:A(2"k") = A(2"qn" + "r") = A [(2"n")"q"(2"r")] &equiv; (−1)"q"(A shift-left "r") (mod 2"n" + 1).

Since A is &le; 2"n" and "r" < "n", A shift-left "r" has at most 2"n"−1 bits, and so only one shift and subtraction (followed by normalization) is needed.

Finally, to divide by 2"k", observe that squaring the first equivalence above yields:

:22"n" &equiv; 1 (mod 2"n" + 1)

Hence,

:A/2"k" = A(2−"k") &equiv; A(22"n" − "k") = A shift-left (2"n" − "k") (mod 2"n" + 1).

Overview

Given an input numbers "x" and "y", and an integer "N", the following algorithm computes "xy" mod 2"N" + 1:

# Split each input number into vectors X and Y of 2"k" parts each, where 2"k" divides "N". (e.g. 12345678 -> (12, 34, 56, 78)).
# In order to make progress, it's necessary to use a smaller "N" for recursive multiplications. For this purpose choose "n" as the smallest integer at least 2"n"/2"k" + "k" and divisible by 2"k".
# Compute the product of X and Y mod 2"n" + 1 using the negacyclic convolution:
## Multiply X and Y each by the weight vector using shifts (shift the "j"th entry left by "jn"/2"k").
## Compute the DFT of X and Y using the number-theoretic FFT (perform all multiplies using shifts; for the 2"k"-th root of unity, use 22"n"/2"k").
## Recursively apply this algorithm to multiply corresponding elements of the transformed X and Y.
## Compute the IDFT of the resulting vector to get the result vector C (perform all multiplies using shifts).
## Multiply the result vector C by A-1 using shifts.
## Adjust signs: some elements of the result may be negative. We compute the largest possible positive value for the "j"th element of C, (j + 1)22N/2"k", and if it exceeds this we subtract the modulus 2"n" + 1.
# Finally, perform carrying mod 2N+1 to get the final result.

In step 2, the observation is used that:
* Each element of the input vectors has at most "n"/2"k" bits;
* The product of any two input vector elements has at most 2"n"/2"k" bits;
* Each element of the convolution is the sum of at most 2"k" such products, and so cannot exceed 2"n"/2"k" + "k" bits.
* "n" must be divisible by 2"k" to ensure that in the recursive calls the condition "2"k" divides "N" holds in step 1.

The parameter "k" is fixed and in practice is typically set to a small value such as 5, 6, or 7, depending on the size of the input numbers.

Optimizations

This section explains a number of important practical optimizations that have been considered when implementing Schönhage-Strassen in real systems. It is based primarily on a 2007 work by Gaudry, Kruppa, and Zimmermann describing enhancements to the GNU Multi-Precision Library. [Pierrick Gaudry, Alexander Kruppa, and Paul Zimmermann. [http://www.loria.fr/~gaudry/publis/issac07.pdf A GMP-based Implementation of Schönhage-Strassen’s Large Integer Multiplication Algorithm] . Proceedings of the 2007 International Symposium on Symbolic and Algebraic Computation, pp.167&ndash;174.]

Below a certain cutoff point, it's more efficient to perform the recursive multiplications using other algorithms, such as Toom–Cook multiplication. The results must be reduced mod 2"n" + 1, which can be done efficiently as explained above with shifts and adds/subtracts.

Computing the IDFT involves dividing each entry by the primitive root of unity 22"n"/2"k", an operation that is frequently combined with multiplying the vector by A-1 afterwards, since both involve division by a power of two.

In a system where a large number is represented as an array of 2"w"-bit words, it's useful to ensure that the vector size 2"k" is also a multiple of the bits per word by choosing "k" &ge; "w"; this allows the inputs to be broken up into pieces without bit shifts, and provides a uniform representation for values mod 2"n" + 1 where the high word can only be zero or one.

Normalization involves adding or subtracting the modulus 2"n"+1; this value has only two bits set, which means this can be done in constant time on average with a specialized operation.

Iterative FFT algorithms such as the Cooley-Tukey FFT algorithm, although frequently used for FFTs on vectors of complex numbers, tend to exhibit very poor cache locality with the large vector entries used in Schönhage-Strassen. The straightforward recursive, not in-place implementation of FFT is more successful, with all operations fitting in the cache beyond a certain point in the call depth, but still makes suboptimal use of the cache in higher call depths. Gaudry, Kruppa, and Zimmerman used a technique combining Bailey's 4-step algorithm with higher radix transforms that combine multiple recursive steps. They also mix phases, going as far into the algorithm as possible on each element of the vector before moving on to the next one.

The "square root of 2 trick", first described by Schönhage, is to note that, provided "k" &ge; 2, 23"n"/4−2"n"/4 is a square root of 2 mod 2"n"+1, and so a 4"n"-th root of unity (since 22"n" &equiv; 1). This allows the transform length to be extended from 2"k" to 2"k" + 1.

Finally, the authors are careful to choose the right value of "k" for different ranges of input numbers, noting that the optimal value of "k" may go back and forth between the same values several times as the input size increases.

References

Wikimedia Foundation. 2010.

Look at other dictionaries:

• Schönhage-Strassen — Der Schönhage Strassen Algorithmus ist ein Algorithmus zur Multiplikation zweier n stelliger ganzer Zahlen. Er wurde 1971 von Arnold Schönhage und Volker Strassen entwickelt.[1] Der Algorithmus basiert auf einer „superschnellen“ Variante der… …   Deutsch Wikipedia

• Schönhage-Strassen-Algorithmus — Der Schönhage Strassen Algorithmus ist ein Algorithmus zur Multiplikation zweier n stelliger ganzer Zahlen. Er wurde 1971 von Arnold Schönhage und Volker Strassen entwickelt.[1] Der Algorithmus basiert auf einer „superschnellen“ Variante der… …   Deutsch Wikipedia

• Multiplication algorithm — A multiplication algorithm is an algorithm (or method) to multiply two numbers. Depending on the size of the numbers, different algorithms are in use. Efficient multiplication algorithms have existed since the advent of the decimal system.… …   Wikipedia

• Fürer's algorithm — is an integer multiplication algorithm for very large numbers possessing a very low asymptotic complexity. It was created in 2007 by Martin Fürer of Pennsylvania State UniversityFuhrer, M. (2007). Faster Integer Multiplication in Proceedings of… …   Wikipedia

• Arnold Schönhage — (born 1934) is a mathematician and computer scientist and Professor Emeritus at Rheinische Friedrich Wilhelms Universität, Bonn. He was also professor in Tübingen and Konstanz. Schönhage now lives near Bonn, Germany.Schönhage together with Volker …   Wikipedia

• Volker Strassen — is a German mathematician. He received in 2003, with three others, the Paris Kanellakis Award of the ACM, for the Solovay Strassen primality test.In 1971 Strassen published a paper together with Arnold Schönhage on asymptotically fastinteger… …   Wikipedia

• Cipolla's algorithm — In computational number theory, Cipolla s algorithm is a technique for solving a congruence of the form x2 = n, where , so n is the square of x, and where p is an odd prime. Here denotes the finite field with p elements; . Th …   Wikipedia

• Williams' p + 1 algorithm — In computational number theory, Williams p + 1 algorithm is an integer factorization algorithm, one of the family of algebraic group factorisation algorithms. It was invented by Hugh C. Williams in 1982. It works well if the number N to be… …   Wikipedia

• Cornacchia's algorithm — In computational number theory, Cornacchia s algorithm is an algorithm for solving the Diophantine equation x2 + dy2 = m, where and d and m are coprime. The algorithm was described in 1908 by Giuseppe Cornacchia[1]. Contents 1 The algorithm …   Wikipedia

• Karatsuba algorithm — The Karatsuba multiplication algorithm, a technique for quickly multiplying large numbers, was discovered by Anatolii Alexeevich Karatsuba in 1960 and published in the joint paper with Yu. Ofman in 1962. It reduces the multiplication of two n… …   Wikipedia