General Matrix Multiply

The General Matrix Multiply (GEMM) is a subroutine in the Basic Linear Algebra Subprograms (BLAS) which performs matrix multiplication, that is the multiplication of two matrices. This includes:

SGEMM for single precision,
DGEMM for double-precision,
CGEMM for complex single precision, and
ZGEMM for complex double precision.

GEMM is often tuned by High Performance Computing vendors to run as fast as possible, because it is the building block for so many other routines. It is also the most important routine in the LINPACK benchmark. For this reason, implementations of fast BLAS library typically focus first on GEMM performance.

Operation

The xGEMM routine calculates the new value of matrix C based on the matrix-product of matrices A and B, and the old value of matrix C

$C \leftarrow \alpha A B + \beta C$

where $α$ and $β$ values are scalar coefficients.

The Fortran interface for these procedures are:

 SUBROUTINE xGEMM ( TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC )

where TRANSA and TRANSB determines if the matrices A and B are to be transposed. M is the number of rows in matrix C and, depending on TRANSA, the number of rows in the original matrix A or its transpose. N is the number of columns in matrix C and, depending on TRANSB, the number of columns in the matrix B or its transpose. K is the number of columns in matrix A (or its transpose) and rows in matrix B (or its transpose). LDA, LDB and LDC specifies the size of the first dimension of the matrices, as laid out in memory; meaning the memory distance between the start of each row/column, depending on the memory structure (Dongarra et al. 1990).

The C interface for these procedures are similar:

 void cblas_xgemm (
     const enum CBLAS_ORDER Order,
     const enum CBLAS_TRANSPOSE TransA,
     const enum CBLAS_TRANSPOSE TransB,
     const int M,
     const int N,
     const int K,
     const SCALAR alpha,
     const TYPE * A,
     const int lda,
     const TYPE * B,
     const int ldb,
     const SCALAR beta,
     TYPE * C,
     const int ldc)

where 'x' in name is s,d,c, or z meaning single precision real, double precision real, single precision complex, or double precision complex, and other types have similar meanings as in Fortran interface.^{[citation needed]}

For the C interface, a higher API also exists. For example, in the GNU Scientific Library, a corresponding interface is (take the double precision real case for instance):

 int gsl_blas_dgemm(
     CBLAS_TRANSPOSE_t TransA,
     CBLAS_TRANSPOSE_t TransB,
     double alpha,
     const gsl_matrix * A,
     const gsl_matrix * B,
     double beta,
     gsl_matrix * C)

where TransA is CblasNoTrans or CblasTrans for A or transposed A respectively, and TransB has similar choices (GSL Team 2007).

Optimization

Not only is GEMM an important building block of other numeric software, it is often an important building block for calls to GEMM for larger matrices. By decomposing one or both of the input matrices into block matrices, GEMM can be used repeatedly on the smaller blocks to build up a result for the full matrix. This is one of the motivations for including the $β$ parameter, so the results of previous blocks can be accumulated. Note that this decomposition requires the special case $β = 1$ which many implementations optimize for, thereby eliminating one multiplication for each value of C.

This decomposition allows for better locality of reference both in space and time of the data used in the product. This, in turn, takes advantage of the cache on the system (Golub & Van Loan 1996, Ch. 1). For systems with more than one level of cache, the blocking can be applied a second time to the order in which the blocks are used in the computation. Both of these levels of optimization are used in implementations such as ATLAS. More recently, implementations by Kazushige Goto have shown that blocking only for the L2 cache, combined with careful amortizing of copying to contiguous memory to reduce TLB misses, is superior to ATLAS. A highly tuned implementation based on these ideas are part of the GotoBLAS.

References

Dongarra, Jack J.; Croz, Jeremy Du; Hammarling, Sven; Duff, Iain S. (1990), "A set of level 3 basic linear algebra subprograms", ACM Transactions on Mathematical Software 16 (1): 1–17, doi:10.1145/77626.79170, ISSN 0098-3500 .
Golub, Gene H.; Van Loan, Charles F. (1996), Matrix Computations (3rd ed.), Johns Hopkins, ISBN 978-0-8018-5414-9 .
GSL Team (2007), "§12.1.3 Level 3 GSL BLAS Interface", GNU Scientific Library. Reference Manual, http://www.gnu.org/software/gsl/manual/html_node/Level-3-GSL-BLAS-Interface.html .
Goto, Kazushige; van de Geijn, Robert A. (2008), "Anatomy of High-Performance Matrix Multiplication", ACM Transactions on Mathematical Software 34 (3): Article 12, 25 pages, doi:10.1145/1356052.1356053 .

v · d · eNumerical linear algebra

Key concepts	Floating point · Numerical stability · BLAS

Problems	Matrix multiplication · Matrix decompositions · Linear equations · Sparse problems

Hardware	CPU cache · TLB · Cache-oblivious algorithm · SIMD · Multiprocessing

Software	Specialized libraries · General purpose software

Categories:

C libraries
Fortran libraries
Numerical linear algebra
Numerical software

Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

Matrix multiplication — In mathematics, matrix multiplication is a binary operation that takes a pair of matrices, and produces another matrix. If A is an n by m matrix and B is an m by p matrix, the result AB of their multiplication is an n by p matrix defined only if… … Wikipedia
Matrix chain multiplication — is an optimization problem that can be solved using dynamic programming. Given a sequence of matrices, we want to find the most efficient way to multiply these matrices together. The problem is not actually to perform the multiplications, but… … Wikipedia
Matrix exponential — In mathematics, the matrix exponential is a matrix function on square matrices analogous to the ordinary exponential function. Abstractly, the matrix exponential gives the connection between a matrix Lie algebra and the corresponding Lie group.… … Wikipedia
Multiply–accumulate operation — In computing, especially digital signal processing, the multiply–accumulate operation is a common step that computes the product of two numbers and adds that product to an accumulator. The hardware unit that performs the operation is known as a… … Wikipedia
Multiply-accumulate — In computing, especially digital signal processing, multiply accumulate is a common operation that computes the product of two numbers and adds that product to an accumulator.: a leftarrow a + b imes cWhen done with floating point numbers it… … Wikipedia
Matrix Math Extensions — Intel Prozessor mit MMX Die Multi Media Extension (kurz MMX) ist eine Anfang 1997 von Intel auf den Markt gebrachte Rechnerarchitektur, die es erlaubt, größere Datenmengen parallelisiert und somit schneller zu verarbeiten. Die… … Deutsch Wikipedia
Triangular matrix — In the mathematical discipline of linear algebra, a triangular matrix is a special kind of square matrix where the entries either below or above the main diagonal are zero. Because matrix equations with triangular matrices are easier to solve… … Wikipedia
Rotation matrix — In linear algebra, a rotation matrix is a matrix that is used to perform a rotation in Euclidean space. For example the matrix rotates points in the xy Cartesian plane counterclockwise through an angle θ about the origin of the Cartesian… … Wikipedia
Data General Nova — System Data General Nova 1200 front panel … Wikipedia
Positive-definite matrix — In linear algebra, a positive definite matrix is a matrix that in many ways is analogous to a positive real number. The notion is closely related to a positive definite symmetric bilinear form (or a sesquilinear form in the complex case). The… … Wikipedia

Academic Dictionaries and Encyclopedias

General Matrix Multiply

Operation

Optimization

References

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

General Matrix Multiply

Operation

Optimization

References

Look at other dictionaries:

Share the article and excerpts

Direct link