Streaming SIMD Extensions


Streaming SIMD Extensions

Streaming SIMD Extensions (SSE) is a SIMD (Single Instruction, Multiple Data) instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! (which had debuted a year earlier). SSE contains 70 new instructions.

It was originally known as KNI for "Katmai New Instructions" ("Katmai" being the code name for the first Pentium III core revision). During the Katmai project Intel was looking to distinguish it from their earlier product line, particularly their flagship Pentium II. AMD eventually added support for SSE instructions, starting with its Athlon XP processor. It was later renamed ISSE, for Internet Streaming SIMD Extensions, then SSE.

Intel was generally disappointed with their first IA-32 SIMD effort, MMX. MMX had two main problems: it re-used existing floating point registers making the CPU unable to work on both floating point and SIMD data at the same time, and it only worked on integers.

Registers

SSE originally added eight new 128-bit registers known as XMM0 through XMM7. The x86-64 extensions from AMD (originally called "AMD64") and later duplicated by Intel add a further eight registers XMM8 through XMM15. There is also a new 32-bit control/status register, MXCSR. All the 16 128 bit XMM registers are accessible only in 64-bit operating mode.

Each register packs together four 32-bit single-precision floating point numbers or two 64-bit double-precision floating point numbers or four 32-bit integers or eight 16-bit short integers or sixteen 8-bit bytes or characters. The Integer operations have instructions for signed and unsigned variants. Integer SIMD operations may still be performed with the eight 64-bit MMX registers.

Because these 128-bit registers are additional program states that the operating system must preserve across task switches, they are disabled by default until the operating system explicitly enables them. This means that the OS must know how to use the FXSAVE and FXRSTOR instructions, which is the extended pair of instructions which can save all x87 and SSE register states all at once. This support was quickly added to all major IA-32 operating systems.

Because SSE adds floating point support, it sees much more use than MMX. The addition of SSE2's integer support makes SSE even more flexible. While MMX is redundant, operations can be operated in parallel with SSE operations offering further performance increases in some situations.

The first CPU to support SSE, the Pentium III, shared execution resources between SSE and the FPU. While a compiled application can interleave FPU and SSE instructions side-by-side, the Pentium III will not issue an FPU and an SSE instruction in the same clock-cycle. This limitation reduces the effectiveness of pipelining, but the separate XMM registers do allow SIMD and scalar floating point operations to be mixed without the performance hit from explicit MMX/floating point mode switching.

SSE Instructions

SSE introduced both scalar and packed floating point instructions.

Floating point instructions

* Memory-to-Register / Register-to-Memory / Register-to-Register data movement
** Scalar – MOVSS
** Packed – MOVAPS, MOVUPS, MOVLPS, MOVHPS, MOVLHPS, MOVHLPS
* Arithmetic
** Scalar – ADDSS, SUBSS, MULSS, DIVSS, RCPSS, SQRTSS, MAXSS, MINSS, RSQRTSS
** Packed – ADDPS, SUBPS, MULPS, DIVPS, RCPPS, SQRTPS, MAXPS, MINPS, RSQRTPS
* Compare
** Scalar – CMPSS, COMISS, UCOMISS
** Packed – CMPPS
* Data shuffle and unpacking
** Packed – SHUFPS, UNPCKHPS, UNPCKLPS
* Data-type conversion
** Scalar – CVTSI2SS, CVTSS2SI, CVTTSS2SI
** Packed – CVTPI2PS, CVTPS2PI, CVTTPS2PI
* Bitwise logical operations
** Packed – ANDPS, ORPS, XORPS, ANDNPS

Integer instructions

* Arithmetic
** PMULHUW, PSADBW, PAVGB, PAVGW, PMAXUB, PMINUB, PMAXSW, PMINSW
* Data movement
** PEXTRW, PINSRW
* Other
** PMOVMSKB, PSHUFW

Other instructions

* MXCSR management
** LDMXCSR, STMXCSR
* Cache and Memory management
** MOVNTQ, MOVNTPS, MASKMOVQ, PREFETCH0, PREFETCH1, PREFETCH2, PREFETCHNTA, SFENCE

Example

The following simple example demonstrates the advantage of using SSE. Consider an operation like vector addition, which is used very often in computer graphics applications. To add two single precision, 4-component vectors together using x87 requires four floating point addition instructions

vec_res.x = v1.x + v2.x;
vec_res.y = v1.y + v2.y;
vec_res.z = v1.z + v2.z;
vec_res.w = v1.w + v2.w;

This would correspond to four x87 FADD instructions in the object code. On the other hand, as the following pseudo-code shows, a single 128 bit 'packed-add' instruction can replace the four scalar addition instructions.

movaps xmm0,address-of-v1 ;xmm0=v1.w | v1.z | v1.y | v1.x
addps xmm0,address-of-v2 ;xmm0=v1.w+v2.w | v1.z+v2.z | v1.y+v2.y | v1.x+v2.x movaps address-of-vec_res,xmm0

Later versions

*SSE2, introduced with the Pentium 4, is a major enhancement to SSE (which some programmers renamed "SSE1"). SSE2 adds new math instructions for double-precision (64-bit) floating point and also extends MMX instructions to operate on 128-bit XMM registers. Until SSE4 [see below] , SSE integer instructions introduced with later SSE extensions would still operate on 64-bit MMX registers because the new XMM registers require operating system support. SSE2 enables the programmer to perform SIMD math of virtually any type (from 8-bit integer to 64-bit float) entirely with the XMM vector-register file, without the need to touch the (legacy) MMX/FPU registers. Many programmers consider SSE2 to be "everything SSE should have been", as SSE2 offers an orthogonal set of instructions for dealing with common datatypes.

*SSE3, also called Prescott New Instructions, is an incremental upgrade to SSE2, adding a handful of DSP-oriented mathematics instructions and some process (thread) management instructions.

*SSSE3 is an incremental upgrade to SSE3, adding 16 new opcodes which include permuting the bytes in a word, multiplying 16-bit fixed-point numbers with correct rounding, and within-word accumulate instructions. SSSE3 is often mistaken for SSE4 as this term was used during the development of the Core microarchitecture.

*SSE4 is another major enhancement, adding a dot product instruction, lots of additional integer instructions, a popcnt instruction, and more. SSE4 ends MMX register support. [ [http://www.intel.com/technology/architecture-silicon/sse4-instructions/index.htm Intel® Streaming SIMD Extensions 4 Instruction Set Innovation ] ]

*SSE5 is a new iteration announced by AMD in August 2007. [cite news |url=http://www.theregister.co.uk/2007/08/30/amd_sse5/ |title=AMD plots single thread boost with x86 extensions |publisher=The Register |date=30 August 2007 |accessdate=2008-02-01] [http://developer.amd.com/sse5.jsp]

*AVX (Advanced Vector Extensions) is an advanced version of SSE announced by Intel featuring a widened data path from 128 bits to 256 bits and 3-operand instructions (up from 2). Products implementing AVX are slated for 2010. [http://www.extremetech.com/article2/0,1697,2276803,00.asp]

ee also

*Assembly language
*MMX (instruction set)
*3DNow!
*AltiVec
*Pentium III's SSE implementation

References


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Streaming SIMD Extensions — Streaming SIMD Extensions, généralement abrégé SSE, est un jeu de 70 instructions supplémentaires pour microprocesseurs x86, apparu en 1999 sur le Pentium III en réponse au 3DNow! d AMD apparu 1 an plus tôt. Le fonctionnement est de type SIMD. Il …   Wikipédia en Français

  • Streaming SIMD Extensions — Die Streaming SIMD Extensions (SSE), früher auch Internet Streaming Extensions (ISSE) ist eine von Intel entwickelte Befehlssatzerweiterung der X86er Architektur, welche mit der Einführung des Pentium III (Katmai) Prozessors vorgestellt wurde und …   Deutsch Wikipedia

  • Streaming SIMD Extensions 5 — SSE5 (Streaming SIMD Extensions 5) war eine geplante Befehlssatzerweiterung von AMD für x86 Prozessoren. Sie umfasste 47 Befehle, die vor allem zur Beschleunigung bestimmter Algorithmen, beispielsweise FFT, DCT oder der Verschlüsselung mit AES,… …   Deutsch Wikipedia

  • Streaming SIMD Extensions — SSE (англ. Streaming SIMD Extensions, потоковое SIMD расширение процессора) это англ. Single Instruction, Multiple Data, Одна инструкция множество данных) набор инструкций, разработанный Pentium III как ответ на аналогичный набор инструкций… …   Википедия

  • Streaming SIMD Extensions 2 — SSE2 (Streaming SIMD Extensions 2) ist eine x86 Befehlssatzerweiterung, die Intel mit dem Intel Pentium 4 einführte. Die mit SSE eingeführten 128 Bit Register können in SSE2 auch mit MMX Operationen verwendet werden. SSE2 ermöglicht die… …   Deutsch Wikipedia

  • Streaming SIMD Extensions 3 — Die Streaming SIMD Extensions 3 (kurz SSE3) ist die zweite Erweiterung des SSE Befehlssatzes. Sie ist auch unter dem Intel Codenamen Prescott New Instructions (PNI) bekannt, da sie zuerst bei der Prescott Variante des Pentium 4 ab Frühjahr 2004… …   Deutsch Wikipedia

  • Streaming SIMD Extensions 2 — SSE2 (англ. Streaming SIMD Extensions 2, потоковое SIMD расширение процессора)  это Pentium 4. SSE2 использует восемь 128 битных регистров (xmm0 до xmm7), включённых в архитектуру x86 с вводом расширения SSE, каждый из которых трактуется как 2… …   Википедия

  • Streaming SIMD Extensions 4 — SSE4 (Streaming SIMD Extensions 4) ist eine Befehlssatzerweiterung, die bei AMD seit K10 und bei Intel seit der Penryn Variante der Core 2 Prozessoren verwendet wird. Der zweite Teil, SSE4.2 genannt, wurde mit der Intel Nehalem Mikroarchitektur… …   Deutsch Wikipedia

  • Streaming SIMD Extensions —    Abbreviated SSE. A set of multimedia instructions introduced with Intel s Pentium III. SSE provides four main improvements over the Pentium II instruction set:    • Eight new directly addressable 128 bit floating point registers.    • Eight… …   Dictionary of networking

  • Internet Streaming SIMD Extensions — Streaming SIMD Extensions Streaming SIMD Extensions, généralement abrégé SSE, est un jeu de 70 instructions supplémentaires pour microprocesseurs x86, apparu en 1999 sur le Pentium III en réponse au 3DNow! d AMD apparu 1 an plus tôt. Le… …   Wikipédia en Français


Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”

We are using cookies for the best presentation of our site. Continuing to use this site, you agree with this.