AMD FireStream

AMD FireStream

The AMD FireStream (previously branded as both ATI FireStream and AMD Stream Processor [ [,,51_104_543_13744~114146,00.html AMD Press Release] ] ) is a stream processor developed by ATI Technologies. It is designed to target various industries, such as the High Performance Computing (HPC), scientific, and financial sectors, which utilize the stream processing/GPGPU (General Purpose Graphics Processing Units) concept for heavy floating-point computations. The AMD FireStream can also be used as a floating-point co-processor for offloading CPU calculations, which is part of the Torrenza initiative.


Since the release of the past-generation Radeon R520 and GeForce G70 GPU cores, the programmable shaders architecture with large floating-point (FP) throughput has drawn more attention from academic and commercial interest groups, primarily for its ability to process data besides its original intended use of rendering visual effects. Due to the displayed interest, more resources were allocated towards developing GPGPU products – responsible for calculating general purpose mathematical formulas – to process heavy calculations which were previously running on mainstream servers, desktop Central Processing Units (CPU), and specialized floating-point math co-processors. GPGPUs were projected to have performance gains upwards of a factor of 10 when compared to CPU-only projections.

Similar GPGPUs appeared as early as the early 2000s. BionicFX was experimenting with processing audio data with a GeForce 6800 video card, announcing the Audio Video EXchange (AVEX) framework [ [,1558,1642832,00.asp?kc=ETRSS02129TX1K0000532 ExtremeTech report] , retrieved July 17, 2007] , with similar trials being performed by ATI at about the same time. Another example is the Folding@Home distributed computing research program from Stanford University. This was the first piece of software to use the Radeon R580 GPU and other ATI GPU cores, equipped with a special beta version of the ATI Catalyst driver (version 6.5), to perform computations unrelated to graphics. Since May 2006, it has used the GPU cores to accelerate the simulation of protein folding in order to investigate protein-related diseases. At this time, the ATI FireStream was in its planning stages.

With the acquisition of ATI complete, AMD officially announced the reconstruction of branding and announced the AMD Stream Processor (originally the ATI FireStream) on November 15, 2006 as the industry's first commercially available hardware stream processing solution. Based on an ATI Radeon X1900 video card, the AMD Stream Processor is a specialized add-on card that implements the R580 Graphics Processing Unit (GPU). However, it was targeted at complex floating-point calculations used in scientific and financial fields instead of 3D graphics acceleration. AMD claimed that this processor had 8 times the floating-point performance over traditional graphics data processing. [,,51_104_543~114147,00.html AMD “Close to Metal” Technology Unleashes the Power of Stream Computing] : AMD Press Release, November 14, 2006.]

In fact, ATI had put considerable effort in to research and development (R&D) of a GPGPU product before their purchase by AMD [PDF| [ ATI DPVM SIGGRAPH 2006 sketch] |134 KiB , PDF| [ ATI DVPM SIGGRAPH 2006 Presentation] |671 KiB ] , and announced the adoption of the stream processing/GPGPU concept in its line of GPU cores in 2006, codenamed "Radeon R580".

The brand was further renamed to AMD FireStream with the second generation of stream processors (based on a 55 nm process), released on November 8, 2007. [ [,,51_104_543~121775,00.html AMD Press Release] , retrieved November 12, 2007] Future plans include the development of a stream processor on an MXM module, intended for embedded applications and next generation products in the fourth quarter of 2008.


The Radeon series graphics processors are 32-bit single-precision floating point vector processors. Due to the highly parallel nature of vector processors, these processors have had a huge impact in specific data processing applications. The mass client project "Folding@Home" has reported speed improvements of 20 to 40 times using an R580-based graphics card. [ AnandTech report] : ATI's Stream Processing & Folding@Home, September 30, 2006]

The Radeon R580 core includes a total of 48 pixel and vertex shaders, which become parallel processors in floating-point calculations. The ATI FireStream add-on card utilizes the PCI Express x16 interface to provide 8 GiB/s bandwidth. The card is equipped with 1 GB GDDR3 local memory while the GPU runs at 600 MHz core frequency and 1300 MHz memory frequency. The core has the ability to execute 512 threads simultaneously (Simultaneous multithreading, SMT), at a rated TDP of 165 W. The main difference between the AMD FireStream and ordinary Radeon series video cards is that the stream processor on the FireStream lacks video output connectors.

The stream processing hardware comes with a hardware interface called THIN (Thin Hardware INterface), or Close to Metal (CTM, previously named "Data Parallel Virtual Machine"), to open the GPU architecture in addition to native instruction sets to program developers. This allows to direct control of the stream processors/ALUs and the memory controllers, and permits bypassing of the 3D API layer.

The AMD Stream Processing lineup saw an update to the latest GPU architecture (the Radeon R600) with the release of the latest-generation FireGL video cards on August 7, 2007, which are also capable of stream processing. The architecture was manufactured on the same 80 nm fabrication process node as R580, with more parallel processors and stream processing units. In addition, the maximum GDDR4 memory was increased to 2 GB, providing a maximum of 128 GiB/s of memory bandwidth. The "R600 XTX" core-based FireGL products released (FireGL V8600 and FireGL V8650) consume more power than the first-generation ATI FireStream, with rated thermal design powers (TDP) of under 225 W and over 255 W respectively.

The second generation, the AMD FireStream 9170, is based on the RV670 core and is constructed using a 55 nm fabrication process. It features industry's first hardware-based support for double-precision floating-point numbers, asynchronous DMA (giving the stream processors and onboard memory the ability to exchange data without CPU intervention), memory export functionality, and reduced power consumption (less than 150 Watts with 2 GB GDDR3 memory onboard on a PCI-E 2.0 interface, providing 16 GiB/s device I/O bandwidth).

The latest generation of products in the AMD FireStream line is FireStream 9250, based on the RV770 core and is manufactured using 55 nm fabrication process. It features 1 TFLOPS of raw floating-point power on single-precision operations, 1 GiB of GDDR3 memory and a single-slot cooler.

AMD stream processing lineup

The hardware specifications of stream processors released by AMD (and previously ATI) are summarized as follows:


:"NB1": The number of Stream Processing Units (SPU) can only be applied to DirectX 10-compatible hardware, which contains unified shaders (implemented in products based on the Radeon R600 and higher architectures, in which there are 5 ALUs in each shader, 4 for MADD/ADD/MUL/integer ADD/dot product and 1 for transcendental functions), making 320 Stream Processing Units in 64 unified shaders. Also note that the Stream Processing Units in ATI hardware implementations is architecturally different from NVIDIA's implementation of Stream Processors in Tesla products. The SP in NVIDIA's implementation have a higher clock frequency (2.35x or higher) than the other parts of the core, while SPUs in ATI's implementation have the same clock frequency as the core.

:"NB2": The first generation of products originally used the "ATI FireStream" brand, and were re-branded as "AMD Stream Processor" after the brand reconstruction act that was followed by AMD's acquisition of ATI. AMD refers to the Radeon RV670-based AMD FireStream 9170 because no R600-based AMD Stream Processors were released under the stream processing lineup (although prototype cards were publicly demonstrated with similar configurations as the FireGL V8650 without video output capabilities). Since the FireGL 2007 series, the high-end and ultra high-end FireGL products have implemented stream processing support.

:"NB3": Estimated figure.


The AMD FireStream was launched with a wide range of software platform support. One of the supporting firms was PeakStream (acquired by Google in June 2007), who was first to provide an open beta version of software to support CTM and AMD FireStream as well as x86 and Cell (Cell Broadband Engine) processors. The FireStream was claimed to be 20 times faster in typical applications than regular CPUs after running PeakStream's software Fact|date=July 2007. RapidMind also provided stream processing software that worked with ATI and NVIDIA, as well as Cell processors.

oftware Development Kit

AMD first released its Stream Computing SDK (v1.0), in December 2007 under the AMD EULA, to be run on Windows XP. [ [ Stream Computing SDK download page] and [ Stream Computing SDK EULA] , retrieved December 29, 2007] The SDK includes "Brook+", an AMD hardware optimized version of the Brook language developed by Stanford University, itself a variant of the ANSI C (C language), open-sourced and optimized for stream computing. The AMD Core Math Library (ACML) and AMD Performance Library (APL) with optimizations for the AMD FireStream and the COBRA video library for video transcoding acceleration will also be included. Another important part of the SDK, the Compute Abstraction Layer (CAL), is a software development layer aimed for low-level access, through the CTM hardware interface, to the GPU architecture for performance tuning software written in various high-level programming languages.

AMD had announced the support of OpenCL, a parallel computing language developed by Apple, as well as the concept of compute shader in Microsoft's next generation API called DirectX 11 in future releases of the AMD Stream SDK [ [,,51_104_543~127451,00.html AMD press release] , retrieved August 7, 2008]


According to an AMD-demonstrated system [ [ HardOCP report] , retrieved July 17, 2007] with two dual-core AMD Opteron processors and two Radeon R600 GPU cores running on Microsoft Windows XP Professional, 1 teraflop (TFLOP) can be achieved by a universal multiply-add (MADD) calculation. By comparison, an Intel Core 2 Quad 3.0 GHz processor can achieve up to 48 GFLOPS.

Recent demonstrations showed that, in Kaspersky SafeStream anti-virus scanning tests optimized for AMD stream processors, the system with two AMD stream processors with dual Opteron processors spotted 6.2 Gbit/s (775 MiB/s) bandwidth, 21 times faster when compared to other dual-processor systems. The stream processor systems also showed only 1-2% CPU utilization, which indicates significant floating-point offloading from the CPU to the stream processor [ [ The Inquirer report] , retrieved September 12, 2007] .


* Compared, for example, to traditional floating point accelerators such as the 64-bit floating point (FP64) CSX600 math processor from ClearSpeed that is used in today's supercomputers, current and older GPUs from ATI (and NVIDIA) are running on 32-bit processors with only single-precision data capabilities. [ Universit¨at Dortmund, Fachbereich Mathematik research] : Accelerating Double precision on GPUs (Proceedings of ASIM 2005), Dominik Goddeke, Robert Strzodka, and Stefan Turek. 18th Symposium on Simulation Technique, 2005.]
**Instead of the 64-bit double-precision capability of supercomputers [ TGDaily report] : Nvidia activates a supercomputer in your PC, February 16, 2007] , the second generation of stream processors (the AMD FireStream 9170) is able to handle double-precision data. This is a result of FP32 filtering support contained as part of the requirements of the DirectX 10.1 API. However, the double precision operations (frequently used in supercomputer benchmarks) can achieve only half of the performance in theory compared to single precision operations, the actual figures may be lower, as the GPU do not have full double-precision units implemented.
* Recursive functions are not supported.
* Only bilinear texture filtering is supported; mipmapped textures and anisotropic filtering are not supported at this time.
* Various deviations from the IEEE 754 standard. Denormal numbers and signaling NaNs are not supported; the rounding mode cannot be changed, and the precision of division/square root is slightly lower than single-precision.
* Functions cannot have a variable number of arguments. The same problem occurs for recursive functions.
* Conversion of floating-point numbers to integers on GPUs is done differently than on x86 CPUs; it is not fully IEEE-754 compliant.
* Doing "global synchronization" on the GPU is not very efficient, which forces the GPU to divide the kernel and do synchronization on the CPU. Given the variable number of multiprocessors and other factors, there may not be a perfect solution to this problem.
* The bus bandwidth and latency between the CPU and the GPU may become a bottleneck, which may be alleviated in the future by introducing interconnects with higher bandwidth.

ee also

* ATI Technologies
* Stream Processing
* NVIDIA Tesla
* Compute Unified Device Architecture (CUDA)


External links

* [ AMD FireStream SDK Whitepaper]
* [ AMD FireStream papers and presentations]
* [ Folding@Home]
* [ AnandTech article on distributed computing]
* [ report: Standford University:ATI GPU has the best general purpose computing performance]
* [ MyDrivers review: Building a supercomputer with video cards]

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • AMD FireStream — Лого FireStream ATI Stream Technology (ранее как ATI FireStream и AMD Stream Processor[1]) это набор аппаратных и программных технологий, которые позволяют использовать графические процессоры …   Википедия

  • AMD Cinema 2.0 — (сокращённо Cinema 2.0)  концепция, инициатива и программа американской компании AMD по созданию и реализации интерактивной трёхмерной компьютерной графики в реальном времени. Cinema 2.0 станет, согласно AMD, «краеугольным достижением в… …   Википедия

  • AMD CrossFireX — The AMD CrossFireX™ Logo Manufacturer AMD Type Multi GPU Connection PCI Express AMD Chipset Seri …   Wikipedia

  • AMD Cinema 2.0 — Cinema 2.0, as described by Advanced Micro Devices (AMD), is the new gaming experience brought by the Radeon HD 4800 series GPUs with teraFLOPS computing capabilities, and leading edge solutions from partners to enable photorealistic graphics… …   Wikipedia

  • AMD CrossFireX — Современный логотип AMD CrossFireX …   Википедия

  • Comparison of AMD graphics processing units — For information on Nvidia graphics processing units, see Comparison of Nvidia graphics processing units. This page contains general information about the GPUs and video cards by Advanced Micro Devices (AMD), including those by ATI Technologies… …   Wikipedia

  • Comparison of AMD chipsets — Amd chipsets logo This is a comparison of chipsets sold under the brand AMD, manufactured before May 2004 by the company itself, before the adoption of open platform approach as well as chipsets manufactured by ATI Technologies (ATI) after July… …   Wikipedia

  • Сравнение графических процессоров AMD — Эта таблица содержит основную информацию о графических процессорах AMD и видеокартах, построенных на официальных спецификациях AMD. Содержание …   Википедия

  • Stream SDK — Лого FireStream AMD FireStream (ранее как ATI FireStream и AMD Stream Processor[1]) представляет собой потоковый процессор, разработанный компанией ATI. FireStream  технология ATI. Областями применения FireStream являются приложения,… …   Википедия

  • Stream computing — Лого FireStream AMD FireStream (ранее как ATI FireStream и AMD Stream Processor[1]) представляет собой потоковый процессор, разработанный компанией ATI. FireStream  технология ATI. Областями применения FireStream являются приложения,… …   Википедия