 Quadrupleprecision floatingpoint format

In computing, quadruple precision (also commonly shortened to quad precision) is a binary floatingpoint computer number format that occupies 16 bytes (128 bits) in computer memory.
In IEEE 7542008 the 128bit base2 format is officially referred to as binary128.
Floatingpoint precisions IEEE 754:
16bit: Half (binary16)
32bit: Single (binary32), decimal32
64bit: Double (binary64), decimal64
128bit: Quadruple (binary128), decimal128
Other:
Minifloat · Extended precision
Arbitrary precisionContents
IEEE 754 quadrupleprecision binary floatingpoint format: binary128
The IEEE 754 standard specifies a binary128 as having:
 Sign bit: 1
 Exponent width: 15
 Significand precision: 113 (112 explicitly stored)
The format is written with an implicit lead bit with value 1 unless the exponent is stored with all zeros. Thus only 112 bits of the significand appear in the memory format, but the total precision is 113 bits (approximately 34 decimal digits, ). The bits are laid out as follows:
Exponent encoding
The quadrupleprecision binary floatingpoint exponent is encoded using an offset binary representation, with the zero offset being 16383; also known as exponent bias in the IEEE 754 standard.
 E_{min} = 0x0001−0x3fff = −16382
 E_{max} = 0x7ffe−0x3fff = 16383
 Exponent bias = 0x3fff = 16383
Thus, as defined by the offset binary representation, in order to get the true exponent the offset of 16383 has to be subtracted from the stored exponent.
The stored exponents 0x0000 and 0x7fff are interpreted specially.
Exponent Significand zero Significand nonzero Equation 0x0000 0, −0 subnormal numbers 0x0001, ..., 0x7ffe normalized value 0x7fff ±infinity NaN (quiet, signalling) The maximum representable value is ≈ 1.1897 × 10^{4932}.
Quadrupleprecision examples
These examples are given in bit representation, in hexadecimal, of the floatingpoint value. This includes the sign, (biased) exponent, and significand.
3fff 0000 0000 0000 0000 0000 0000 0000 = 1 c000 0000 0000 0000 0000 0000 0000 0000 = 2
7ffe ffff ffff ffff ffff ffff ffff ffff ≈ 1.189731495357231765085759326628007 × 10^{4932} (max quadruple precision)
0000 0000 0000 0000 0000 0000 0000 0000 = 0 8000 0000 0000 0000 0000 0000 0000 0000 = 0
7fff 0000 0000 0000 0000 0000 0000 0000 = infinity ffff 0000 0000 0000 0000 0000 0000 0000 = infinity
3ffd 5555 5555 5555 5555 5555 5555 5555 ≈ 1/3
By default, 1/3 rounds down like double precision, because of the odd number of bits in the significand. So the bits beyond the rounding point are
0101...
which is less than 1/2 of a unit in the last place.Doubledouble arithmetic
A common software technique to implement nearly quadruple precision using pairs of doubleprecision values is sometimes called doubledouble arithmetic.^{[1]}^{[2]}^{[3]} Using pairs of IEEE doubleprecision values with 53bit significands, doubledouble arithmetic can represent operations with at least^{[1]} a 2×53=106bit significand (and possibly 107 bits via clever use of the sign bit^{[4]}), only slightly less precise than the 113bit significand of IEEE binary128 quadruple precision. The range of a doubledouble remains essentially the same as the doubleprecision format because the exponent has still 11 bits,^{[1]} significantly lower than the 15bit exponent of IEEE quadruple precision (a range of for doubledouble versus for binary128).
In particular, a doubledouble/quadrupleprecision value q in the doubledouble technique is represented implicitly as a sum q=x+y of two doubleprecision values x and y, each of which supplies half of q's significand.^{[2]} That is, the pair (x,y) is stored in place of q, and operations on q values (+,−,×,...) are transformed into equivalent (but more complicated) operations on the x and y values. Thus, arithmetic in this technique reduces to a sequence of doubleprecision operations; since doubleprecision arithmetic is commonly implemented in hardware, doubledouble arithmetic is typically substantially faster than more general arbitraryprecision arithmetic techniques.^{[2]}^{[1]}
Implementations
Quadruple precision is almost always implemented in software by a variety of techniques (such as the doubledouble technique above, although that technique does not implement IEEE quadruple precision), since direct hardware support for quadruple precision is extremely rare. One can use general arbitraryprecision arithmetic libraries to obtain quadruple (or higher) precision, but specialized quadrupleprecision implementations may achieve higher performance.
Computerlanguage support
A separate question is the extent to which quadrupleprecision types are directly incorporated into computer programming languages.
Quadruple precision is specified in Fortran by the
REAL*16
orREAL(KIND=16)
type, although this type is not supported by all compilers. (QuadrupleprecisionREAL*16
is supported by the Intel Fortran Compiler^{[5]} and by the GNU Fortran compiler^{[6]} on x86, x8664, and Itanium architectures, for example.)In the C/C++ with a few systems and compilers, quadruple precision may be specified by the long double type, but this is not required by the language (which only requires
long double
to be at least as precise asdouble
), nor is it common. On x86 and x8664, the most common C/C++ compilers implementlong double
as either 80bit extended precision (e.g. the GNU C Compiler gcc^{[7]} and the Intel C++ compiler with a/Qlong‑double
switch^{[8]}) or simply as being synonymous with double precision (e.g. Microsoft Visual C++^{[9]}), rather than as quadruple precision. On a few other architectures, some C/C++ compilers implementlong double
as quadruple precision, e.g. gcc on PowerPC (as doubledouble^{[10]}^{[11]}^{[12]}) and SPARC,^{[13]} or the Sun Studio compilers on SPARC.^{[14]} Even iflong double
is not quadruple precision, however, some C/C++ compilers provide a nonstandard quadrupleprecision type as an extension. For example, gcc provides a quadrupleprecision type called__float128
for x86, x8664 and Itanium CPUs,^{[15]} and some versions of Intel's C/C++ compiler for x86 and x8664 supply a nonstandard quadrupleprecision type called_Quad
.^{[16]}Hardware support
Native support of 128bit floats is defined in SPARC V8^{[17]} and V9^{[18]} architectures (e.g. there are 16 quadprecision registers %q0, %q4, ...), but no SPARC CPU implements quadprecision operations in hardware.^{[19]}
As at 2005, there are no native 128bits FPUs.^{[20]}
NonIEEE extendedprecision (128 bit of storage, 1 sign bit, 7 exponent bit, 112 fraction bit, 8 bits unused) was added to the System/370 series and was available on some S/360 models (S/36085^{[21]}, 195, and others by special request or simulated by OS software).
See also
 IEEE Standard for FloatingPoint Arithmetic (IEEE 754)
 Extended precision (80bit)
 ISO/IEC 10967, Language Independent Arithmetic
 Primitive data type
 long double
References
 ^ ^{a} ^{b} ^{c} ^{d} Yozo Hida, X. Li, and D. H. Bailey, QuadDouble Arithmetic: Algorithms, Implementation, and Application, Lawrence Berkeley National Laboratory Technical Report LBNL46996 (2000). Also Y. Hida et al., Library for doubledouble and quaddouble arithmetic (2007).
 ^ ^{a} ^{b} ^{c} J. R. Shewchuk, Adaptive Precision FloatingPoint Arithmetic and Fast Robust Geometric Predicates, Discrete & Computational Geometry 18:305363, 1997.
 ^ Knuth, D. E.. The Art of Computer Programming (2nd ed.). chapter 4.2.3. problem 9..
 ^ Robert MunafoF107 and F161 HighPrecision FloatingPoint Data Types (2011).
 ^ "Intel Fortran Compiler Product Brief". Su. http://h21007.www2.hp.com/portal/download/files/unprot/intel/product_brief_Fortran_Linux.pdf. Retrieved 20100123.
 ^ "GCC 4.6 Release Series  Changes, New Features, and Fixes". http://gcc.gnu.org/gcc4.6/changes.html. Retrieved 20100206.
 ^ i386 and x8664 Options, Using the GNU Compiler Collection.
 ^ Intel Developer Site
 ^ MSDN homepage, about Visual C++ compiler
 ^ RS/6000 and PowerPC Options, Using the GNU Compiler Collection.
 ^ Inside Macintosh  PowerPC Numerics
 ^ 128bit long double support routines for Darwin
 ^ SPARC Options, Using the GNU Compiler Collection.
 ^ The Math Libraries, Sun Studio 11 Numerical Computation Guide (2005).
 ^ Additional Floating Types, Using the GNU Compiler Collection
 ^ Intel C++ Forums (2007).
 ^ The SPARC Architecture Manual: Version 8. SPARC International, Inc. 1992. http://www.sparc.com/standards/V8.pdf. Retrieved 20110924. "SPARC is an instruction set architecture (ISA) with 32bit integer and 32, 64, and 128bit IEEE Standard 754 floatingpoint as its principal data types."
 ^ David L. Weaver, Tom Germond, ed (1994). The SPARC Architecture Manual: Version 9. SPARC International, Inc. http://www.sparc.org/standards/SPARCV9.pdf. Retrieved 20110924. "Floatingpoint: The architecture provides an IEEE 754compatible floatingpoint instruction set, operating on a separate register file that provides 32 singleprecision (32bit), 32 doubleprecision (64bit), 16 quadprecision (128bit) registers, or a mixture thereof."
 ^ "SPARC Behavior and Implementation". Numerical Computation Guide — Sun Studio 10. Sun Microsystems, Inc. 2004. http://download.oracle.com/docs/cd/E1905901/stud.10/8190499/ncg_sparc.html. Retrieved 20110924. "There are four situations, however, when the hardware will not successfully complete a floatingpoint instruction: ... The instruction is not implemented by the hardware (such as ... quadprecision instructions on any SPARC FPU)."
 ^ J. Fujimoto, T. Ishikawa, D. PerretGallix (20050505). High precision numerical computations — A case for an HAPPY design. http://fcppl.in2p3.fr/cgibin/twiki.source/pub/ACAT/PresentationsNotes/Highprecisionnumericalcomputatio3.pdf. Retrieved 20110924. "But today there are no straight 128bit floating point unit providing quadruple precision available."
 ^ "Structural aspects of the system/360 model 85: III extensions to floatingpoint architecture", Padegs, A., IBM Systems Journal, Vol:7 No:1 (March 1968), pp. 22–29
External links
 HighPrecision Software Directory
 QPFloat, a free software (GPL) software library for quadrupleprecision arithmetic
 HPAlib, a free software (LGPL) software library for quadprecision arithmetic
 libquadmath, the GCC quadprecision math library
Categories: Computer arithmetic
 Data types
Wikimedia Foundation. 2010.