Error detection and correction

In mathematics, computer science, telecommunication, and information theory, error detection and correction has great practical importance in maintaining data (information) integrity across noisy channels and less-than-reliable storage media.

General definitions of terms

Definitions of Error detection and error correction:
* Error detection is the ability to detect the presence of errors caused by noise or other impairments during transmission from the transmitter to the receiver.
* Error correction is the additional ability to reconstruct the original, error-free data.

There are two basic ways to design the channel code and protocol for an error correcting system:
* Automatic repeat-request (ARQ): The transmitter sends the data and also an error detection code, which the receiver uses to check for errors, and requests retransmission of erroneous data. In many cases, the request is implicit; the receiver sends an acknowledgement (ACK) of correctly received data, and the transmitter re-sends anything not acknowledged within a reasonable period of time.
* Forward error correction (FEC): The transmitter encodes the data with an error-correcting code (ECC) and sends the coded message. The receiver never sends any messages back to the transmitter. The receiver decodes what it receives into the "most likely" data. The codes are designed so that it would take an "unreasonable" amount of noise to trick the receiver into misinterpreting the data.

It is possible to combine the two, so that minor errors are corrected without retransmission, and major errors are detected and a retransmission requested. The combination is called hybrid automatic repeat-request.

Error detection schemes

In telecommunication, a "redundancy check" is extra data added to a message for the purposes of error detection.

Several schemes exist to achieve error detection, and are generally quite simple. All error detection codes (which include all error-detection-and-correction codes) transmit more bits than were in the original data. Most codes are "systematic": the transmitter sends a fixed number of original data bits, followed by fixed number of "check bits" (usually referred to as redundancy in the literature) which are derived from the data bits by some deterministic algorithm. The receiver applies the same algorithm to the received data bits and compares its output to the received check bits; if the values do not match, an error has occurred at some point during the transmission. In a system that uses a "non-systematic" code, such as some raptor codes, data bits are transformed into at least as many code bits, and the transmitter sends only the code bits.

Repetition schemes

Variations on this theme exist. Given a stream of data that is to be sent, the data is broken up into blocks of bits, and in sending, each block is sent some predetermined number of times. For example, if we want to send "1011", we may repeat this block three times each.

Suppose we send "1011 1011 1011", and this is received as "1010 1011 1011". As one group is not the same as the other two, we can determine that an error has occurred. This scheme is not very efficient, and can be susceptible to problems if the error occurs in exactly the same place for each group (e.g. "1010 1010 1010" in the example above will be detected as correct in this scheme).

The scheme however is extremely simple, and is in fact used in some transmissions of numbers stations.Fact|date=July 2008

Parity schemes

:"Main article": Parity bitA "parity bit" is an "error detection" mechanism that can only detect an odd number of errors.

The stream of data is broken up into blocks of bits, and the number of 1 bits is counted. Then, a "parity bit" is set (or cleared) if the number of one bits is odd (or even). (This scheme is called even parity; odd parity can also be used.) If the tested blocks overlap, then the parity bits can be used to isolate the error, and even correct it if the error affects a single bit: this is the principle behind the Hamming code.

There is a limitation to parity schemes. A parity bit is only guaranteed to detect an odd number of bit errors (one, three, five, and so on). If an even number of bits (two, four, six and so on) are flipped, the parity bit appears to be correct, even though the data is corrupt.


: "Main article": ChecksumA checksum of a message is an arithmetic sum of message code words of a certain word length, for example byte values, and their carry value. The sum is negated by means of ones-complement, and stored or transferred as an extra code word extending the message.

On the receiver side, a new checksum may be calculated, from the extended message. If the new checksum is not 0, error is detected.

Checksum schemes include parity bits, check digits, and longitudinal redundancy check.

Cyclic redundancy checks

: "Main article": Cyclic redundancy checkMore complex error detection (and correction) methods make use of the properties of finite fields and polynomials over such fields.

The cyclic redundancy check considers a block of data as the coefficients to a polynomial and then divides by a fixed, predetermined polynomial. The coefficients of the result of the division is taken as the redundant data bits, the CRC.

On reception, one can recompute the CRC from the payload bits and compare this with the CRC that was received. A mismatch indicates that an error occurred.

Hamming distance based checks

If we want to detect "d" bit errors in an "n" bit word we can map every "n" bit word into a bigger "n+d+1" bit word so that the minimum Hamming distance between each valid mapping is "d+1". This way, if one receives a "n+d+1" word that doesn't match any word in the mapping (with a Hamming distance "x" <= "d+1" from any word in the mapping) it can successfully detect it as an errored word. Even more, "d" or fewer errors will never transform a valid word into another, because the Hamming distance between each valid word is at least "d+1", and such errors only lead to invalid words that are detected correctly.Given a stream of "m*n" bits, we can detect "x <= d" bit errors successfully using the above method on every "n" bit word. In fact, we can detect a maximum of "m*d" errors if every "n" word is transmitted with maximum "d" errors.

Hash function

Any hash function can be used as a redundancy check.

Horizontal and vertical redundancy check

Other types of redundancy check include horizontal redundancy check, vertical redundancy check and RAID#Non-standard_levels "double", "dual" or "diagonal" parity (used in Raid-DP).

Polarity schemes

One less commonly used form of error correction and detection is transmitting a polarity reversed bitstream simultaneously with the bitstream it is meant to correct. This scheme is very weak at detecting bit errors, and marginally useful for byte or word error detection and correction. However, at the physical layer in the OSI model, this scheme can aid in error correction and detection.

Polarity symbol reversal is (probably) the simplest form of Turbo code, but technically not a Turbo code at all.
* Turbo codes DO NOT work at the bit level.
* Turbo codes typically work at the character or symbol level depending on their placement in the OSI model.
* Character here refers to Baudot, ASCII-7, the 8-bit byte or the 16-bit word.

Original transmitted symbol "1011"
* transmit 1011 on carrier wave 1 (CW1)
* transmit 0100 on carrier wave 2 (CW2)

Receiver end
* do bits polarities of (CW1) <> (CW2)?
* if CW1 = CW2, signal "bit error" (triggers more complex ECC)

This polarity reversal scheme works fairly well at low data rates (below 300 baud) with very redundant data like telemetry data.Specify|date=February 2007

Error correction

Automatic repeat request

Automatic Repeat-reQuest (ARQ) is an error control method for data transmission which makes use of error detection codes, acknowledgment and/or negative acknowledgement messages and timeouts to achieve reliable data transmission. An acknowledgment is a message sent by the receiver to the transmitter to indicate that it has correctly received a data frame.

Usually, when the transmitter does not receive the acknowledgment before the timeout occurs (i.e. within a reasonable amount of time after sending the data frame), it retransmits the frame until it is either correctly received or the error persists beyond a predetermined number of retransmissions.

A few types of ARQ protocols are Stop-and-wait ARQ, Go-Back-N ARQ and Selective Repeat ARQ.

Hybrid ARQ is a combination of ARQ and forward error correction.

Error-correcting code

An error-correcting code (ECC) or forward error correction (FEC) code is a code in which each data signal conforms to specific rules of construction so that departures from this construction in the received signal can generally be automatically detected and corrected. It is used in computer data storage, for example in dynamic RAM, and in data transmission.

The basic idea is for the transmitter to apply one or more of the above error detecting codes;then the receiver uses those codes to narrow down exactly where in the message the error (if any) is.If there is a single bit error in transmission, the decoder uses those error detecting codes to narrow down the error to a single bit (1 or 0), then fix that error by flipping that bit. (Later we will discuss ways of dealing with more than one error per message.)

* repetition schemes -- if the transmitter repeats each data bit at least 3 different times (triple modular redundancy), the receiver can correct any single-bit error by taking the majority vote of the received data bits.
* parity schemes -- if the transmitter sends parity bits covering various overlapping groups of the data bits, a single-bit error will cause a parity error in every group that covers that erroneous bit. The receiver can correct any single-bit error by flipping the one bit covered by every group that indicates an error, but not covered by any group that checks out good. There are a wide variety of parity-based codes, differing in exactly how groups of data bits are chosen. We will discuss them later.
* cyclic redundancy checks -- when a transmitter adds a CRC code to a message, any single-bit error will cause the received CRC to differ from the receiver-calculated CRC. If the message is short enough, the receiver can figure out exactly which bit was flipped, and correct it -- Header Error Correction.
* Hamming distance based checks -- Since it takes many bit errors to convert one valid Hamming code word to any other valid Hamming code word, the receiver can correct any single-bit error in a word by finding the "closest" valid Hamming code, the one code word that has only 1 bit different from the received word.

Some codes can correct a certain number of bit errors and only detect further numbers of bit errors. Codes which can correct one error are termed single error correcting (SEC), and those which detect two are termed double error detecting (DED). Hamming codes can correct single-bit errors and detect double-bit errors (SEC-DED) -- more sophisticated codes correct and detect even more errors.

An error-correcting code which corrects all errors of up to "n" bits correctly is also an error-detecting code which can detect at least all errors of up to 2"n" bits.

Two main categories are convolutional codes and block codes. Examples of the latter are Hamming code, BCH code, Reed-Solomon code, Reed-Muller code, Binary Golay code, and low-density parity-check codes.

Shannon's theorem is an important theorem in error correction which describes the maximum attainable efficiency of an error-correcting scheme versus the levels of noise interference expected. In general, these methods put redundant information into the data stream following certain algebraic or geometric relations so that the decoded stream, if damaged in transmission, can be corrected. The effectiveness of the coding scheme is measured in terms of code rate, which is the code length divided by the useful information, and the Coding gain, which is the difference of the SNR levels of the uncoded and coded systems required to reach the same BER levels.

Error-correcting memory

Because soft errors are extremely common in the DRAM of computers used in satellites and space probes, such memory is structured as ECC memory (also called "EDAC protected memory").Typically every bit of memory is refreshed at least 15 times per second.During this memory refresh, the memory controller reads each word of memory and writes the (corrected) word back. Fact|date=April 2007Such memory controllers traditionally use a Hamming code, although some use triple modular redundancy.Even though a single cosmic ray can upset many physically neighboring bits in a DRAM, such memory systems are designed so that neighboring bits belong to different words, so such single event upsets (SEUs) cause only a single error in any particular word, and so can be corrected by a single-bit error correcting code.As long as no more than a single bit in any particular word is hit by an error between refreshes, such a memory system presents the illusion of an error-free memory. [ ]


Applications that require low latency (such as telephone conversations) cannot use Automatic Repeat reQuest (ARQ); they must use Forward Error Correction (FEC). By the time an ARQ system discovers an error and re-transmits it, the re-sent data will arrive too late to be any good.

Applications where the transmitter immediately forgets the information as soon as it is sent (such as most television cameras) cannot use ARQ; they must use FEC because when an error occurs, the original data is no longer available. (This is also why FEC is used in data storage systems such as RAID and distributed data store).

Applications that use ARQ must have a return channel. Applications that have no return channel cannot use ARQ.

Applications that require extremely low error rates (such as digital money transfers) must use ARQ.

The Internet

In a typical TCP/IP stack, error detection is performed at multiple levels:
* Each Ethernet frame carries a CRC-32 checksum. The receiver discards frames if their checksums do not match.
* The IPv4 header contains a header checksum of the contents of the header (excluding the checksum field). Packets with checksums that don't match are discarded.
* The checksum was omitted from the IPv6 header, because most current link layer protocols have error detection.
* UDP has an optional checksum. Packets with wrong checksums are discarded.
* TCP has a checksum of the payload, TCP header (excluding the checksum field) and source- and destination addresses of the IP header. Packets found to have incorrect checksums are discarded and eventually get retransmitted when the sender receives a triple-ack or a timeout occurs.

Deep-space telecommunications

NASA has used many different error correcting codes. For missions between 1969 and 1977 the Mariner spacecraft used a Reed-Muller code. The noise these spacecraft were subject to was well approximated by a "bell-curve" (normal distribution), so the Reed-Muller codes were well suited to the situation.

The Voyager 1 & Voyager 2 spacecraft transmitted color pictures of Jupiter and Saturn in 1979 and 1980.
* Color image transmission required 3 times the amount of data, so the Golay (24,12,8) code was used.Fact|date=December 2007
* This Golay code is only 3-error correcting, but it could be transmitted at a much higher data rate.
* Voyager 2 went on to Uranus and Neptune and the code was switched to a concatenated Reed-Solomon code-Convolutional code for its substantially more powerful error correcting capabilities.
* Current DSN error correction is done with dedicated hardware.
* For some NASA deep space craft such as those in the Voyager program, Cassini-Huygens (Saturn), New Horizons (Pluto) and Deep Space 1 -- the use of hardware ECC may not be feasible for the full duration of the mission.

The different kinds of deep space and orbital missions that are conducted suggest that trying to find a "one size fits all" error correction system will be an ongoing problem for some time to come.
* For missions close to the earth the nature of the "noise" is different from that on a spacecraft headed towards the outer planets.
* In particular, if a transmitter on a spacecraft far from earth is operating at a low power, the problem of correcting for noise gets larger with distance from the earth.

Satellite broadcasting (DVB)

The demand for satellite transponder bandwidth continues to grow, fueled by the desire to deliver television (including new channels and High Definition TV) and IP data. Transponder availability and bandwidth constraints have limited this growth, because transponder capacity is determined by the selected modulation scheme and Forward error correction (FEC) rate.

* QPSK coupled with traditional Reed Solomon and Viterbi codes have been used for nearly 20 years for the delivery of digital satellite TV.
* Higher order modulation schemes such as 8PSK, 16QAM and 32QAM have enabled the satellite industry to increase transponder efficiency by several orders of magnitude.
* This increase in the information rate in a transponder comes at the expense of an increase in the carrier power to meet the threshold requirement for existing antennas.
* Tests conducted using the latest chipsets demonstrate that the performance achieved by using Turbo Codes may be even lower than the 0.8 dB figure assumed in early designs.

Data storage

Error detection and correction codes are often used to improve the reliability of data storage media.

A "parity track" was present on the first magnetic tape data storage in 1951. The "Optimal Rectangular Code" used in group code recording tapes not only detects but also corrects single-bit errors.

Some file formats, particularly archive formats, include a checksum (most often CRC32) to detect corruption and truncation and can employ redundancy and/or parity files to recover portions of corrupted data.

Reed Solomon codes are used in compact discs to correct errors caused by scratches.

Modern hard drives use CRC codes to detect and Reed-Solomon codes to correct minor errors in sector reads, and to recover data from sectors that have "gone bad" and store that data in the spare sectors. [ [ My Hard Drive Died | Scott A. Moulton ] ]

RAID systems use a variety of error correction techniques, to correct errors when a hard drive completely fails.

The Vandermonde matrix is used in some RAID techniques.

Information theory and error detection and correction

Information theory tells us that whatever the probability of error in transmission or storage, it is possible to construct error-correcting codes in which the likelihood of failure is arbitrarily low, although this requires adding increasing amounts of redundant data to the original, which might not be practical when the error probability is very high. Shannon's theorem sets an upper bound to the error correction rate that can be achieved (and thus the level of noise that can be tolerated) using a fixed amount of redundancy, but does not tell us how to construct such an optimal encoder.

Error-correcting codes can be divided into block codes and convolutional codes. Other block error-correcting codes, such as Reed-Solomon codes, transform a chunk of bits into a (longer) chunk of bits in such a way that errors up to some threshold in each block can be detected and corrected.

However, in practice errors often occur in bursts rather than at random. This is often compensated for by shuffling (interleaving) the bits in the message after coding. Then any burst of bit-errors is broken up into a set of scattered single-bit errors when the bits of the message are unshuffled (de-interleaved) before being decoded.

List of error-correction, error-detection methods

This list contains methods of error correction (Reed-Solomon, for example is a method) and practical techniques for error correction (like the Check digit, a practical method).
* Berger code
* Chipkill, an application of ECC techniques to volatile system memory.
* Constant-weight code
* Convolutional codes are usually decoded with iterative Viterbi decoding techniques
* Differential space–time codes, related to space–time block codes.
* Dual modular redundancy, subset of N-modular redundancy, related to triple modular redundancy
* Erasure codes are a superset of Fountain codes
* Forward error correction
* Group code
* Golay code, the Binary Golay codes are the most commonly used Golay codes
* Goppa code that is used to create the McEliece cryptosystem
* Hadamard code
* Hagelbarger code
* Hamming code
* Lexicographic code
* Longitudinal redundancy check
* Low-density parity-check code
* LT codes are near optimal rateless erasure correcting codes.
* m of n codes
* Online codes are an example of rateless erasure codes.
* Parity bit
* Raptor codes are high speed (near real time) fountain codes.
* Reed-Solomon error correction
* Reed-Muller code
* Repeat-accumulate code
* Sparse graph code
* Space–time code
* Space–time trellis code
* Tornado codes are optimal Fountain codes
* Triple modular redundancy
* Turbo code
* Viterbi algorithm
* Walsh code used in cellular telephony for its high noise immunity, not just its ECC capabilities

Practical uses of Error Correction methods
* Concatenated error correction codes, the Compact Disc and Voyager Program spacecraft use concatenated error correction technologies
* Check digit, commonly used on UPC barcodes
* Luhn algorithm, the most commonly used base 10 checksum that can perform limited error detection but not error correction
* Luhn mod N algorithm, the above algorithm but implementable in a non base 10 form
* Verhoeff algorithm, a modular based form not related to the Luhn algorithms that can detect most forms of transposition errors in financial cryptographic applications

See also

* Repetition schemes:* Triple modular redundancy
* Error correction standardization:* Federal Standard 1037C:* MIL-STD-188
* Errors and residuals in statistics
* Research Conferences on Error Correction:* International Symposium On Turbo Codes,


External links

* [ The on-line textbook: Information Theory, Inference, and Learning Algorithms] , by David MacKay, contains chapters on elementary error-correcting codes; on the theoretical limits of error-correction; and on the latest state-of-the-art error-correcting codes, including low-density parity-check codes, turbo codes, and fountain codes.
* Article: [ Memory errors and SECDED]
* [ Compute parameters of linear codes] - an on-line interface for generating and computing parameters (e.g. minimum distance, covering radius) of linear error-correcting codes.
* [ ECC Page]
* [ Digital communication blog ]

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • error detection and correction —    A mechanism used to determine whether transmission errors have occurred and, if so, to correct those errors.    Some programs or transmission protocols simply request a retransmission of the affected block of data if an error is detected. More …   Dictionary of networking

  • Error Checking and Correction — Dieser Artikel oder Abschnitt bedarf einer Überarbeitung. Näheres ist auf der Diskussionsseite angegeben. Hilf mit, ihn zu verbessern, und entferne anschließend diese Markierung. Fehlerkorrekturverfahren (englisch: Error Correction Code, kurz… …   Deutsch Wikipedia

  • Error Detection and Handling — In television technology, Error Detection and Handling (EDH) protocol is an optional but commonly used addition to the Standard Definition Serial Digital Interface (SDI) standard. This protocol allows an SD SDI receiver to verify that each field… …   Wikipedia

  • automatic error detection/correction —    A feature that automatically detects and corrects errors in a data transmission signal …   IT glossary of terms, acronyms and abbreviations

  • Error — The word error has different meanings and usages relative to how it is conceptually applied. The concrete meaning of the Latin word error means wandering or straying . To the contrary of an illusion, an error or a mistake can sometimes be… …   Wikipedia

  • Detection error tradeoff — A Detection Error Tradeoff graph is a graphical plot of error rates for binary classification systems, plotting false reject rate vs. false accept rate.[1] The x and y axes are scaled non linearly by their Normal Deviates, yielding tradeoff… …   Wikipedia

  • error —    The difference between the expected and the actual.    In computing, the way that the operating system reports unexpected, unusual, impossible, or illegal events is by displaying an error number or error message. Errors range from trivial,… …   Dictionary of networking

  • error correction — ˈerror corˌrection noun [uncountable] COMPUTING the process of finding and removing mistakes in a series of binary numbers (= where the numbers 0 and 1 are repeated many times ) in a piece of digital electronic equipment * * * error correction UK …   Financial and business terms

  • error — noun ADJECTIVE ▪ egregious (esp. AmE), fundamental, glaring, grave, great, grievous, major, serious ▪ The report contained some glaring errors …   Collocations dictionary

  • Error-correcting code — Dieser Artikel oder Abschnitt bedarf einer Überarbeitung. Näheres ist auf der Diskussionsseite angegeben. Hilf mit, ihn zu verbessern, und entferne anschließend diese Markierung. Fehlerkorrekturverfahren (englisch: Error Correction Code, kurz… …   Deutsch Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”