Unusual software bug


Unusual software bug

Unusual software bugs are a class of software bugs that are considered exceptionally difficult to understand and repair. There are several kinds, mostly named after scientists who discovered counterintuitive things.

Contents

Bohrbug

A bohrbug (named after the Bohr atom model) is a bug that manifests itself consistently under a well-defined (but possibly unknown) set of conditions. Thus, in contrast with heisenbugs, a bohrbug does not disappear or alter its characteristics when it is researched. These include the easiest bugs to fix (where the nature of the problem is obvious), but also bugs that are hard to find and fix and remain in the software during the operational phase. Sometimes an error might occur only when a unique data set is entered, or unique circumstances are encountered. These kinds of bugs are often present in parts of source code that are not invoked very often and thus might remain undetected for an extended period of time, and are sometimes termed a ghost in the code.

For example, an overflow bug in a by-the-book binary search algorithm may exhibit itself only when the data array under search is very large and the item to be searched for is located near the end of the array. Because programmers tend to test their work using small arrays of data, and only recently have there existed machines with enough memory to hold a sufficiently large array, such a bug may go undetected for many years.[1]

Mandelbug

A mandelbug (named after fractal innovator Benoît Mandelbrot) is a computer bug whose causes are so complex that its behavior appears chaotic or even non-deterministic.[2] This word also implies that the speaker thinks it is a bohrbug rather than a heisenbug.

Mandelbug is sometimes used to describe a bug whose behavior does not appear chaotic, but whose causes are so complex that there is no practical solution. An example of this is a bug caused by a flaw in the fundamental design of the entire system.

In the literature, there are inconsistent statements about the relationships between bohrbug, heisenbug, and mandelbug: according to the above definition, mandelbugs are bohrbugs. Heisenbug and bohrbug are considered antonyms. Moreover, it is claimed that all heisenbugs are mandelbugs.[3]

In a column in IEEE Computer,[4] mandelbug is considered the complementary antonym to bohrbug; i.e., a software bug is either a bohrbug or a mandelbug. The apparently complex behavior of a mandelbug is assumed to be caused either by long delays between fault activation and the failure occurrence, or by influences of other software system elements (hardware, operating system, other applications) on the fault's behavior. Heisenbugs (whose behavior is influenced by a debugger, or other means of investigating the fault) are mandelbugs.

Heisenbug

A heisenbug (named after the Heisenberg uncertainty principle) is a computer bug that disappears or alters its characteristics when an attempt is made to study it.

One common example is a bug that occurs in a program that was compiled with an optimizing compiler, but not in the same program when compiled without optimization (e.g., for generating a debug-mode version). Another example is a bug caused by a race condition. A heisenbug may also appear in a system that does not conform to the command-query separation design guideline, since a routine called more than once could return different values each time, generating hard-to-reproduce bugs in a race condition scenario.

The name heisenbug is a pun on the Heisenberg uncertainty principle, a quantum physics concept which is commonly (yet inaccurately) used to refer to the fact that in the Copenhagen Interpretation model of quantum mechanical behaviour, observers affect what they are observing, by the mere act of observing it alone (this is actually the observer effect, and is commonly confused with the Heisenberg uncertainty principle).

One common reason for heisenbug-like behaviour is that executing a program in debug mode often cleans memory before the program starts, and forces variables onto stack locations, instead of keeping them in registers. These differences in execution can alter the effect of bugs involving out-of-bounds member access, incorrect assumptions about the initial contents of memory, or floating-point comparisons (for instance, when a floating-point variable in a 32-bit stack location is compared to one in an 80-bit register). Another reason is that debuggers commonly provide watches or other user interfaces that cause additional code (such as property accessors) to be executed, which can, in turn, change the state of the program. Yet another reason is a fandango on core, the effect of a pointer running out of bounds. Many heisenbugs are caused by uninitialized values.

Time can also be a factor in heisenbugs. Executing a program under control of a debugger can change the execution timing of the program as compared to normal execution. Time-sensitive bugs such as race conditions may not reproduce when the program is slowed down by single-stepping source lines in the debugger. This is particularly true when the behavior involves interaction with an entity not under the control of a debugger, such as when debugging network packet processing between two machines and only one is under debugger control.

In an interview Bruce Lindsay tells of being there when the term was first used, and that it was created because Heisenberg said, "The more closely you look at one thing, the less closely can you see something else."[5]

This claim of origin is almost certainly wrong, as the term has been used for over two decades. For example, the earliest Google-archived mention is from the mailing list (later Usenet news group) comp.risks, moderated by Peter G. Neumann. In RISKS Digest Volume 4 : Issue 34, dated 23 December 1986,[6] Zhahai Stewart contributes an item titled "Another heisenbug" noting that many such contributions have appeared in recent issues of RISKS Digest. The term, and especially the distinction Heisenbug/Bohrbug, was already mentioned in 1985 by Jim Gray in a paper about software failures.[7]

Schrödinbug

A schrödinbug is a bug that manifests only after someone reading source code or using the program in an unusual way notices that it never should have worked in the first place, at which point the program promptly stops working for everybody until fixed. The Jargon File adds: "Though... this sounds impossible, it happens; some programs have harbored latent schrödinbugs for years."

The name schrödinbug was introduced in the version 2.9.9 of the Jargon file, published in April 1992. It is derived from the Schrödinger's cat thought experiment. A well-written program executing in a reliable computing environment is expected to follow the principle of determinism, and that being so the quantum questions of observability (i.e., breaking the program by reading the source code) posited by Schrödinger (i.e., killing the cat by opening the box) affecting the operation of a program is unexpected.

Repairing an obviously defective piece of code is often more important than determining what arcane set of circumstances caused it to work at all (or appear to work) in the first place, and why it then stopped. Because of this, many of these bugs are never fully understood. When a bug of this type is examined in enough detail, it can usually be reclassified as a bohrbug, heisenbug, or mandelbug.

Phase of the Moon bug

The phase of the moon is sometimes spouted as a silly parameter on which a bug might depend, such as when exasperated after trying to isolate the true cause. The Jargon File documents two rare instances in which data processing problems were actually caused by phase-of-the-moon timing.[8]

In general, programs that exhibit time-dependent behavior are vulnerable to time-dependent failures. These could occur during a certain part of a scheduled process, or at special times, such as on leap days or when a process crosses a daylight saving time, day, month, year, or century boundary (as with the Year 2000 bug).

Statistical bug

Statistical bugs can only be detected in aggregates and not in single runs of a section of code. These are bugs that usually affect code that is supposed to produce random or pseudo-random output. An example is code to generate points uniformly distributed on the surface of a sphere, say, and the result is that there are significantly more points in the northern hemisphere than the southern one. Tracing in detail through a single run of the point generator can completely fail to shed light on the location of such a bug because it is impossible to identify the output of any one run as wrong – after all, it's intended to be random. Only when many points are generated does the problem become apparent. Popular debugging techniques such as checking pre- and postconditions can do little to help. Similar problems can also occur in numerical algorithms in which each individual operation is accurate to within a given tolerance but where numerical errors accumulate only after a large number of runs, especially if the errors have a systematic bias. A simple example of this is the strfry() function in the GNU C Library.[9]

Alpha particle bug (single event upset)

The term alpha particle bug derives from the historical phenomenon of soft errors caused by cosmic rays. These are energetic charged subatomic particles, originating from outer space. When cosmic rays collide with molecules in the atmosphere, they produce a shower of billions of high energy radioactive particles. These particles could disturb an electron in RAM, and thus change a 0 to a 1, and vice-versa. Thus the term is used to describe a class of bug where an issue was only seen once, was verifiable at the time, but source code analysis indicates that the bug should be impossible, thus the only explanation is that an alpha particle disturbed an electron. The likely cause of such bugs is build or integration errors, or some form of unusual memory corruption. This bug is often referred to by spacecraft developers as a single event upset.

According to a study done by Intel in 1990, the number of errors caused by cosmic rays increases with the altitude of the computer and drops to zero if the computer is running in a cave.[10] Therefore, computer chips in airplanes, space craft and other sensitive systems will have error checking in ram while common desktop computers will not. In 1998, only one error per month per 256 MiB of ram was expected for a desktop computer. However, as chip density increases, Intel expects the errors caused by cosmic rays to increase and be a limiting factor in design.[11]

A 2011 'Black Hat' paper - "Bitsquatting - DNS Hijacking without Exploitation" - discusses the real-life security implications of such bit-flips in the internet's DNS system. The paper found up to 3,434 incorrect requests a day due to bit-flip changes for various common domains. Many of these bit-flips would probably be attributable to hardware problems, but some could be attributed to Alpha particles.

See also

References

This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed under the GFDL.

  1. ^ Joshua Bloch, "Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken". Official Google Research Blog, June 2, 2006.
  2. ^ The Jargon File - mandelbug
  3. ^ [The following article investigates the various definitions of bohrbug, mandelbug and heisenbug proposed in the literature, as well as the statements made about the relationships between these fault types: M. Grottke and K. S. Trivedi, Software Faults, Software Aging and Software Rejuvenation. Journal of the Reliability Engineering Association of Japan, Vol. 27, No. 7, pp. 425-438, 2005.]
  4. ^ IEEE Computer vol. 40, no. 2 - February 2007
  5. ^ "A Conversation with Bruce Lindsay", ACM Queue vol. 2, no. 8 - November 2004
  6. ^ Risks-List on Google Groups RISKS DIGEST 4.34
  7. ^ "Why Do Computers Stop And What Can Be Done About It?". 1985. http://citeseer.ist.psu.edu/gray85why.html. 
  8. ^ CATB.org, "phase of the moon"
  9. ^ strfry() gives skewed distributions
  10. ^ http://findarticles.com/p/articles/mi_qn4158/is_19980728/ai_n14164062[dead link]
  11. ^ http://www.newscientist.com/blog/technology/2008/03/do-we-need-cosmic-ray-alerts-for.html

External links


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Software bug — To report a MediaWiki error on Wikipedia, see Wikipedia:Bug reports. A software bug is the common term used to describe an error, flaw, mistake, failure, or fault in a computer program or system that produces an incorrect or unexpected result, or …   Wikipedia

  • Software brittleness — The term software brittleness refers to the increased difficulty in fixing older software that may appear reliable, but fails badly when presented with unusual data or altered in a seemingly minor way. The term is derived from analogies to… …   Wikipedia

  • Project management software — is a term covering many types of software, including estimation and planning, scheduling, cost control and budget management, resource allocation, collaboration software, communication, quality management and documentation or administration… …   Wikipedia

  • Time formatting and storage bugs — In computer science, time formatting and storage bugs are a class of software bugs which may cause time and date calculation or display to be improperly handled. These are most commonly manifestations of arithmetic overflow, but can also be the… …   Wikipedia

  • Bohr — can refer to any of the following Danish people:* Niels Bohr (1885 1962), Danish atomic physicist, Nobel Prize in physics 1922 * Aage Niels Bohr (born 1922), Danish nuclear physicist, Nobel Prize in physics 1975, son of Niels Bohr * Christian… …   Wikipedia

  • Debugging — Debug redirects here. For the shell command, see debug (command). For the German magazine, see Debug (magazine). A photo of the apocryphally first real bug, which was debugged in 1947. Debugging is a methodical process of finding and reducing the …   Wikipedia

  • Computer insecurity — This article is about computer security exploits and defenses. For computer security through design and engineering, see computer security. Computer security Secure operating systems Security architecture Security by design Secure coding …   Wikipedia

  • Memory management unit — This 68451 MMU could be used with the Motorola 68010 A memory management unit (MMU), sometimes called paged memory management unit (PMMU), is a computer hardware component responsible for handling accesses to memory requested by the CPU. Its… …   Wikipedia

  • O Captain! My Captain! — Whitman s notes for a revision of O Captain! My Captain! O Captain! My Captain! is an extended metaphor poem written in 1865 by Walt Whitman, concerning the death of American president Abraham Lincoln. Contents …   Wikipedia

  • Computers and Information Systems — ▪ 2009 Introduction Smartphone: The New Computer.       The market for the smartphone in reality a handheld computer for Web browsing, e mail, music, and video that was integrated with a cellular telephone continued to grow in 2008. According to… …   Universalium