Censoring (statistics)


Censoring (statistics)

In statistics and engineering, censoring occurs when the value of an observation is only partially known. For example, suppose a study is conducted to measure the impact of a drug on mortality. In such a study, it may be known that an individual's age at death is at least 75 years. Such a situation could occur if the individual disenrolled from the study at age 75, or if the individual is currently alive at the age of 75.

Censoring also occurs when a value occurs outside the range of a measuring instrument. For example, a bathroom scale might only measure up to 300 lbs. If a 350 lb individual is weighed using the scale, the observer would only know that the individual's weight is at least 300 lbs.

Types of censoring

* "Left censoring" - a data point is below a certain value but it is unknown by how much
* "Interval censoring" - a data point is somewhere on an interval between two values
* "Right censoring" - a data point is above a certain value but it is unknown by how much
** "Type I censoring" occurs if an experiment has a set number of subjects or items and stops the experiment at a predetermined time, at which point any subjects remaining are right-censored.
** "Type II censoring" occurs if an experiment has a set number of subjects or items and stops the experiment when a predetermined number are observed to have failed; the remaining subjects are then right-censored.
** "Random" (or "non-informative") "censoring" is when each subject has a censoring time that is statistically independent of their failure time. The observed value is the minimum of the censoring and failure times; subjects whose failure time is greater than their censoring time are right-censored.

Censoring should not be confused with the related idea: truncation. With censoring, observations result either in knowing the exact value that applies, or in knowing that the value lies either above or below a given threshold (for upper and lower censoring respectively). With truncation, observations never result in values outside a given range — values in the population outside the range are never seen or never recorded if they are seen. Note that in statistics, truncation is not the same as rounding.

The problem of censored data, in which the observed value of some variable is partially known, is related to the problem of missing data, where the observed value of some variable is unknown.

Epidemiology

One of the earliest attempts to analyse a statistical problem involving censored data was Daniel Bernoulli's 1766 analysis of smallpox morbidity and mortality data to demonstrate the efficacy of vaccination. [Bernoulli D. (1766) "Essai d’une nouvelle analyse de la mortalite causee par la petite verole. "Mem. Math. Phy. Acad. Roy. Sci. Paris", reprinted in Bradley (1971) 21 and Blower (2004)]

Operating life testing

Reliability testing often consists of conducting a test on an item (under specified conditions) to determine the time it takes for a failure to occur.
* Sometimes a failure is planned and expected but does not occur: operator error, equipment malfunction, test anomaly, etc. The test result was not the desired time-to-failure but can be (and should be) used as a time-to-termination. The use of censored data is unintentional but necessary.
* Sometimes engineers plan a test program so that, after a certain time limit or number of failures, all other tests will be terminated. These suspended times are treated as right-censored data. The use of censored data is intentional.An analysis of the data from replicate tests includes both the times-to-failure for the items which failed and the time-of-test-termination for those which did not fail.

Analysis

Special techniques may be used to handle censored data. Tests with specific failure times are coded as actual failures: Censored data are coded for the type of censoring and the known interval or limit. Special software programs (often reliability oriented) can conduct a maximum likelihood estimation for summary statistics, confidence intervals, etc.

References

Bibliography

*Blower, S. (2004), D, Bernoulli's "PDF| [http://www.semel.ucla.edu/biomedicalmodeling/pdf/Bernoulli&Blower.pdf An attempt at a new analysis of the mortality caused by smallpox and of the advantages of inoculation to prevent it] |146 KiB ", "Reviews of Medical Virolology", 14: 275–288
*Bradley, L. (1971) "Smallpox Inoculation: An Eighteenth Century Mathematical Controversy", Nottingham
* cite book | title=Methods for Statistical Analysis of Reliability and Life Data | author=Mann, N. R. "et al." | location=New York | publisher=Wiley | id=ISBN 047156737X | year=1975

External links

*"Engineering Statistics Handbook", NIST/SEMATEK, [http://www.itl.nist.gov/div898/handbook/]

ee also

*Survival analysis
*Data analysis
*Reliability (statistics)


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Censoring — may refer to:* Censoring (statistics) * Censorship …   Wikipedia

  • Truncation (statistics) — In statistics, truncation results in values that are limited above or below, similar to but distinct from the concept of statistical censoring.Usually the values that insurance adjusters receive are either left truncated, right censored or both.… …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • Reliability (statistics) — In statistics, reliability is the consistency of a set of measurements or measuring instrument, often used to describe a test. This can either be whether the measurements of the same instrument give or are likely to give the same measurement… …   Wikipedia

  • Imputation (statistics) — For other uses of imputation , see Imputation (disambiguation). In statistics, imputation is the substitution of some value for a missing data point or a missing component of a data point. Once all missing values have been imputed, the dataset… …   Wikipedia

  • Cohort (statistics) — For other senses of this word, see cohort (disambiguation). In statistics and demography, a cohort is a group of subjects who have shared a particular time together during a particular time span[1] (e.g., people born in Europe between 1918 and… …   Wikipedia

  • Survival analysis — is a branch of statistics which deals with death in biological organisms and failure in mechanical systems. This topic is called reliability theory or reliability analysis in engineering, and duration analysis or duration modeling in economics or …   Wikipedia

  • Data analysis — Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making. Data analysis has multiple facets and approaches,… …   Wikipedia

  • List of mathematics articles (C) — NOTOC C C closed subgroup C minimal theory C normal subgroup C number C semiring C space C symmetry C* algebra C0 semigroup CA group Cabal (set theory) Cabibbo Kobayashi Maskawa matrix Cabinet projection Cable knot Cabri Geometry Cabtaxi number… …   Wikipedia

  • Missing data — In statistics, missing data, or missing values, occur when no data value is stored for the variable in the current observation. Missing data are a common occurrence and can have a significant effect on the conclusions that can be drawn from the… …   Wikipedia