- Logistic regression
In

statistics ,**logistic regression**is a model used for prediction of theprobability of occurrence of an event by fitting data to alogistic curve . It makes use of several predictor variables that may be either numerical or categorical. For example, the probability that a person has a heart attack within a specified time period might be predicted from knowledge of the person's age, sex andbody mass index . Logistic regression is used extensively in the medical and social sciences as well as marketing applications such as prediction of a customer's propensity to purchase a product or cease a subscription.Other names for logistic regression used in various other application areas include

**logistic model**,, andlogit model**maximum-entropy classifier**.Logistic regression is one of a class of models known as

generalized linear model s.In this model, increasing age is associated with an increasing risk of death from heart disease (z goes up by 2.0 for every 10 years over the age of 50), female sex is associated with a decreased risk of death from heart disease ("z" goes down by 1.0 if the patient is female), and increasing cholesterol is associated with an increasing risk of death (z goes up by 1.2 for each 1 mmol/L increase in cholesterol above 5mmol/L).

We wish to use this model to predict Mr Petrelli's risk of death from heart disease: he is 50 years old and his cholesterol level is 7.0 mmol/L.Mr Petrelli's risk of death is therefore

: $frac\{1\}\{1+e^\{-z\; ext\{,\; where\; \}\; z=-5.0\; +\; (+2.0)(5.0-5.0)\; +\; (-1.0)0\; +\; (+1.2)(7.0-5.0).$

This means that by this model, Mr Petrelli's risk of dying from heart disease in the next 10 years is 0.07 (or 7%).

**Formal mathematical specification**Logistic regression analyzes binomially distributed data of the form

:$Y\_i\; sim\; B(n\_i,p\_i),\; ext\{\; for\; \}i\; =\; 1,\; dots\; ,\; m,$

where the numbers of

Bernoulli trial s "n"_{"i"}are known and the probabilities of success "p"_{"i"}are unknown. An example of this distribution is the fraction of seeds ("p"_{"i"}) that germinate after "n"_{"i"}are planted.The model proposes for each trial (value of "i") there is a set of explanatory variables that might inform the final probability. These explanatory variables can be thought of as being in a "k" vector "X"

_{"i"}and the model then takes the form:$p\_i\; =\; operatorname\{E\}left(left.frac\{Y\_i\}\{n\_\{i\; ight|X\_i\; ight).\; ,!$

The

logit s of the unknown binomial probabilities ("i.e.", the logarithms of theodds ) are modelled as a linear function of the "X_{i}".:$operatorname\{logit\}(p\_i)=lnleft(frac\{p\_i\}\{1-p\_i\}\; ight)\; =\; eta\_0\; +\; eta\_1\; x\_\{1,i\}\; +\; cdots\; +\; eta\_k\; x\_\{k,i\}.$

Note that a particular element of "X

_{i}" can be set to 1 for all "i" to yield an intercept in the model. The unknown parameters "β"_{j}are usually estimated bymaximum likelihood .The interpretation of the "β"

_{"j"}parameter estimates is as the additive effect on the logodds ratio for a unit change in the "j"th explanatory variable. In the case of a dichotomous explanatory variable, for instance gender, $e^eta$ is the estimate of the odds ratio of having the outcome for, say, males compared with females.The model has an equivalent formulation

:$p\_i\; =\; frac\{1\}\{1+e^\{-(eta\_0\; +\; eta\_1\; x\_\{1,i\}\; +\; cdots\; +\; eta\_k\; x\_\{k,i\}).\; ,!$

This functional form is commonly called a single-layer

perceptron or single-layerartificial neural network . A single-layer neural network computes a continuous output instead of astep function . The derivative of "p_{i}" with respect to "X = x_{1}...x_{k}" is computed from the general form:: $y\; =\; frac\{1\}\{1+e^\{-f(X)$

where "f"("X") is an

analytic function in "X". With this choice, the single-layer network is identical to the logistic regression model. This function has a continuous derivative, which allows it to be used in backpropagation. This function is also preferred because its derivative is easily calculated:: $y\text{'}\; =\; y(1-y)frac\{mathrm\{d\}f\}\{mathrm\{d\}X\},!$

**Extensions**Extensions of the model cope with multi-category dependent variables and ordinal dependent variables, such as polytomous regression. Multi-class classification by logistic regression is known as

multinomial logit modeling. An extension of the logistic model to sets of interdependent variables is theconditional random field .**See also***

Logistic function

*Sigmoid function

*Artificial neural network

*Data mining

*Linear discriminant analysis

*Perceptron

*Probit model

*Variable rules analysis

*Jarrow-Turnbull model **External links*** [

*http://statpages.org/logistic.html Web-based logistic regression calculator*]

* [*http://www.cs.utah.edu/~hal/megam A highly optimized Maximum Entropy modeling package*]

* [*http://mallet.cs.umass.edu/index.php/Main_Page MALLET Java library, includes a trainer for logistic models*]**References***cite book

last = Agresti

first = Alan.

title = Categorical Data Analysis

publisher = New York: Wiley-Interscience

date = 2002

isbn = 0-471-36093-7

*cite book

last = Amemiya

first = T.

title = Advanced Econometrics

publisher = Harvard University Press

date = 1985

isbn = 0-674-00560-0

*cite book

last = Balakrishnan

first = N.

title = Handbook of the Logistic Distribution

publisher = Marcel Dekker, Inc.

date = 1991

isbn = 978-0824785871

*cite book

last = Greene

first = William H.

title = Econometric Analysis, fifth edition

publisher = Prentice Hall

date = 2003

isbn = 0-13-066189-9

*cite book

last = Hosmer

first = David W.

coauthors = Stanley Lemeshow

title = Applied Logistic Regression, 2nd ed.

publisher = New York; Chichester, Wiley

date = 2000

isbn = 0-471-35632-8

*Wikimedia Foundation.
2010.*