Proportional hazards models

Proportional hazards models

Proportional hazards models are a class of survival models in statistics. Survival models relate the time that passes before some event occurs to one or more covariates that may be associated with that quantity. In a proportional hazards model, the unique effect of a unit increase in a covariate is multiplicative with respect to the hazard rate. For example, taking a drug may halve one's hazard rate for a stroke occurring, or, changing the material from which a manufactured component is constructed may double its hazard rate for failure. Other types of survival models such as accelerated failure time models do not exhibit proportional hazards. These models could describe a situation such as a drug that reduces a subject's immediate risk of having a stroke, but where there is no reduction in the hazard rate after one year for subjects who do not have a stroke in the first year of analysis.



Survival models can be viewed as consisting of two parts: the underlying hazard function, often denoted Λ0(t), describing how the hazard (risk) changes over time at baseline levels of covariates; and the effect parameters, describing how the hazard varies in response to explanatory covariates. A typical medical example would include covariates such as treatment assignment, as well as patient characteristics such as age, gender, and the presence of other diseases in order to reduce variability and/or control for confounding.

The proportional hazards condition[1] states that covariates are multiplicatively related to the hazard. In the simplest case of stationary coefficients, for example, a treatment with a drug may, say, halve a subject's hazard at any given time t, while the baseline hazard may vary. Note however, that the covariate is not restricted to binary predictors; in the case of a continuous covariate x, the hazard responds logarithmically; each unit increase in x results in proportional scaling of the hazard. The Cox partial likelihood shown below, is obtained by using Breslow's estimate of the baseline hazard function, plugging it into the full likelihood and then observing that the result is a product of two factors. The first factor is the partial likelihood shown below, in which the baseline hazard has "canceled out". The second factor is free of the regression coefficients and depends on the data only through the censoring pattern. The effect of covariates estimated by any proportional hazards model can thus be reported as hazard ratios.

Sir David Cox observed that if the proportional hazards assumption holds (or, is assumed to hold) then it is possible to estimate the effect parameter(s) without any consideration of the hazard function. This approach to survival data is called application of the Cox proportional hazards model,[2] sometimes abbreviated to Cox model or to proportional hazards model.

The partial likelihood

Let Yi denote the observed time (either censoring time or event time) for subject i, and let Ci be the indicator that the time corresponds to an event (i.e. if Ci = 1 the event occurred and if Ci = 0 the time is a censoring time). The hazard function for the Cox proportional hazard model has the form

\Lambda(t|X) = \Lambda_0(t)\exp(\beta_1X_1 + \cdots + \beta_pX_p) = \Lambda_0(t)\exp(\beta^\prime X).

This expression gives the hazard at time t for an individual with covariate vector (explanatory variables) X. Based on this hazard function, a partial likelihood can be constructed from the datasets as

L(\beta) = \prod_{i:C_i=1}\frac{\theta_i}{\sum_{j:Y_j\ge Y_i}\theta_j},

where X1, ..., Xn are the covariate vectors for the n independently sampled individuals in the dataset (treated here as column vectors), and θj = expXj).

The corresponding log partial likelihood is

\ell(\beta) = \sum_{i:C_i=1} \left(\beta^\prime X_i - \log \sum_{j:Y_j\ge Y_i}\theta_j\right).

This function can be maximized over β to produce maximum partial likelihood estimates of the model parameters.

The partial score function is

\ell^\prime(\beta) = \sum_{i:C_i=1} \left(X_i - \frac{\sum_{j:Y_j\ge Y_i}\theta_jX_j}{\sum_{j:Y_j\ge Y_i}\theta_j}\right),

and the Hessian matrix of the partial log likelihood is

\ell^{\prime\prime}(\beta) = -\sum_{i:C_i=1} \left(\frac{\sum_{j:Y_j\ge Y_i}\theta_jX_jX_j^\prime}{\sum_{j:Y_j\ge Y_i}\theta_j} - \frac{\sum_{j:Y_j\ge Y_i}\theta_jX_j\times \sum_{j:Y_j\ge Y_i}\theta_jX_j^\prime}{[\sum_{j:Y_j\ge Y_i}\theta_j]^2}\right).

Using this score function and Hessian matrix, the partial likelihood can be maximized using the Newton-Raphson algorithm. The inverse of the Hessian matrix, evaluated at the estimate of β, can be used as an approximate variance-covariance matrix for the estimate, and used to produce approximate standard errors for the regression coefficients.

Tied times

Several approaches have been proposed to handle situations in which there are ties in the time data. Breslow's method describes the approach in which the procedure described above is used unmodified, even when ties are present. An alternative approach that is considered to give better results is Efron's method.[3] Let tj denote the unique times, let Hj denote the set of indices i such that Yi = tj and Ci = 1, and let mj = |Hj|. Efron's approach maximizes the following partial likelihood.

L(\beta) = \prod_j \frac{\prod_{i\in H_j}\theta_i}{\prod_{\ell=0}^{m-1}[\sum_{i:Y_i\ge t_j}\theta_i - \frac{\ell}{m}\sum_{i\in H_j}\theta_i]

The corresponding log partial likelihood is

\ell(\beta) = \sum_j \left(\sum_{i\in H_j} \beta^\prime X_i -\sum_{\ell=0}^{m-1}\log\left(\sum_{i:Y_i\ge t_j}\theta_i - \frac{\ell}{m}\sum_{i\in H_j}\theta_i\right)\right),

the score function is

\ell^\prime(\beta) = \sum_j \left(\sum_{i\in H_j} X_i -\sum_{\ell=0}^{m-1}\frac{\sum_{i:Y_i\ge t_j}\theta_iX_i - \frac{\ell}{m}\sum_{i\in H_j}\theta_iX_i}{\sum_{i:Y_i\ge t_j}\theta_i - \frac{\ell}{m}\sum_{i\in H_j}\theta_i}\right),

and the Hessian matrix is

\ell^{\prime\prime}(\beta) = -\sum_j \sum_{\ell=0}^{m-1} \left(\frac{\sum_{i:Y_i\ge t_j}\theta_iX_iX_i^\prime - \frac{\ell}{m}\sum_{i\in H_j}\theta_iX_iX_i^\prime}{\phi_{j,\ell,m}} - \frac{Z_{j,\ell,m}\times Z_{j,\ell,m}^\prime}{\phi_{j,\ell,m}^2}\right),


\phi_{j,\ell,m} = \sum_{i:Y_i\ge t_j}\theta_i - \frac{\ell}{m}\sum_{i\in H_j}\theta_i

Z_{j,\ell,m} = \sum_{i:Y_i\ge t_j}\theta_iX_i - \frac{\ell}{m}\sum_{i\in H_j}\theta_iX_i.

Note that when Hj is empty (all observations with time tj are censored), the summands in these expressions are treated as zero.

Time-varying predictors and coefficients

Extensions to time dependent variables, time dependent strata, and multiple events per subject, can be incorporated by the counting process formulation of Andersen and Gill.[4]

In addition to allowing time-varying covariates (i.e., predictors), the Cox model may be generalized to time-varying coefficients as well. That is, the proportional effect of a treatment may vary with time; e.g. a drug may be very effective if administered within one month of morbidity, and become less effective as time goes on. The hypothesis of no change with time (stationarity) of the coefficient may then be tested. Details and software are available in Martinussen and Scheike (2006).[5]

Specifying the baseline hazard function

The Cox model may be specialized if a reason exists to assume that the baseline hazard follows a particular form. In this case, the baseline hazard Λ0(t) is replaced by a given function. For example, assuming the hazard function to be the Weibull hazard function gives the Weibull proportional hazards model.

Incidentally, using the Weibull baseline hazard is the only circumstance under which the model satisfies both the proportional hazards, and accelerated failure time models.

The generic term parametric proportional hazards models can be used to describe proportional hazards models in which the hazard function is specified. The Cox proportional hazards model is sometimes called a semiparametric model by contrast.

Some authors (e.g. Bender, Augustin and Blettner[6]) use the term Cox proportional hazards model even when specifying the underlying hazard function, to acknowledge the debt of the entire field to David Cox.

The term Cox regression model (omitting proportional hazards) is sometimes used to describe the extension of the Cox model to include time-dependent factors. However, this usage is potentially ambiguous since the Cox proportional hazards model can itself be described as a regression model.

Relationship to Poisson models

There is a relationship between proportional hazards models and Poisson regression models which is sometimes used to fit approximate proportional hazards models in software for Poisson regression. The usual reason for doing this is that calculation is much quicker. This was more important in the days of slower computers but can still be useful for particularly large data sets or complex problems. Authors giving the mathematical details include Laird and Olivier (1981),[7] who remark

"Note that we do not assume [the Poisson model] is true, but simply use it as a device for deriving the likelihood."

The book on generalized linear models by McCullagh and Nelder[8] has a chapter on converting proportional hazards models to generalized linear models.

See also


  1. ^ Breslow, N. E. (1975). "Analysis of Survival Data under the Proportional Hazards Model". International Statistical Review / Revue Internationale de Statistique 43 (1): 45–57. doi:10.2307/1402659. JSTOR 1402659. 
  2. ^ Cox, David R (1972). "Regression Models and Life-Tables". Journal of the Royal Statistical Society. Series B (Methodological) 34 (2): 187–220. JSTOR 2985181.  MR0341758
  3. ^ Efron, Bradley (1974). "The Efficiency of Cox's Likelihood Function for Censored Data". Journal of the American Statistical Association 72 (359): 557–565. JSTOR 2286217. 
  4. ^ Andersen, P.; Gill, R. (1982). "Cox's regression model for counting processes, a large sample study.". Annals of Statistics 10 (4): 1100–1120. doi:10.1214/aos/1176345976. JSTOR 2240714. 
  5. ^ Martinussen & Scheike (2006) Dynamic Regression Models for Survival Data (Springer).
  6. ^ Bender, R., Augustin, T. and Blettner, M. (2006). Generating survival times to simulate Cox proportional hazards models, Statistics in Medicine 2005; 24:1713–1723. doi: 10.1002/sim.2369
  7. ^ Nan Laird and Donald Olivier (1981). "Covariance Analysis of Censored Survival Data Using Log-Linear Analysis Techniques". Journal of the American Statistical Association 76 (374): 231–240. doi:10.2307/2287816. JSTOR 2287816. 
  8. ^ P. McCullagh and J. A. Nelder (2000). "Chapter 13: Models for Survival Data". Generalized Linear Models (Second ed.). Boca Raton, Florida: Chapman & Hall/CRC. ISBN 0-412-31760-5.  (Second edition 1989; first CRC reprint 1999.)


  • D. R. Cox and D. Oakes (1984). Analysis of survival data (Chapman & Hall).
  • D. Collett (2003). Modelling survival data in medical research (Chapman & Hall/CRC).
  • T. M. Therneau and P. M. Grambsch (2000). Modeling survival data: extending the Cox Model (Springer).
  • V.Bagdonavicius, R.Levuliene, M.Nikulin (2010). "Goodness-of-fit criteria for the Cox model from left truncated and right censored data". Journal of Mathematical Sciences, v.167, #4, 436-443.

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Cox Models — This article is about the division of Estes Industries. For Cox models in statistics, see proportional hazards models. Cox Models, a former division of Estes Industries of Penrose, Colorado, was a multimillion dollar hobby company, is one of the… …   Wikipedia

  • Accelerated failure time model — In the statistical area of survival analysis, an accelerated failure time model (AFT model) is a parametric model that provides an alternative to the commonly used proportional hazards models. Whereas a proportional hazards model assumes that the …   Wikipedia

  • Hazard ratio — The hazard ratio in survival analysis is the effect of an explanatory variable on the hazard or risk of an event. For a less technical definition than is provided here, consider hazard ratio to be an estimate of relative risk and see the… …   Wikipedia

  • David Cox (statistician) — David Cox Born 15 July 1924 (1924 07 15) (age 87) …   Wikipedia

  • Survival analysis — is a branch of statistics which deals with death in biological organisms and failure in mechanical systems. This topic is called reliability theory or reliability analysis in engineering, and duration analysis or duration modeling in economics or …   Wikipedia

  • Regression dilution — is a statistical phenomenon also known as attenuation . Consider fitting a straight line for the relationship of an outcome variable y to a predictor variable x, and estimating the gradient (slope) of the line. Statistical variability,… …   Wikipedia

  • Logrank test — In statistics, the logrank test is a hypothesis test to compare the survival distributions of two samples. It is a nonparametric test and appropriate to use when the data are right skewed and censored (technically, the censoring must be non… …   Wikipedia

  • First-hitting-time model — In statistics, first hitting time models are a sub class of survival models. The first hitting time, also called first passage time, of a set A with respect to an instance of a stochastic process is the time until the stochastic process first… …   Wikipedia

  • Linear regression — Example of simple linear regression, which has one independent variable In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one… …   Wikipedia

  • Randomized controlled trial — Flowchart of four phases (enrollment, intervention allocation, follow up, and data analysis) of a parallel randomized trial of two groups, modified from the CONSORT (Consolidated Standards of Reporting Trials) 2010 Statement[1] …   Wikipedia