Duration models

Longitudinal data involves time by definition
Continuous-model observation means we measure duration, and we can model it: what characteristics raise expected duration? which lower it?
However, with observed durations we usually have significant proportions which are censored, that is spells which are on-going at the time of observation
- we know the spells are at least `this' long, but we don't know their true duration or outcome.
As a result simple models such as OLS regression with duration as the -variable will be seriously biased.
The alternative is hazard modelling: model the instantaneous or period-by-period hazard of exit
- instantaneous: continuous time models
- period-by-period: discrete time models
- probability conditional on being at risk (i.e., not already experienced exit)
Conceptually the focus shifts from duration to person-periods:
- a censored spell of length contributes observations of `no-exit', with the last one indicated as censored
- a terminated spell of length contributes observations of `no-exit' and one of `exit'
- Thus we use all the information without suffering censoring bias
Kaplan-Meier survival estimates use this methodology:
- Given cases at the start of period 1, experiencing a transition and disappearing from observation
- The transition rate is estimated as $\frac{n}{N - m/2}$
- The survival rate between times 1 and 2 is $1 - \frac{n}{N - m/2}$
- Continuing this curve over time gives the `survival function' which can be considered an estimate of the proportion of surviving if there were no censoring.
The key to coping with censoring is thinking in terms of person-time-units
Parametric approaches exist too: we can model the hazard rate.
In continuous time:

$\begin{displaymath} h_t = \lim_{\Delta_t\rightarrow 0} \frac{P(T<t+\Delta_t\vert T>t)}{\Delta_t} \end{displaymath}$

$\includegraphics{timeobs}$

Observation window and censoring
Many types of model exist, in continuous and discrete time, with different parameterisation of the duration dependence
- the way which hazard changes with time, controlling for covariates
These include
- exponential model (constant hazard, therefore an exponential distribution of completed durations)
- Weibull and Gompertz models
- Cox proportional hazard model (semi-parametric: model is structured to let duration dependence drop out of the estimation)
The exponential model is not enormously realistic but very easy to fit, and very flexible.
Very easy to include time-dependent covariates: explanatory variables which change during the spell
One strategy is to generate a long-format spell files, with a spell for each period where all the covariates are the same, and a 1 unit spell indicating any transitions that occur: weight by spell duration and fit a logistic regression with transition as the dependent variable.
Cox PH regression very flexible, very good on TDCs, as long as you have continuous time measurement.
Cox regression available in SPSS and Stata, though programming the TDCs takes a little time.
Many more discrete time models available in Stata, including as user-written additions (e.g., pgmhaz and others by Stephen Jenkins)
TDA was written from scratch to fit duration models and is relatively easy to use and is in some ways a reference implementation of certain models.
References for hazard modelling:
- Allison (1984) - good, accessible, clear
- Blossfeld and Rohwer (2002, 2nd ed) - uses TDA

© Brendan Halpin (e-mail)	23-Apr-2012
Department of Sociology, University of Limerick
Taught programme: MA in Sociology (Applied Social Research),
Short course, May 14/15 2012: Categorical Data Analysis for Social Scientists