Continuous-model observation means we measure duration, and
we can model it: what characteristics raise expected duration?
which lower it?
However, with observed durations we usually have significant
proportions which are censored, that is spells which are
on-going at the time of observation
we know the spells are at least `this' long, but we don't
know their true duration or outcome.
As a result simple models such as OLS regression with
duration as the -variable will be
seriously biased.
The alternative is hazard modelling: model the instantaneous
or period-by-period hazard of exit
instantaneous: continuous time models
period-by-period: discrete time models
probability conditional on being at risk (i.e., not already
experienced exit)
Conceptually the focus shifts from duration to
person-periods:
a censored spell of length contributes
observations of `no-exit', with the last one indicated as
censored
a terminated spell of length contributes
observations of `no-exit' and one of `exit'
Thus we use all the information without suffering censoring bias
Kaplan-Meier survival estimates use this methodology:
Given cases at the start of period 1, experiencing
a transition and disappearing from observation
The transition rate is estimated as
The survival rate between times 1 and 2 is
Continuing this curve over time gives the `survival
function' which can be considered an estimate of the proportion
of surviving if there were no censoring.
The key to coping with censoring is thinking in terms of
person-time-units
Parametric approaches exist too: we can model the hazard
rate.
In continuous time:
Observation window and censoring
Many types of model exist, in continuous and discrete time,
with different parameterisation of the duration dependence
the way which hazard changes with time, controlling for covariates
These include
exponential model (constant hazard, therefore an
exponential distribution of completed durations)
Weibull and Gompertz models
Cox proportional hazard model (semi-parametric: model is
structured to let duration dependence drop out of the
estimation)
The exponential model is not enormously realistic but very
easy to fit, and very flexible.
Very easy to include time-dependent covariates:
explanatory variables which change during the spell
One strategy is to generate a long-format spell files, with a
spell for each period where all the covariates are the same, and
a 1 unit spell indicating any transitions that occur: weight by
spell duration and fit a logistic regression with transition as
the dependent variable.
Cox PH regression very flexible, very good on TDCs, as long
as you have continuous time measurement.
Cox regression available in SPSS and Stata, though
programming the TDCs takes a little time.
Many more discrete time models available in Stata, including
as user-written additions (e.g., pgmhaz and others by
Stephen Jenkins)
TDA was written from scratch to fit duration models and is
relatively easy to use and is in some ways a reference
implementation of certain models.