There are many different ways of representing longitudinal
data structures
Some depend on the nature of the data ...
...or on the nature of the analysis
Others are largely equivalent
Let's take pure panel information: discrete evenly spaced
state observations
One file per wave, with identical structure, identified by
PID
One file, one record per respondent, identified wave by
variable name/position
One file, one record per respondent per wave, identified by
PID and a wave-number index variable.
These are equivalent in their information content:
We can move between them relatively easily (especially
between types 2 and 3, in Stata and later versions of SPSS)
But differ in their ease of use for different purposes.
For example, to cross-tabulate a variable for a given pair of
waves, type 2 is clearly better.
However, if you want to cross-tabulate current status with
last year's status, pooling across waves, type 3 is better.
Status history data is relatively simple in principle: there
is an observation for each time unit per person
An easy way to represent is as a wide horizontal file:
one variable per time unit
Broadly equivalent is a long vertical file: one record per person-time-unit
The practical complication is combining waves
If (as in ECHP design) the reference period is a calendar
year, some respondents do not report their recent experience
If (as in BHPS) a variable length reference period is used
there will be overlap
With overlap, a decision for the analyst: which report to accept?
In SPSS, handling wide `calendars' by VECTOR/LOOP is
straightforward
In Stata, handling long vertical files is easy
Event history data is a little more complicated
An efficient representation is to record the dates and
destinations of all transitions: this is a pure event
history (the act of observation must be recorded as an event)
Closely related is spell or episode history: store start of
spell, state and end-date (including `on-going at time of
observation' or `censored')
However, for many purposes event/episode data can be
transformed into state histories, with a variable per time unit
This can be wasteful, if the average spell length is much
greater than one time unit: long strings of the same data
A bigger problem is that it loses information: for instance
two successive jobs with the same characteristics look like one
long job.
It's also harder to think in spell terms (how long, when did
this spell end/start)
But if you need to relate status in many domains, it's very
convenient (e.g., you want to know job status and marital status at
a particular time)