Methods of data representation

There are many different ways of representing longitudinal data structures
Some depend on the nature of the data ...
...or on the nature of the analysis
Others are largely equivalent
Let's take pure panel information: discrete evenly spaced state observations
1. One file per wave, with identical structure, identified by PID
2. One file, one record per respondent, identified wave by variable name/position
3. One file, one record per respondent per wave, identified by PID and a wave-number index variable.
These are equivalent in their information content:
We can move between them relatively easily (especially between types 2 and 3, in Stata and later versions of SPSS)
But differ in their ease of use for different purposes.
For example, to cross-tabulate a variable for a given pair of waves, type 2 is clearly better.
However, if you want to cross-tabulate current status with last year's status, pooling across waves, type 3 is better.
Status history data is relatively simple in principle: there is an observation for each time unit per person
An easy way to represent is as a wide horizontal file: one variable per time unit
Broadly equivalent is a long vertical file: one record per person-time-unit
The practical complication is combining waves
- If (as in ECHP design) the reference period is a calendar year, some respondents do not report their recent experience
- If (as in BHPS) a variable length reference period is used there will be overlap
- With overlap, a decision for the analyst: which report to accept?
In SPSS, handling wide `calendars' by VECTOR/LOOP is straightforward
In Stata, handling long vertical files is easy
Event history data is a little more complicated
An efficient representation is to record the dates and destinations of all transitions: this is a pure event history (the act of observation must be recorded as an event)
Closely related is spell or episode history: store start of spell, state and end-date (including `on-going at time of observation' or `censored')
However, for many purposes event/episode data can be transformed into state histories, with a variable per time unit
This can be wasteful, if the average spell length is much greater than one time unit: long strings of the same data
A bigger problem is that it loses information: for instance two successive jobs with the same characteristics look like one long job.
It's also harder to think in spell terms (how long, when did this spell end/start)
But if you need to relate status in many domains, it's very convenient (e.g., you want to know job status and marital status at a particular time)

© Brendan Halpin (e-mail)	23-Apr-2012
Department of Sociology, University of Limerick
Taught programme: MA in Sociology (Applied Social Research),
Short course, May 14/15 2012: Categorical Data Analysis for Social Scientists