(Long section: enough material for at least two lab sessions)
The file newpan.sav is organised in a spell format, where each record represents a spell of time of variable length, and there are many records per individual. Because the number of records is highly variable, this sort of data is best stored in long format.
Each record in this file represents a report on current employment status (from a wINDRESP) or a completion of an employment status spell (from a wJOBHIST, which also reports some entire spells start to finish), organised in temporal order. If we're interested in employment-status spells, this file needs some processing before it is useful: employment-status episodes are composed of one or more of the records in this file, termed `splits'. We can identify them by the episno variable2, and thereby collapse consecutive splits into episode records using aggregate.
get file 's:\bhps\spss\ltr\newpan.sav'. agg out = * /break = pid, episno /endtype = last (rectype) /start = first(epdate) /end = first(epend) /lcens = first(lcens) /empstat = first(empstat). recode empstat (-9 thru -1=copy)(1=1)(3=2)(4 thru hi=3). value labels empstat 1 'Employed' 2 'Unemployed' 3 'Out of LM'. missing values all (-9 thru -1).
We retain the last value of rectype in the new variable endtype: if the last `split' representing an episode is a wJOBHIST that means we observe the spell's true end, if it is a wINDRESP it is still on-going or we failed to collect data on its end, so it is censored. lcens is a split-level variable indicating that the start date for that record is (partly) missing, and thus that we don't know it exactly. If the first `split' in an episode is left-censored the episode is left-censored, so we don't know its exact start date, just that it started on or before the recorded date.