Data handling strategies

These operations are relatively straightforward in SPSS
First: Summarising over lower-level units nested within higher
- individuals within households
- job or marriage spells within individuals

aggregate out = summary.sav /break = ghid
  /totinc = sum(indinc)
  /number = n.
 . . .
get file summary.sav.
compute aveinc = totinc/number.

AGGREGATE OUTFILE={file} [/MISSING=COLUMNWISE] [/DOCUMENT]
                  {*   }
 [/PRESORTED] /BREAK=varlist[({A})][varlist...]
                              {D}
  /aggvar['label']aggvar['label']...=function(arguments)[/aggvar ...]

SUM     Sum                      MEAN   Mean
SD      Standard deviation       MAX    Maximum
MIN     Minimum                  PGT    % of cases gt value
PLT     % of cases lt value      PIN    % of cases between values
POUT    % of cases not in range  FGT    Fraction gt value
FLT     Fraction lt value        FIN    Fraction between values
FOUT    Fraction not in range    N      Weighted n
NU      Unweighted n             NMISS  Weighted n of missing
NUMISS  Unweighted n of missing  FIRST  First nonmissing
LAST    Last nonmissing

Stata's equivalent command is collapse:

. collapse (sum) totinc=ind (count) n=apno, by(ahid)
. gen aveinc = totinc/n

Second: Mapping higher-level data back to lower levels
For instance, mapping household level to individual level, either aggregated data like this or data recorded at household level
To do this we use a many-to-one match, and add data from the household record to every record of an individual in that household
```
match files file = individual.sav
           /table = household.sav     /by = hhid.
```
Both files must be sorted according to hhid.

Stata syntax:

. use individual.dta
. merge hhid using household.dta

Or from scratch, with great efficiency

. use individual.dta
. sort ahid apno
. egen totinc=sum(indinc), by(ahid)

Third: matching individuals across waves - a key panel operation!
This is a one-to-one match, matching an individual in one file to his/her record in another file.

SPSS syntax

match files file = wave1.sav
           /file = wave2.sav   /* "/file" not "/table" */
     /by = pid.

Both files must be sorted according to pid.

Stata syntax:

. use wave1.dta
. merge pid using wave2.dta

Three types of record may result:
1. with information from wave 1 only
2. with information from wave 2 only
3. with information from wave 1 and wave 2
Stata creates a variable _merge which distinguishes the three cases
In SPSS one can use the /in= command to create marker variables
Fourth: matching individuals in households
- identify individuals with partner in household
  - see wHGSPN: spouse's wPNO, $\leq$ 0 if not present (from the household grid)
- we thus identify partner's wPNO
- save WHID, partner's wPNO and substantive variables to a temporary file (renaming variables)
- merge temporary file back in, linking WHID and wPNO in main file to partner's WHID and wPNO in temporary file
$\includegraphics{rect4}$
```
get file = "s:\bhps\spss\aindresp.sav"
      /keep = ahid ahgspn ajbsemp ajbft asex.

select if (ahgspn gt 0).
sort cases by ahid ahgspn.

save out="m:\spsinfo.sav"
          /rename=(ahgspn ajbsemp ajbft asex
                    = apno aspjbsem aspjbft aspsex).
```

This is now a file of PIDs and spouses' characteristics: match it back.

get file = 's:\bhps\spss\aindresp.sav'
  /keep = ahid apno ajbsemp ajbft asex
match files file = *
           /file = "m:\spsinfo.sav" /by=ahid,apno.

Stata code:

use ahid ahgspn ajbsemp ajbft asex using s:\bhps\stata\aindresp
keep  if ahgspn > 0
rename ahgspn  apno
rename ajbsemp aspjbsem
rename ajbft   aspjbft 
rename asex    aspsex  

sort ahid apno

save m:\spinfo, replace

use ahid apno ahgspn ajbsemp ajbft asex using s:\bhps\stata\aindresp
sort ahid apno
merge ahid apno using m:\spinfo

Couples are straightforward because it is a symmetric relationship with only two people; partner's PID is cheap to store
For more general relationships wEGOALT can be used:
- A matrix identifying the relationship of every person in the household

© Brendan Halpin (e-mail)	23-Apr-2012
Department of Sociology, University of Limerick
Taught programme: MA in Sociology (Applied Social Research),
Short course, May 14/15 2012: Categorical Data Analysis for Social Scientists