- These operations are relatively straightforward in SPSS
- First: Summarising over lower-level units nested within
higher
- individuals within households
- job or marriage spells within individuals
aggregate out = summary.sav /break = ghid
/totinc = sum(indinc)
/number = n.
. . .
get file summary.sav.
compute aveinc = totinc/number.
AGGREGATE OUTFILE={file} [/MISSING=COLUMNWISE] [/DOCUMENT]
{* }
[/PRESORTED] /BREAK=varlist[({A})][varlist...]
{D}
/aggvar['label']aggvar['label']...=function(arguments)[/aggvar ...]
SUM Sum MEAN Mean
SD Standard deviation MAX Maximum
MIN Minimum PGT % of cases gt value
PLT % of cases lt value PIN % of cases between values
POUT % of cases not in range FGT Fraction gt value
FLT Fraction lt value FIN Fraction between values
FOUT Fraction not in range N Weighted n
NU Unweighted n NMISS Weighted n of missing
NUMISS Unweighted n of missing FIRST First nonmissing
LAST Last nonmissing
- Second: Mapping higher-level data back to lower levels
- For instance, mapping household level to individual level,
either aggregated data like this or data recorded at household
level
- To do this we use a many-to-one match, and add data from the
household record to every record of an individual in that
household
match files file = individual.sav
/table = household.sav /by = hhid.
Both files must be sorted according to hhid.
- Stata syntax:
. use individual.dta
. merge hhid using household.dta
- Or from scratch, with great efficiency
. use individual.dta
. sort ahid apno
. egen totinc=sum(indinc), by(ahid)
- Third: matching individuals across waves - a key panel operation!
- This is a one-to-one match, matching an individual in one
file to his/her record in another file.
- SPSS syntax
match files file = wave1.sav
/file = wave2.sav /* "/file" not "/table" */
/by = pid.
Both files must be sorted according to pid.
- Stata syntax:
. use wave1.dta
. merge pid using wave2.dta
- Three types of record may result:
- with information from wave 1 only
- with information from wave 2 only
- with information from wave 1 and wave 2
- Stata creates a variable
_merge which distinguishes
the three cases
- In SPSS one can use the /in= command to create marker
variables
- Fourth: matching individuals in households
- identify individuals with partner in household
- see wHGSPN: spouse's wPNO,
0 if
not present (from the household grid)
- we thus identify partner's wPNO
- save WHID, partner's wPNO and substantive variables to a
temporary file (renaming variables)
- merge temporary file back in, linking WHID and wPNO in main
file to partner's WHID and wPNO in temporary file
get file = "s:\bhps\spss\aindresp.sav"
/keep = ahid ahgspn ajbsemp ajbft asex.
select if (ahgspn gt 0).
sort cases by ahid ahgspn.
save out="m:\spsinfo.sav"
/rename=(ahgspn ajbsemp ajbft asex
= apno aspjbsem aspjbft aspsex).
- This is now a file of PIDs and spouses' characteristics:
match it back.
get file = 's:\bhps\spss\aindresp.sav'
/keep = ahid apno ajbsemp ajbft asex
match files file = *
/file = "m:\spsinfo.sav" /by=ahid,apno.
- Stata code:
use ahid ahgspn ajbsemp ajbft asex using s:\bhps\stata\aindresp
keep if ahgspn > 0
rename ahgspn apno
rename ajbsemp aspjbsem
rename ajbft aspjbft
rename asex aspsex
sort ahid apno
save m:\spinfo, replace
use ahid apno ahgspn ajbsemp ajbft asex using s:\bhps\stata\aindresp
sort ahid apno
merge ahid apno using m:\spinfo
- Couples are straightforward because it is a symmetric
relationship with only two people; partner's PID is
cheap to store
- For more general relationships wEGOALT can be used:
- A matrix identifying the relationship of every person in
the household