Non-response and imputation

Attrition is compensated for by longitudinal weighting, to re-assert representativity to the initial population
Item non-response is, in many cases, compensated for by imputing a value, especially where a household summary (e.g., household income) would be affected
The default behaviour in the case of missing values is to drop the whole case - this is acceptable only if values are missing completely at random
If not missing completely at random, imputation makes for better statistical estimates
Two main methods of imputation:
- `Hot-decking': take a value at random from those of cases with identical characteristics
- Regression based: fit a model using non-missing covariates and cases where the variable is not missing, then predict a value for cases where it is missing
Hot decking introduces some randomness (good!) and ensures the imputed value is a possible real-world value
Regression-based imputation is in some ways more precise, but the imputed values have too little variance (i.e., $\hat{y}_i = y_i - e_i$ )
Special attention is paid to the longitudinal logic:
- account is taken of the previous wave's value
- but not to the extent to under-represent wave-on-wave transition rates
If a variable contains imputed values there is a parallel imputation-flag variable
If the case is imputed, the flag contains the missing value the original variable used to hold, and is otherwise 0

© Brendan Halpin (e-mail)	23-Apr-2012
Department of Sociology, University of Limerick
Taught programme: MA in Sociology (Applied Social Research),
Short course, May 14/15 2012: Categorical Data Analysis for Social Scientists