Grouped and individual level logistic

We have seen that we can fit logistic regressions on individual level or grouped data
that with grouped data there is a strong and simple link between logistic and loglinear models
and that the parameter estimates are unchanged whether we fit individually or grouped.
So why fit grouped?
It can be shown that for poisson, binomial and multinomial models, that as long as the number of ``settings'' - combinations of values of explanatory variables - is (i) fixed and (ii) small relative to the sample size, has a $\chi ^2$ distribution with the number of parameters as degrees of freedom
For grouped logistic and loglinear models, these requirements are usually met: we can think of the data set as a table, with a number of cells that is much smaller than the sample size.
Generally as long as there are very few cells with a fitted value of 5 or less, it is assumed that approximates a $\chi ^2$ distribution.
This allows us to use as a measure of the overall fit: if it is ``significant'', then the saturated model is better than this one
For individual level data, however, there may be as many ``settings'' as cases, and a new sample will have different ``settings''.
Thus, though we can calculate , we cannot use it to measure overall fit of individual level models
However, $\Delta G^2$ derived in comparing pairs of nested models does still have a $\chi ^2$ distribution, and should be used to aid in model search
$\Delta G^2 = \Delta L_m$ , since
One strategy recommended to assess overall fit is to group all continuous variables and fit a loglinear logistic model
The Hosmer-Lemeshow test is less drastic but has a similar logic: it groups the data into deciles according to the predicted probabilities, and conducts an test on the observed and expected values in the table (decile by dependent variable)
This test has been shown to be approximately $\chi ^2$ distributed, with 8 degrees of freedom
In SPSS, if the fit is good, the significance p-value should be high (e.g., in excess of 0.2, certainly greater than 0.1)
Another commonly used test of fit is the classification table: a table of observed by expected values, where a predicted probability is considered as 1, otherwise zero. This cut off point is arbitrary, and almost always unsuitable
The classification plot is more successful, as this shows a histogram of predicted probability values, distinguishing between observed 0s and 1s

An extension of the classification table approach is the Receiver Operating Characteristic (ROC) curve approach: this draws a plot of the proportion correctly predicted varying the cut-off point from 0 to 100%, against the one minus the proportion wrongly classified as 0s
The more successful the prediction, the greater the area between the curve and the 45 degree line
However, it is possible to have the ``best'' model for a particular data set and yet not have much predictive power
Model search also involves looking at the contribution of each variable and its parameters. The Wald test uses the fact that standard errors have an asymptotically normal distribution, so for reasonably large samples we can conduct the usual t-test against the null hypothesis that $\beta=0$
See Hosmer and Lemeshow, Ch 5, Agresti p112ff.

Logistic Regression Unit 2

© Brendan Halpin (e-mail)	23-Apr-2012
Department of Sociology, University of Limerick
Taught programme: MA in Sociology (Applied Social Research),
Short course, May 14/15 2012: Categorical Data Analysis for Social Scientists