We have seen that we can fit logistic regressions on
individual level or grouped data
that with grouped data there is a strong and simple link
between logistic and loglinear models
and that the parameter estimates are unchanged whether we fit
individually or grouped.
So why fit grouped?
It can be shown that for poisson, binomial and multinomial
models, that as long as the number of ``settings'' -
combinations of values of explanatory variables - is (i) fixed
and (ii) small relative to the sample size, has a
distribution with the number of parameters as degrees of freedom
For grouped logistic and loglinear models, these requirements
are usually met: we can think of the data set as a table,
with a number of cells that is much smaller than the sample size.
Generally as long as there are very few cells with a fitted
value of 5 or less, it is assumed that approximates a
distribution.
This allows us to use as a measure of the overall fit:
if it is ``significant'', then the saturated model is better than
this one
For individual level data, however, there may be as many
``settings'' as cases, and a new sample will have different
``settings''.
Thus, though we can calculate , we cannot use it to
measure overall fit of individual level models
However, derived in comparing pairs of nested
models does still have a distribution, and should be
used to aid in model search
, since
One strategy recommended to assess overall fit is to group
all continuous variables and fit a loglinear logistic model
The Hosmer-Lemeshow test is less drastic but has a similar
logic: it groups the data into deciles according to the predicted
probabilities, and conducts an test on the observed and
expected values in the table (decile by dependent variable)
This test has been shown to be approximately
distributed, with 8 degrees of freedom
In SPSS, if the fit is good, the significance p-value should
be high (e.g., in excess of 0.2, certainly greater than 0.1)
Another commonly used test of fit is the classification
table: a table of observed by expected values, where a predicted
probability is considered as 1, otherwise zero. This cut
off point is arbitrary, and almost always unsuitable
The classification plot is more successful, as this shows a
histogram of predicted probability values, distinguishing between
observed 0s and 1s
An extension of the classification table approach is the
Receiver Operating Characteristic (ROC) curve approach: this
draws a plot of the proportion correctly predicted varying the
cut-off point from 0 to 100%, against the one minus the
proportion wrongly classified as 0s
The more successful the prediction, the greater the area
between the curve and the 45 degree line
However, it is possible to have the ``best'' model for a
particular data set and yet not have much predictive power
Model search also involves looking at the contribution of
each variable and its parameters. The Wald test uses the fact
that standard errors have an asymptotically normal distribution,
so for reasonably large samples we can conduct the usual t-test
against the null hypothesis that