Zero cells arise for two reasons: either that combination is
impossible (structural zero) or it is rare and happens not to
occur in the sample.
The first table shown in this course ()
had a structural zero:
no men on maternity leave. If you were to tabulate academics'
jobs at both ends of a five-year span, one triangle of the table
would be empty: no one gets demoted. Nearly half the table would
be occupied by structural zeros.
Structural zeros are not a problem: weight them out of the
analysis (using /CSTRUCTURE in SPSS, for instance). We
lose a degree of freedom for each cell weighted out, but they do
not contribute any information, so this is correct.
Sampling zeros are a different problem: they are an
exacerbated case of sparseness. Depending on their prevalence and
location in a table they may not be a problem.
But if a model contains a term relating to a marginal that
contains a zero total (e.g., if there is an entire row of sampling
zeros) the true value of some corresponding parameters will be
infinite: the computer will probably estimate a large parameter
with a ridiculously large standard error, and the estimate will
be very unstable.
Sampling zeros can also cause practical problems for the
program, causing it to have difficulty in converging (this is
closely related to parameters being infinite).
An easy solution to this problem is to add a small constant
to the zero (or all) cells: Agresti suggests ; the
algorithm has no difficulty with very small numbers but true
zeros confuse it.
Agresti (section 7.4) and Lindsey (Ch. 5) are good on zero
cells. Lindsey (in an earlier book) provides a macro which
detects which zero cells are causing problems in a model, and
refits the model excluding these cells. This can be regarded as a
conservative test, as it excludes cells, reducing the degrees of
freedom, and is more likely to affect complex models, making it
more likely that they will be rejected.
This automates a proposal Agresti makes, of excluding problem
cells and refitting. Lindsey's macro is written in GLIM, but I
have a Stata version which I can make available to anyone interested.