Collapsing categories

Where we can't exclude an entire variable it may be possible to address sparseness and parsimony by simplifying variables, reducing the number of categories. In this model
```
genlog freq size mode
  /print=est/plot=none
  /design = freq size mode mode*size 
            freq*mode freq*size.
```
The two estimated parameters for the freq*size interaction are 0.5828 and 0.3895, for the effect of large and medium respectively, versus small, the reference category. Thus the difference between large and medium is approximately half the difference between medium and small. What happens if we redefine size as large/medium versus small?
```
recode size (1,2=1) (3=2) into size2.
genlog freq size2 mode
  /print=est/plot=none
  /design = freq size2 mode mode*size2
            freq*mode freq*size2.
```
Deviance falls from 4.1415 for 4 df to 0.2633 for 2 df. They are both very well fitting models, and the difference between them is slight (), but the simpler model has the virtue of being simpler to interpret.
These represent two methods for simplifying complex tables: the exclusion of a variable from the table (collapsing the table), or collapsing categories of variables. These strategies can be very useful in larger tables where sparsity or lack of parsimony may be a problem.
Agresti has an interesting discussion of collapsibility in terms of graph theory (section 7.1).

© Brendan Halpin (e-mail)	23-Apr-2012
Department of Sociology, University of Limerick
Taught programme: MA in Sociology (Applied Social Research),
Short course, May 14/15 2012: Categorical Data Analysis for Social Scientists