There are many summary measures of association in tables.
In the simplest case of a table, a popular
measure is (phi):
Given this table:
1
2
Total
1
a
b
a+b
2
c
d
c+d
Total
a+c
b+d
a+b+c+d
Phi is defined as:
Where there are higher than expected
numbers in cells a and/or d, will tend to 1, and where the
opposite is true, to .
For tables of variables with more than two categories an
analogous measure is Cramér's .
For tables it is equivalent.
Same range .
Based on Pearson's , a more general measure (Cramér's
is a version of scaled to be independent of
table size).
Pearson's is based on the deviation between the observed
values () and the expected () values under the assumption of
independence:
That is, for each cell it calculates a measure of the
observed-expected difference, and adds them up.
Pearson's has a (chi-squared) distribution
which allows us to make inferences about association:
Sampling from a population where two variables are truly
independent will result in tables which do not exactly
match the independence table.
The probability distribution of the calculated value
under these circumstances follows a distribution, with
degrees of freedom equal to .
By comparing the for a real table with the
cumulative distribution we can test the null
hypothesis that there is no association.
e.g., if the is at least as big as the value you could
theoretically get from truly independent variables no more
than, say, 1% of the time, then you can be 99% confident that
there is really association.
Many measures relevant to loglinear models approximate a
distribution.