Next:
Poisson Regression
Up:
Introduction: the analysis of
Previous:
Introduction: the analysis of
What's Categorical Data?
Categorical data consists of variables with a
finite
number of values (really, a
small
number of
discrete
values).
It can be:
nominal
ordinal
interval
or ratio data,
But it can't be
continuous
.
Categorical data arise in a number of ways
Simple counts
Binary variables - yes/no, pass/fail, live/die
Unordered multinomial: christian, jew, muslim, atheist
Ordinal:
Pure: degree, complete second level, incomplete second level, primary only, no education.
Imperfect scale measurement: e.g., Likert's 5-point scale
Grouped variables: e.g., income in bands
Some of these are interval-ratio variables:
Count of number of children
Income in bands (imprecise ratio measurement)
and we can use appropriate summaries, such as means and correlations.
However, the really powerful technique of
Ordinary Least Squares Regression
won't do:
OLS requires a dependent variable that is
conditionally normally distributed
OLS may well predict impossible values - negative counts, probabilities outside the 0-1 range
So something else is required
Consider first count data: non-negative integers
Where the mean of a count variable is sufficiently large, its distribution approximates the normal and
OLS will do okay
Counts are distributed as Poisson:
This is a
discrete
function
Asymmetric but approximated by the normal distribution for large values of
Standard deviation is
Next:
Poisson Regression
Up:
Introduction: the analysis of
Previous:
Introduction: the analysis of
© Brendan Halpin
(e-mail)
23-Apr-2012
Department of Sociology
,
University of Limerick
Taught programme:
MA in Sociology (Applied Social Research)
,
Short course, May 14/15 2012:
Categorical Data Analysis for Social Scientists