SO5032: Lab Materials
Table of Contents
1. Week 10 Lab
1.1. Logistic regression and odds ratios
Load this death penalty data from Agresti:
source("https://teaching.sociology.ul.ie/so5032/tabxw.R")
dpdf <- data.frame(
count = c( 53, 414, 0, 16, 11, 37, 4, 139),
defendant = factor(c( 1, 1, 1, 1, 2, 2, 2, 2), labels=c("White", "Black")),
victim = factor(c( 1, 1, 2, 2, 1, 1, 2, 2), labels=c("White", "Black")),
penalty = factor(c( 1, 0, 1, 0, 1, 0, 1, 0), labels=c("No", "Yes")))
It summarises murder convictions by defendant's and victim's race, and whether the death penalty was handed down. We've seen it briefly before.
First tabulate by defendant's race and verdict, and calculate the odds ratio by hand (black yes over black no, all over white yes over white no):
tabxw(dpdf, defendant, penalty, wt=count) tabxw(dpdf, defendant, penalty, wt=count, "row")
Interpret the odds ratio: what does it say about the relationship between defendant's race and penalty?
Then fit the logistic regression with defendant's race explaining the penalty. Exponentiate the slope coefficient, and satisfy yourself that it matches the OR you calculated by hand:
## Expand the table data frame into library(dplyr) dpdfexp <- slice(dpdf,rep(1:n(), times=dpdf$count)) summary(glm(formula = penalty ~ defendant, family = "binomial", data = dpdfexp)) exp(mod1$coefficients[2])
We can take account of victim's race as well. Since this is correlated with both defendant's race and penalty, it could change the results:
mod2 <- glm(data=dpdfexp, penalty ~ defendant + victim, family="binomial")
What happens to the effect of defendant's race when victim's race is included?
1.2. Model search
Load the following extract from the European Social Survey (Ireland, wave 9):
library(foreign)
ess <- read.dta("http://teaching.sociology.ul.ie/so5032/logitessie.dta")
Examine the variables, and use logistic regression to model what affects the odds of being married. Use z-tests and likelihood ratio tests to build a good model.
For example, a likelihood ratio test for adding var4 and var5 (mod2) to a model that already contains var1, var2 and var3 (mod1):
library(lmtest) lrtest(mod2, mod1)
Think about the model that you end up with. What matter and what doesn't? Think about the order of causality: why should age have an effect? Is your health now a legitimate predictor of whether you got married in the past (and stayed married)?