SO5032: Lab Materials
Table of Contents
1. Week 9 Lab
1.1. Logistic Regression
This file contains Stata code to run the credit card example in A&F:
library(foreign)
cc <- read.table("https://teaching.sociology.ul.ie/so5032/creditcard.dat", col.names=c("income", "card"))
Execute the code to create the working data file, and examine the
relationship between having a credit card and income (compare means,
group income and cross tabulate, etc.). The variable card records
whether a credit card is held, and income is the explanatory variable.
Now fit the linear probability model and interpret it:
modlpm <- lm(data=cc, card ~ income)
Save the predicted values:
cc$plin <- predict(modlpm)
Examine them, e.g., with scatterplots, histograms.
Now fit the logistic regression, and interpret it:
modlgt <- glm(data=cc, card ~ income, family="binomial")
Note that the glm() function fits lots of different models, but if you tell it to fit a "binomial-family error distribution" it fits a logistic regression.
Save the predicted values and graph them against income.
library(ggplot2)
cc$plog <- predict(modlgt, type="response")
ggplot(data=cc, aes(x=income, y=card)) +
geom_point(color="red", alpha=0.2) +
geom_point(color="blue", aes(x=income,y=plog))
Compare this graph with that relating the first model's predicted values to income:
ggplot(data=cc, aes(x=income, y=card)) +
geom_point(color="red", alpha=0.2) +
geom_point(color="darkorange", aes(x=income,y=plin)) +
geom_point(color="blue", aes(x=income,y=plog))
1.1.1. Predicted values by hand
Calculate predicted values by hand (log-odds, odds and probabilities) for income = 10, 11, 60, and 61.
1.1.2. Age and health
Repeat the exercise with this alternative data file:
limit <- read.dta("https://teaching.sociology.ul.ie/so5032/limit.dta")
which has data on age and whether the respondent finds their health limits their daily activities.