SO5032: Lab Materials

Table of Contents

1. Week 9 Lab

1.1. Logistic Regression

This file contains Stata code to run the credit card example in A&F:

library(foreign)
cc <- read.table("https://teaching.sociology.ul.ie/so5032/creditcard.dat", col.names=c("income", "card"))

Execute the code to create the working data file, and examine the relationship between having a credit card and income (compare means, group income and cross tabulate, etc.). The variable card records whether a credit card is held, and income is the explanatory variable.

Now fit the linear probability model and interpret it:

modlpm <- lm(data=cc, card ~ income)

Save the predicted values:

cc$plin <- predict(modlpm)

Examine them, e.g., with scatterplots, histograms.

Now fit the logistic regression, and interpret it:

modlgt <- glm(data=cc, card ~ income, family="binomial")

Note that the glm() function fits lots of different models, but if you tell it to fit a "binomial-family error distribution" it fits a logistic regression.

Save the predicted values and graph them against income.

library(ggplot2)
cc$plog <- predict(modlgt, type="response")
ggplot(data=cc, aes(x=income, y=card)) +
    geom_point(color="red", alpha=0.2) +
    geom_point(color="blue", aes(x=income,y=plog))

Compare this graph with that relating the first model's predicted values to income:

ggplot(data=cc, aes(x=income, y=card)) +
    geom_point(color="red", alpha=0.2) +
    geom_point(color="darkorange", aes(x=income,y=plin)) +
    geom_point(color="blue", aes(x=income,y=plog))

1.1.1. Predicted values by hand

Calculate predicted values by hand (log-odds, odds and probabilities) for income = 10, 11, 60, and 61.

1.1.2. Age and health

Repeat the exercise with this alternative data file:

limit <- read.dta("https://teaching.sociology.ul.ie/so5032/limit.dta")

which has data on age and whether the respondent finds their health limits their daily activities.

Author: Brendan Halpin

Created: 2026-03-24 Tue 11:46

Validate