SO5032: Lab Materials

Table of Contents

1. Week 12 Lab: Multinomial and Ordinal Regression

1.1. Multinomial logistic regression

Load this BHPS excerpt (same as last week):

library(foreign)
bhpsq <- read.dta("https://teaching.sociology.ul.ie/so5032/bhpsqual.dta")
bhpsq <- subset(bhpsq, sex != "inapplicable")
bhpsq$sex <- droplevels(bhpsq$sex)

vote has four categories. Examine bivariate relationships between vote and some of the other variables, by cross-tabulating vote and any categorical variables, or looking at how the mean value of quantitative variables differ by vote.

Then search for a multinomial logistic regession that makes sense:

library(nnet)
summary(modm1 <- multinom(vote ~ age +  hten))

It is better to have a base category that is easily interpretable, and the default here will use the first. To avoid this, use the relevel() option, which will force category one as the base:

bhpsq$v2 <- relevel(bhpsq$vote, ref = "Lab")

See what happens to the parameter estimates when you change the base.

1.1.1. Hypothesis testing

Use the likelihood-ratio test for each variable, since there are three times as many parameters as usual

library(lmtest)
modm1 <- multinom(vote ~ age +  hten + eun)
modm0 <- multinom(vote ~ age +  hten)
lrtest(modm1, modm0)

Use predict(modm1) to generate predicted values (most likely outcome) and predict(modm1, type"probs")= to create predicted probabilities (note this creates one column per category of the dependend variable). How often is the most probable predicted category the same as the observed one?

1.2. Qual: a second variable

Use multinomial logistic regression to model the effects of relevant covariates on qual as the dependentf variable. Select the variables carefully: some do not make sense for predicting highest qualification.

1.3. Ordinal logistic regression

qual is an ordinal variable. In your multinomial analysis, did you observe patterns in the parameter estimates? Search for a good proportional odds ordinal logistic model with syntax such as the following:

library(MASS)
summary(modb1 <- polr(qual ~ age + sex, bhpsq))

Compare the ordinal logistic results with the multinomial results you have already produced. Do they tell the same story?

1.4. Additional: Exam performance as ordinal

Using data on exam grades, consider the variable G4.

marks <- read.dta("https://teaching.sociology.ul.ie/so5032/labs/marksdata.dta")

Fit a set of binary logistic regressions, comparing Fail, C and B respectively with A. To do this, create a new variable which is 1 for Fail etc., 0 for A and missing otherwise. Use CAO1 (CAO points divided by 100) and MODSIZE as explanatory variables.

Then fit a mutltinomial logistic regression with G4 as the dependent variable, and the same explanatory variables. Compare your results with the binary regressions.

Finally, noting the ordinal pattern in the parameter estimates in both the binary and multinomial, fit an ordinal logistic regression. Compare your results with the preceeding.

With the results of the ordinal regression, calculate the odds ratio of being higher rather than lower, for a 100-point difference in raw CAO points (a 1-unit difference in CAO1). Do the same for a 100 difference in MODSIZE.

Author: Brendan Halpin

Created: 2026-04-21 Tue 13:26

Validate