Table of Contents

1 Week 3 Lab Multiple regression

1.1 Hours, gender and income

We will use a small extract from the British Household Panel Survey with info on hours worked, income and gender. Load it into Stata as follows:


First fit a bivariate regression using hours to predict work.

reg ofimn ojbhrs

Then do a t-test comparing income across gender:

ttest ofimn, by(osex)

Compare the results you get by regression:

gen female = osex==2
reg ofimn female

Now fit the following multiple regression:

reg ofimn ojbhrs female

Interpret the results.

Draw the two regression lines on paper.

1.2 Crime

Agresti discusses a data set containing county level information on crime. Load it into Stata as follows:



  • Look at the bivariate correlations (correlation or scatterplot)
  • Fit bivariate regressions with crime rate as the dependent variable and each of the other variables as the independent
  • Fit a regression with all the explanatory variables – interpret the findings.

Graph crime rate against education, and against income. Repeat the exercise using a marker for high urbanisation. E.g.:

gen hi_u =u>60
scatter c hs if hi_u || scatter c hs if !hi_u   

(Even better, make a variable with several levels of urbanisation, to get more detail in the graph.)

What is going on in the relationship between education and crime?

1.3 Predicted values

Taking the final regression results, calculate the predicted values by hand (calculator!) for the first few cases (i.e. use their values on the independent variables). Then, after running the regression, do predict varname to get Stata to generate predicted values. Were your calculations correct?

  • Do a scatter plot of the predicted values versus the observed values
  • Are the predicted values close to the real ones?
  • Calculate the correlation between the predicted and observed values – relate it to the \(R^2\) from the regression

Author: brendan

Created: 2021-02-09 Tue 15:27