Table of Contents

1. Week 3 Lab Multiple regression

1.1. Maths and Height

Load the following data:


This is the maths/height example considered yesterday. Examine the correlations between the variables, numerically and graphically (corr varlist and scatter yvar xvar). Then regress maths on height: reg maths height. Interpret the output, and relate it to the scatter plot.

Consider controlling for year. First, compare the maths/height scatterplot across year:

scatter maths height, by(year)

What does this tell you? Does this command make it clearer?

bysort year: pwcorr height maths, sig

Then fit the regression including year as well as height as explanatory variables: reg maths height year. Interpret the output.

1.2. Hours, gender and income

We will use a small extract from the British Household Panel Survey with info on hours worked, income and gender. Load it into Stata as follows:


First fit a bivariate regression using hours to predict work.

reg ofimn ojbhrs

Then do a t-test comparing income across gender:

ttest ofimn, by(osex)

Compare the results you get by regression:

gen female = osex==2
reg ofimn female

Now fit the following multiple regression:

reg ofimn ojbhrs female

Interpret the results.

Draw the two regression lines on paper.

1.3. Crime

Agresti discusses a data set containing county level information on crime. Load it into Stata as follows:



  • Look at the bivariate correlations (correlation or scatterplot)
  • Fit bivariate regressions with crime rate as the dependent variable and each of the other variables as the independent
  • Fit a regression with all the explanatory variables – interpret the findings.

Graph crime rate against education, and against income. Repeat the exercise using a marker for high urbanisation. E.g.:

gen hi_u =u>60
scatter c hs if hi_u || scatter c hs if !hi_u   

(Even better, make a variable with several levels of urbanisation, to get more detail in the graph.)

What is going on in the relationship between education and crime?

1.4. Predicted values

Taking the final regression results, calculate the predicted values by hand (calculator!) for the first few cases (i.e. use their values on the independent variables). Then, after running the regression, do predict varname to get Stata to generate predicted values. Were your calculations correct?

  • Do a scatter plot of the predicted values versus the observed values
  • Are the predicted values close to the real ones?
  • Calculate the correlation between the predicted and observed values – relate it to the \(R^2\) from the regression

Author: Brendan Halpin

Created: 2022-02-08 Tue 15:52