Table of Contents
1. Week 3 Lab Multiple regression
1.1. Maths and Height
Load the following data:
use http://teaching.sociology.ul.ie/so5032/mathsheight.dta
This is the maths/height example considered in class. Examine the
correlations between the variables, numerically and graphically (corr
varlist and scatter
yvar xvar). Then regress maths on height: reg
maths height
. Interpret the output, and relate it to the scatter plot.
Consider controlling for year. First, compare the maths/height scatterplot across year:
scatter maths height, by(year)
What does this tell you? Does this command make it clearer?
bysort year: pwcorr height maths, sig
Then fit the regression including year as well as height as
explanatory variables: reg maths height year
. Interpret the
output.
1.2. Hours, gender and income
We will use a small extract from the British Household Panel Survey with info on hours worked, income and gender. Load it into Stata as follows:
use http://teaching.sociology.ul.ie/so5032/labs/jobhours.dta
First fit a bivariate regression using hours to predict work.
reg ofimn ojbhrs
Then do a t-test comparing income across gender:
ttest ofimn, by(osex)
Compare the results you get by regression:
gen female = osex==2 reg ofimn female
Now fit the following multiple regression:
reg ofimn ojbhrs female
Interpret the results.
Draw the two regression lines on paper.
1.3. Crime
Agresti discusses a data set containing county level information on crime. Load it into Stata as follows:
clear use http://teaching.sociology.ul.ie/so5032/labs/agresticounties.dta
Then:
- Look at the bivariate correlations (correlation or scatterplot)
- Fit bivariate regressions with crime rate as the dependent variable and each of the other variables as the independent
- Fit a regression with all the explanatory variables – interpret the findings.
Graph crime rate against education, and against income. Repeat the exercise using a marker for high urbanisation. E.g.:
gen hi_u =u>60 scatter c hs if hi_u || scatter c hs if !hi_u
(Even better, make a variable with several levels of urbanisation, to get more detail in the graph.)
What is going on in the relationship between education and crime?
1.4. Predicted values
Taking the final regression results, calculate the predicted values by hand (calculator!) for the first few cases (i.e. use their values on the independent variables). Then, after running the regression, do predict
varname to get Stata to generate predicted values. Were your calculations correct?
- Do a scatter plot of the predicted values versus the observed values
- Are the predicted values close to the real ones?
- Calculate the correlation between the predicted and observed values – relate it to the \(R^2\) from the regression