PC Labs for SO5041: Week 11
Table of Contents
1 Week 11 Lab: Regression
1.1 Correlation
The following data file contains six pairs of
variables, X1
and Y1
, X2
and
Y2
etc.
use http://teaching.sociology.ul.ie/so5041/labs/correl
First, graph all six pairs in scatterplots.
Also graph X1
with Y2
. What sort of
association do you see in each case (positive, negative, none,
strong, weak)? Make a guess what the value of the correlation
coefficient might be (write it down).
For each graph, get the correlation coefficient: e.g., corr x1 y1
. How do the reported correlation
coefficients correspond with those you guessed?
1.2 Correlations with real data
With the following data file (use http://teaching.sociology.ul.ie/so5041/ocorr
), explore the
correlations between the variables it contains, graphically and with the
correlation coefficient.
1.3 Linear Regression
Do sysuse nlsw88
to load the National Longitudinal Study of Women data
set that comes with Stata. Look at wage
, the hourly wage rate. Predict
wage
using grade
:
reg wage grade
Write out the Y = a + bx
equation. Calculate the predicted value for
grade=0
and grade=20
, and draw the line on a graph (on paper).
1.4 R-squared
Considering the following list of variables:
age
ttl_exp
, total lifetime work experiencetenure
, tenure in current jobgrade
, years of educationunion
, whether a member of a union
Let's consider wage as the "dependent variable", to be explained by the others (ignoring union for the moment as it only has two values). Create scatterplots for wage (on the Y-axis) compared with each of the other variables. Consider the correlations too (e.g., =corr age wage =Can you see much of a relationship?
Now do regression analyses: reg wage
varname,
with each of the other variables one at a time as the
independent. There are two things to look at: the
R2 figure and the parameter estimate (B for the
independent variable, along with its significance). Which variables
affect wage much? Do any not affect it at all?
Interpret the results: in each case ask the question, "what happens to the predicted value of income, if the value of X were to change by one unit?". For two different values of the independent variable (X) calculate the predicted value of income – see where these fall on the scatterplot, and see where the regression line would lie. Does it seem like a good summary of the relationship?
If R2 is big, the independent variable "explains" the dependent variable "a lot". However, it is possible for R2 to be small and yet for the independent variable to a systematic effect (i.e. very low p-value for significance): this independent variable may be only one thing among many that affect the dependent variable.
1.4.1 Union effects
Test the effect of union
on wage. Use a t-test in
the first instance, and then fit a regression. Compare the results.
Do the same relating grade
to union
. Note
that unionised workers tend to earn more and be better educated. Could
it be that the union effect is simply due to them being better educated?
That is, for workers with similar education does union status matter?
Fit the wage/grade regression
for unionised and non-unionised workers separately, and think about the results (make
scatterplots too): do reg wage grade if union==0
, reg
wage grade if union==1
.
1.4.2 Two explanatory variables
You can also fit a model with both union status and grage explaining
wage. Fit a regression with both grade
and union
as explanatory variables. Interpret the
parameter estimates.
Draw the regression lines for union members and non-union members.
Compare your results to the previous separate regressions, and the t-test.