SO5032: Lab Materials

Table of Contents

1. Week 4 Lab

1.1. Mental health

This file contains code that relates a mental impairment score to SES (socioeconomic status) and a negative life-events score. Run it as follows:


Fit the regression model predicting impairment from the other two variables. Interpret the model.

1.2. Predicted values

Taking the regression results, calculate the predicted values by hand (calculator!) for the first few cases (i.e. use their values on the independent variables). Then, after running the regression, do predict var to get Stata to generate predicted values. Were your calculations correct?

  • Do a scatter plot of the predicted values versus the observed values
  • Are the predicted values close to the real ones?
  • Calculate the correlation between the predicted and observed values – relate it to the R2 from the regression

1.3. Hypothesis tests

With the model containing the two explanatory variables, carry out hypothesis tests on the conditional effects of the two variables. Can you reject the null hypothesis in either case?

1.4. Adjusted R2

F-tests can be used to globally test a model, and also do compare two models, one with extra variables. An approximate but quicker way to do this is to look at Adjusted R2, which is R2 scaled to take account of the number of cases and number of parameters, in a calculation similar to that for the F-statistic. Adjusted R2 can fall as variables are added to the model, unlike R2, if their contribution is insignificant.

1.5. F-tests

Stata's regression output presents the result of an F-test against the null model (top-right of output) but doesn't do incremental F-tests. A handy add-on for this can be installed using

ssc install ftest

Using it means you need to fit a model, store its details, fit another and compare the two:

use, clear
reg c u
estimates store urban
reg c u i hs
ftest urban

Interpret that result, and compare it with the result of testing reg c i hs and reg c i hs u.

1.6. Note: Dummy variables

If you have a categorical explanatory variable, you can enter it as a set of n-1 "dummy" variables, where n is the number of values. A dummy variable is a variable taking the values 0 and 1, indicating that the original variable takes the appropriate value:

Original d1 d2 d3
1 1 0 0
2 0 1 0
3 0 0 1
4 0 0 0

In this example, the original value takes the values 1 to 4. There are three dummy variables, d1 to d3, taking the values 0 and 1, each corresponding to one value of the original variable. For value 4 of the original variable, all three dummy variables have the value 0. Once the dummy variables are entered in a regression analysis, the interpretation of their parameter estimates is the effect on the dependent variable of being in this category compared with category 4.

You can create dummy variables easily in Stata:

sysuse nlsw88, clear
tab occupation, gen(docc)

However, you don't need to. You can simply use "factor notation": reg wage ttl_exp i.occupation. This gives the same result as putting in d2 and d3. Try both ways to satisfy yourself.

Author: Brendan Halpin

Created: 2022-02-21 Mon 11:12