PC Labs for Summer School: Intro

Table of Contents

1. Stata introductory session 1

1.1. Stata

Log on, and start Stata: Hit the Windows key, and type "Stata" in the search box.

The Stata window has four panels: the big one is for output, and the wide one at the bottom is for entering commands.

You can also use the mouse and menus, but we will focus on the command language.

1.2. Explore existing data

Load the lab1.dta file by entering the following command in the command window:

use http://teaching.sociology.ul.ie/ssrm/unitb0/lab1.dta

Use describe, list, tab, summarize to get an idea of what it contains.

In particular, tab and summarize let you look at how variables are distributed:

tab empstat
su grsearn

Generate bar charts of empstat by using the following commands:

graph hbar, over(empstat)

What does this command do?

graph hbar (mean) grsearn, over(lastexam)

Generate histograms of grsearn:

histogram grsearn
histogram grsearn if grsearn > 0

1.3. Generating new and recoding variables

Creating new variables is done using generate, usually shortened to gen:

gen deduct = grsearn - netearn
label variable deduct "Deductions from gross pay"
su deduct
scatter deduct grsearn

We can also recode variables. Note for instance that the 5th, 6th and 7th categories of empstat have very small numbers of cases – it is convenient to move them all into a single category. We do this by creating a copy of empstat, and recoding that.

recode empstat 5/7=7, gen(emp2)
label values emp2 empstat

Note how we can give the new variable the old variable's labels.

1.4. Bivariate analysis

Crosstabulations are achieved by tab var1 var2, and percentages are entered like tab var1 var2, col or tab var1 var2, row. Explore some pairs of categorical variables in the data file. For example:

tab emp2 sex
tab emp2 sex, col

Graphical equivalents of cross-tabulations are clustered and stacked bar charts:

graph hbar, over(sex) over(emp2)
graph hbar, over(sex) over(emp2) asyvars
graph hbar, over(sex) over(emp2) asyvars stack
graph hbar, over(sex) over(emp2) asyvars stack percentages
graph hbar, over(emp2) over(sex) asyvars stack percentages

We can compare mean income across qualifications like this:

bysort lastexam: su netearn
graph hbar (mean) netearn, over(lastexam)

Try also:

graph hbar (mean) netearn, over(sex)
graph hbar (mean) netearn, over(sex) over(lastexam)
graph box  netearn, over(lastexam)

1.5. Missing values

2. Stata introductory session 2

2.1. Using the Do-File Editor

Find the icon for the do-file editor and open it. Try running commands from it (start from scratch with the lab1.dta file, for instance). It is immediately useful when you want to enter a series of commands, e.g.:

use http://teaching.sociology.ul.ie/ssrm/unitb0/lab1.dta
recode empstat 5/7=7, gen(emp2)
label values emp2 empstat
graph hbar, over(emp2)

It can also be a good way to build up files that achieve complex tasks, like going from loading a data file, through multiple data manipulation, to producing a specific result. To try this, build up commands in the do-file editor which load the lab1.dta file, do the recode and graph the mean income in each empstat group. Run it from the do file editor (you may need to make clear the first command), and then once it works, save it. Then run it from Stata: enter the command "=do = file.do" at the Stata command line.

2.2. Logging

You can log your activities to a file. First, turn on logging:

log using mylogfile.log, replace

Continue doing analysis for a while. Then close the log:

tab emp2 sex, col
log close

The log represents a record of your activities. You can examine it in any text-editor, or in Stata like this:

view mylogfile.log

2.3. Statistical tests

Generate a confidence interval around mean net earnings

ci mean netearn

Download this spreadsheet (csv) file and load it into Stata using

import delimited using http://teaching.sociology.ul.ie/ssrm/unitb0/ttest.csv

This is paired data – carry out a paired-sample ttest like this:

ttest before == after

Then calculate the difference between the two, and carry out a one-sample ttest thus:

gen diff = after - before
ttest diff == 0

How do they compare?

To carry out an independent sample t-test, reload lab1.dta. Test whether earnings differ across gender:

use http://teaching.sociology.ul.ie/ws/lab1.dta, clear
ttest netearn, by(sex)

Finally, test for association between empstat (recoded if you like) and gender:

tab empstat sex, chi

2.4. Data Entry

There are lots of ways to enter data into Stata. See above, how we used import delimited to import a CSV file. You can also import whole Excel files, if they are in the correct structure (optional headings row, variables in columns, no extraneous material). We can also enter directly into the Stata Data Editor.

See the data in dataentry.html. Try entering it (or part of it) in Stata in the following ways:

  • Using the Data Editor
  • Copying the data to a file and using infile age sex reg maths dist using datafile.dat
  • Copying the data to a spreadsheet, saving as CSV and doing insheet age sex reg maths dist using datafile.dat

You can label it in the data editor, or better in syntax (in a do-file!):

label define malefemale 1 "Male" 2 "Female"
label values sex malefemale