PC Labs for SO5041: Week 3

Table of Contents

1 Week 3 Lab: Data entry

1.1 Data collected in class

In previous years, I collected information from undergraduates in class. Here it is represented numerically:

Age Sex Reg  Maths Dist

 37   0     2   3   27
 29   1     5   3    2
 22   0     7   2   -9
 20   1     1   2    1
 20   1     1   2    6
 20   1     1   3   24
 21   1     2   2    0
 21   1     2   2    0
 20   1     7   2    0
 22   0     1   3    1
 20   0     2   3   25
 20   1     3   3    1
 20   1     5   2    5
 20   1     3   3   75
 21   0     3   3   20
 21   1     1   3    5
 21   1     1   3    7
 21   0     1   3    3
 19   0     1   2   40
 19   1     2   3    5

Sex is coded with

  • 0: Male
  • 1: Female

Reg is region of birth and is coded with

  1. Munster
  2. Leinster
  3. Connacht-ulster
  4. NI
  5. GB
  6. EU
  7. US
  8. Other

Dist is distance of home from UL, in miles.

Maths is last maths done, coded with

  1. Post LC
  2. LC(h)
  3. LC(o)
  4. JC(h)
  5. JC(o)
  6. Lower

1.2 What to do?

Enter this data into Stata.

Steps:

  • First run Stata (from the Start menu, under Programs -> Specialist Software).
  • The easiest way to enter data is in the Data Editor. To open this click on the spreadsheet/pencil icon editdata.png in the toolbar or enter the text command edit and you can simply enter the data, like in a spreadsheet. Enter the data in the same format as above, column for column.
    • When you have finished, do File -> Exit to return to the main Stata window, and examine what you have.
    • You have a pretty anonymous data set, with variable names like var1 and unlabeled numbers. Go back to the data editor, and right-click on the var1 cell, and click on the Variable properties option. This allows you to change the name of the variable (names are short, one word, so change "var1" perhaps to "age") and to give it a more descriptive label (e.g., "Age in Years").
    • For variables such as sex, the numbers are attached to categories arbitrarily, so it is good to label the numbers also. To do this, first create a label as a command:
label define mf 0 "Male" 1 "Female"

and then attach it to the variable:

label values sex mf

You can do this through the windows interface too, but it's more complicated to explain!

  • Where there are lots of labels you may need more space: you can add values to an existing set of labels by using the add option as in the second and third lines here:
label define region 1 "Munster" 2 "Leinster"
label define region 3 "Connacht-Ulster" 4 "NI", add
label define region 5 "GB" 6 "EU", add

1.3 Explore

Use describe, summarize, list and tabulate to examine the data.

Be clear about which of tabulate and summarize is appropriate for the different variables. Tabulation is usually helpful where there are relatively few distinct values, and summaries such as means and standard deviations are only meaningful for interval and ratio variables.