PC Labs for SO5041: Week 3
Table of Contents
Week 3 Lab: Data entry
Data collected in class
In previous years, I collected information from undergraduates in class. Here it is represented numerically:
Age Sex Reg Maths Dist 37 0 2 3 27 29 1 5 3 2 22 0 7 2 -9 20 1 1 2 1 20 1 1 2 6 20 1 1 3 24 21 1 2 2 0 21 1 2 2 0 20 1 7 2 0 22 0 1 3 1 20 0 2 3 25 20 1 3 3 1 20 1 5 2 5 20 1 3 3 75 21 0 3 3 20 21 1 1 3 5 21 1 1 3 7 21 0 1 3 3 19 0 1 2 40 19 1 2 3 5
Sex is coded with
- 0: Male
- 1: Female
Reg
is region of birth and is coded with
- Munster
- Leinster
- Connacht-ulster
- NI
- GB
- EU
- US
- Other
Dist
is distance of home from UL, in miles.
Maths
is last maths done, coded with
- Post LC
- LC(h)
- LC(o)
- JC(h)
- JC(o)
- Lower
What to do?
Enter this data into Stata.
Steps:
- First run Stata
The easiest way to enter data is in the Data Editor. To open this click on the spreadsheet/pencil icon in the toolbar or enter the text command
edit
and you can simply enter the data, like in a spreadsheet. Enter the data in the same format as above, column for column.NB shortcut:
do https://teaching.sociology.ul.ie/so5041/week3data.do
- When you have finished, do
File -> Exit
to return to the main Stata window, and examine what you have. - You have a pretty anonymous data set, with variable names like
var1
and unlabeled numbers. Go back to the data editor, and right-click on thevar1
cell, and click on theVariable properties
option. This allows you to change the name of the variable (names are short, one word, so change "var1" perhaps to "age") and to give it a more descriptive label (e.g., "Age in Years"). - For variables such as sex, the numbers are attached to categories arbitrarily, so it is good to label the numbers also. To do this, first create a label as a command:
- When you have finished, do
label define mf 0 "Male" 1 "Female"
and then attach it to the variable:
label values sex mf
You can do this through the windows interface too, but it's more complicated to explain!
- Where there are lots of labels you may need more space: you can add values to an existing set of labels by using the
add
option as in the second and third lines here:
label define region 1 "Munster" 2 "Leinster" label define region 3 "Connacht-Ulster" 4 "NI", add label define region 5 "GB" 6 "EU", add
Explore
Use describe
, summarize
, list
and tabulate
to examine the data.
Be clear about which of tabulate
and summarize
is appropriate for
the different variables. Tabulation is usually helpful where there are
relatively few distinct values, and summaries such as means and
standard deviations are only meaningful for interval and ratio
variables.