Week 2 PC Lab

In previous years, I collected information from students in class. Here it is represented numerically:

Age Sex Reg  Maths Dist

 37   0     2   3   27
 29   1     5   3    2
 22   0     7   2   -9
 20   1     1   2    1
 20   1     1   2    6
 20   1     1   3   24
 21   1     2   2    0
 21   1     2   2    0
 20   1     7   2    0
 22   0     1   3    1
 20   0     2   3   25
 20   1     3   3    1
 20   1     5   2    5
 20   1     3   3   75
 21   0     3   3   20
 21   1     1   3    5
 21   1     1   3    7
 21   0     1   3    3
 19   0     1   2   40
 19   1     2   3    5

Sex is coded with

Reg is region of birth and is coded with

  1. Munster
  2. Leinster
  3. Connacht-Ulster
  4. NI
  5. GB
  6. EU
  7. US
  8. Other

Dist is distance of home from UL, in miles.

Maths is last maths done, coded with

  1. Post LC
  2. LC(h)
  3. LC(o)
  4. JC(h)
  5. JC(o)
  6. Lower

What to do

Enter this data into Stata.

Steps:

Looking at the data

What can we do with the data? Descriptive statistics! For variables whose numbers represent categories, tabulate them, e.g.,

tab region

For variables where the number itself is meaningful, get the mean:

su distance

We will be covering the mean, median, standard deviation, etc in the coming weeks.

Graphics

The graphical equivalent of a frequency table (e.g., tab region) is a bar chart or a pie chart. Pie charts are easy:

graph pie, over(region)

Bar charts require a trick, which involves creating a dummy variable using generate:

gen x = 1
graph bar (count) x, over(region)

For variables where the numbers are meaningful, particularly when there are many different values, the histogram is the analogue of the summarize command:

histogram dist

Graphics tip

If you're interested in finding out more about Stata graphics, there are lots of examples on the Stata website.


Brendan Halpin
Department of Sociology, University of Limerick
F1-002, x 3147; brendan.halpin@ul.ie