# PC Labs for SO5041: Week 3

## Week 3 Lab: Data entry

### Data collected in class

In previous years, I collected information from undergraduates in class. Here it is represented numerically:

```Age Sex Reg  Maths Dist

37   0     2   3   27
29   1     5   3    2
22   0     7   2   -9
20   1     1   2    1
20   1     1   2    6
20   1     1   3   24
21   1     2   2    0
21   1     2   2    0
20   1     7   2    0
22   0     1   3    1
20   0     2   3   25
20   1     3   3    1
20   1     5   2    5
20   1     3   3   75
21   0     3   3   20
21   1     1   3    5
21   1     1   3    7
21   0     1   3    3
19   0     1   2   40
19   1     2   3    5
```

Sex is coded with

• 0: Male
• 1: Female

`Reg` is region of birth and is coded with

1. Munster
2. Leinster
3. Connacht-ulster
4. NI
5. GB
6. EU
7. US
8. Other

`Dist` is distance of home from UL, in miles.

`Maths` is last maths done, coded with

1. Post LC
2. LC(h)
3. LC(o)
4. JC(h)
5. JC(o)
6. Lower

### What to do?

Enter this data into Stata.

Steps:

• First run Stata
• The easiest way to enter data is in the Data Editor. To open this click on the spreadsheet/pencil icon in the toolbar or enter the text command `edit` and you can simply enter the data, like in a spreadsheet. Enter the data in the same format as above, column for column.
• When you have finished, do `File -> Exit` to return to the main Stata window, and examine what you have.
• You have a pretty anonymous data set, with variable names like `var1` and unlabeled numbers. Go back to the data editor, and right-click on the `var1` cell, and click on the `Variable properties` option. This allows you to change the name of the variable (names are short, one word, so change "var1" perhaps to "age") and to give it a more descriptive label (e.g., "Age in Years").
• For variables such as sex, the numbers are attached to categories arbitrarily, so it is good to label the numbers also. To do this, first create a label as a command:
```label define mf 0 "Male" 1 "Female"
```

and then attach it to the variable:

```label values sex mf
```

You can do this through the windows interface too, but it's more complicated to explain!

• Where there are lots of labels you may need more space: you can add values to an existing set of labels by using the `add` option as in the second and third lines here:
```label define region 1 "Munster" 2 "Leinster"
label define region 3 "Connacht-Ulster" 4 "NI", add
label define region 5 "GB" 6 "EU", add
```

### Explore

Use `describe`, `summarize`, `list` and `tabulate` to examine the data.

Be clear about which of `tabulate` and `summarize` is appropriate for the different variables. Tabulation is usually helpful where there are relatively few distinct values, and summaries such as means and standard deviations are only meaningful for interval and ratio variables.