# PC Labs for SO5041: Week 10

## Week 10 Lab: Independent sample t-test, correlation

### Stata hypothesis tests

Load the last lab's data into stata as follows:

```import delimited using http://teaching.sociology.ul.ie/so5041/labs/hypotest.csv
```

(This is an example of how to read a CSV file directly into Stata.)

First, generate a new variable that is the difference between before and after:

```gen diff = after - before
```

Then, use the `ttest` command to compare this with zero:

```ttest diff == 0
```

Interpret the output. If you still have your calculations from last week, compare with your results.

Note that you can do a paired-sample t-test in one step with the following:

```ttest after == before
```

### Hypothesis testing exam marks

Say that in last year's Leaving Cert English exam, the average mark achieved was 62.1%. A year later, the Dept of Education wants quick feedback on whether the standard has changed. A random sample of 100 scripts are assessed and marked. Their average mark is 65.2% with a standard deviation of 12.4%

Conduct a test of the hypothesis that the standard has changed, using a 95% level of confidence. Report and interpret your findings.

### Independent Sample t-test in Stata

Load the following file, which contains information on gender and work hours for a UK sample (`use http://teaching.sociology.ul.ie/so5041/labs/week11a`). Use summaries and graphs to get a sense of the male-female differences, and then use an independent sample t-test (i.e., `ttest varname, by(groupvar)`, replacing `varname` and `groupvar` with the names of the appropriate variables).

### Comparing distributions

You can visually compare the two workhours distributions with these commands:

```hist ojbhrs, by(osex)
kdensity ojbhrs if osex==1, addplot(kdensity ojbhrs if osex==2)
```

The distributions have quite different shapes. However, with this large sample, you will notice that running the t-test with the `unequal` option makes very little difference.

### Testing proportions in Stata

Refer to the extract of the school-leavers' survey:

```use https://teaching.sociology.ul.ie/so5041/labs/example6

```

According to CSO population estimates, the proportion female in the relevant age group in the population was 48.4% when this survey was collected. Calculate the proportion female in the data, and construct a confidence interval around it. Is there any evidence that the sample was drawn from a population with a different proportion? In other words, is this sample consistent with (representative of) the contemporary population?

Do this by hand first, then get Stata to do the work. The `prtest` is analogous to the `ttest` command.

```gen female = sex == 2
prtest female == 0.484
```

Your results should be very close but not identical, as `prtest` does not use the normal approximation.

### Comparing proportions across groups

With the same data, calculate the proportion unemployed or looking for a first job, and compare it by sex:

```recode empstat 1=0 2/3=1 4/99=0, gen(ue)
tab sex ue, row
```

There is apparently a difference in the proportion unemployed between men and women. Do `prtest, by(sex)` to test this, and interpret the result.

Then run a chi-sq on the table, and compare the inferences:

```tab sex ue, chi row
```

That is, comparing a proportion over two groups is actually creating a 2-by-2 table, and the inferences from the `prtest` and the chi-sq test should be the same.