# PC Labs for SO5041: Week 10

## Table of Contents

## Week 10 Lab: Independent sample t-test, correlation

### Stata hypothesis tests

Load the last lab's data into stata as follows:

import delimited using http://teaching.sociology.ul.ie/so5041/labs/hypotest.csv

(This is an example of how to read a CSV file directly into Stata.)

First, generate a new variable that is the difference between before and after:

gen diff = after - before

Then, use the `ttest`

command to compare this with zero:

ttest diff == 0

Interpret the output. If you still have your calculations from last week, compare with your results.

Note that you can do a paired-sample t-test in one step with the following:

ttest after == before

### Hypothesis testing exam marks

Say that in last year's Leaving Cert English exam, the average mark achieved was 62.1%. A year later, the Dept of Education wants quick feedback on whether the standard has changed. A random sample of 100 scripts are assessed and marked. Their average mark is 65.2% with a standard deviation of 12.4%

Conduct a test of the hypothesis that the standard has changed, using a 95% level of confidence. Report and interpret your findings.

### Independent Sample t-test in Stata

Load the following file, which contains information on gender and work hours for a UK sample (`use http://teaching.sociology.ul.ie/so5041/labs/week11a`

). Use summaries and graphs to get a sense of the male-female differences, and then use an independent sample t-test (i.e.,
`ttest varname, by(groupvar)`

, replacing `varname`

and `groupvar`

with the names of the appropriate variables).

### Comparing distributions

You can visually compare the two workhours distributions with these commands:

hist ojbhrs, by(osex) kdensity ojbhrs if osex==1, addplot(kdensity ojbhrs if osex==2)

The distributions have quite different shapes. However, with this large sample, you will notice that running the t-test with the `unequal`

option makes very little difference.

### Testing proportions in Stata

Refer to the extract of the school-leavers' survey:

use https://teaching.sociology.ul.ie/so5041/labs/example6

According to CSO population estimates, the proportion female in the relevant age group in the population was 48.4% when this survey was collected. Calculate the proportion female in the data, and construct a confidence interval around it. Is there any evidence that the sample was drawn from a population with a different proportion? In other words, is this sample consistent with (representative of) the contemporary population?

Do this by hand first, then get Stata to do the work. The `prtest`

is analogous to the `ttest`

command.

gen female = sex == 2 prtest female == 0.484

Your results should be very close but not identical, as `prtest`

does not use the normal approximation.

### Comparing proportions across groups

With the same data, calculate the proportion unemployed or looking for a first job, and compare it by sex:

recode empstat 1=0 2/3=1 4/99=0, gen(ue) tab sex ue, row

There is apparently a difference in the proportion unemployed between men and women. Do `prtest, by(sex)`

to test this, and interpret the result.

Then run a chi-sq on the table, and compare the inferences:

tab sex ue, chi row

That is, comparing a proportion over two groups is actually creating a 2-by-2 table, and the inferences from the `prtest`

and the chi-sq test should be the same.