PC Labs for SO5041: Week 10

Table of Contents

Week 10 Lab: Independent sample t-test, correlation

Stata hypothesis tests

Load the last lab's data into stata as follows:

import delimited using http://teaching.sociology.ul.ie/so5041/labs/hypotest.csv

(This is an example of how to read a CSV file directly into Stata.)

First, generate a new variable that is the difference between before and after:

gen diff = after - before

Then, use the ttest command to compare this with zero:

ttest diff == 0

Interpret the output. If you still have your calculations from last week, compare with your results.

Note that you can do a paired-sample t-test in one step with the following:

ttest after == before

Hypothesis testing exam marks

Say that in last year's Leaving Cert English exam, the average mark achieved was 62.1%. A year later, the Dept of Education wants quick feedback on whether the standard has changed. A random sample of 100 scripts are assessed and marked. Their average mark is 65.2% with a standard deviation of 12.4%

Conduct a test of the hypothesis that the standard has changed, using a 95% level of confidence. Report and interpret your findings.

Independent Sample t-test in Stata

Load the following file, which contains information on gender and work hours for a UK sample (use http://teaching.sociology.ul.ie/so5041/labs/week11a). Use summaries and graphs to get a sense of the male-female differences, and then use an independent sample t-test (i.e., ttest varname, by(groupvar), replacing varname and groupvar with the names of the appropriate variables).

Comparing distributions

You can visually compare the two workhours distributions with these commands:

hist ojbhrs, by(osex)
kdensity ojbhrs if osex==1, addplot(kdensity ojbhrs if osex==2)

The distributions have quite different shapes. However, with this large sample, you will notice that running the t-test with the unequal option makes very little difference.

Testing proportions in Stata

Refer to the extract of the school-leavers' survey:

use https://teaching.sociology.ul.ie/so5041/labs/example6

According to CSO population estimates, the proportion female in the relevant age group in the population was 48.4% when this survey was collected. Calculate the proportion female in the data, and construct a confidence interval around it. Is there any evidence that the sample was drawn from a population with a different proportion? In other words, is this sample consistent with (representative of) the contemporary population?

Do this by hand first, then get Stata to do the work. The prtest is analogous to the ttest command.

gen female = sex == 2
prtest female == 0.484

Your results should be very close but not identical, as prtest does not use the normal approximation.

Comparing proportions across groups

With the same data, calculate the proportion unemployed or looking for a first job, and compare it by sex:

recode empstat 1=0 2/3=1 4/99=0, gen(ue)
tab sex ue, row

There is apparently a difference in the proportion unemployed between men and women. Do prtest, by(sex) to test this, and interpret the result.

Then run a chi-sq on the table, and compare the inferences:

tab sex ue, chi row

That is, comparing a proportion over two groups is actually creating a 2-by-2 table, and the inferences from the prtest and the chi-sq test should be the same.