PC Labs for SO5041: Week 10
Table of Contents
Week 10 Lab: Independent sample t-test, correlation
Stata hypothesis tests
Load the last lab's data into stata as follows:
import delimited using http://teaching.sociology.ul.ie/so5041/labs/hypotest.csv
(This is an example of how to read a CSV file directly into Stata.)
First, generate a new variable that is the difference between before and after:
gen diff = after - before
Then, use the ttest
command to compare this with zero:
ttest diff == 0
Interpret the output. If you still have your calculations from last week, compare with your results.
Note that you can do a paired-sample t-test in one step with the following:
ttest after == before
Hypothesis testing exam marks
Say that in last year's Leaving Cert English exam, the average mark achieved was 62.1%. A year later, the Dept of Education wants quick feedback on whether the standard has changed. A random sample of 100 scripts are assessed and marked. Their average mark is 65.2% with a standard deviation of 12.4%
Conduct a test of the hypothesis that the standard has changed, using a 95% level of confidence. Report and interpret your findings.
Independent Sample t-test in Stata
Load the following file, which contains information on gender and work hours for a UK sample (use http://teaching.sociology.ul.ie/so5041/labs/week11a
). Use summaries and graphs to get a sense of the male-female differences, and then use an independent sample t-test (i.e.,
ttest varname, by(groupvar)
, replacing varname
and groupvar
with the names of the appropriate variables).
Comparing distributions
You can visually compare the two workhours distributions with these commands:
hist ojbhrs, by(osex) kdensity ojbhrs if osex==1, addplot(kdensity ojbhrs if osex==2)
The distributions have quite different shapes. However, with this large sample, you will notice that running the t-test with the unequal
option makes very little difference.
Testing proportions in Stata
Refer to the extract of the school-leavers' survey:
use https://teaching.sociology.ul.ie/so5041/labs/example6
According to CSO population estimates, the proportion female in the relevant age group in the population was 48.4% when this survey was collected. Calculate the proportion female in the data, and construct a confidence interval around it. Is there any evidence that the sample was drawn from a population with a different proportion? In other words, is this sample consistent with (representative of) the contemporary population?
Do this by hand first, then get Stata to do the work. The prtest
is analogous to the ttest
command.
gen female = sex == 2 prtest female == 0.484
Your results should be very close but not identical, as prtest
does not use the normal approximation.
Comparing proportions across groups
With the same data, calculate the proportion unemployed or looking for a first job, and compare it by sex:
recode empstat 1=0 2/3=1 4/99=0, gen(ue) tab sex ue, row
There is apparently a difference in the proportion unemployed between men and women. Do prtest, by(sex)
to test this, and interpret the result.
Then run a chi-sq on the table, and compare the inferences:
tab sex ue, chi row
That is, comparing a proportion over two groups is actually creating a 2-by-2 table, and the inferences from the prtest
and the chi-sq test should be the same.