# SO5041: Lab Materials

## Sampling distributions

Links to the two sampling distribution applications:

## Confidence Intervals

Confidence intervals are bands around the point estimate (e.g. sample mean, median, proportion) for which we are reasonably sure the true population value lies. "Reasonably" often means 95% sure, or 99% sure, which is to say that respectively 95 times or 99 times out of a hundred, the true value will lie within the interval.

We calculate a CI as the point estimate (e.g. sample mean) plus or minus Z times the standard error.

The Standard Error is estimated as the sample standard deviation divided by the square root of the sample size.

Z depends on the Confidence Coefficient, and is the z score from the standard normal distribution for which 95 or 99% of the distribution is in the range -Z to +Z. For 95% we want to find the z score corresponding to a "right tail" of 0.025 (add the right and left tails to get 0.05 = 1 - 95%). For 99% we want a right tail of 0.005 (half of 1%).

A table of the standard normal distribution is available here.

1. A mean age for a sample of voters is calculated as 34.2, with a standard deviation of 10.7. The sample size is 1000.
1. Calculate the confidence interval for 95% confidence
2. Calculate the confidence interval for 99% confidence
3. Repeat the exercise assuming the sample size was actually 2000, for both confidence levels
2. In the file http://teaching.sociology.ul.ie/so5041/labs/slsextract.dta find the mean of gross earnings, and construct a 95% and a 99% confidence interval -- all the information is available through the `summarize` command.
3. With the same variable, do `ci grsearn`. This is Stata's way of calculating the confidence interval for the gross earnings variable. How do the results compare with your estimate?
4. Do `help ci` and see if you can figure out how to get the `ci` to give you a 99% confidence interval.

## Proportions

When we are constructing the CI for a proportion (e.g. percent voting yes, proportion female, percent unemployed) we have a shortcut: the standard deviation of a proportion is the square root of p times q, where q is 1 minus p (proportion voting no, or make, or not unemployed). Use that information in the following:

1. From a sample of 1600, 43% say they will vote against the EU Constitution: construct a 99% confidence interval
2. Using the data set already downloaded, calculate the proportion unemployed (include looking for first job). Construct a confidence interval around your point estimate.
3. Interpret your findings.

### How does this work?

In Stata, with the School-Leavers' Survey data, calculate the proportion who are either unemployed or looking for a first job. Using the formula, calculate the standard error and confidence interval.

Construct a new variable so that it is equal to 1 for unemployed/looking for first job, and 0 otherwise. Use the Stata `ci` command to calculate the confidence interval around it. Compare this with your result from the formula.

What do you see, and why may this happen?

Brendan Halpin
Department of Sociology,
University of Limerick
brendan.halpin@ul.ie
Last modified: Mon Oct 20 09:36:42 IST 2014