SO5041: Lab Materials

Week 6 Lab

Sampling distributions

Links to the two sampling distribution applications:

  1. Coin-toss example
  2. Binomial sampling simulation

Confidence Intervals

Confidence intervals are bands around the point estimate (e.g. sample mean, median, proportion) for which we are reasonably sure the true population value lies. "Reasonably" often means 95% sure, or 99% sure, which is to say that respectively 95 times or 99 times out of a hundred, the true value will lie within the interval.

We calculate a CI as the point estimate (e.g. sample mean) plus or minus Z times the standard error.

The Standard Error is estimated as the sample standard deviation divided by the square root of the sample size.

Z depends on the Confidence Coefficient, and is the z score from the standard normal distribution for which 95 or 99% of the distribution is in the range -Z to +Z. For 95% we want to find the z score corresponding to a "right tail" of 0.025 (add the right and left tails to get 0.05 = 1 - 95%). For 99% we want a right tail of 0.005 (half of 1%).

A table of the standard normal distribution is available here.

  1. A mean age for a sample of voters is calculated as 34.2, with a standard deviation of 10.7. The sample size is 1000.
    1. Calculate the confidence interval for 95% confidence
    2. Calculate the confidence interval for 99% confidence
    3. Repeat the exercise assuming the sample size was actually 2000, for both confidence levels
  2. In the file find the mean of gross earnings, and construct a 95% and a 99% confidence interval -- all the information is available through the summarize command.
  3. With the same variable, do ci grsearn. This is Stata's way of calculating the confidence interval for the gross earnings variable. How do the results compare with your estimate?
  4. Do help ci and see if you can figure out how to get the ci to give you a 99% confidence interval.


When we are constructing the CI for a proportion (e.g. percent voting yes, proportion female, percent unemployed) we have a shortcut: the standard deviation of a proportion is the square root of p times q, where q is 1 minus p (proportion voting no, or make, or not unemployed). Use that information in the following:

  1. From a sample of 1600, 43% say they will vote against the EU Constitution: construct a 99% confidence interval
  2. Using the data set already downloaded, calculate the proportion unemployed (include looking for first job). Construct a confidence interval around your point estimate.
  3. Interpret your findings.

How does this work?

In Stata, with the School-Leavers' Survey data, calculate the proportion who are either unemployed or looking for a first job. Using the formula, calculate the standard error and confidence interval.

Construct a new variable so that it is equal to 1 for unemployed/looking for first job, and 0 otherwise. Use the Stata ci command to calculate the confidence interval around it. Compare this with your result from the formula.

What do you see, and why may this happen?

Brendan Halpin
Department of Sociology,
University of Limerick
Last modified: Mon Oct 20 09:36:42 IST 2014