PC Labs for SO5041: Week 6

Table of Contents

Week 6: Sampling distributions and confidence intervals

Sampling distributions

Links to the sampling distribution applications:

Confidence Intervals

Confidence intervals are bands around the point estimate (e.g. sample mean, median, proportion) for which we are reasonably confident the true population value lies. "Reasonably" often means 95% confident, or 99% confident, which is to say that respectively 95 times or 99 times out of a hundred, the true value will lie within the interval.

We calculate a CI as the point estimate (e.g. sample mean) plus or minus Z times the standard error.

The Standard Error is estimated as the sample standard deviation divided by the square root of the sample size.

Z depends on the Confidence Coefficient, and is the z score from the standard normal distribution for which 95 or 99% of the distribution is in the range -Z to +Z. For 95% we want to find the z score corresponding to a "right tail" of 0.025 (add the right and left tails to get 0.05 = 1 - 95%). For 99% we want a right tail of 0.005 (half of 1%).

A table of the standard normal distribution is available here. See also the online calculator.

  1. Mean age for a sample of voters is calculated as 34.2, with a standard deviation of 10.7. The sample size is 1000:
    • Calculate the confidence interval for 95% confidence
    • Calculate the confidence interval for 99% confidence
    • Repeat the exercise assuming the sample size was actually 2000, for both confidence levels
  2. Load the slsextract.dta file as follows:
library(foreign)
sls <- read.dta("https://teaching.sociology.ul.ie/so5041/slsextract.dta")

Find the mean of gross earnings, and construct a 95% and a 99% confidence interval – all the information is available through the mean(), sd() and nrow() functions.

  1. With the same variable, enter this code t.test(sls$grsearn)$conf.int. This is a way of getting R to calculate the confidence interval for the gross earnings variable in a single command (we'll cover t-tests next week).

    How do the results compare with your estimate?

  2. Do "?t.test" in RStudio to get the help on this command. See if you can figure out how to get a 99% confidence interval.

Proportions

When we are constructing the CI for a proportion (e.g. percent voting yes, proportion female, percent unemployed) we have a shortcut: the standard deviation of a proportion is the square root of p times q, where q is 1 minus p (proportion voting no, or make, or not unemployed). Use that information in the following:

  1. From a sample of 1600, 43% say they will vote against the EU Constitution: construct a 99% confidence interval
  2. Using the data set already downloaded, calculate the proportion unemployed (include looking for first job). Construct a confidence interval around your point estimate.
  3. Interpret your findings.