Tables and formula are available here. These will be available during the exam. For today's purposes they include the tables of the Chi-squared distribution, and of Student's t Distribution. Some of the formulas will be covered in subsequent lectures.

As we have seen, spreadsheets are very useful for manipulating data and presenting numbers. Today we use a spreadsheet to manipulate tables, and to calculate expected values and the chi-squared statistic.

The following table is available here as
a spreadsheet.
Fill in the missing row, column and grand
totals, using the `=sum()`

function, marking the relevant
cells with the mouse. For example, type in the first row, go to
where you want the row total to be, enter `=sum(`

, mark
the row cells with the mouse, enter `)`

and press
return.

Male | Female | Total | |

Employed | 388 | 380 | 768 |

Unemployed | 67 | 46 | 113 |

Looking for 1st job | 170 | 151 | 321 |

Student | 471 | 490 | 961 |

Other | 8 | 26 | 34 |

Total | 1104 | 1093 | 2197 |

Another tip: rather than entering the same formula repeatedly,
you can copy it -- for instance, if you copy the row 1 total
formula down one, it now totals row 2. Formulas by default use
*relative references* to other cells. For instance, a
formula in B1 referring to A1 is really referring to "one cell to
the left", so if we copy it to C23 the new formula refers to B23.

Copy the entire table to a nearby location and delete the
numbers in the body. Calculate the row proportion (percentage) in each
cell by using a formula that divides the corresponding cell in the
original table by the row total. If you want to copy this formula
from the "male" column to the "female" one,
you need to "de-relativise" it a bit because the position of the
row total does not move. To do this just put a `$`

in
the reference to the row total: for instance, `=C4/$E4`

.
If you copy this one cell right it will become `=D4/$E4`

.

To get percentages, you can either multiply by 100, or set the
format (`Format -> Cells -> Percentage`

).

If there is no association between two variables, we would
expect the row percentages in each row to follow more or less the
same pattern as the percentages across the column totals (also true
for column percentages). We can calculate the *expected
values* therefore, from the column and row totals. One way is
for each cell to multiply the column total percentage by the row
total number. Another way is to multiply the row total by the
column total and divide by the grand total.

Copy the table again (for simplicity, perhaps key in the totals). For each cell replace the observed value with a formula for the expected value. Can you see big differences between the observed and expected values?

Calculate percentages based on the expected values, and verify that they are the same column by column, row by row.

A standard test for association in tables is called the
chi-squared test. This involves calculating for each cell the
quantity *(O-E) ^{2}/E* where

Calculate the chi-squared statistic and assess whether there seems to be association or not.

The table above is drawn from the School Leavers' Survey, a subset of
which is available here (or do `use http://teaching.sociology.ul.ie/so5041/labs/example6`

). Download and load
into Stata; recode the `empstat`

variable to match the table
and re-create the crosstab, with the chi-squared statistic (hint: the
option `chi` after a comma). Compare Stata's value to that you
calculated by hand.

Using the tables, calculate the t-score for the 90, 95 and 99% levels for:

- A large sample (use normal distribution)
- A sample of 50 (use t-distribution)
- A sample of 25 (use t-distribution)
- A sample of 4 (use t-distribution)

Make a table of your results -- what patterns do you see?

Suppose you have been given information from a sample: mean age is 45 years, standard deviation 14 and sample size 50. Make a 95% confidence interval around the mean age, first using the normal distribution. Then repeat the exercise using the t-distribution. How do the two confidence intervals differ and which should you use?

Suppose someone has claimed that the average age in this population is 40 years -- what does your data say about this claim?

Brendan Halpin

Department of Sociology, University of Limerick

F2-025, x 3147; brendan.halpin@ul.ie Last modified: Mon Nov 5 11:49:09 GMT 2012