In a previous note, I used Stata to simulate 2*2 tables of a one-off outcome. The simulation shows that odds ratios (ORs) are a much better estimate of the underlying causal effect or statistical association than relative rates are, given certain assumptions. One key assumption is that it is a one-off outcome, where it is reasonable to model the propensity for the event with a normal or logistic distribution. Where the outcome is the result of potentially repeated exposure to a risk (such as being ever married or ever infected with a particular pathogen) the resulting propensity is not likely to be normal. That is, if you are exposed to many opportunities to marry, saying yes once means you become ever-married for ever after, and even if the propensity to marry at a specific opportunity is normally distributed, the combined distribution of propensity to be ever-married after an unknown number of opportunities is likely not to be well-described as normal.
A frequent theme in the medical statistics and epidemiological literature is that odds ratios (ORs) as effect measures for binary outcomes are counter intuitive and an impediment to understanding. Barros and Hirakata (2003), for instance, refer to the relative rate as the “measure of choice” and complain that the OR will “overestimate” the RR as the baseline probability rises. Clearly, ORs are less intuitive than relative rates (RRs), but in this note I take issue with the conclusion sometimes made, that models with relative-rate interpretations should be used instead of logistic regression and other OR models. This is because RRs are not measures of the size of the statistical association between a variable and an outcome since they also vary inversely with the baseline probability), and because, under certain assumptions, ORs and related measures are. That is, RRs may feel more real but they are likely to be misleading.
gnuplot and Stata to generate a heatmap representation of a square matrix containing a measure of closeness between 26 departments in a university.
gnuplot is a general-purpose plotting program, and can be wheedled into doing a lot of things, but Stata’s graphics routines are also very general. Given data in i, j, n format (in blocks, that is with a blank line inserted before every change of value of i),
gnuplot can generate a heatmap with code like the following: