// Feb 20 2011 21:07:20 // Simulate an infection passing through a population // 100 cases are exposed to the pathogen in each time period // 2 groups, with different probability of infection at exposure // (iff not already infected, hence these are hazard rates not probabilities) // Note that the time-periods need not be of equal length, but that if // they are, the underlying hazard is fixed. /* What this demonstrates is that where the binary outcome is due to exposure over time, rather than a one-off outcome, the OR and the RR are defective. Instead a hazard rate is desired, and it happens that the cloglog link in glm will give it. I'm not sure what to think about a hazard model that doesn't know anything about the time of event. A 1-unit change in x leads to a B change in log(-log(1-p)) exp(B) in -log(1-p) exp(-exp(B)) in 1-p if we reverse code, it is e(-e(B)) change in p */ postfile infect run t infnsm infsmo relrate OR clog using infect, replace every(1) // Run the simulation 10 times forvalues r = 1/10 { // Create the simulated population clear set obs 5000 gen smoker = _n>_N/2 gen infected = 0 gen randso = 0 // tinf is time of infection -- if this is available proper hazard models are possible gen tinf = 0 // At each of 200 steps ... forvalues x = 1/200 { replace randso = uniform() sort randso // ... expose a random 100 cases to the pathogen replace tinf = `x' if infected == 0 & _n<=100 & !smoker & uniform()<0.25 replace tinf = `x' if infected == 0 & _n<=100 & smoker & uniform()<0.50 replace infected = 1 if tinf == `x' // Calculate cloglog parameter, OR and RR qui tab smoker infected, matcell(tabdat) qui cloglog infected i.smoker matrix cll = e(b) scalar cll2 = cll[1,2] scalar infsmo = tabdat[2,2] scalar infnsm = tabdat[1,2] scalar relrate = (tabdat[2,2]/(tabdat[2,1]+tabdat[2,2]))/(tabdat[1,2]/(tabdat[1,1]+tabdat[1,2])) scalar OR = (tabdat[2,2]/(tabdat[2,1] ))/(tabdat[1,2]/(tabdat[1,1] )) capture post infect (`r') (`x') (infnsm) (infsmo) (relrate) (OR) (cll2) } save infhaz`r', replace } postclose infect