Updated, Nov 6, 2020

Q 10 Kenya Adult Circumcision Trial

To replicate the numbers in the Figure, we first look at what the authors say they did.

[You are also encouraged to read the other sections, as well as te full Statistical Analysis sction.]

Interestingly, the target sample size calculation assuming a 15% non-informative loss-to-follow-up, 5% non-adherence to treatment assignment in either direction, and 2.5 per 100 person-years annual HIV seroincidence in the control group.

The Kaplan-Meier method was used to estimate the HIV event distribution over time by treatment, accounting for staggered enrolment and incomplete, discrete follow-up. The time of HIV-positive status was credited to the follow-up visit when HIV was first detected. HIV-negative participants were censored in the analysis at the last regular follow-up visit completed where HIV status was ascertained. Estimates of 2-year HIV seroincidences and corresponding standard errors obtained by Greenwood’s formula were used to test for differences between the treatments on the primary outcome (HIV seroconversion). The primary analysis was by intention-to-treat; participants were included in the analysis in the group to which they were randomly assigned and all participants with follow-up for HIV status were included in the analysis.

Thus, for the cumulative incidences, and their difference:

ds=read.table("http://www.biostat.mcgill.ca/hanley/c634/HIV/KenyaTrialData.txt",header=TRUE)

# re-arrange so both data for both arms are in same row.

data=data.frame(
  cbind(ds[seq(1,11,2),c(1,2,4,5)],
        ds[seq(2,12,2),4:5]))

names(data)=c("t.0","t.end","n.C","n.HIV.C", "n.NotC","n.HIV.NotC")
data
##    t.0 t.end  n.C n.HIV.C n.NotC n.HIV.NotC
## 1    0     1 1367       4   1380          1
## 3    1     3 1351       2   1368          3
## 5    3     6 1323       5   1350          9
## 7    6    12 1287       3   1302         18
## 9   12    18 1029       0   1039          7
## 11  18    24  764       8    740          9
# calculations...  

#  C and NotC : shorthand for Circumcised and Not Circumcised)
#  p (conditional, window-specific) probability of seroconversion
#  S (unconditional) probability of remaining seronegative
#  CI cumulative incidence
#  SE Greenwood Standard Error for S-hat

data$p.C = (data$n.C - data$n.HIV.C)/data$n.C #condn'l pr(Staying HIV-)
data$p.NotC = (data$n.NotC - data$n.HIV.NotC)/data$n.NotC #condn'l pr(Staying HIV-)

data$S.C    = cumprod(data$p.C); # un-condn'l prob(Staying HIV-)
data$S.NotC = cumprod(data$p.NotC); # un-condn'l prob(Staying HIV-)

( data$CI.C = 100*(1 - data$S.C)       ) ; # Cum.Incidence (%) of HIV+
## [1] 0.2926116 0.4402169 0.8164821 1.0476791 1.0476791 2.0838291
( data$CI.NotC = 100*(1 - data$S.NotC) ) ; # Cum.Incidence (%) of HIV+
## [1] 0.07246377 0.29160310 0.95632575 2.32559313 2.98364977 4.16357836
data$V.log.p.C = (1/data$n.C)*(1-data$p.C)/data$p.C #  Greenwood SE for S or CI=1-S
data$V.log.p.NotC = (1/data$n.NotC)*(1-data$p.NotC)/data$p.C # ditto, NotC

( data$SE.S.C =round(100 * data$S.C * sqrt(cumsum(data$V.log.p.C)),2) ) # Greenwood SE(%S)
## [1] 0.15 0.18 0.25 0.28 0.28 0.46
( data$SE.S.NotC =round(100 * data$S.NotC * sqrt(cumsum(data$V.log.p.NotC)),2) ) # SE(%S)=SE(%CI)
## [1] 0.07 0.15 0.26 0.41 0.48 0.61
Q = qnorm(c(0.025,0.975))

( round(data$CI.C[6] + Q * data$SE.S.C[6],1) ) #  95% CI for cum.inc
## [1] 1.2 3.0
( round(data$CI.NotC[6] + Q * data$SE.S.NotC[6],1) )  # likewise
## [1] 3.0 5.4
# z statistic for difference in cumulative incidence

SE.diff = sqrt( data$SE.S.C[6]^2 + data$SE.S.NotC[6]^2 ) 
  
( round( ( data$CI.C[6] - data$CI.NotC[6] ) /  SE.diff, 2) )
## [1] -2.72
# IT MATTERS THAT YOU ROUND NUMBERS APPROPRIATELY. AND THINK OF END-USERS

A secondary analysis, that used the same statistical approach described above, excluded participants subsequently confirmed as HIV positive by PCR at baseline, and one further analysis excluded those confirmed positive at either baseline or at 1 month.

This is because an HIV test can still be negative for a while after someone is infected.

Furthermore, an as-treated analysis was done with a time-dependent covariate in a Cox regression model for circumcision status at each follow-up visit to take into account those individuals who did not adhere to their randomisation assignment; in this analysis, a time-dependent variable for the circumcision status of each participant at each follow-up visit was constructed and included as a single time-dependent predictor variable in a Cox regression model with all participants. Thus, irrespective of treatment assignment, participants were accounted in this analysis as they were treated with respect to circumcision. Cox regression models with fixed covariates were used to consider various baseline adjustments to the treatment effect. Age-group and variables that seemed to be slightly imbalanced were used — ie, ethnic group, occupation, infection with herpes simplex virus type 2, and infection with Chlamydia trachomatis. These variables were considered inde- pendently for association with HIV incidence, then singly, as adjustments to the treatment effect. Finally, the set of variables was included in a model as an adjustment to the treatment effect.

Note: An as-treated analysis means that if someone in the delayed- circumcision arm (the control arm) decided to get circumcised, he would be included in the circumcision arm; likewise, if someone in the circumcision arm (the treatment arm) decided not to get circumcised, he would be included in the control arm.

All hazard or risk ratios were estimated with the parameter estimates from Cox regression. An exact method for computing the likelihood was specified to handle ties.

It is not quite clear how the risk ratio of 0.49 (shown) was calculated but here are a few possibilities.

Lets take the ‘risk ratio’ term literally, i.e., as a ratio of the two 2-year risks. After all, a risk must have a time referent, and here the 1-KM(2y.) numbers are risks.

If we already have the 4.2%(SE 0.46) and the 2.1%(SE 0.61), we have immediately calculate the ratio, and use the delta method applied to its log as a way to calculate its SE.

\[Var[ \log(RiskRatio)] = Var[ \log(Risk.C) - \log(Risk.Not.C)]\] With \(Var[\log(RV) \approx Var(RV)/(E^2)\), where \(E = E[RV]\), and using a few more decional places, we have

( SE.log.ratio = sqrt( 
        data$SE.S.C[6]^2    / data$CI.C[6]^2 + 
        data$SE.S.Not[6]^2 / data$CI.NotC[6]^2 
                      )   )
## [1] 0.2649419
# multiplicative margin of error (MME)

( MME = exp( qnorm(c(0.025, 0.5, 0.975)) * SE.log.ratio ) )
## [1] 0.5949508 1.0000000 1.6808113
( CI = round((data$CI.C[6]/data$CI.NotC[6]) * MME, 4)  )
## [1] 0.2978 0.5005 0.8412

CLEARLY, no matter how roughly or finely we round this ratio, and its limits, this interpretation doesn’t replicate the 0.28, 0.47, 0.78 Risk Ratio shown in their Figure.

Another possibility is that they calculated a rate ratio (a ratio of two incidence densities) but called it a risk ratio. This somewhat loose terminology is common, and is even found in the 2 authorative books by Breslow and Day.

You can see why the interchangeability arose: just as here, when we have low cumulative incidences, the Risk Ratio is mathematically close to the Hazard or IR or ‘rate’ Ratio. Assuming a constant over time hazard ratio, ie. over the follow-up period \(t=0\) to \(t=2\) years, and denoting the constant over time rate ratio as \(\theta\) \[\lambda_1(u) = \lambda_0(u) \times \theta,\]
so \[Risk_1(t=2) = 1 - \exp\bigg[ - \int_0^2\lambda_1(u)du\bigg] \approx \int_0^2\lambda_1(u)du = \int_0^2 \theta \times \lambda_0(u)du \approx \theta \times Risk_0(t=2).\] So, maybe they calculated \(\hat{\lambda_1}/\hat{\lambda_0}\) and just called it a ‘Risk Ratio’ rather than its proper name, a ‘Rate Ratio’ or ‘Hazard Ratio’ or ‘Incidence Density Ratio’?

Let’s see if that would fit. We need to calculate the 2 amounts of exposed population time, in terms of person years:

( PY.C    = round( (1/12) * sum( data$n.C   *(data$t.end - data$t.0)), 1))  
## [1] 2209.8
( PY.NotC = round( (1/12) * sum( data$n.NotC*(data$t.end - data$t.0)), 1)) 
## [1] 2221
n.cases.HIV.C    = sum( data$n.HIV.C )     ; n.cases.HIV.C
## [1] 22
n.cases.HIV.NotC = sum( data$n.HIV.NotC )  ; n.cases.HIV.NotC
## [1] 47
incidence.HIV.C    = 100* n.cases.HIV.C    / PY.C; incidence.HIV.C   # per 100 PY
## [1] 0.9955652
incidence.HIV.NotC = 100* n.cases.HIV.NotC / PY.NotC; incidence.HIV.NotC # per 100 PY
## [1] 2.116164
( Rate.Ratio.hat =  incidence.HIV.C / incidence.HIV.NotC )
## [1] 0.4704575
( SE.log.Rate.Ratio.hat =
    sqrt( 1/n.cases.HIV.C + 1/n.cases.HIV.NotC ) )
## [1] 0.2583237
( MME=exp(qnorm(c(0.025,0.5,0.975)) * SE.log.Rate.Ratio.hat) ) 
## [1] 0.6027184 1.0000000 1.6591496
round(Rate.Ratio.hat * MME, 4)
## [1] 0.2836 0.4705 0.7806

Rounded down to 2 decimal places, thus calculation matches the reported point and interval estimates.

Notice that one can also arrive at these same estimates using POISSON REGRESSION.

data$n.HIV.C
## [1] 4 2 5 3 0 8
data$n.HIV.NotC
## [1]  1  3  9 18  7  9
( events        = c(data$n.HIV.C, data$n.HIV.NotC ) )
##  [1]  4  2  5  3  0  8  1  3  9 18  7  9
( py            = c(data$n.C   *(data$t.end - data$t.0)/12,
                  data$n.NotC*(data$t.end - data$t.0)/12) )
##  [1] 113.9167 225.1667 330.7500 643.5000 514.5000 382.0000 115.0000 228.0000
##  [9] 337.5000 651.0000 519.5000 370.0000
( circumcision  = c(rep(1,6),     rep(0,6)) )
##  [1] 1 1 1 1 1 1 0 0 0 0 0 0
fit= glm(events ~ circumcision+offset(log(py)), family=poisson) 
summary(fit)
## 
## Call:
## glm(formula = events ~ circumcision + offset(log(py)), family = poisson)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.2007  -1.1053   0.1220   0.9265   2.0861  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -3.8556     0.1459 -26.432  < 2e-16 ***
## circumcision  -0.7541     0.2583  -2.919  0.00351 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 35.640  on 11  degrees of freedom
## Residual deviance: 26.498  on 10  degrees of freedom
## AIC: 68.432
## 
## Number of Fisher Scoring iterations: 5
( Rate.0.and.RateRatio = exp(fit$coefficients) )
##  (Intercept) circumcision 
##   0.02116164   0.47045042
( MME = exp(qnorm(c(0.025, 0.5, 0.975))*summary(fit)$coefficients[2,2]) )
## [1] 0.6027192 1.0000000 1.6591475
round(Rate.0.and.RateRatio[2] * MME,2)
## [1] 0.28 0.47 0.78

Rounded down to 2 decimal places, thus calculation ALSO matches the reported point and interval estimates.

But they mentioned using COX REGRESSION, so lets see waht that gives.

# Expand data set (could use weights instead, but a dataset with 1 row 
# (or even several time slices) per participant is more natural

month = NULL
hiv.pos = NULL
circumcised=NULL
for(i in 1:6){
  for(j in 0:1){
    
    n     = (j==1)*data$n.C[i]     + (j==0)*data$n.NotC[i]
    n.pos = (j==1)*data$n.HIV.C[i] + (j==0)*data$n.HIV.NotC[i]

    circumcised = c( circumcised, rep(j,n.pos) )
    hiv.pos     = c( hiv.pos,rep(1,n.pos ) )
    month       = c( month,rep(data$t.end[i],n.pos) )
    
    if(i  < 6) n.censored = n - n.pos - (j==1)*data$n.C[i+1] - (j==0)*data$n.NotC[i+1]
    if(i == 6) n.censored = n - n.pos 
    
    circumcised = c(circumcised, rep(j,n.censored))
    hiv.pos = c(hiv.pos,rep(0,n.censored))
    month=c(month,rep(data$t.end[i],n.censored))
  }
}
length(hiv.pos)
## [1] 2747
df=data.frame(month,circumcised,hiv.pos)
head(df,20) ; tail(df,20)
##    month circumcised hiv.pos
## 1      1           0       1
## 2      1           0       0
## 3      1           0       0
## 4      1           0       0
## 5      1           0       0
## 6      1           0       0
## 7      1           0       0
## 8      1           0       0
## 9      1           0       0
## 10     1           0       0
## 11     1           0       0
## 12     1           0       0
## 13     1           1       1
## 14     1           1       1
## 15     1           1       1
## 16     1           1       1
## 17     1           1       0
## 18     1           1       0
## 19     1           1       0
## 20     1           1       0
##      month circumcised hiv.pos
## 2728    24           1       0
## 2729    24           1       0
## 2730    24           1       0
## 2731    24           1       0
## 2732    24           1       0
## 2733    24           1       0
## 2734    24           1       0
## 2735    24           1       0
## 2736    24           1       0
## 2737    24           1       0
## 2738    24           1       0
## 2739    24           1       0
## 2740    24           1       0
## 2741    24           1       0
## 2742    24           1       0
## 2743    24           1       0
## 2744    24           1       0
## 2745    24           1       0
## 2746    24           1       0
## 2747    24           1       0
summary(df)
##      month        circumcised        hiv.pos       
##  Min.   : 1.00   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.:18.00   1st Qu.:0.0000   1st Qu.:0.00000  
##  Median :24.00   Median :0.0000   Median :0.00000  
##  Mean   :19.36   Mean   :0.4976   Mean   :0.02512  
##  3rd Qu.:24.00   3rd Qu.:1.0000   3rd Qu.:0.00000  
##  Max.   :24.00   Max.   :1.0000   Max.   :1.00000
library(survival)

# 
ties=c("efron","breslow","exact")
for(method in 1:3){
  print( noquote("********************************"))
  print( noquote(paste("Ties dealt with by",
                       ties[method],"method")) )
  print( noquote("********************************"))
  ph.fit = coxph(Surv(month,hiv.pos) ~ circumcised, 
                   data=df, ties=ties[method] )
  print(ph.fit)               
}
## [1] ********************************
## [1] Ties dealt with by efron method
## [1] ********************************
## Call:
## coxph(formula = Surv(month, hiv.pos) ~ circumcised, data = df, 
##     ties = ties[method])
## 
##                coef exp(coef) se(coef)      z       p
## circumcised -0.7592    0.4680   0.2583 -2.939 0.00329
## 
## Likelihood ratio test=9.27  on 1 df, p=0.00233
## n= 2747, number of events= 69 
## [1] ********************************
## [1] Ties dealt with by breslow method
## [1] ********************************
## Call:
## coxph(formula = Surv(month, hiv.pos) ~ circumcised, data = df, 
##     ties = ties[method])
## 
##                coef exp(coef) se(coef)     z       p
## circumcised -0.7568    0.4691   0.2583 -2.93 0.00339
## 
## Likelihood ratio test=9.21  on 1 df, p=0.002407
## n= 2747, number of events= 69 
## [1] ********************************
## [1] Ties dealt with by exact method
## [1] ********************************
## Call:
## coxph(formula = Surv(month, hiv.pos) ~ circumcised, data = df, 
##     ties = ties[method])
## 
##                coef exp(coef) se(coef)     z       p
## circumcised -0.7616    0.4669   0.2591 -2.94 0.00329
## 
## Likelihood ratio test=9.27  on 1 df, p=0.002331
## n= 2747, number of events= 69
# Since all 3 sets of results are SO close to each other, lets use the last one.

round(exp(ph.fit$coefficients),3) # Fitted HAZARD RATIO
## circumcised 
##       0.467
MME = exp(qnorm(c(0.025,0.5,0.975))*summary(ph.fit)$coefficients[1,3] )

round(MME*exp(ph.fit$coefficients),2) # Point Estimate and CI
## [1] 0.28 0.47 0.78

SO, it looks like the fitted hazard ratio for the Cox Model might have been used, and reported as a ‘Risk Ratio’.

JH hopes to cover in class the various methods for handling ties, and to show the logLikelihood that lies behind the parameter-fitting for the Cox model.

Q 11 Uganda Adult Circumcision Trial

  1. Regarding the authors’ use of the term ‘’Cumulative incidence per 100 person-years’’ in the last row of Table 3
    .
    It is misleading and inaccurate. The 2 rates are incidence densities or event rates over the man-years of follow-up.

From Abstract:

HIV incidence over 24 months was 0·66 cases per 100 person-years in the intervention group and 1·33 cases per 100 person-years in the control group (estimated efficacy of intervention 51%, 95% CI 16–72; p=0·006).

The as-treated efficacy was 55% (95% CI 22–75; p=0·002); efficacy from the Kaplan-Meier time-to-HIV-detection as-treated analysis was 60% (30–77; p=0·003)

  1. The reporting of a single incidence (hazard) rate ratio of 0.49 (or ‘an estimated efficacy of intervention of 51%’) over the full 2 years might be questioned. Admittedly, The CIs for the three follow-up-time specific ratios all overlap, but the reports speak of some participants not respecting the advice to refrain from sexual contact until the circumcision had fully healed, and this behaviour made have contriby=ted to the lower protection rates in the early followup months.

It would be learn what the ratio was when the 2 year follow was complete. As it stands now, the overall ratio is more influenced (weighted ) by the larger number of events in the earlier folloup experience. So, it is somewhat arbitrary.

ds=read.table(
  "http://www.biostat.mcgill.ca/hanley/c634/HIV/UgandaTrialData.txt",
  header=TRUE)

ds
##   t.from t.to circumcision n.at.risk n.seroconversions person.years
## 1    0.0  0.5            1      2263                14       1172.1
## 2    0.0  0.5            0      2319                19       1206.7
## 3    0.5  1.0            1      2235                 5       1190.7
## 4    0.5  1.0            0      2229                14       1176.3
## 5    1.0  2.0            1       964                 3        989.7
## 6    1.0  2.0            0       980                12       1008.7
# re-arrange so both data for both arms are in same row.

data=cbind(ds[seq(1,5,2),c(1,2,4,5,6)],ds[seq(2,6,2),4:6])
names(data)=c("t.0","t.end", "n.C","n.HIV.C","py.C",  "n.NotC","n.HIV.NotC","py.NotC") ; data
##   t.0 t.end  n.C n.HIV.C   py.C n.NotC n.HIV.NotC py.NotC
## 1 0.0   0.5 2263      14 1172.1   2319         19  1206.7
## 3 0.5   1.0 2235       5 1190.7   2229         14  1176.3
## 5 1.0   2.0  964       3  989.7    980         12  1008.7
plot(c(0,2),c(-2500,2500),
     col="white", xlab="Follup-Up Time (Years)",
                  ylab="PopulationSize",
     cex.lab=1.75,
     cex.axis=1.75)
for(i in 1:3){
   rect( data$t.0[i],   0,
       data$t.end[i],   data$n.C[i],
       col="lightgreen",border=NA)
   rx = runif(data$n.HIV.C[i],  data$t.0[i],data$t.end[i] )
   ry = runif(data$n.HIV.C[i],            
              0.01*data$n.C[i],0.99*data$n.C[i])
   points(rx,ry,cex=1,pch=19)
   rect( data$t.0[i],   0,
       data$t.end[i],   -data$n.C[i],
       col="lightblue",border=NA)
   rx = runif(data$n.HIV.NotC[i], data$t.0[i],data$t.end[i] )
   ry = runif(data$n.HIV.NotC[i],      
              0.01*data$n.NotC[i], 0.99*data$n.NotC[i])
   points(rx,-ry,cex=1,pch=19)
}

By the way, a number of us are trying to promote this type of graph and have proposed calling it a Population-Time plot. See Figure 1 here, or Figure 3 here, and the various vignettes illustrating the casebase package for R. You saw another onelast week when we were dealing with the discontinuations in the population time in which the IUD was being used.

Q 12 Bangladesh Malaria Vaccine Trial

See the 2015 Lancet article on the ’’Feasibility and effectiveness of oral cholera vaccine in an urban endemic setting in Bangladesh: a cluster randomised open-label trial".

In the ‘Statistical analysis’ section, the authors state that

We calculated sample size by methods described elsewhere.[Donner and Klar text]. We calculated the intra-cluster correlation for cholera hospital admissions for 2008, and 2009. We assumed 65% efficacy and 65% coverage, yielding 42% overall protective efficacy, with a one-sided test (\(\alpha=0.05\)), 80% power, incidence of 1.6 cases per 1000 people per year, 25% yearly attrition, and 2 years of post-vaccination surveillance. On the basis of these assumptions, we calculated that we would need 236,340 participants (78,780 in each group).

The following R code answers these questions

  1. What (assumed constant) attrition rate [expressed as an instantaneous rate of say x losses per 100 participant-days] would generate an annual attrition risk of 25% (so that only 3/4 of those randomized remain under followup [‘at risk’] at the end of year 1, and only 9/16 at the end of year 2)?

  2. Assume 50,000 persons were to be randomized to the control arm. Using the attrition rate you just calculated, compute and graph the expected number under surveillance for each of the first 730 days of follow-up. Add up these daily numbers to get the expected total number of person-days (PD) of follow-up.

  3. For the moment, ignore the variance (reciprocal of sample size) inflation caused by the cluster randomization design, (i.e., natural cluster to variation in attack rates) and by the fact that cholera can also easily spread between persons in the same cluster. Assume instead that the individual attacks are governed by a single homogeneous Poisson process, with \(\lambda_0\) = 1.6/1000 PersonYears (PY) in the ‘control arm’ area.
    .
    Convert this attack rate to an attack rate per person-day or per 100,000 person-days. [For interest, is it higher or lower that the observed rate given in Table 2?]
    Let \(Y_0\) denote the number of attacks in the control arm. Using the rate you just calculated, and the PD from above, calculate E[\(Y_0\)], and (under the no extra-variance assumption) Var[\(Y_0\)].

  4. Assume, for the purposes of hypothesis testing and setting a ‘positivity’ cutoff for a statistical test of the null, that the attack rate (\(\lambda_1\)) for persons in the ‘vaccination only’ area is also 1.6/1000PY, and that the 50,000 persons randomized to this arm are subject to the same attrition rate. Let \(Y_1\) denote the number of attacks in this arm, and let \(d = Y_1 - Y_0.\) [If the 2 amounts of experience were different, we would need to consider the difference in the rates, rather than in the numerators. This `close to 50:50’ randomization makes the planning a bit easier.]
    .
    Under this null assumption, calculate \(\sigma_{d|H_0} = Var[d|H_0]^{1/2)}\), and compute \[d_{crit} = E[d|H_0] - 1.96 \times \sigma_{d|H_0}= 0 - 1.96 \times \sigma_{d|H_0}.\] (Note that this implies a 2-sided test with \(\alpha=0.05;\) it appears that the authors used a 1-sided test, so they would have used 1.645 SD’s as their criterion for test positivity). Sketch the distribution of \(d|H_0,\) leaving some space to the left of, and below, the distribution so as to be able to overlay the non-null version. (Given the large expected numbers, it is safe to use a Normal approximations to the exact distributions of \(d|H_0\) and \(d|H_1.\))

  5. Under the authors’ assumptions, what is the (non-null, \(H_1\)) value of \(\lambda_1\), of \(d\), and of \(\sigma_{d|H_1}\)? Sketch this distribution of \(d|H_1,\) to the left of the null distribution, and upside down [See diagram in section 4.3.2 (p14) of JH’s Notes on inference for a mean. It is a simpler (one-sample) context, and the alternative is on the positive side of the null, but it gives you the idea. For more on sample size calculations for a comparison of 2 means, see the Notes on comparison of 2 means in the Resources.] so it is easier to distinguish the various tail areas. Then use a normal approximation to the distribution of \(d|H_1\) to calculate what percentage of it lies to the left of \(d_{crit}.\) This percentage is called the power of this size study, i.e., the probability that – assuming the \(\lambda_1\) and \(\lambda_0\) are as specified – the study will yield a `positive’ (i.e., statistically significant) result. [It does not mean, as some investigators sloppily write, that the study has this power to detect a rate ratio reduction of 42%.]

# bangladesh cholera vaccine trial  jh 2015.11.09

# attrition rate

# exp(-730 * rate ) = 9/16

a.rate = -log(9/16)/730

N=50000 # trial and error

n.c = N * exp( - a.rate *(1:730) )

plot(1:730,n.c, ylim=c(0,N),
 col="white", xlab="Time: Follow-Up Day",
              ylab="Population Size (Persons)",
              cex.lab=1.5, cex.axis=1.5)
polygon(c(1:730,730,1),c(n.c,0,0),
 col="lightblue", border=NA )

P1000Y=365*1000
100000*1.6/P1000Y
## [1] 0.4383562
( observed.attack.rate=106/(39327744) )
## [1] 2.695298e-06
PD = sum(n.c); PD    # person days
## [1] 27743226
assumed.attack.rate=1.6/(1000*365)

c(assumed.attack.rate, observed.attack.rate)
## [1] 4.383562e-06 2.695298e-06
E.attacks = assumed.attack.rate * PD
E.attacks
## [1] 121.6141
# assuming for now, Y_0 and Y_1 are Poisson
#  i.e. no extra-Poisson (between clsuter) variation

d.crit = qnorm(0.025) * sqrt(1/E.attacks + 1/E.attacks)

#  alt 

Efficacy = 0.65; Coverage=0.65

alt.Rate.Ratio = 1-Efficacy*Coverage;
alt.Rate.Ratio
## [1] 0.5775
alt.E.attacks = alt.Rate.Ratio * E.attacks

c.crit.z.alt=(d.crit-
       (alt.E.attacks-E.attacks) )/
         sqrt(alt.E.attacks+E.attacks)
c.crit.z.alt
## [1] 3.691513
pnorm(c.crit.z.alt)
## [1] 0.9998885
## in a diagram

E.diff.n.cases.null = E.attacks - E.attacks
V.diff.n.cases.null = E.attacks + E.attacks

E.diff.n.cases.alt  = alt.E.attacks - E.attacks
V.diff.n.cases.alt  = alt.E.attacks + E.attacks

possible.diffs = floor( E.diff.n.cases.alt - 
         3*sqrt(V.diff.n.cases.alt)
                      ):
                 ceiling( E.diff.n.cases.null + 
         3*sqrt(V.diff.n.cases.null)
                      )

prob.mass.fn.null = dnorm(possible.diffs,
                          E.diff.n.cases.null,
                      sqrt(V.diff.n.cases.null)
                         )

prob.mass.fn.alt = dnorm(possible.diffs,
                          E.diff.n.cases.alt,
                      sqrt(V.diff.n.cases.alt)
                         )



plot(possible.diffs,prob.mass.fn.null,
  type="h",col="green",
  ylim=c(-max(prob.mass.fn.alt),
          max(prob.mass.fn.null)),
   xlab="N.cases.vaccine.arm - N.cases.ctl.arm",
   ylab="Prob. of observing.This difference",
   cex.lab=1.5, cex.axis=1.5)
abline(v=0-1.96*sqrt(V.diff.n.cases.null),
col="green",lwd=3)
text(1*sqrt(V.diff.n.cases.null),
     2/3*max(prob.mass.fn.null),
     "NULL",col="green",adj=c(-0.5,0))
SD=sqrt(V.diff.n.cases.null)
h = exp(-1/2)*max(prob.mass.fn.null)
segments( 0,h,floor(SD),h,
         col="green",lwd=2)
text(SD/2,h,toString(round(SD,1)),
   col="green",adj=c(0.5,1.25) )
text(0,0,paste("0 = ",
               toString(round(E.attacks,1)),
               " - ",
               toString(round(E.attacks,1)),
               sep=""),
col="green",
     adj=c(0,1.25) )
points(possible.diffs+0.5,-prob.mass.fn.alt,
  type="h", col="red")
text(E.diff.n.cases.alt-1*sqrt(V.diff.n.cases.alt),
     -2/3*max(prob.mass.fn.null),
     "ALT",col="red",adj=c(1.5,0))

text(E.diff.n.cases.alt,0,
     paste(
     toString(alt.Rate.Ratio),
     " 
     x ",
     toString(round(E.attacks,1)),
     " 
     -     
      ",
     toString(round(E.attacks,1)),
     " 
     = ",
     toString(round(E.diff.n.cases.alt,1)),
     sep=""),
     col="red",
     adj=c(1,-0.1) )
text(E.diff.n.cases.alt,0,"|",col="red")     

OBSERVED DATA IN ACTUAL TRIAL

In the ‘Results’ section, the authors address a measure of the 2-year protection afforded by the vaccine.

They had two ways of obtaining a crude measure:

  1. as $ 100 (1-RiskRatio),$ where the \(RiskRatio\) is estimated as the ratio of the 2-year (730 day) risks (i.e. approximately \(\widehat{R_1(2y)} = 1-0.9989 = 0.0011\) and \(\widehat{R_0(2y)} = 1-0.9981 = 0.0019\)) obtained from the the two Kaplan-Meier curves in Figure 3). This gives a crude estimate of \(100\times(1 - 0.0011/0.0019)\) = 42% protection.

  2. as $ 100 (1-RateRatio),$ or $ 100 (1-HazardRatio),$ where the \(RateRatio\) or \(HazardRatio\) is estimated as the ratio of the attack rates calculated over the 730 days (i.e. approximately \(\widehat{\lambda_1} = 65/41,809,947PD = 0.1555\) attacks per 100000PD and \(\widehat{\lambda_0} = 106/39,327,744PD = 0.2695\) attacks per 100000PD obtained from the data in the top row of Table 2). This also gives a crude estimate of \(100\times(1 - 0.1555/0.2695)\) = 42% protection.

Assuming no extra-Poisson variation, we have enough information to directly calculate a CI for the second version. We start by calculating a CI for the \(RateRatio\) or \(HazardRatio\), and then compute \(100 \times (1 - CI).\) But instead of working in the ratio scale, we work in the \(\log [Ratio]\) scale, so that \(Var\{ \log [ \widehat{\lambda_1} / \widehat{\lambda_0} ] \} = Var\{ \log [ \widehat{\lambda_1} ] \} + Var\{ \log [ \widehat{\lambda_0} ] \}.\)

Use your results from exercise 0.1 of the `models for intensity rates’ material to work out the variance (and thus a CI) for the log of the ratio, and from it, a CI for the ratio itself. Then, convert this (slightly asymmetric) CI for the ratio into a (similarly asymmetric) CI to accompany the point estimate of the percent protection. Can you think of reasons why their CI is slightly wider?

With a few approximations and interpolations, and again assuming no extra-binomial variation, we have enough information to directly calculate a Greenwood-based CI for the first version.

\[\widehat{RiskRatio} = \widehat{R_1(2y)} / \widehat{R_0(2y)} \]

\[\textrm{Var}\{ \log \widehat{RiskRatio} \} = \textrm{Var}\{ \log [ \widehat{R_1}] \} + \textrm{Var}\{ \log [ \widehat{R_0} ] \} \]

Writing \(S = 1 - R,\) noting that $ [ ] = $ \(\textrm{Var} [ 1- \hat{R} ] = \textrm{Var} [ \hat{S} ],\) and using the delta method, we can write each of the two Var{log} components as \[ (1/ \hat{R})^2 \times \textrm{Var} [ \hat{R} ] = (1/ \hat{R})^2 \times \textrm{Var} [ \hat{S} ] = (1/ \hat{R})^2 \times \hat{S}^2 \times \sum \{ n_i^{-2} \} \]

where the sum is over the risksets. In this application, each riskset, i.e., \(n_i\), is very large; if the attacks occur on separate days, so that each \(d_i\) is 1, then each \(d_i/[n_i(n_i-d_i)]\) term in the sum in the Greenwood formula can be approximated by \(1/n_i^2\). So, all we are missing for the two components are the 65 different \(n\)’s, i.e. the numbers at risk in the vaccination arm when each of the 65 attacks occurred, and the 106 numbers at risk in the control arm when each of the 106 attacks occurred. Figure 3 indicates that the events are distributed across the 730 days, but because there is more person time nearer to day 1 than day 730 (see your first set of calculations), the 65 events are too. and so the \(n\)’s at these times tend to be somewhat bigger than the average \(n\).

Generate a best guess for the 65 \(n\)’s at risk, and from them calculate the first variance component for \(\textrm{Var}\{ \log \widehat{RiskRatio} \}.\) Do the same for the other arm (with 106 attacks), and add the two variance components to get the variance of the \(\log RiskRatio\). From this, calculate a CI for \(\log RiskRatio\) and, from it, a CI for \(RiskRatio,\) and, from it, a CI for the \(Percent \ Protection\).

c=c(65,106); PD=c(41809947,39327744)
Rate=c/PD
round(Rate[1]/Rate[2],2)
## [1] 0.58
efficacy=round(100*(1-Rate[1]/Rate[2]),0)
log.ratio=log(Rate[1]/Rate[2]) 

SE.log = sqrt(sum(1/c))

round(100*(1-exp(log.ratio+c(-1.96,0,1.96)*SE.log)),0)
## [1] 58 42 21
round(100*(1-exp(log.ratio+c(-1.645,0,1.645)*SE.log)),0)
## [1] 55 42 25
N1=94675
a.rate = -log(38866/N1)/730
n.c = N1 * exp( - a.rate*(1:730) )
prob = n.c/sum(n.c)
#plot(1:730,prob, ylim=c(0,max(prob)))
t.events1 = sort( sample(1:730,65, prob=prob) )
n.riskset1 = n.c[t.events1]
Sum = sum(1/(n.riskset1^2))
Var.logR1 = (1/0.0011^2) * (0.9989^2) * Sum

N0=80056 
a.rate = -log(37303/N0)/730
n.c = N0 * exp( - a.rate*(1:730) )
prob = n.c/sum(n.c)
t.events0 = sort( sample(1:730,106, prob=prob) )
n.riskset0 = n.c[t.events0]
Sum = sum(1/(n.riskset0^2))
Var.logR0 = (1/0.0011^2) * (0.9989^2) * Sum


Var.logRR = Var.logR1 + Var.logR0

CI.RR = round( 100*exp( log(0.0011/0.0019)+
            c(-1.96,0,1.96)*sqrt(Var.logRR) 
          ), 0)
100-CI.RR
## [1] 61 42 13

Refer again to the ‘Statistical analysis’ section, where the authors address the intra-cluster correlation for hospital admissions.

Assume, for simplicity, that all clusters have the same (average) cluster size of \(n=9,001,\) so that the variance inflation factor is \(VIF = 1+(n-1) \times ICC = 1 + 9000 \times ICC.\)

Using the same steps as in the power calculation above, and assuming for now that ICC=0, work out what sample size per arm would be required

for the type II error of 20% (80% power) if one uses a test of size \(\alpha=0.05\) (1-sided). \  \ Compare this with the requirement calculated by the authors, and deduce the value of ICC they must have used.

[You may also find the Rate ratios' section in the article,Sample Size, Precision and Power Calculations: A Unified Approach’ by Hanley and Moodie in J Biomet Biostat 2:124 in 2011 to be of help. The article can also be found using the REPRINTS link on JH’s homepage. It calculates the requirements in terms of , but it is easy to work back from numbers of events to the numbers of person-days required to generate this many events.]

Q 13 “Marriage-Free Survival”

This article was chosen in part to highlight the silliness of our obsession with ‘survival’ data and our **use of emotionally-laden words*. The title was ‘Marriage risk of cancer research fellows’; the label on the y-axis was ‘Marriage-Free Survival’.

The median in the K-M curve is about 6 years.

The text of the letter reads

11 of 13 participants (85%) got married by the 17-year cutoff.

Taken in isolation, this is not that helpful, since the ‘survival’ curve reaches 20% by the end of year 7 and 10% by the end of year 8 or so. The confusion is introduced by the mention of the ‘17-year cutoff.’

Where does this 17 years come from, given that the survival curve end at zero at year 8? It comes from the fact that the first fellow entered follow-up in 1993 (the last one entered in 2008) and the analysis was carried out in 2010, some 17 years later. So the 17 is not relevant, and its mention in the results is misleading.

  1. It is easy to determine the times of the 11 marriages, since each marriage results in a downwards jump in the curve. If we add up the 6 single jumps, and the triple jump at the end of year 6, and the double at the end of year 7, our total comes to the 6+3+2 = 11 reported.

  2. It is also possible to determine how far out the two still-unmarried fellows were when the report was prepared. We can do so by noticing where the jumps increase. It is clear to the eye that the size of the jump at the end of year 5 (t = 5y) is larger than the size of the jump at t = 4.5y. So there must be a censored value somewhere in the second half of the 5th year. Likewise, the size of the triple-jump at the end of year 6 is more than 3 times the size of the jump at the end of year 5. So there must be a consored value somewhere in that 6th year.
    .
    [Note: We shouldn’t say ‘at risk’, but we could say something like ‘still-ummarried’ or ‘had not changed their facebook status to married’ instead. The key is to avoid ‘jargon’ and to instead say what it is in clear language that a lay person would easily understand. The use of the term ‘still unmarried’ emphasizes that marriage (the event) is a transition from the unmarried state to the married state. We are using the word ‘state’ here in the same way that people talk about a person’s ‘Facebook status.

Q 14 Aerosolized plus Intravenous Colistin versus Intravenous Colistin Alone for the Treatment of Ventilator-associated Pneumonia.

JH omitted the descriptor subtitle: ‘A Matched Case-Control Study’.

===========

1.. Comment on the authors’ description of their study.

===========

It is NOT a case-control study, or even a so-called ‘case-control study’.

First, some clarifications, from a nenowned teacher of epidemiology, whose 2011 book Epidemiological Research: Terms and Concepts should be regarded as the go-to book.

Case – Concerning a sickness or an illness (or whatever else), an instance of it. Note: A person with a case of an illness is not a case of that illness; and a number of cases of an illness is not a group but a series of these (in a group of people).

Case – A common misnomer for a person with a case of a particular illness (especially in the context of ‘case-control’ studies). (Cf. sect. I – 1. 2.)

Case-control study∗ (synonyms: retrospective study, case-history study) – “The observational epidemiological study of persons with the disease (or other outcome variable) of interest and a suitable control group of persons without the disease (comparison group, reference group). The potential relationship of a suspected risk factor or an attribute to the disease is examined by comparing the diseased and nondiseased with regard to how frequently the attribute or factor is present . . . in each of the groups (diseased and nondiseased) [refs.]” [4].
.
Note 1: Instead of the persons involved, studied is an occurrence relation (abstract).
.
Note 2: Even though the term ‘case-control study’ is commonly – including in the I.E.A. dictionary [4] – used as a synonym for ‘case-base study’ and ‘case-referent study,’ the concept of ‘case-control study’ actually is profoundly different from the referent of those two other terms.
.
Note 3: The concept of ‘case-control’ study – or ‘trohoc’ (heteropalindrome of ‘cohort’) study – is seriously malformed. It represents what may be termed the trohoc fallacy (corresponding to the cohort fallacy). A case series (not group) in any epidemiological study has its meaning only in the context of its referent, the study base, in reference to which it presumably is the totality of cases. Its associated non-case series (not group), in turn, has meaning only if it is a fair sample of the person-moments (infinite in number) constituting the population-time of the study base. And once the two series are thus construed, comparison between them – numerator and denominator series – is seen to be profoundly misguided. Very notably, the alternative to causality – confounding, that is – can never be understood in terms of the “case-control” comparison – but only in terms of the etiogenetic contrast in reference to the study base. (See ‘Directionality,’ ‘Overmatching,’ and ‘Etiologic study.’)
.
Note 4: The terms ‘retrospective study’ and ‘case-history study’ have lost favor in comparison with ‘case-control study,’ while the respective concepts – fundamental fallacies – are the very same. (Cf. ‘Etiologic study.’)

JH’s shorter (and less formal) version is a study that allows the ratio of two (time-based) rates

Rate\(_1\) = No. of cases (\(C_1\)) occurring in a certain amount of ‘exposed’ person.time / this amount of person.time (\(PT_1\)).

and

Rate \(_0\) = No. of cases (\(C_0\)) occurring in a certain amount of ‘un-exposed’ person.time / this amount of person.time (\(PT_0\))

to be estimated from the ‘case’ series, and a ‘denominator’ series, a fair sample of the person-moments making up the base in which, or from which, the cases arose. The base comprises \(PT\) = \(PT_1\) + \(PT_0\) units of Population-Time. Say the sample is of size \(n\).

The \(n\) person-moments are classified as either ‘exposed’ or ‘not exposed’ and the \(n_1: n_0\) ratio is used to estimate the relative sizes of the exposed and unexposed segments of the base. In other words, \[\widehat{PT_1} = [n_1/n] \times PT; \ \ \widehat{PT_0} = [n_1/n] \times PT.\]

If we knew the actual sizes of the 2 segments of the base , we would calculate \[RateRatio = \frac{C_1/PT_1}{C_0/PT_0}.\] Since we don’t, we substitute our best estimates of them, to get

\[\widehat{RateRatio} = \frac{C_1/([n_1/n] \times PT)} {C_0/([n_0/n] \times PT)}.\]

Since \(n\) and \(PT\) cancel out, we have \[\widehat{RateRatio} = \frac{C_1/n_1} {C_0/n_0} = \frac{C_1 \times n_0} {C_0 \times n_0} .\] The computatition vesrion on the right hand side is the familiar cross-product ratio that is often mistaken for and confused with an odds ratio. It is not a ratio of two odds; it is a ratio of two ‘quasi-rates’ or ‘quasi-incidence densities’.

===========

2.. Overall, the mortality rate in the ICU was 42% (18 of 43 patients) in the IV colistin, compared with and 24% (10 of 43 patients) in the AS-IV colistin group (P = .066).

Show how the authors arrived at their P = .066 for the all-cause mortality comparison. If you used an online calculator, include a screenshot. If you used R, show the code you used.

===========

Several of you got ‘close’, using various ‘canned’ routines

Some did the calculation by hand (with help of R. For example

( z = (18/43 -10/43)/sqrt(
  (18/43) *  (25/43) / 43 + (10/43) * (33/43) / 43 ) )
## [1] 1.878352
round(2*(1-pnorm(z)),4)
## [1] 0.0603

Another possibility might be that they used a ‘continuity correction’. It would reduce the \(X^2\) statistic and thus increase the p-value.

Others, whose teachers told you that you can use separate variances when calculating a SE(diff of 2 proportions) for a CI, but a common variance when calculating a SE(diff of 2 proportions) for a test of the null got

( z = (18/43 -10/43)/sqrt(
  (28/86) *  (58/86) / 43 + (28/86) *  (58/86) / 43 ) )
## [1] 1.840968
round(2*(1-pnorm(z)),3)
## [1] 0.066

which matched theirs.

Here are JH’s Notes from when he taught the Intro biostatistics course for our Epidemiology students.

Notice the difference between the SE on page 1 (for the CI) and the SE on page 2 (for the test of the null H). The common \(p\) in the variance is dictated by the null. The CI reflects whatever the 2 proportions are, whereas the SE for the null is calculated under the common \(p\) specified by the \(H_0.\)

It is not widely appreciated by non-statisticians that that the square of the z-statistic is the same as the Chi-Square statistic.

Nor, nowadays, do statisticians themselves the ‘old-timer’ way of calculating the Chi-Square statistic, using nothing but integers. It avoids having to calculate the 4 expected frequencies in the \(2 \times 2\) table, then the 4 \((O-E)^2/E\) values, and finally their sum. This integer-only version of the Chi-Square statistic is illustrated on page 5 of these Notes.

a=18; b= 10; c = 25; d=33;
r1=a+b; r2=c+d; c1=43; c2=43; n=c1+c2;

( Ch.Sq.statistic = n*(a*d - b*c)^2/(r1*r2*c1*c2) )
## [1] 3.389163
z^2
## [1] 3.389163
round( pchisq(Ch.Sq.statistic,1,lower.tail=FALSE),3)
## [1] 0.066

===========

3.. How the curves were fitted, and the distribution of the times of the 10 and of the 18 deaths.

===========

Suspiciously, the 6 single jumps in the solid line curve are all of size 1/10 = 0.1, and the 2 double jumps are both of size 2/10 = 0.2. This would have to mean that the survival times of the 43-10 = 33 patients who did not die are all censored before the 6th day, the day of the first death. THIS DOES NOT MAKE MUCH SENSE.

Also, the fact that the survival curve goes down to 0 is very suspicious. One would think that the censored survival times are all past the 30 days, so the curve should end with a horizontal straight line at a height of \(S\) = 33/43 = 0.77.

The most logical conclusion is that the Kaplan-Meier curve was calculated just using the survival times of the 10 patients who died. The answer to the next question will support this.

===========

4.. How can the log-rank test give the reported P = .888 for all-cause mortality?

===========

THANKS to James for his ‘data extraction’

library(survival)

for(version in c("Dead Patients Only","All Patients")){
  
# the 10 deaths, without/with 33 censored included
  
t1 =c(6,6, 7, 8, 9, 13, 14,14, 25, 27)
dead1 = rep(1,length(t1))

# the 18, with 25 censored
t0 =c(5,5, 6,6,6, 7,7, 8, 9, 10, 
      12,12, 16,16,16, 20, 29, 30)
dead0 = rep(1,length(t0))
      
if(version=="All Patients"){
   t1 = c(t1, rep(30,33) ) 
   dead1 = c(dead1, rep(0,33) )
   
   t0 =c(t0, rep(30,25))
   dead0 = c( dead0, rep(0,25) )
}  

Day = c(t1, t0)
iv.plus = c(rep(1,length(t1)), 
            rep(0,length(t0)) )
dead = c(dead1,dead0)

km = survfit(Surv(Day,dead)~iv.plus)

plot(km, ylab="Proportion Still Alive",
         xlab="Day",
         cex.lab=1.5, cex.axis=1.5,
         lwd=5,
     col=c("black","red"),
     main=version)
         
abline(h=seq(0,1,0.1),col="lightblue")

log.rank = survdiff(Surv(Day,dead)~iv.plus)
print(log.rank)

# more decimals

print(c(        round(log.rank$chisq,3), 
  round(pchisq(log.rank$chisq,1,lower.tail=FALSE),3)
       )
     )

if(version=="All Patients"){ # show details
  
  print(noquote(""))
  print(noquote("DETAILS  .. by JH"))
  print(noquote(""))
  risk.set.day = sort( unique(Day[dead==1]) )
  print(noquote(""))
   print(noquote("Risksets on these days..."))
  print(risk.set.day)
  
   
  n.tables = length(risk.set.day) 
  m = matrix(NA,n.tables,7)
  for(table in 1:n.tables){
    table.day = risk.set.day[table]
    
    m[table,1] = table.day
    
    m[table,2] = sum( dead[ dead==1 &  Day == table.day & 
                            iv.plus ==1 ] )               # a
    m[table,3] = sum( Day >= table.day & 
                      iv.plus ==1  ) - m[table,2]  # b
    m[table,4] = sum( dead[ dead==1 &  Day == table.day & 
                            iv.plus ==0 ] )              #  c
    m[table,5] = sum( Day >= table.day & 
                            iv.plus ==0  ) - m[table,4]  # d
    
    r1 = m[table,2]+ m[table,3]
    r2 = m[table,4]+ m[table,5]
    c1 = m[table,2]+ m[table,4]
    c2 = m[table,3]+ m[table,5]
    
    n = r1+r2
    
    E.0 = r1*c1/n
    
    V.0 = r1*r2*c1*c2/(n^3)
  
    m[table,6] = E.0  # R
    m[table,7] = V.0  # V
    
    text(table.day-0.1, 1 - table.day / 30,
         toString(m[table,2]),adj=c(1,-0.2))
    text(table.day, 1 - table.day / 30,
         toString(m[table,3]),adj=c(0,-0.2))
    text(table.day-0.1, 1 - table.day / 30,
         toString(m[table,4]),adj=c(1,1.2))
    text(table.day, 1 - table.day / 30,
         toString(m[table,5]),adj=c(0,1.2))
    
    text(table.day -1.5, 1 - table.day / 30,
         toString(round(m[table,2]-E.0,1)),
         adj=c(1,0.5), col="blue",cex=1.4)
    
    text(table.day -3, 1 - table.day / 30,
         toString(round(V.0,4)),adj=c(1,0.5), col="blue",cex=1.4)
  }
  
  text(0, 0.15," V[ a | H0 ] and ( a - E[ a | H0 ] )", 
       col="blue",cex=2.5,adj=c(0,0.5))
  
  print(noquote(""))
   print(noquote("Sum(a), Sum(E.a.0), Sum(V.0)"))
   print(noquote(""))
   
  sums = apply(m,2,sum)[c(2,6,7)]
  
  print (round( sums,4))
  
  print(noquote(""))
   print(noquote("( Sum(a) - Sum(E.a.0) )^2 / Sum(V.0)"))
   print(noquote(""))
  
  print( (sums[1]-sums[2])^2 / sums[3]  )
  
}

} # who (dead only, all)

## Call:
## survdiff(formula = Surv(Day, dead) ~ iv.plus)
## 
##            N Observed Expected (O-E)^2/E (O-E)^2/V
## iv.plus=0 18       18    18.33   0.00586    0.0199
## iv.plus=1 10       10     9.67   0.01111    0.0199
## 
##  Chisq= 0  on 1 degrees of freedom, p= 0.9 
## [1] 0.020 0.888

## Call:
## survdiff(formula = Surv(Day, dead) ~ iv.plus)
## 
##            N Observed Expected (O-E)^2/E (O-E)^2/V
## iv.plus=0 43       18     13.2      1.78      3.43
## iv.plus=1 43       10     14.8      1.58      3.43
## 
##  Chisq= 3.4  on 1 degrees of freedom, p= 0.06 
## [1] 3.427 0.064
## [1] 
## [1] DETAILS  .. by JH
## [1] 
## [1] 
## [1] Risksets on these days...
##  [1]  5  6  7  8  9 10 12 13 14 16 20 25 27 29 30
## [1] 
## [1] Sum(a), Sum(E.a.0), Sum(V.0)
## [1] 
## [1] 10.0000 14.8380  6.7355
## [1] 
## [1] ( Sum(a) - Sum(E.a.0) )^2 / Sum(V.0)
## [1] 
## [1] 3.475096

No surprisingly, the two versions of the log-rank test give very different results. The comparison based only on those who died replicates the P=0.888 the authors reported.

JH’s suggestion is that some of you write a letter to the Editor explaining who you are and why the concerns in the original letter to the Editor are borne out by your analyses, and asking that the journal fix the error. You might also point out that this wrong way of analyzing data is very effective in minimizing any differences in the full data.

Your survey of the web/textbooks found both two versions ‘out there’. C&H show us the rationale behind the more compact (1 term) version. It also fits with the more ‘modern’ (M-H) way of computing a chi-square statistic even in the case of a single 2 x 2 table.

Even though Mantel proposed this method, based on just the ‘a’ cell, back in 1959, it does not seem to have pdermeated the statistics community. Maybe some of the reason is that it is only suited to 2 x 2 tables, and not to larger ones. See page 8 of these notes for the details, and some worked examples.

Q 15 Association Between Push-up Exercise Capacity and Future Cardiovascular Events Among Active Adult Men

The full article can be found here.

We will focus on the K-M curves in Figure 1 (and the Incidence Rate Ratios in the top half of Table 2) more to get experience with these entities than to make fair comparisons.

  1. Why, based on the data reported in Table, do these crude comparisons over-sell the benefits (for cardiovascular health) of being able to complete more push-ups? (Put another way, why were the authors asked to show Table 3 in addition to Table 2?)
    .
    Answer: because they are not fair comparisons. They compare incidence rates in younger and thus fitter men with those in older and thus less fit men. So (aside from sampling variation) the difference we see could be due to their age difference, or their fitness difference. From these ‘crude’ data, it is not possible to separate the independent effects of age and fitness. We are seeing some amalgam of the two. The COMPARISON is UNFAIR, CONFOUNDED, CONFUSED, DISTORTED. It is not a comparison of ‘like with like’. It does not keep all other factors constant.
    .
    Here are some of JH’s favourite examples of confounding that would easily be understood by anyone, and that he has continued to use [he has another one he is willing to share offline].
    .
    >> The 2 mortality comparisons ( Panama vs USA; and smokers vs. non smokers) in the first few pages of Chapter 1 of Rothman’s Introductory Epidemiology textbook
    .
    >> The female (30%) and male (44%) rates of admission to Berkeley Graduate School admissions in the 1960s. textbook](http://www.medicine.mcgill.ca/epidemiology/hanley/c678/mh.pdf)
    .
    >> Exposure to Scientific Theories Affects Women’s Math Performance (Article, raw data supplied by authors, and re-analysis by JH in this all-in-1-pdf [data & JH notes at very end]
    .
    ONE IMPORTANT CORRECTION TO JH’s DIAGRAM at the end of the ‘Women and Math’ material, thanks to Haoyu
    .

Hi Jim,
.
Sorry that I missed your lecture yesterday. I was watching the lecture recording just now, and you talked a lot about one of my favorite words for the past two years, “confounding”. It was excellently explained in a statistical way, but I noticed a small difference in the abstract figure you showed at last, that the bi-directional arrow is not proper if it is a DAG figure. And to be called a confounder in Epi, the direction can only be from the confounder to the “cause” otherwise it’s just a part of the effect of the cause on the outcome. Nevertheless, I am pretty sure that you have already known these things, just in case if it’s a typo.
.
Thanks,
.
Haoyu .

See more on confounding here and here.
.
The ‘adjusted’ comparison in Table 3 does try to make a fair comparison, by using regression models to (synthetically, artificially, mathematically) ‘match’ or ‘handicap’ men on age.

  1. Read the last sentence of the Results section of the Abstract, and identify the table in which this result appears. Comment.
    .
    They took it from Table 2, which merely shows the crude comparison, ie. unadjusted (crude, confounded) incidence ratio. This is a sneaky practice and misleads university media offices, journalists and other casual readers who don’t have the time to read the full paper.

  2. From data in the Figure, construct a dataset of 1104 observations (1 per participant) that comes close to the actual dataset, and use it – and the survival package in R – to generate the K-M curves. It will help to work with an electronic version of the Figure, so that you can enlarge it.

  3. Log-rank test

Here is a version for the 0-10 and 11-20 push-ups groups: it is enough to show the issues.

Year=c(0.2,1:11)

set.seed(1234567)

( AT.RISK=t(matrix(
c( 75, 68, 63, 60, 53, 39, 32,28,26,23,17,17,
  200,200,186,184,172,139,118,96,89,78,63,42),12,2) ))
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,]   75   68   63   60   53   39   32   28   26    23    17    17
## [2,]  200  200  186  184  172  139  118   96   89    78    63    42
str(AT.RISK)
##  num [1:2, 1:12] 75 200 68 200 63 186 60 184 53 172 ...
( t.event=list(
 t1=c(1.01,1.10, 1.95,2.11, 2.95,4.6, 4.7,7.8),
 t2=c(1.85,2.03, 3.4 ,4.8,  4.9, 5.35,6.6,7.4, 10.5)) )
## $t1
## [1] 1.01 1.10 1.95 2.11 2.95 4.60 4.70 7.80
## 
## $t2
## [1]  1.85  2.03  3.40  4.80  4.90  5.35  6.60  7.40 10.50
L=c("0-10 Pushups","11-20 Pushups")
par(mfrow=c(1,1), mar=c(0,0,0,0))
plot(-5,-1,xlim=c(-0.5,13),ylim=c(-220,220))
for(no in seq(-75,200,25)){
    text(-0.05,no,
     toString(abs(no)),  adj=c(1,0.5))
    if(abs(no)<50) text(11.05,no,
     toString(abs(no)),  adj=c(0,0.5))
} 
for(t in 0:11) text(t,220, toString(t),  adj=c(0.5,0.5))
text(12.25,220,"Years\nFollowed",adj=c(0.5,0.5))
text(-0.75,-50,"<---------- |   ------   No. Being Followed    ---> ",
adj=c(0,0.5),srt=90)

RT = list(t1=NULL,t0=NULL)
TIMES=list(t1=NULL,t0=NULL)
Tt = sort(unlist(t.event))
 
for(row in 1:2){
    n.t = function(t) approx(Year,AT.RISK[row,],t)$y
    RT[[row]]=n.t( Tt )
    t.n = function(n) {
        if(i <= AT.RISK[row,12]) return(11)
        if(i > AT.RISK[row,12]) return(
          approx(AT.RISK[row,], Year,n)$y )
    }
    Sign = ((row==2)-(row==1))
    nn=AT.RISK[row,1]
    Times=NULL
    for(i in 1:nn){      
        Times=c(Times,t.n(i))   
    }
    TIMES[[row]]=
     c(Times[sample(1:nn,nn-length(t.event[[row]]))],
       t.event[[row]])
       
    for(i in 1:nn){
        ttt = sort(TIMES[[row]])[nn+1-i]
        segments(0, Sign*(i), 
                       ttt, Sign*(i), 
        col=c("grey20","blue")[row] ,lwd=0.1)           
    } 

    segments(unlist(t.event),0,
         unlist(t.event),Sign*200,col="white",lwd=1.5)
    PT= sum(TIMES[[row]])
    n.events = length(t.event[[row]])
    n=n.t(t.event[[row]])
    r = runif(n,0.1,0.9)
    points(t.event[[row]],Sign*r*n,pch=19,cex=0.5,col="red")
    
    lambda.hat = n.events/PT
    text(6.2,Sign*65*row,L[row])
    text(6.2,Sign*50*row,
    paste(toString(round(1000*lambda.hat,1)),"/ 1000 ManYears"),
    col="red")  
}
tt=sort(unlist(t.event))
dy=40
yy=-110 +c(0,-dy,  dy,0,-dy,-2*dy, 0,-dy, 
   dy,0,-dy,-2*dy,0,0,0,-dy,0)

CEX=0.85
A = 0; EA=0; V=0
for(i in 1:length(tt)){
    in.2 = tt[i] %in% t.event[[2]]
    a=0; if(in.2) a=1
    A=A+a
    text(tt[i],yy[i],
    paste("a=",toString(a),sep=""),cex=CEX,font=4,col="blue")
    n2 = sum(TIMES[[2]]>= tt[i])
    n1 = sum(TIMES[[1]]>= tt[i])
    EA = EA+n2/(n1+n2)
    text(tt[i],yy[i]-dy/3.5,
     paste("E: ",toString(round(n2/(n2+n1),2)),sep=""),
     cex=CEX*0.75,col="blue")
    v = 1*(n1+n2-1)*n1*n2/( (n1+n2-1)*(n1+n2)^2 )
    text(tt[i],yy[i]-2*dy/3.5,
     paste("v: ",toString(round(v,2)),sep=""),
     cex=CEX*0.75,col="blue")
    V=V+v
}

ALL.TIMES = c(TIMES[[1]],TIMES[[2]])
              
EVENT.indicator =c(rep(0, 75-8), rep(1,8 ),
                   rep(0,200-9), rep(1,9 ) )
GROUP = c( rep(" 0-10",length(TIMES[[1]])),
           rep("11-20",length(TIMES[[2]])) )

table(GROUP)
## GROUP
##  0-10 11-20 
##    75   200
n = length(GROUP)
df = data.frame(ALL.TIMES , EVENT.indicator, GROUP )

df=df[order(df$ALL.TIMES , df$EVENT.indicator),]
head(df,20); tail(df,20)
##     ALL.TIMES EVENT.indicator GROUP
## 59  0.2000000               0  0-10
## 43  0.3142857               0  0-10
## 190 0.6000000               0 11-20
## 56  0.6571429               0  0-10
## 252 0.7000000               0 11-20
## 66  0.7714286               0  0-10
## 84  0.8000000               0 11-20
## 13  0.8857143               0  0-10
## 194 0.9000000               0 11-20
## 31  1.0000000               0  0-10
## 183 1.0000000               0 11-20
## 68  1.0100000               1  0-10
## 129 1.1000000               0 11-20
## 69  1.1000000               1  0-10
## 40  1.2000000               0  0-10
## 235 1.2000000               0 11-20
## 93  1.3000000               0 11-20
## 256 1.4000000               0 11-20
## 122 1.5000000               0 11-20
## 8   1.6000000               0  0-10
##     ALL.TIMES EVENT.indicator GROUP
## 180        11               0 11-20
## 210        11               0 11-20
## 212        11               0 11-20
## 213        11               0 11-20
## 215        11               0 11-20
## 216        11               0 11-20
## 219        11               0 11-20
## 220        11               0 11-20
## 224        11               0 11-20
## 226        11               0 11-20
## 234        11               0 11-20
## 242        11               0 11-20
## 244        11               0 11-20
## 245        11               0 11-20
## 246        11               0 11-20
## 250        11               0 11-20
## 251        11               0 11-20
## 260        11               0 11-20
## 261        11               0 11-20
## 266        11               0 11-20
library(survival)
logRank = survdiff( Surv(ALL.TIMES,EVENT.indicator) ~ GROUP)

logRank
## Call:
## survdiff(formula = Surv(ALL.TIMES, EVENT.indicator) ~ GROUP)
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## GROUP= 0-10  75        8        4      3.99      5.23
## GROUP=11-20 200        9       13      1.23      5.23
## 
##  Chisq= 5.2  on 1 degrees of freedom, p= 0.02
text(11,-140,"||\n||\n||\n||\n||\n||\n||\n||\n||")
text(12.2,-80,
paste("Sum(a)\n",toString(A)),cex=CEX,font=4,
 col="blue")
text(12.2,-80-3*dy/3.5,
     paste("Sum(E[a])\n",toString(round(EA,1))),
     cex=CEX*0.75,col="blue")
text(12.2,-80-6*dy/3.5,
     paste("Sum(v)\n",toString(round(V,2))),
     cex=CEX*0.75,col="blue")
text(12.2,-100-9*dy/3.5,
     paste("(9-13.2)^2\n-------------\n 2.96\n\n=",
     toString( round((A-EA)^2/V,2))),
     cex=CEX*0.75,col="blue")
     
survdiff(Surv(ALL.TIMES, EVENT.indicator) ~ GROUP)
## Call:
## survdiff(formula = Surv(ALL.TIMES, EVENT.indicator) ~ GROUP)
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## GROUP= 0-10  75        8        4      3.99      5.23
## GROUP=11-20 200        9       13      1.23      5.23
## 
##  Chisq= 5.2  on 1 degrees of freedom, p= 0.02
text(5.7,225,
"\n
              N Observed Expected (O-E)^2/E (O-E)^2/V
GROUP= 0-10  75        8     3.83      4.53      5.86
GROUP=11-20 200        9    13.17      1.32      5.86

 Chisq= 5.9  on 1 degrees of freedom, p= 0.02",
adj=c(0,1),family="mono", cex=1.2 )

KM = survfit( Surv(ALL.TIMES,EVENT.indicator) ~ GROUP)

par(mfrow=c(1,1), mar=c(5,5,4,0))

plot(KM, ylab=
       "Proportion Still Free of Cardiovascular Disease",
         xlab="Follow-Up Year",
         cex.lab=1.5, cex.axis=1.5,
         lwd=5,
     col=c("black","red"),
     main="Those whose Push-up Exercise Capacity was 0-10 (black) vs. 11-20 (red)")

abline(h=seq(0,1,0.1),col="lightblue")

In case you are wondering why it is called the log rank test, maybe this abstract and this presentation at the (Montreal) CRM Workshop Survival Analysis in 2005 and the Annual Conference of Applied Statistics in Ireland in 2006, might help.

  1. Just using the data in the Figure, determine the (approx.) numbers of person years in each of the 5 push-up categories, and compare them with those back-calculated from Table 2.
    .
    If we just average the row of population sizes 75, 68, 63, 60, 53, 39, 32,28,26,23,17, we get an average population sze of 44, and so the area under the curve is about 440, close to the 8 * 100000 / 1757 = 455.3215709 backcalculated from the the reported event rate of 1757 per 100,000 person years, and the ‘numerator’ of 8 events
plot( c(0,10), c(220,-80), 
      col="white",
      xlab="Follow-up Year",
      ylab="Population-Sizes (ignore the negative sign)",
      cex.axis=1.5, cex.lab=1.5)

AT.RISK=t(matrix(
c( 75, 68, 63, 60, 53, 39, 32,28,26,23,17,
  200,200,186,184,172,139,118,96,89,78,63),11,2) ) 
  
polygon( c(0, 0:10,       10), 
         c(0, AT.RISK[2,], 0), col="grey95",border=NA)

rect(0,0, 10,mean( AT.RISK[2,]) )

polygon( c(0, 0:10,       10), 
        c(0, -AT.RISK[1,], 0), col="grey85",border=NA)

rect(0,0,10,-mean( AT.RISK[1,]) )

text(0, -mean(AT.RISK[1,])/2, 
          toString( mean(AT.RISK[1,]) ), 
          adj=c(-0.25,0.5), cex=2  )
text(5, -mean(AT.RISK[1,]), "10", 
          adj=c(0.5,-0.2) , cex=2 )
Population-Time Plots, and  approximations of the area or integral by the area of a rectangle.

Population-Time Plots, and approximations of the area or integral by the area of a rectangle.

8 * 100000 / 1757  # back-calculation of no. of person-years.
## [1] 455.3216
9 * 100000 /625
## [1] 1440
  1. Calculate a crude IRR and 95% CI for the IRR for the 11-20 (index category) versus the 0-10 (reference category) contrast. Compare them with those reported in Table 2. [Hint: work with logIRR, so that its variance is the sum of the variances of the logs of the 2 Poisson random variables – encountered already in exercise 0.1 in the notes on intensity rates:- models / inference / planning ; compute the CI in the log scale, then transfer it back to the IRR scale.]
    .
    With 8 events in 455.3PY, 9 in 1440PY, the rate.ratio = (9/1440)/(8/455.3) = = 0.36.
    .
    The SE of the log of the ratio is SE.log = sqrt(1/9 + 1/8) = 0.49, so the multiplicative margin of error is MME = exp( qnorm(c(0.025,0.975))*SE.log) = 0.39, 2.59, so the CI for the rate ratio is round(rate.ratio*MME,2) = 0.14, 0.93, which closely matches the reported one.
    .
    In case the concept of a multiplicative margin of error is new and mysterious to you, think of it as a ‘shortcut’ that avoids having to take the log of the observed rate ratio, then calculate the CI in the log scale, then transform these limits back to the natural, ratio, scale. Instead we leave the rate ratio in place, and ‘multiply and divide’ it by the back-transformed margins of error. The MMEs are symmetric in the log scale, and allow us to describe the lower limit as ‘x fold’ or ‘x-times’ lower than the point estimate, and the upper limit as ‘x fold’ or ‘x-times’ higher than the point estimate. It encourages us to think in a multiplicative scale, and to get away from thinking that the only limits are \(+/-\) style limits. Now we have \(\div \ \times\) style limits.

  2. Why are the corresponding adjusted IRR (2 vs 1) estimates from model 2 in Table 3 closer to the null (i.e., to IRR=1) than the crude one in Table 2?
    .
    Because the UNadjusted one compared YOUNGER and fitter men with compared OLDER and less fit men. If we level the playing field (to use a term we hear a lot these days) or **take away* the starting age-advantage, then naturally, the event rate ratio (the Incidence Density Ratio, the Hazard Ratio) is less extreme.