By Daniala Weir and Deepa Jahagirdar

We all learn about basic power calculations in stats 101. But when it comes time to actually do it, it’s as if we know nothing at all. In our methods group discussion on October 28^{th}, we talked this challenging aspect of every study. In the simplest case, software (STATA, SAS, R) and online power calculators are your best friend, for example when the effect measure is an unadjusted difference in proportions. Gillian Ainsworth provided us with a few great examples including an overview by Dr. Hanley & Dr. Moodie.

But let’s face it: the simplest case rarely applies. Simultaneously accounting for confounding, correlation, ranges of prevalence for outcomes and exposures requires methods beyond those online calculators. In this case, it is best to simulate data. While the word ‘simulation’ can scare a lot of people off, Brooke Levis provided some very useful examples from her own research as well as R code written along with Andrea Benedetti to conduct a simulation for her power calculations. The code will is pasted below.

Finally, Dr. Hanley gave some important advice,“Every study is valuable for answering an over arching research question….. Just because you don’t have enough power to conduct the study you want, doesn’t mean it shouldn’t be conducted at all. Think about it as contributing to an overall met-analysis on a particular research question.”

R Code for Power Calculations Simulations (credit to Brooke Levis and Dr Andrea Benedetti):

#this function generates a dataset of size n with a binary outcome, binary exposure, and binary confounder

#the exposure prevalence is given by prevx

#the confounder prevalence is given by prevconf

#the outcome prevalence is given by prevy

#the OR between the confounder and x is ORconfx

#the OR between the confounder and y is ORconfy

#the OR between the exposure and y is ORxy

#nreps is the number of times the data is generated and analyzed

#for each data set, a crude model and an adjusted model are fit and the significance of the exposure beta is assessed

getpower<-function(n, prevx, prevconf, prevy, ORconfx, ORconfy, ORxy, nreps){

#make a matrix to hold the results

res<-matrix(NA, ncol=11, nrow=nreps)

res<-as.data.frame(res)

colnames(res)<-c(“n”,”prevx”,”prevconf”,”prevy”, “ORconfx”,”ORconfy”,”ORxy”,”pvaladjmodel”,”sigadjmodel”,”pvalcrudemodel”,”sigcrudemodel”)

for(i in 1:nreps){

#generate the binary exposure – input prevalence of exposure

x<-rbinom(n, 1, prevx)

#generate the binary confounder – prevalence of confounder and OR between exposure and confounder

b0confx<-log(prevconf/(1-prevconf))

b1confx<-log(ORconfx)

regeqxconf<-b0confx+b1confx*x

conf<-rbinom(n,1, exp(regeqxconf)/(1+exp(regeqxconf)) )

#generate the binary outcome – prevalence of outcome, OR between exposure and outcome and OR between confounder and outcome

b0<-log(prevy/(1-prevy))

b1confy<-log(ORconfy)

b1xy<-log(ORxy)

regeq<-b0+b1confy*conf+b1xy*x

y<-rbinom(n, 1, exp(regeq)/(1+exp(regeq)))

#adjusted model

m1<-glm(y~x+conf, family=binomial)

#get p value for exposure beta

res[i,]$pvaladjmodel<-summary(m1)$coef[2,4]

#is it significant?

res[i,]$sigadjmodel<-ifelse(summary(m1)$coef[2,4]<0.05,1,0)

#crude model

m0<-glm(y~x, family=binomial)

#get p value for exposure beta

res[i,]$pvalcrudemodel<-summary(m0)$coef[2,4]

#is it significant?

res[i,]$sigcrudemodel<-ifelse(summary(m0)$coef[2,4]<0.05,1,0)

#hold onto data generation params

res[i,]$n<-n

res[i,]$prevx<-prevx

res[i,]$prevconf<-prevconf

res[i,]$prevy<-prevy

res[i,]$ORconfx<-ORconfx

res[i,]$ORconfy<-ORconfy

res[i,]$ORxy<-ORxy

}

#return the results

res

}

#call the function

p1<-getpower(n=400, prevx=.5, prevconf=.1, prevy=.2, ORconfx=2, ORconfy=2, ORxy=2, nreps=500)

colMeans(p1)