I cringe when I see research proposals that describe a sophisticated statistical approach, yet do not evaluate this approach in their power/precision/sample size planning. It's often the case that a simplified version of the proposed statistical approach is used instead. Presumably, this is due to the limited availability of power/precision/sample size planning software for sophisticated statistical analyses.

In my own planning, I have defaulted to implementing power/precision analyses with Monte Carlo methods (i.e., simulation). I refer to the approach as "simulated power/precision analysis", but I concede that this may not be the best name. Indeed, there may be a more established name that is unknown to me. This approach initially requires more effort than using one of the many power/precision software packages. However, it's almost always more relevant to the proposed research. With practice, the simulation approach has become second nature, and I use it for complex and simple statistical strategies alike.

Below is an example of the simulation approach to compute the power of a test in a simple crossover design. Whenever a simulated power analysis is implemented, it's necessary to specify (1) how the data will arise, and (2) what statistical procedure will be applied. Note that there is no requirement that the statistical procedure should "match" the data generating mechanism. Rather, it's important that (1) is an accurate reflection of prior belief, and (2) is an accurate representation of the proposed statistical procedure. When (1) and (2) do match, as they do in this example, I am sometimes concerned that the resulting computations are optimistic.

In this example, patients will be recruited and given each of two treatments, where the order of treatments is randomized in a block fashion, so that the design is balanced in this regard. We assume that the data arise from a linear mixed effects model, where there is a random intercept for each patient, a treatment effect, an order effect, and a treatment-order interaction effect. The magnitude of each effect is specified, but may be zero. The statistical procedure is to fit a linear mixed effects model, compute a confidence interval for the magnitude of the treatment effect, and finally to make an inference about its significance. We conclude that the treatment effect is (level ) significant when the associated confidence interval fails to include the value zero:

# Simulate a crossover design with the formula: # Response ~ 1 + Treatment + Order + Treatment:Order + (1 | Patient) # Fit simulated data with linear mixed effects model. Make # significance decision about treatment effect on the basis # of 95% confidence interval (i.e., significant if 95% CI fails # to include zero). # n - number of patients in each order group # sdW - within patient standard deviation # sdB - between patient standard deviation # beta - coefficient vector c(Intercept, Treatment, Order, Treatment:Order) simulate <- function(n, sdW=4, sdB=1, beta=c(8, 4, 0, 0), alpha=0.05) { require("lme4") Patient <- as.factor(rep(1:(2*n), rep(2, 2*n))) Treatment <- c(rep(c("Treatment1", "Treatment2"), n), rep(c("Treatment2", "Treatment1"), n)) Order <- rep(c("First", "Second"), 2*n) Data <- data.frame(Patient, Treatment, Order) CMat <- model.matrix(~ Treatment * Order + Patient, data=Data) Response <- CMat %*% c(beta, rnorm(2*n-1, 0, sdB)) + rnorm(4*n, 0, sdW) Data$Response <- Response Fit <- lmer(Response ~ (1 | Patient) + Treatment * Order, data=Data) Est <- fixef(Fit)[2] Ste <- sqrt(vcov(Fit)[2,2]) prod(Est + c(-1,1) * qnorm(1-alpha/2) * Ste) > 0 } # type I error for n=20 (result: 0.059) #mean(replicate(1000, simulate(n=20, beta=c(8, 0, 0, 0)))) # type I error for n=50 (result: 0.057) #mean(replicate(1000, simulate(n=50, beta=c(8, 0, 0, 0)))) # type I error for n=20 and order effect 2 (result: 0.062) #mean(replicate(1000, simulate(n=20, beta=c(8, 0, 2, 0)))) # type I error for n=50 and order effect 2 (result: 0.05) #mean(replicate(1000, simulate(n=50, beta=c(8, 0, 2, 0)))) # power for n=20 and treatment effect 4 (result: 0.869) #mean(replicate(1000, simulate(n=20, beta=c(8, 4, 0, 0)))) # power for n=50 and treatment effect 4 (result: 0.997) #mean(replicate(1000, simulate(n=50, beta=c(8, 4, 0, 0))))

Several scenarios are considered, including some checks on the type I error associated with the proposed procedure, and its power under three hypothetical data generating mechanisms. ***update 2012/02/23: commenter Paul rightly points out below that 1000 replications is insufficient for the implied precision of three decimal places!*** It's quite late as I'm writing this, and so I will end the discussion here. Indeed, I am trying to shorten my posts in an effort to make them more frequent! Please do comment if I've left out an important detail!