During one of our Department's weekly biostatistics "clinics", a visitor was interested in creating confidence bands for a Gaussian density estimate (or a Gaussian mixture density estimate). The mean, variance, and two "nuisance" parameters, were simultaneously estimated using least-squares. Thus, the approximate sampling variance-covariance matrix (4x4) was readily available. The two nuisance parameters do not directly affect the Gaussian density, but the client was concerned that their correlation with the mean and variance estimates would affect the variance of the density estimate. Of course, this might be the case in general, and a nonparametric bootstrap method might be used to account for this. Nevertheless, I proposed using the delta method, in which the variability of the nuisance parameter estimates do not affect that of the density estimate; a consequence of the normality assumption. This can be verified by fiddling with the parameters below.

The code below implements a Wald-type pointwise 95% confidence band for a test case; I made up the values of the estimated parameters and their approximate variance-covariance matrix (note that the mean and variance estimators are statistically independent). After fiddling with this a bit, it's clear that this delta method approach can perform poorly when the sampling variance is large (e.g., the lower bound of the density estimate can be negative).

```
## bell curve function
bell <- function(dist, mu=0, sig=1, p1=0, p2=0)
exp(-(dist-mu)^2/sig/2)/sqrt(2*pi)/sig
## plot bell curve at default parameters
curve(bell(x), from=-5, to=5, ylim=c(0,0.6),
ylab="density", xlab="distance")
## compute gradient of bell_curve on a grid of distances
dgrid <- seq(-5, 5, 1/50)
bderv <- numericDeriv(
expr=quote(bell(dgrid, mu, sig, p1, p2)),
theta=c("mu","sig","p1","p2"),
rho=list2env(list(dgrid=dgrid,mu=0,sig=1,p1=0,p2=0)))
bgrad <- attr(bderv, 'gradient')
## variance-covariance matrix of mu, sig, p1, and p2
pvcov <- matrix(c(1.0,0.0,0.1,0.0,
0.0,1.0,0.0,0.1,
0.1,0.0,0.2,0.1,
0.0,0.1,0.1,0.2)/100, 4,4)
## approxiamte variance-covariance of bell curve
## induced by variability in parameters
bvcov <- bgrad %*% pvcov %*% t(bgrad)
## add pointwise 95% Wald confidence bands
polygon(x=c(dgrid, rev(dgrid)),
y=c(bderv + qnorm(0.975)*sqrt(diag(bvcov)),
bderv - qnorm(0.975)*sqrt(diag(bvcov))),
col="lightgray", border=NA)
lines(dgrid, bderv, lwd=2)
abline(h=0, lty=3)
```

]]>

Suppose I make a wager with my father in-law that a Democrat will win the 2016 presidential election, and that we both put up $500 with even odds. So, if a Democrat wins, I gain $500, but if a Republican wins, I lose $500.

Further suppose that I can take the other side of this bet with more favorable odds somewhere else, say 6-to-1 favoring the Democrat (the odds are closer than this right now, on Predictit.org). That is, if I bet $150 dollars that a Republican will win against 6:1 odds, then I will gain $750 if the republican wins and lose $150 if the democrat wins.

Now, consider the two possible outcomes (assuming that either a Democrat or Republican are sure to win): 1) A Democrat wins, and I gain $500 dollars on the first bet and lose $150 on the second for a net $350 gain. 2) A Republican wins, and I lose $500 on the first bet and gain $750 on the second for a net gain of $250. Thus, I win at least $250 either way. Of course this would only work if my father in-law agreed to take worse odds (i.e., paid more for the bet) than could be had elsewhere. This is a type of market arbitrage.

]]>Statisticians often need to integrate some function with respect to the multivariate normal (Gaussian) distribution, for example, to compute the standard error of a statistic, or the likelihood function in of a mixed effects model. In many (most?) useful cases, these integrals are intractable, and must be approximated using computational methods. Monte-Carlo integration is one such method; a stochastic method, but its computation can be prohibitively expensive, especially when the integral is computed many times.

Quadrature methods, which are deterministic rather than stochastic, are another set of methods that can be less computationally expensive, especially for lower-dimension integrals. The main idea is to approximate the integral as a weighted summation (the Monte-Carlo method uses an unweighted summation), where the integrand is evaluated on a grid of points selected from the domain of integration. The weights and points are carefully selected to approximate the integral. Gauss-Hermite quadrature is a well-known method for selecting the weights and points for integrals involving the univariate normal distribution. The details of selecting weights and points is complicated, and involves finding the roots of Hermite polynomials (see with Wikipedia link above for details). Fortunately, there already exists some R code (extracted from the ecoreg package; see the `hermite` and `gauss.hermite` functions below) that implements this.

There are natural extensions of univariate Gaussian quadrature for integrals involving the multivariate normal distribution. Peter Jäckel has written a great, short, accessible article about this, and some of the figures below look very similar to those in the article. The extension to multivariate integrals is based on the idea of creating an M-dimensional grid of points by expanding the univariate grid of Gauss-Hermite quadrature points, and then rotating, scaling, and translating those points according to the mean vector and variance-covariance matrix of the multivariate normal distribution over which the integral is calculated (see the `mgauss.hermite` function below, with comments). The weights of the M-variate quadrature points are the product of the corresponding M univariate weights. The following code block lists three functions, where the first two compute the Gauss-Hermite quadrature weights and points in one dimension, and the last computes the weights and points for multivariate Gaussian quadrature.

```
## perform quadrature of multivariate normal
## compute Gauss-Hermite quadrature points and weights
## for a one-dimensional integral.
## points -- number of points
## interlim -- maximum number of Newton-Raphson iterations
hermite <- function (points, z) {
p1 <- 1/pi^0.4
p2 <- 0
for (j in 1:points) {
p3 <- p2
p2 <- p1
p1 <- z * sqrt(2/j) * p2 - sqrt((j - 1)/j) * p3
}
pp <- sqrt(2 * points) * p2
c(p1, pp)
}
gauss.hermite <- function (points, iterlim = 50) {
x <- w <- rep(0, points)
m <- (points + 1)/2
for (i in 1:m) {
z <- if (i == 1)
sqrt(2 * points + 1) - 2 * (2 * points + 1)^(-1/6)
else if (i == 2)
z - sqrt(points)/z
else if (i == 3 || i == 4)
1.9 * z - 0.9 * x[i - 2]
else 2 * z - x[i - 2]
for (j in 1:iterlim) {
z1 <- z
p <- hermite(points, z)
z <- z1 - p[1]/p[2]
if (abs(z - z1) <= 1e-15)
break
}
if (j == iterlim)
warning("iteration limit exceeded")
x[points + 1 - i] <- -(x[i] <- z)
w[i] <- w[points + 1 - i] <- 2/p[2]^2
}
r <- cbind(x * sqrt(2), w/sum(w))
colnames(r) <- c("Points", "Weights")
r
}
## compute multivariate Gaussian quadrature points
## n - number of points each dimension before pruning
## mu - mean vector
## sigma - covariance matrix
## prune - NULL - no pruning; [0-1] - fraction to prune
mgauss.hermite <- function(n, mu, sigma, prune=NULL) {
if(!all(dim(sigma) == length(mu)))
stop("mu and sigma have nonconformable dimensions")
dm <- length(mu)
gh <- gauss.hermite(n)
#idx grows exponentially in n and dm
idx <- as.matrix(expand.grid(rep(list(1:n),dm)))
pts <- matrix(gh[idx,1],nrow(idx),dm)
wts <- apply(matrix(gh[idx,2],nrow(idx),dm), 1, prod)
## prune
if(!is.null(prune)) {
qwt <- quantile(wts, probs=prune)
pts <- pts[wts > qwt,]
wts <- wts[wts > qwt]
}
## rotate, scale, translate points
eig <- eigen(sigma)
rot <- eig$vectors %*% diag(sqrt(eig$values))
pts <- t(rot %*% t(pts) + mu)
return(list(points=pts, weights=wts))
}
```

For some of the M-variate points, the weights are very small, and thus contribute very little to the integral. The notion of ‘pruning’ can be used to eliminate those points with very small weights. The `mgauss.hermite` function does this by trimming a specified fraction of the smallest weights (I’ve found that pruning 20% works well). In two dimensions, when the variance of each variable is 1.0 and correlation 0.5, the pruned points look as follows, where the point diameter is monotonic in the corresponding weight:

```
sig <- matrix(c(1,0.5,0.5,1),2,2)
pts <- mgauss.hermite(10, mu=c(0,0), sigma=sig, prune=0.2)
plot(pts$points, cex=-5/log(pts$weights), pch=19,
xlab=expression(x[1]),
ylab=expression(x[2]))
```

Computing a 2D integral with these points would require 80 evaluations of the integrand (note that there were originally 10 points in each dimension, or 100 points total, but by pruning were reduced to 80). Now, the real question is whether integrating with such points and weights can achieve a similar or better result than a same-sized (or perhaps even much larger) Monte-Carlo method. The following three sections compare these methods (and additionally the delta method, in the last section) in computing means and variances, probabilities, and the standard error of an unusual statistic. Probabilities are an interesting case because of their discreteness, and computing standard errors is, obviously, an important application of quadrature.

The true mean vector is zero, and the true variances and covariance are one and one-half, respectively. The quadrature method is the winner here:

```
library(mvtnorm); set.seed(42)
x80 <- rmvnorm(80, sigma=sig)
x1000 <- rmvnorm(1000, sigma=sig)
### Means
## quadrature with 80 points
colSums(pts$points * pts$weights)
## [1] -6.140989e-21 1.291725e-20
## Monte-Carlo with 80 points
colMeans(x80)
## [1] -0.06886731 -0.03477292
## Monte-Carlo with 1000 points
colMeans(x1000)
## [1] -0.02371047 -0.01133503
### Variances
## quadrature with 80 points
cov.wt(pts$points, wt=pts$weights, method="ML")$cov
## [,1] [,2]
## [1,] 0.9999904 0.4999952
## [2,] 0.4999952 0.9999904
## Monte-Carlo with 80 points
cov(x80)
## [,1] [,2]
## [1,] 0.9838169 0.4958186
## [2,] 0.4958186 1.0174029
## Monte-Carlo with 1000 points
cov(x1000)
## [,1] [,2]
## [1,] 1.0083872 0.4938198
## [2,] 0.4938198 0.9727271
```

Computing probabilities is the same as using an indicator functions as the integrand in this context, which are obviously much more discrete than the integrand for means. It looks like the Monte-Carlo methods may be superior in computing such quantities. For the first probability, the true value is 1/3; for the second, the true value is 1/20:

```
### Probabilities
## P(x1<0, x2<0) = 1/3
## quadrature with 80 points
gfun <- function(x) prod(x<0)
sum(apply(pts$points, 1, gfun) * pts$weights)
## [1] 0.3927074
## Monte-Carlo with 80 points
mean(apply(x80, 1, gfun))
## [1] 0.3625
## Monte-Carlo with 1000 points
mean(apply(x1000, 1, gfun))
## [1] 0.336
## P(x1<q0.05, x2<q0.05) = 0.05
q0.05 <- qmvnorm(0.05, sigma=sig)$quantile
gfun <- function(x) prod(x<q0.05)
## quadrature with 80 points
sum(apply(pts$points, 1, gfun) * pts$weights)
## [1] 0.01911414
## Monte-Carlo with 80 points
mean(apply(x80, 1, gfun))
## [1] 0.0625
## Monte-Carlo with 1000 points
mean(apply(x1000, 1, gfun))
## [1] 0.06
```

Consider a model described by vector of parameters, and an estimator that has an approximate multivariate normal distribution. This is often the case, for example, with ordinary least-squares and maximum likelihood estimators.

As an example, consider the one-compartment pharmacokinetic model with first-order elimination and intravenous bolus injection. The `cfun` function below gives the concentration of drug as a function of time following an IV bolus. The `tfun` function computes the time at which the concentration reaches a given level. As our statistic of interest, consider the amount of time in which the concentration remains above 0.064 g/L, which is four-fold the minimum inhibitory concentration (MIC) of piperacillin, an antibiotic. The figure below illustrates the time-course of piperacillin concentration for a typical patient after a 3g IV bolus.

```
## drug concentration after bolus injection
## one-compartment model
cfun <- function(t,v,k,dose=3)
dose/v*exp(-k*t)
## compute time at which concentration reaches x
tfun <- function(x,v,k,dose=3)
-log(x*v/dose)/k
## compute time at which concentration reaches 0.064
## given parameters p=c(v,k)
gfun <- function(p)
tfun(0.064, p[1], p[2])
## from a recent PK study on piperacillin
log_mu <- c(log_v=3.444, log_k=-2.036)
log_sig <- structure(c(0.0033, -0.0022, -0.0022, 0.0034),
.Dimnames = rep(list(c("log_v", "log_k")),2),
.Dim = c(2L, 2L))
curve(cfun(x, v=exp(log_mu[1]), k=exp(log_mu[2])),
from=0, to=8, n=300,
xlab="Time (h)", ylab="Concentration (g/L)")
lines(x=c(-10,rep(gfun(exp(log_mu)),2)),
y=c(rep(0.064,2), 0), lty=2)
legend("topright", bty="n",
legend=paste0("Time to 0.064 g/L: ",
round(gfun(exp(log_mu)),1), "h"))
```

Since the amount of time in which the concentration remains above 0.064 g/L is a function of the model parameters, sampling variability in the parameter estimates propagate, and thus we can compute a standard error. The "true" standard error is approximately 0.1233, which was computed (ironically) using the Monte Carlo method with 10M sample points. Below, the standard error is approximated using quadrature with 80 points, Monte Carlo with 80 and 1000 points, and the delta method. Each of the methods perform fairly well here. However, this is a fairly 'smooth' statistic. In fact, my motivation for this little experiment is to lay the groundwork to examine a more complex statistic: the probabilities of pharmacokinetic target attainment in a population. These statistics are usually calculated using a Monte Carlo method ("Monte Carlo Simulation" or MCS, in the antibiotic literature), and are thus somewhat discrete. That is, even for large MCSs, the numerical delta method (i.e., where the gradient is computed numerically) can fail miserably.

```
## quadrature with 80 points
pts <- mgauss.hermite(n=10, mu=log_mu, sigma=log_sig, prune=0.2)
cov.wt(matrix(apply(exp(pts$points), 1, gfun), nrow(pts$points),1),
pts$weights, method="ML")$cov
## [,1]
## [1,] 0.1232139
## Monte-Carlo with 80 points
var(apply(exp(rmvnorm(80, mean=log_mu, sigma=log_sig)), 1, gfun))
## [1] 0.123917
## Monte-Carlo with 1000 points
var(apply(exp(rmvnorm(1e3, mean=log_mu, sigma=log_sig)), 1, gfun))
## [1] 0.1219701
## delta method
rho <- list2env(list(log_v=log_mu[1],log_k=log_mu[2],x=0.064))
nd <- attr(numericDeriv(quote(tfun(x,exp(log_v),exp(log_k))),
theta=c('log_v','log_k'),rho=rho), 'gradient')
nd %*% log_sig %*% t(nd)
## [,1]
## [1,] 0.121936
```

The code snippet below creates the above graphic:

```
## radially symmetric kernel (Gussian kernel)
RadSym <- function(u)
exp(-rowSums(u^2)/2) / (2*pi)^(ncol(u)/2)
## multivariate extension of Scott's bandwidth rule
Scott <- function(data)
t(chol(cov(data))) * nrow(data) ^ (-1/(ncol(data)+4))
## compute KDE at x given data
mvkde <- function(x, data, bandwidth=Scott, kernel=RadSym) {
# bandwidth may be a function or matrix
if(is.function(bandwidth))
bandwidth <- bandwidth(data)
u <- t(solve(bandwidth, t(data) - x))
mean(kernel(u))
}
## compute KDE at (matrix) x given data
smvkde <- function(x, ...)
apply(x, 1, mvkde, ...)
## Example with 'airquality' data
## compute bivariate KDE and plot contours
data("airquality")
aq <- subset(airquality, !is.na(Ozone) & !is.na(Solar.R),
select=c("Ozone", "Solar.R"))
## compute density on a grid of Ozone and Solar.R values
dens.Ozone <- seq(min(aq$Ozone),max(aq$Ozone),length.out=100)
dens.Solar.R <- seq(min(aq$Solar.R),max(aq$Solar.R),length.out=100)
dens.grid <- expand.grid(Ozone=dens.Ozone, Solar.R=dens.Solar.R)
dens.vals <- smvkde(dens.grid, data=aq)
## arrange density values into matrix for easy plotting
dens.mtrx <- matrix(dens.vals, 100, 100)
contour(x=dens.Ozone, y=dens.Solar.R, z=dens.mtrx,
xlab="Ozone", ylab="Solar.R")
points(aq$Ozone, aq$Solar.R, pch=20)
## sample and plot 1000 points from bivariate KDE
## assume Gaussian kernel and Scott bandwidth formula
## 1. sample the original data with replacement
n <- 1000; p <- dim(aq)[2]; set.seed(42)
dens.samp <- aq[sample(1:nrow(aq), size=n, replace=TRUE),]
## 2. add variability by sampling from kernel
dens.samp <- dens.samp + matrix(rnorm(n*p), n, p) %*% Scott(aq)
## 3. plot sampled points
points(dens.samp$Ozone, dens.samp$Solar.R, pch=3,
cex=0.4, col=gray(0.4))
legend("bottomright",
c("Original", "Sampled", "KDE Contours"),
pch=c(20,3,NA),lty=c(NA,NA,1),
col=gray(c(0,0.2,0)), bty="n")
```

]]>There is a nice package and paper about this here: http://www.jstatsoft.org/v57/i05/paper. However, the associated code is complex and uses `lattice`. Here's a brief recipe using base graphics that implements the above figure:

```
set.seed(40)
x <- matrix(rgamma(50,1,1),10,5)
x <- x/rowSums(x)
colnames(x) <- c("Strongly Disagree", "Disagree",
"Neutral", "Agree", "Strongly Agree")
rownames(x) <- paste0("Q", 1:nrow(x))
## colors for each category
clrs <- rev(gray.colors(ncol(x))) ## colors
## centering category
acat <- 3 ## "Neutral"
## separation between bars
sepr <- 0.2
## ncol and nrow
nr <- nrow(x)
nc <- ncol(x)
## reorder so that questions 1:nrow(x) go from top down
x <- x[nr:1,]
## compute center offsets
cnof <- apply(x, 1, function(y) {
lo <- if(acat > 1) sum(y[1:(acat-1)]) else 0
hi <- sum(y[1:acat])
lo + (hi-lo)/2
})
## create plot
plot(c(-1,1), c(1,nr), type="n",
ylim=c(1-(1-sepr)/2-sepr,
nr+(1-sepr)/2+sepr),
ylab="", yaxt="n",
xlab="", )
## plot bars
for(i in 1:nr) {
for(j in 1:nc) {
lo <- if(j > 1) sum(x[i,][1:(j-1)]) else 0
hi <- sum(x[i,][1:j])
polygon(x=c(lo, lo, hi, hi)-cnof[i],
y=c(i-(1-sepr)/2, i+(1-sepr)/2,
i+(1-sepr)/2, i-(1-sepr)/2),
col=clrs[j], border=NA)
}
}
## create y-axis
axis(2, at=1:nr, las=2, xpd=NA, labels=rownames(x))
legend("topleft", fill=clrs, bty="n", legend=colnames(x))
## add center line
abline(v=0, lty=2)
```

]]>```
## This script illustrates the nested versus non-nested
## random effects functionality in the R packages lme4 (lmer)
## and nlme (lme).
library("lme4")
library("nlme")
data("Oxide")
Oxide <- as.data.frame(Oxide)
## In the Oxide data, Site is nested in Wafer, which
## is nested in Lot. But, the latter appear crossed:
xtabs(~ Lot + Wafer, Oxide)
## Create a variable that identifies unique Wafers
Oxide$LotWafer <- with(Oxide, interaction(Lot,Wafer))
## For continuous response 'Thickness',
## fit nested model E[y_{ijk}] = a + b_i + g_{ij}
## for Lot i = 1:8 and Wafer j = 1:3 and Site k = 1:3
## where b_i ~ N(0, \sigma_1)
## g_{ij} ~ N(0, \sigma_2)
## and b_i is independent of g_{ij}
## The following four models are identical:
## lme4
lmer(Thickness ~ (1 | Lot/Wafer), data=Oxide)
lmer(Thickness ~ (1 | Lot) + (1 | LotWafer), data=Oxide)
## Note: the equivalence of the above formulations makes
## clear that the intercept indexed by Wafer within Lot
## has the same variance across Lots.
## nlme
lme(Thickness ~ 1, random= ~1 | Lot/Wafer, data=Oxide)
lme(Thickness ~ 1, random=list(~1|Lot, ~1|Wafer), data=Oxide)
## Note: the second formulation illustrates that lme assumes
## nesting in the order that grouping factors are listed. I
## think that this was a poor implementation decision, and
## that the latter should indicate non-nested grouping.
## Fit non-nested model E[y_{ijk}] = a + b_i + g_j
## for Lot i = 1:8 and Wafer j = 1:3 and Site k = 1:3
## where b_i ~ N(0, \sigma_1)
## g_j ~ N(0, \sigma_2)
## and b_i is independent of g_j
## lme4
lmer(Thickness ~ (1 | Lot) + (1 | Wafer), data=Oxide)
lmer(Thickness ~ (1 | Wafer) + (1 | Lot), data=Oxide)
## nlme: There is no 'easy' way to do this with nlme,
## and I couldn't get this to work with nlme. This is a
## trick that gives a random slope for each level of the
## grouping variables, which are indexed by the levels of
## a dummy grouping variable with only one group. We also
## specify, for each grouping factor, that covariance
## matrix is proportional to the identity matrix.
Oxide$Dummy <- factor(1)
Oxide <- groupedData(Thickness ~ 1 | Dummy, Oxide)
lme(Thickness ~ 1, data=Oxide,
random=pdBlocked(list(pdIdent(~ 0 + Lot),
pdIdent(~ 0 + Wafer))))
```

The image below is my interpretation of the nlme (lme) trick for non-nested (crossed) random effects. The idea is to assign a random slope (no intercept) to each level of the grouping factors, which are each indexed by the levels of a dummy variable with that has exactly one level. The pdIdent function ensures that these random effects are uncorrelated and common variance. The pdBlocked function specifies that the random effects are also independent across the two grouping factors.

]]>A Monte-Carlo simulation can be implemented to study the effects of the control group frequencies, the odds ratio associated with treatment allocation (i.e., the 'treatment effect'), and sample size on the power or precision associated with a null hypothesis test or confidence interval for the treatment effect.

In order to simulate this process, it's necessary to specify each of the following:

- control group frequencies
- treatment effect
- sample size
- testing or confidence interval procedure

Ideally, the control group frequencies would be informed by preliminary data, but expert opinion can also be useful. Once specified, the control group frequencies can be converted to intercepts in the POCL model framework. There is an analytical solution for this; see the link above. But, a quick and dirty method is to simulate a large sample from the control group population, and then fit an intercept-only POCL model to those data. The code below demonstrates this, using the `polr` function from the `MASS` package.

```
## load MASS for polr()
library(MASS)
## specify frequencies of 11 ordered categories
prbs <- c(1,5,10,15,20,40,60,80,80,60,40)
prbs <- prbs/sum(prbs)
## sample 1000 observations with probabilities prbs
resp <- factor(replicate(1000, sample(0:10, 1, prob=prbs)),
ordered=TRUE, levels=0:10)
## fit POCL model; extract intercepts (zeta here)
alph <- polr(resp~1)$zeta
```

As in most other types of power analysis, the treatment effect can represent the minimum effect that the study should be designed to detect with a specified degree of power; or in a precision analysis, the maximum confidence interval width in a specified fraction of samples. In this case, the treatment effect is encoded as a log odds ratio, i.e., a slope parameter in the POCL model.

Given the intercept and slope parameters, observations from the POCL model can be simulated with permuted block randomization in blocks of size four to one of two treatment groups as follows:

```
## convenience functions
logit <- function(p) log(1/(1/p-1))
expit <- function(x) 1/(1/exp(x) + 1)
## block randomization
## n - number of randomizations
## m - block size
## levs - levels of treatment
block_rand <- function(n, m, levs=LETTERS[1:m]) {
if(m %% length(levs) != 0)
stop("length(levs) must be a factor of 'm'")
k <- if(n%%m > 0) n%/%m + 1 else n%/%m
l <- m %/% length(levs)
factor(c(replicate(k, sample(rep(levs,l),
length(levs)*l, replace=FALSE))),levels=levs)
}
## simulate from POCL model
## n - sample size
## a - alpha
## b - beta
## levs - levels of outcome
pocl_simulate <- function(n, a, b, levs=0:length(a)) {
dat <- data.frame(Treatment=block_rand(n,4,LETTERS[1:2]))
des <- model.matrix(~ 0 + Treatment, data=dat)
nlev <- length(a) + 1
yalp <- c(-Inf, a, Inf)
xbet <- matrix(c(rep(0, nrow(des)),
rep(des %*% b , nlev-1),
rep(0, nrow(des))), nrow(des), nlev+1)
prbs <- sapply(1:nlev, function(lev) {
yunc <- rep(lev, nrow(des))
expit(yalp[yunc+1] - xbet[cbind(1:nrow(des),yunc+1)]) -
expit(yalp[yunc] - xbet[cbind(1:nrow(des),yunc)])
})
colnames(prbs) <- levs
dat$y <- apply(prbs, 1, function(p) sample(levs, 1, prob=p))
dat$y <- unname(factor(dat$y, levels=levs, ordered=TRUE))
return(dat)
}
```

The testing procedure we consider here is a likelihood ratio test with 5% type-I error rate:

```
## Likelihood ratio test with 0.05 p-value threshold
## block randomization in blocks of size four to one
## of two treatment groups
## dat - data from pocl_simulate
pocl_test <- function(dat) {
fit <- polr(y~Treatment, data=dat)
anova(fit, update(fit, ~.-Treatment))$"Pr(Chi)"[2] < 0.05
}
```

The code below demontrates the calculation of statistical power associated with sample of size 100 and odds ratio 0.25, where the control group frequencies of each category are as specified above. When executed, which takes some time, this gives about 80% power.

```
## power: n=50, OR=0.25
mean(replicate(10000, pocl_test(pocl_simulate(50, a=alph, b=c(0, log(0.25))))))
```

The figure below illustrates the power associated with a sequence of odds ratios. The dashed line represents the nominal type-I error rate 0.05.

Simulation-based power and precision analysis is a very powerful technique, which ensures that the reported statistical power reflects the intended statistical analysis (often times in research proposals, the proposed statistical analysis is not the same as that used to evaluate statistical power). In addition to the simple analysis described above, it is also possible to evaluate an adjusted analysis, i.e., the power to detect a treatment effect after adjustement for covariate effects. Of course, this requires that the latter effects be specified, and that there is some mechanism to simulate covariates. This can be a difficule task, but makes clear that there are many assumptions involved in a realistic power analysis.

Another advantage to simulation-based power analysis is that it requires implementation of the planned statistical procedure before the study begins, which ensures its feasibility and provides an opportunity to consider details that might otherwise be overlooked. Of course, it may also accelerate the 'real' analysis, once the data are collected.

Here is the complete R script:

```
## load MASS for polr()
library(MASS)
## specify frequencies of 11 ordered categories
prbs <- c(1,5,10,15,20,40,60,80,80,60,40)
prbs <- prbs/sum(prbs)
## sample 1000 observations with probabilities prbs
resp <- factor(replicate(1000, sample(0:10, 1, prob=prbs)),
ordered=TRUE, levels=0:10)
## fit POCL model; extract intercepts (zeta here)
alph <- polr(resp~1)$zeta
## convenience functions
logit <- function(p) log(1/(1/p-1))
expit <- function(x) 1/(1/exp(x) + 1)
## block randomization
## n - number of randomizations
## m - block size
## levs - levels of treatment
block_rand <- function(n, m, levs=LETTERS[1:m]) {
if(m %% length(levs) != 0)
stop("length(levs) must be a factor of 'm'")
k <- if(n%%m > 0) n%/%m + 1 else n%/%m
l <- m %/% length(levs)
factor(c(replicate(k, sample(rep(levs,l),
length(levs)*l, replace=FALSE))),levels=levs)
}
## simulate from POCL model
## n - sample size
## a - alpha
## b - beta
## levs - levels of outcome
pocl_simulate <- function(n, a, b, levs=0:length(a)) {
dat <- data.frame(Treatment=block_rand(n,4,LETTERS[1:2]))
des <- model.matrix(~ 0 + Treatment, data=dat)
nlev <- length(a) + 1
yalp <- c(-Inf, a, Inf)
xbet <- matrix(c(rep(0, nrow(des)),
rep(des %*% b , nlev-1),
rep(0, nrow(des))), nrow(des), nlev+1)
prbs <- sapply(1:nlev, function(lev) {
yunc <- rep(lev, nrow(des))
expit(yalp[yunc+1] - xbet[cbind(1:nrow(des),yunc+1)]) -
expit(yalp[yunc] - xbet[cbind(1:nrow(des),yunc)])
})
colnames(prbs) <- levs
dat$y <- apply(prbs, 1, function(p) sample(levs, 1, prob=p))
dat$y <- unname(factor(dat$y, levels=levs, ordered=TRUE))
return(dat)
}
## Likelihood ratio test with 0.05 p-value threshold
## block randomization in blocks of size four to one
## of two treatment groups
## dat - data from pocl_simulate
pocl_test <- function(dat) {
fit <- polr(y~Treatment, data=dat)
anova(fit, update(fit, ~.-Treatment))$"Pr(Chi)"[2] < 0.05
}
## power: n=50, OR=0.25
mean(replicate(10000, pocl_test(pocl_simulate(50, a=alph, b=c(0, log(0.25))))))
```

]]>As part of my course on statistical learning, we created 3D graphics to foster a more intuitive understanding of the various methods that are used to relax the assumption of linearity (in the predictors) in regression and classification methods.

The authors of our text (The Elements of Statistical Learning, 2nd Edition) provide a Mixture Simulation data set that has two continuous predictors and a binary outcome. This data is used to demonstrate classification procedures by plotting classification boundaries in the two predictors, which are determined by one or more surfaces (e.g., a probability surface such as that produced by logistic regression, or multiple intersecting surfaces as in linear discriminant analysis). In our class laboratory, we used the R package `rgl` to create a 3D representation of these surfaces for a variety of semiparametric classification procedures.

Chapter 6 presents local logistic regression and kernel density classification, among other kernel (local) classification and regression methods. Below is the code and graphic (a 2D projection) associated with the local linear logistic regression in these data:

library(rgl) load(url("http://statweb.stanford.edu/~tibs/ElemStatLearn/datasets/ESL.mixture.rda")) dat <- ESL.mixture ddat <- data.frame(y=dat$y, x1=dat$x[,1], x2=dat$x[,2]) ## create 3D graphic, rotate to view 2D x1/x2 projection par3d(FOV=1,userMatrix=diag(4)) plot3d(dat$xnew[,1], dat$xnew[,2], dat$prob, type="n", xlab="x1", ylab="x2", zlab="", axes=FALSE, box=TRUE, aspect=1) ## plot points and bounding box x1r <- range(dat$px1) x2r <- range(dat$px2) pts <- plot3d(dat$x[,1], dat$x[,2], 1, type="p", radius=0.5, add=TRUE, col=ifelse(dat$y, "orange", "blue")) lns <- lines3d(x1r[c(1,2,2,1,1)], x2r[c(1,1,2,2,1)], 1) ## draw Bayes (True) classification boundary in blue dat$probm <- with(dat, matrix(prob, length(px1), length(px2))) dat$cls <- with(dat, contourLines(px1, px2, probm, levels=0.5)) pls0 <- lapply(dat$cls, function(p) lines3d(p$x, p$y, z=1, color="blue")) ## compute probabilities plot classification boundary ## associated with local linear logistic regression probs.loc <- apply(dat$xnew, 1, function(x0) { ## smoothing parameter l <- 1/2 ## compute (Gaussian) kernel weights d <- colSums((rbind(ddat$x1, ddat$x2) - x0)^2) k <- exp(-d/2/l^2) ## local fit at x0 fit <- suppressWarnings(glm(y ~ x1 + x2, data=ddat, weights=k, family=binomial(link="logit"))) ## predict at x0 as.numeric(predict(fit, type="response", newdata=as.data.frame(t(x0)))) }) dat$probm.loc <- with(dat, matrix(probs.loc, length(px1), length(px2))) dat$cls.loc <- with(dat, contourLines(px1, px2, probm.loc, levels=0.5)) pls <- lapply(dat$cls.loc, function(p) lines3d(p$x, p$y, z=1)) ## plot probability surface and decision plane sfc <- surface3d(dat$px1, dat$px2, probs.loc, alpha=1.0, color="gray", specular="gray") qds <- quads3d(x1r[c(1,2,2,1)], x2r[c(1,1,2,2)], 0.5, alpha=0.4, color="gray", lit=FALSE)

In the above graphic, the solid blue line represents the true Bayes decision boundary (i.e., {x: Pr("orange"|x) = 0.5}), which is computed from the model used to simulate these data. The probability surface (generated by the local logistic regression) is represented in gray, and the corresponding Bayes decision boundary occurs where the plane f(x) = 0.5 (in light gray) intersects with the probability surface. The solid black line is a projection of this intersection. Here is a link to the interactive version of this graphic: local logistic regression.

Below is the code and graphic associated with the kernel density classification (note: this code below should only be executed after the above code, since the 3D graphic is modified, rather than created anew):

## clear the surface, decision plane, and decision boundary pop3d(id=sfc); pop3d(id=qds) for(pl in pls) pop3d(id=pl) ## kernel density classification ## compute kernel density estimates for each class dens.kde <- lapply(unique(ddat$y), function(uy) { apply(dat$xnew, 1, function(x0) { ## subset to current class dsub <- subset(ddat, y==uy) ## smoothing parameter l <- 1/2 ## kernel density estimate at x0 mean(dnorm(dsub$x1-x0[1], 0, l)*dnorm(dsub$x2-x0[2], 0, l)) }) }) ## compute prior for each class (sample proportion) prir.kde <- table(ddat$y)/length(dat$y) ## compute posterior probability Pr(y=1|x) probs.kde <- prir.kde[2]*dens.kde[[2]]/ (prir.kde[1]*dens.kde[[1]]+prir.kde[2]*dens.kde[[2]]) ## plot classification boundary associated ## with kernel density classification dat$probm.kde <- with(dat, matrix(probs.kde, length(px1), length(px2))) dat$cls.kde <- with(dat, contourLines(px1, px2, probm.kde, levels=0.5)) pls <- lapply(dat$cls.kde, function(p) lines3d(p$x, p$y, z=1)) ## plot probability surface and decision plane sfc <- surface3d(dat$px1, dat$px2, probs.kde, alpha=1.0, color="gray", specular="gray") qds <- quads3d(x1r[c(1,2,2,1)], x2r[c(1,1,2,2)], 0.5, alpha=0.4, color="gray", lit=FALSE)

Here are links to the interactive versions of both graphics: local logistic regression, kernel density classification

]]>