<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>BioStatMatt</title>
	<atom:link href="http://biostatmatt.com/feed" rel="self" type="application/rss+xml" />
	<link>http://biostatmatt.com</link>
	<description></description>
	<lastBuildDate>Fri, 17 May 2013 15:26:28 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Happy 2nd Birthday Luke!</title>
		<link>http://biostatmatt.com/archives/2483</link>
		<comments>http://biostatmatt.com/archives/2483#comments</comments>
		<pubDate>Tue, 23 Apr 2013 00:53:01 +0000</pubDate>
		<dc:creator>BioStatMatt</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[birthday]]></category>
		<category><![CDATA[Luke]]></category>

		<guid isPermaLink="false">http://biostatmatt.com/?p=2483</guid>
		<description><![CDATA[]]></description>
				<content:encoded><![CDATA[<p><a href="http://biostatmatt.com/uploads/LukeBDay2013.jpg"><img src="http://biostatmatt.com/uploads/LukeBDay2013-346x625.jpg" alt="LukeBDay2013" width="346" height="625" class="aligncenter size-large wp-image-2484" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://biostatmatt.com/archives/2483/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Individual vs. Group Incentive for Weight Loss</title>
		<link>http://biostatmatt.com/archives/2466</link>
		<comments>http://biostatmatt.com/archives/2466#comments</comments>
		<pubDate>Thu, 11 Apr 2013 13:45:32 +0000</pubDate>
		<dc:creator>BioStatMatt</dc:creator>
				<category><![CDATA[Criticism]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[article]]></category>
		<category><![CDATA[imputation]]></category>
		<category><![CDATA[journal]]></category>
		<category><![CDATA[SAS]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://biostatmatt.com/?p=2466</guid>
		<description><![CDATA[A new Annals of Internal Medicine article describes a study that compares two employer-sponsored financial incentive programs for promoting weight loss among obese employees. I first read about the article at the Pacific Standard. The study design is a randomized controlled prospective trial. The two programs are as follows: Program 1. Obese employees are given [...]]]></description>
				<content:encoded><![CDATA[<p>A new <a href="http://annals.org/article.aspx?articleid=1671710">Annals of Internal Medicine article</a> describes a study that compares two employer-sponsored financial incentive programs for promoting weight loss among obese employees. I first read about the article at the <a href="http://www.psmag.com/health/the-weight-loss-incentive-that-works-better-than-cash-54885">Pacific Standard</a>. The study design is a randomized controlled prospective trial. The two programs are as follows:</p>
<ol>
<li>Program 1. Obese employees are given a monthly weight loss goal. If the goal is reached, the participant receives $100, otherwise the employer keeps the money. This is called the <em>individual incentive</em>.</li>
<li>Program 2. Obese employees are organized into groups of five, and each participant is given a montly weight loss goal. A sum of $500 dollars is evenly split among those participants who achieve their monthly weight loss goal. In the event that no participant achieves their montly goal, the employer keeps the incentive money. This is called the <em>group incentive</em>.</li>
</ol>
<p>The researchers found that the group incentive was associated with greater average weight loss than the individual incentive. This result is especially interesting from a psychological perspective, but I was most drawn to the issue of cost. I found it odd that the authors focused on the fact that "both designs used the same up-front allocation of resources". Presumably, this is to argue that the second program was more effective at no additional up-front cost. For example, the authors write: "Similar to that in the individual-incentive group, the up-front allocation of incentives for meeting weight-loss goals was $100 per participant per month (totaling $21 000)." But, the authors later write that, over a 24 week period: "Mean earnings were $514.70 (SD, $522.60) in the group-incentive group and $128.60 (SD, $165.50) in the individual-incentive group (mean between-group difference, $386.10 [CI, $201.00 to $571.30]; P < 0.001)." Hence, it's clear that the second program is more expensive, as one might expect. It's also a little odd that the study consisted mostly of women (89%). The allocation of race/ethnicity was also somewhat imbalanced.</p>
<p>I like that the authors used confidence intervals throughout to summarize the differences in average weight loss (and incentive earnings) between groups. They also used p-values, but I think this was unnecessary. The authors used multiple imputation for missing weights at 24 and 36 weeks. I've always had trouble accepting multiple imputation of outcomes, because the imputation depends so heavily on the method and model used for imputation. In the appendix, the authors write that weight was imputed "adjusting for incentive group, age, sex, race, education, household income, baseline weight, importance of controlling weight, and confidence in controlling weight". No additional details are given about the model, although the software used to implement the method is listed (SAS PROC MI and MIANALYZE). Finally, I felt this senctence was incomplete: "To maintain the type I error rate while testing the 3 hypotheses of primary interest, we used a Bonferroni correction to define an α of 0.0167 as our threshold for statistical significance." The authors neglected that this approach attempts to control the <em>familywise</em> type I error rate. This is an important omission.</p>
]]></content:encoded>
			<wfw:commentRss>http://biostatmatt.com/archives/2466/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hurray! An R Connections API!</title>
		<link>http://biostatmatt.com/archives/2448</link>
		<comments>http://biostatmatt.com/archives/2448#comments</comments>
		<pubDate>Wed, 03 Apr 2013 13:00:34 +0000</pubDate>
		<dc:creator>BioStatMatt</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://biostatmatt.com/?p=2448</guid>
		<description><![CDATA[I waited until April 3 to post this, so it wouldn't be taken as an April Fool's joke! A recent R News item announces that we now have a bona fide mechanism to create custom connections in R! This makes it possible to implement a custom connection in an R package. Until now, the only [...]]]></description>
				<content:encoded><![CDATA[<p>I waited until April 3 to post this, so it wouldn't be taken as an April Fool's joke! A recent <a href="http://developer.r-project.org/blosxom.cgi/R-devel/NEWS/2013/03/24#n2013-03-24">R News item</a> announces that we now have a <i>bona fide</i> mechanism to create custom connections in R! This makes it possible to implement a custom connection in an R package. Until now, the only feasible alternative was to write a patch (but see also this <a href="http://biostatmatt.com/archives/280">trick</a>), as I had done to implement a <a href="http://biostatmatt.com/archives/564"><tt>tty</tt> connection</a>! Of course, this was a suboptimal solution that required an update for each new version of R. In fact, my last update was for <a href="http://biostatmatt.com/archives/1519">R v2.13.1</a>. This new development will make it much easier to share this code. I will also need to update my first draft of <a href="http://biostatmatt.com/R/R-conn-ints.pdf"><i>R Connection Internals</i></a>, which is now three years old. Perhaps the document is better suited as a package vignette. More to follow later...</p>
<p>So that we know who to thank:</p>
<pre>
matt@deb6box$ svn log -r 62016 http://svn.r-project.org/R/trunk/src/include/R_ext/Connections.h
------------------------------------------------------------------------
r62016 | urbaneks | 2013-02-21 14:29:44 -0500 (Thu, 21 Feb 2013) | 1 line

add API to create custom connections
------------------------------------------------------------------------
</pre>
<p>Here is the header file itself, with credits. It looks like all of the <tt>Rconnection</tt> struct is made available. But, notice the warning!</p>
<pre style="font-size:10px;">
matt@deb6box$ svn blame -r 62016 http://svn.r-project.org/R/trunk/src/include/R_ext/Connections.h
 11656     ripley /*
 11656     ripley  *  R : A Computer Language for Statistical Data Analysis
 62016   urbaneks  *  Copyright (C) 2000-2013   The R Core Team.
 11656     ripley  *
 11656     ripley  *  This program is free software; you can redistribute it and/or modify
 11656     ripley  *  it under the terms of the GNU General Public License as published by
 11656     ripley  *  the Free Software Foundation; either version 2 of the License, or
 11656     ripley  *  (at your option) any later version.
 11656     ripley  *
 11656     ripley  *  This program is distributed in the hope that it will be useful,
 11656     ripley  *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 11656     ripley  *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 11656     ripley  *  GNU General Public License for more details.
 11656     ripley  *
 11656     ripley  *  You should have received a copy of the GNU General Public License
 42308     ripley  *  along with this program; if not, a copy is available at
 42308     ripley  *  http://www.r-project.org/Licenses/
 11656     ripley  */
 11656     ripley 
 62016   urbaneks #ifndef R_EXT_CONNECTIONS_H_
 62016   urbaneks #define R_EXT_CONNECTIONS_H_
 62016   urbaneks 
 11668     ripley #include <R_ext/Boolean.h>
 11668     ripley 
 62016   urbaneks #ifndef NO_C_HEADERS
 62016   urbaneks # include <stddef.h> /* for size_t */
 62016   urbaneks # include <stdarg.h> /* for va_list */
 42677   urbaneks #endif
 42677   urbaneks 
 62016   urbaneks /* IMPORTANT: we do not expect future connection APIs to be
 62016   urbaneks    backward-compatible so if you use this, you *must* check the version
 62016   urbaneks    and proceed only if it matches what you expect
 62016   urbaneks 
 62016   urbaneks    We explicitly reserve the right to change the connection
 62016   urbaneks    implementation without a compatibility layer.
 62016   urbaneks  */
 62016   urbaneks #define R_CONNECTIONS_VERSION 1
 62016   urbaneks 
 45984     ripley /* this allows the opaque pointer definition to be made available 
 44013     ripley    in Rinternals.h */
 16472       luke #ifndef HAVE_RCONNECTION_TYPEDEF
 16472       luke typedef struct Rconn  *Rconnection;
 16472       luke #endif
 16472       luke struct Rconn {
 11656     ripley     char* class;
 11656     ripley     char* description;
 44013     ripley     int enc; /* the encoding of 'description' */
 11656     ripley     char mode[5];
 23228     ripley     Rboolean text, isopen, incomplete, canread, canwrite, canseek, blocking, 
 23228     ripley 	isGzcon;
 18583     ripley     Rboolean (*open)(struct Rconn *);
 11656     ripley     void (*close)(struct Rconn *); /* routine closing after auto open */
 11656     ripley     void (*destroy)(struct Rconn *); /* when closing connection */
 11656     ripley     int (*vfprintf)(struct Rconn *, const char *, va_list);
 11656     ripley     int (*fgetc)(struct Rconn *);
 32497     ripley     int (*fgetc_internal)(struct Rconn *);
 31166     ripley     double (*seek)(struct Rconn *, double, int, int);
 13305     ripley     void (*truncate)(struct Rconn *);
 11656     ripley     int (*fflush)(struct Rconn *);
 11656     ripley     size_t (*read)(void *, size_t, size_t, struct Rconn *);
 11656     ripley     size_t (*write)(const void *, size_t, size_t, struct Rconn *);
 59167     ripley     int nPushBack, posPushBack; /* number of lines, position on top line */
 11656     ripley     char **PushBack;
 12256         pd     int save, save2;
 32492     ripley     char encname[101];
 32492     ripley     /* will be iconv_t, which is a pointer. NULL if not in use */
 32492     ripley     void *inconv, *outconv;
 32497     ripley     /* The idea here is that no MBCS char will ever not fit */
 32497     ripley     char iconvbuff[25], oconvbuff[50], *next, init_out[25];
 32497     ripley     short navail, inavail;
 32497     ripley     Rboolean EOF_signalled;
 44101     ripley     Rboolean UTF8out;
 41765     ripley     void *id;
 41765     ripley     void *ex_ptr;
 11656     ripley     void *private;
 61527     ripley     int status; /* for pipes etc */
 16472       luke };
 11656     ripley 
 62016   urbaneks #ifdef  __cplusplus
 62016   urbaneks extern "C" {
 62016   urbaneks #endif
 11656     ripley 
 62016   urbaneks SEXP   R_new_custom_connection(const char *description, const char *mode, const char *class_name, Rconnection *ptr);
 62016   urbaneks size_t R_ReadConnection(Rconnection con, void *buf, size_t n);
 62016   urbaneks size_t R_WriteConnection(Rconnection con, void *buf, size_t n);
 13366     ripley 
 62016   urbaneks #ifdef  __cplusplus
 62016   urbaneks }
 25961     ripley #endif
 25961     ripley 
 62016   urbaneks #endif
</pre>
]]></content:encoded>
			<wfw:commentRss>http://biostatmatt.com/archives/2448/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Identify 2/3 vacation photos, win $5 Starbucks gift card</title>
		<link>http://biostatmatt.com/archives/2424</link>
		<comments>http://biostatmatt.com/archives/2424#comments</comments>
		<pubDate>Fri, 29 Mar 2013 23:00:34 +0000</pubDate>
		<dc:creator>BioStatMatt</dc:creator>
				<category><![CDATA[Recreation]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[pictures]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[vacation]]></category>

		<guid isPermaLink="false">http://biostatmatt.com/?p=2424</guid>
		<description><![CDATA[I have a $5 Starbucks gift card that I simply can't remember to use; I've had it since Christmas. So, I will mail it (within the U.S.A.) to the first commenter who can identify the longitude/latitude (within 5km) where two of these three vacation photos were taken (photos below). ***Update 3/30/2013: locations A and C [...]]]></description>
				<content:encoded><![CDATA[<p>I have a $5 Starbucks gift card that I simply can't remember to use; I've had it since Christmas. So, I will mail it (within the U.S.A.) to the first commenter who can identify the longitude/latitude (within 5km) where two of these three vacation photos were taken (photos below). <b>***Update 3/30/2013: locations A and C have been identified***</b>. Each location must be identified by longitude/latitude; names don't count. Here are some short instructions for obtaining longitude/latitude values from Google Maps: <a href="http://www.tech-recipes.com/rx/2403/google_maps_get_latitude_longitude_values/" target="_blank">1</a> <a href="http://www.tech-recipes.com/rx/5519/the-easy-way-to-find-latitude-and-longitude-values-in-google-maps/" target="_blank">2</a>. I know this is a stretch for R-bloggers, but I will be checking longitude/latitude entries using the following <tt>R</tt> code:</p>
<pre style="font-size:10px">
# convert degree, minute, second to decimal
dms2dec &lt;- function(deg, min, sec) deg + (min + sec/60)/60
 
# convert degrees to radians
d2r &lt;- function(d) d * pi / 180

# compute great circle distance (km) between two coordinates
# given in radians using the Haversine formula
hgcd &lt;- function(lon1, lat1, lon2, lat2) {
    erth &lt;- 6371 # average radius of earth (km)
    dlon &lt;- (lon2 - lon1)
    dlat &lt;- (lat2 - lat1)
    a &lt;- sin(dlat/2)^2 + cos(lat1) * cos(lat2) * sin(dlon/2)^2
    c &lt;- 2 * asin(min(1,sqrt(a)))
    erth * c
}
</pre>
<div id="attachment_2435" class="wp-caption alignleft" style="width: 310px"><a href="http://biostatmatt.com/uploads/locationC.jpg"><img src="http://biostatmatt.com/uploads/locationC-300x225.jpg" alt="Location C (click to enlarge)" width="300" height="225" class="size-medium wp-image-2435" /></a><p class="wp-caption-text">Location C (click to enlarge)</p></div>
<div id="attachment_2436" class="wp-caption alignleft" style="width: 310px"><a href="http://biostatmatt.com/uploads/locationB.jpg"><img src="http://biostatmatt.com/uploads/locationB-300x225.jpg" alt="Location B (click to enlarge)" width="300" height="225" class="size-medium wp-image-2436" /></a><p class="wp-caption-text">Location B (click to enlarge)</p></div>
<div id="attachment_2437" class="wp-caption alignleft" style="width: 310px"><a href="http://biostatmatt.com/uploads/locationA.jpg"><img src="http://biostatmatt.com/uploads/locationA-300x225.jpg" alt="Location A (click to enlarge)" width="300" height="225" class="size-medium wp-image-2437" /></a><p class="wp-caption-text">Location A (click to enlarge)</p></div>
]]></content:encoded>
			<wfw:commentRss>http://biostatmatt.com/archives/2424/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Generalized Pairs Plot: It&#039;s about time!</title>
		<link>http://biostatmatt.com/archives/2398</link>
		<comments>http://biostatmatt.com/archives/2398#comments</comments>
		<pubDate>Thu, 28 Mar 2013 17:46:25 +0000</pubDate>
		<dc:creator>BioStatMatt</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://biostatmatt.com/?p=2398</guid>
		<description><![CDATA[JW Emerson, WA Green, B Schloerke, J Crowley, D Cook, H Hofmann, H Wickham (2013) The Generalized Pairs Plot. Journal of Computational and Graphical Statistics 22(1). Here's a free preprint version. Until this new paper and implementation by Emerson et al., there were no widely available pairs plots that accommodated both numerical and categorical fields. [...]]]></description>
				<content:encoded><![CDATA[<p>JW Emerson, WA Green, B Schloerke, J Crowley, D Cook, H Hofmann, H Wickham (2013) <a href="http://tandfonline.com/doi/abs/10.1080/10618600.2012.694762">The Generalized Pairs Plot</a>. <a href="http://www.tandfonline.com/toc/ucgs20/22/1">Journal of Computational and Graphical Statistics</a> 22(1). Here's a <a href="http://www.bricol.net/downloads/preprints,MSs,etc./11-7MSforJCGS.pdf">free preprint</a> version.</p>
<p>Until this new paper and implementation by Emerson et al., there were no widely available pairs plots that accommodated both numerical and categorical fields. <b>***Update 3/29/2013: <tt>ggpairs</tt> in the GGally package has been around since 2010***</b>. A browse through the <a href="http://gallery.r-enthusiasts.com/thumbs.php">R Graph Gallery</a> confirms this (as of 1/30/2013). See here too: a post on the <a href="http://www.statmethods.net/graphs/scatterplot.html">Quick-R blog</a>. I had been working on such a plot when I discovered the above article. Hence, I'm using this post to share my work, which I will probably abandon in favor of the above.</p>
<p>Any number of statistical graphics might be used instead of a scatterplot for numeric/numeric pairs; maybe a <a href="http://gallery.r-enthusiasts.com/graph/hexbins,111">hexbin plot</a>. A <a href="http://gallery.r-enthusiasts.com/graph/Extended_Sieve_Plots,119">sieve plot</a> or an <a href="http://gallery.r-enthusiasts.com/graph/Association_Plots,56">association plot</a> might be used as an alternative to the mosaicplot for factor/factor pairs. A <a href="http://gallery.r-enthusiasts.com/graph/Beeswarm_Boxplot,163">beeswarm boxplot</a> plot might be used in place of side-by-side boxplots for numeric/factor pairs.</p>
<p>Here was my provisional version of the generalized pairs plot, which I had called an 'association matrix plot':</p>
<pre style="font-size:8px;">
pairsdf &lt;- function(df, abbr = TRUE, abbr.len = 4) {
    par(mfrow = rep(length(df), 2))
    for (row in 1:length(df)) {
        xr &lt;- df[[row]]
        if (is.character(xr) || is.logical(xr)) 
            xr &lt;- as.factor(xr)
        if (is.factor(xr) &amp;&amp; abbr) 
            levels(xr) &lt;- abbreviate(levels(xr), 4)
        for (col in 1:length(df)) {
            xc &lt;- df[[col]]
            if (is.character(xc) || is.logical(xc)) 
                xc &lt;- as.factor(xc)
            if (inherits(xc, &quot;factor&quot;) &amp;&amp; abbr) 
                levels(xc) &lt;- abbreviate(levels(xc), 4)
            cnm &lt;- names(df)[col]
            rnm &lt;- names(df)[row]
            if (col == row) {
                plot(c(0, 1), c(0, 1), type = &quot;n&quot;, xaxt = &quot;n&quot;, 
                  yaxt = &quot;n&quot;, bty = &quot;n&quot;, xlab = &quot;&quot;, ylab = &quot;&quot;, 
                  main = &quot;&quot;)
                text(x = 0.5, y = 0.5, labels = cnm, adj = c(0.5, 
                  0.5), cex = 2)
            }
            else {
                iscf &lt;- is.factor(xc)
                iscn &lt;- is.numeric(xc)
                isrf &lt;- is.factor(xr)
                isrn &lt;- is.numeric(xr)
                if (isrf &amp;&amp; iscf) {
                  mosaicplot(table(xc, xr), xlab = cnm, ylab = rnm, 
                    main = &quot;&quot;, las = 2, color = TRUE, cex = 1.1)
                }
                else if (isrn &amp;&amp; iscn) {
                  plot(xc, xr, xlab = cnm, ylab = rnm, main = &quot;&quot;, 
                    las = 2, cex = 1.1)
                }
                else if (isrn &amp;&amp; iscf) {
                  boxplot(xr ~ xc, xlab = cnm, ylab = rnm, main = &quot;&quot;, 
                    las = 2, cex = 1.1)
                }
                else if (isrf &amp;&amp; iscn) {
                  boxplot(xc ~ factor(xr, levels = rev(levels(xr))), 
                    xlab = cnm, ylab = rnm, main = &quot;&quot;, las = 2, 
                    cex = 1.1, horizontal = TRUE)
                }
                else stop(&quot;urecognized variable type&quot;)
            }
        }
    }
}
</pre>
<p>Below are several association matrix plots generated by the above function (i.e., <tt>pairsdf</tt>) for data sets found in the <tt>MASS</tt> package. When there are many fields, I recommend using three to four square inches per plot.</p>
<p>It's easy to see that the <tt>coop</tt> data set describes a simple factorial experiment.<br />
<img style="display:block;margin-left:auto;margin-right:auto;" src="http://biostatmatt.com/uploads/coop.png" /><br/><br />
However, the <tt>Rabbit</tt> data clearly arose from a more complicated experiment.<br />
<img style="display:block;margin-left:auto;margin-right:auto;" src="http://biostatmatt.com/uploads/Rabbit.png" /><br/><br />
The fields of the <tt>farms</tt> data set are all of the <tt>factor</tt> class.<br />
<img style="display:block;margin-left:auto;margin-right:auto;" src="http://biostatmatt.com/uploads/farms.png" /></p>
]]></content:encoded>
			<wfw:commentRss>http://biostatmatt.com/archives/2398/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Simulated Power/Precision Analysis</title>
		<link>http://biostatmatt.com/archives/2315</link>
		<comments>http://biostatmatt.com/archives/2315#comments</comments>
		<pubDate>Fri, 22 Feb 2013 04:50:02 +0000</pubDate>
		<dc:creator>BioStatMatt</dc:creator>
				<category><![CDATA[Technical]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[simulation]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://biostatmatt.com/?p=2315</guid>
		<description><![CDATA[I cringe when I see research proposals that describe a sophisticated statistical approach, yet do not evaluate this approach in their power/precision/sample size planning. It's often the case that a simplified version of the proposed statistical approach is used instead. Presumably, this is due to the limited availability of power/precision/sample size planning software for sophisticated [...]]]></description>
				<content:encoded><![CDATA[<p>I cringe when I see research proposals that describe a sophisticated statistical approach, yet do not evaluate this approach in their power/precision/sample size planning. It's often the case that a simplified version of the proposed statistical approach is used instead. Presumably, this is due to the limited availability of power/precision/sample size planning software for sophisticated statistical analyses. </p>
<p>In my own planning, I have defaulted to implementing power/precision analyses with Monte Carlo methods (i.e., simulation). I refer to the approach as "simulated power/precision analysis", but I concede that this may not be the best name. Indeed, there may be a more established name that is unknown to me. This approach initially requires more effort than using one of the many power/precision software packages. However, it's almost always more relevant to the proposed research. With practice, the simulation approach has become second nature, and I use it for complex and simple statistical strategies alike.</p>
<p>Below is an example of the simulation approach to compute the power of a test in a simple crossover design. Whenever a simulated power analysis is implemented, it's necessary to specify (1) how the data will arise, and (2) what statistical procedure will be applied. Note that there is no requirement that the statistical procedure should "match" the data generating mechanism. Rather, it's important that (1) is an accurate reflection of prior belief, and (2) is an accurate representation of the proposed statistical procedure. When (1) and (2) do match, as they do in this example, I am sometimes concerned that the resulting computations are optimistic. </p>
<p>In this example, <span class='MathJax_Preview'><img src='http://biostatmatt.com/wp-content/plugins/latex/cache/tex_7b8b965ad4bca0e41ab51de7b31363a1.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="n" /></span><script type='math/tex'>n</script> patients will be recruited and given each of two treatments, where the order of treatments is randomized in a block fashion, so that the design is balanced in this regard. We assume that the data arise from a linear mixed effects model, where there is a random intercept for each patient, a treatment effect, an order effect, and a treatment-order interaction effect. The magnitude of each effect is specified, but may be zero. The statistical procedure is to fit a linear mixed effects model, compute a <span class='MathJax_Preview'><img src='http://biostatmatt.com/wp-content/plugins/latex/cache/tex_d16a3a23f740269a5de9f88e9e4751d8.gif' style='vertical-align: middle; border: none; ' class='tex' alt="(1-\alpha)\%" /></span><script type='math/tex'>(1-\alpha)\%</script> confidence interval for the magnitude of the treatment effect, and finally to make an inference about its significance. We conclude that the treatment effect is (level <span class='MathJax_Preview'><img src='http://biostatmatt.com/wp-content/plugins/latex/cache/tex_7b6fbc0ca8bfb6a1f7b8622e81ee5ac3.gif' style='vertical-align: middle; border: none; ' class='tex' alt="(1-\alpha)" /></span><script type='math/tex'>(1-\alpha)</script>) significant when the associated confidence interval fails to include the value zero:</p>
<pre>
# Simulate a crossover design with the formula:
# Response ~ 1 + Treatment + Order + Treatment:Order + (1 | Patient)
# Fit simulated data with linear mixed effects model. Make
# significance decision about treatment effect on the basis
# of 95% confidence interval (i.e., significant if 95% CI fails
# to include zero).

# n - number of patients in each order group
# sdW - within patient standard deviation
# sdB - between patient standard deviation
# beta - coefficient vector c(Intercept, Treatment, Order, Treatment:Order)
simulate &lt;- function(n, sdW=4, sdB=1, beta=c(8, 4, 0, 0), alpha=0.05) {
    require("lme4")
    Patient   &lt;- as.factor(rep(1:(2*n), rep(2, 2*n)))
    Treatment &lt;- c(rep(c("Treatment1", "Treatment2"), n),
                   rep(c("Treatment2", "Treatment1"), n))
    Order     &lt;- rep(c("First", "Second"), 2*n)
    Data      &lt;- data.frame(Patient, Treatment, Order)
    CMat      &lt;- model.matrix(~ Treatment * Order + Patient, data=Data)
    Response  &lt;- CMat %*% c(beta, rnorm(2*n-1, 0, sdB)) + rnorm(4*n, 0, sdW)
    Data$Response <- Response
    Fit &lt;- lmer(Response ~ (1 | Patient) + Treatment * Order, data=Data)
    Est &lt;- fixef(Fit)[2]
    Ste &lt;- sqrt(vcov(Fit)[2,2])
    prod(Est + c(-1,1) * qnorm(1-alpha/2) * Ste) &gt; 0
}

# type I error for n=20 (result: 0.059)
#mean(replicate(1000, simulate(n=20, beta=c(8, 0, 0, 0))))

# type I error for n=50 (result: 0.057)
#mean(replicate(1000, simulate(n=50, beta=c(8, 0, 0, 0))))

# type I error for n=20 and order effect 2 (result: 0.062)
#mean(replicate(1000, simulate(n=20, beta=c(8, 0, 2, 0))))

# type I error for n=50 and order effect 2 (result: 0.05)
#mean(replicate(1000, simulate(n=50, beta=c(8, 0, 2, 0))))

# power for n=20 and treatment effect 4 (result: 0.869)
#mean(replicate(1000, simulate(n=20, beta=c(8, 4, 0, 0))))

# power for n=50 and treatment effect 4 (result: 0.997)
#mean(replicate(1000, simulate(n=50, beta=c(8, 4, 0, 0))))
</pre>
<p>Several scenarios are considered, including some checks on the type I error associated with the proposed procedure, and its power under three hypothetical data generating mechanisms. ***update 2012/02/23: commenter Paul rightly points out below that 1000 replications is insufficient for the implied precision of three decimal places!*** It's quite late as I'm writing this, and so I will end the discussion here. Indeed, I am trying to shorten my posts in an effort to make them more frequent! Please do comment if I've left out an important detail!</p>
]]></content:encoded>
			<wfw:commentRss>http://biostatmatt.com/archives/2315/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>My fiscal cliff letter to congress</title>
		<link>http://biostatmatt.com/archives/2294</link>
		<comments>http://biostatmatt.com/archives/2294#comments</comments>
		<pubDate>Thu, 06 Dec 2012 14:32:21 +0000</pubDate>
		<dc:creator>BioStatMatt</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[fiscal cliff]]></category>
		<category><![CDATA[government]]></category>
		<category><![CDATA[NIH]]></category>
		<category><![CDATA[NSF]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://biostatmatt.com/?p=2294</guid>
		<description><![CDATA[The ASA recently sent out an email asking its members to contact their representatives in congress to urge them to avoid the 8.2% cuts to NIH, NSF, and federal statistical agencies. I had been meaning to do this, but felt that the ASA letter template was too long. Here is the edited version that I [...]]]></description>
				<content:encoded><![CDATA[<p>The <abbr title="American Statistical Association">ASA</abbr> recently sent out an email asking its members to contact their representatives in congress to urge them to avoid the 8.2% cuts to NIH, NSF, and federal statistical agencies. I had been meaning to do this, but felt that the ASA letter template was too long. Here is the edited version that I sent to <a href="http://blackburn.house.gov/">Rep. Blackburn</a>, <a href="http://www.alexander.senate.gov/">Sen. Alexander</a>, and <a href="http://www.corker.senate.gov/">Sen. Corker</a>:</p>
<blockquote><p>
I am writing to urge you to help avoid the pending 8.2% budget cuts to the NIH, NSF, and federal statistical agencies. I feel that these cuts would be very harmful to the U.S. research infrastructure. These cuts are certain to cause talented young scientists to be driven from research by the disruption to their training and lack of jobs. Funding for research is not the cause of the nation's debt, and slashing research budgets will compromise our future. In closing, I respectfully request that you work with your colleagues to stop these pending budget cuts from taking effect in January.
</p></blockquote>
<p>Here is the original message:</p>
<blockquote><p>
I am writing to urge you to work with your fellow members of Congress and the president to ensure the pending across-the-board budget cuts do not take effect. The 8.2% budget cuts to the National Institutes of Health (NIH), the National Science Foundation (NSF), and the federal statistical agencies would be very harmful to the U.S. research and statistical data infrastructure.</p>
<p>The cuts to NSF and NIH would mean fewer grants at a time when a high proportion of highly rated proposals already go unfunded. This will affect all areas of research and prevent critical projects from being completed. Labs may be forced to close, resulting in layoffs of tens of thousands of researchers. It will take generations to recover the lost talent, as highly trained and dedicated young scientists and engineers will be driven from science by the disruption to their training and lack of jobs. The damage to our nation's health, security, and international competitiveness will be devastating.</p>
<p>The cuts to the statistical agencies could affect our decision making and ultimately cost the taxpayer more money. Data from the federal statistical agencies facilitate i) economic growth and development, ii) smart and efficient government, and iii) the saving of taxpayer money. As an example of the third point, extensive research, testing, and planning are under way now for the 2020 Decennial Census. The GAO has said that, unless major design changes are made, the 2020 Decennial Census could cost the American taxpayer $17 billion more than the 2010 Census. Reducing the U.S. Census Bureau budget could therefore undermine the critical 2020 Decennial Census cost-cutting work now being done.</p>
<p>Funding for research and statistical data is not the cause of the nation's debt, and slashing research budgets will compromise our future. In closing, I respectfully request that you work with your colleagues to stop the pending across-the-board budget cuts from taking effect in January. Federal investment is essential to fund the kind of critical research needed to develop new treatments for debilitating and costly illnesses, foster innovation in engineering, and address the increased demand for better nutrition. We must safeguard and sustain this essential public-private partnership that keeps our nation globally competitive and promotes economic growth and job creation.
</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://biostatmatt.com/archives/2294/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Breakthroughs in the sas7bdat Reverse Engineering Effort</title>
		<link>http://biostatmatt.com/archives/2256</link>
		<comments>http://biostatmatt.com/archives/2256#comments</comments>
		<pubDate>Sat, 03 Nov 2012 13:00:55 +0000</pubDate>
		<dc:creator>BioStatMatt</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[compatibility]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[reproducible]]></category>
		<category><![CDATA[SAS]]></category>
		<category><![CDATA[sas7bdat]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://biostatmatt.com/?p=2256</guid>
		<description><![CDATA[Due largely to the work of Clint Cummins, the sas7bdat file format has become a bit less shrouded. In particular, we now know the following: how to detect files with compressed data (and fail graciously) more details about the platform that generated the file (e.g., endianess, OS details) how to read files that were generated [...]]]></description>
				<content:encoded><![CDATA[<p>Due largely to the work of Clint Cummins, the <tt>sas7bdat</tt> file format has become a bit less shrouded. In particular, we now know the following:</p>
<ul>
<li>how to detect files with compressed data (and fail graciously)</li>
<li>more details about the platform that generated the file (e.g., endianess, OS details)</li>
<li>how to read files that were generated on a 32-bit 'Linux' platform</li>
</ul>
<p>These are significant improvements. The details are documented in the 'sas7bdat' vignette, and online at the <a href="https://github.com/biostatmatt/sas7bdat"><tt>sas7bdat</tt></a> Github repository. The revised <tt>R</tt> package will be available on CRAN shortly, but is still EXPERIMENTAL.</p>
]]></content:encoded>
			<wfw:commentRss>http://biostatmatt.com/archives/2256/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Informative Graphics on Taxes and the Economy</title>
		<link>http://biostatmatt.com/archives/2258</link>
		<comments>http://biostatmatt.com/archives/2258#comments</comments>
		<pubDate>Fri, 02 Nov 2012 17:10:41 +0000</pubDate>
		<dc:creator>BioStatMatt</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[economy]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[tax]]></category>

		<guid isPermaLink="false">http://biostatmatt.com/?p=2258</guid>
		<description><![CDATA[The nonpartisan Congressional Research Service0 made news1 when a report the service had prepared was withdrawn due to pressure from Republican leaders. The report - titled Taxes and the Economy: An Economic Analysis of the Top Tax Rates Since 19452,3 - addressed some of the evidence for the association between tax rates and economic growth, [...]]]></description>
				<content:encoded><![CDATA[<p>The nonpartisan Congressional Research Service<sup><a href="http://www.loc.gov/crsinfo/">0</a></sup> made news<sup><a href="http://www.nytimes.com/2012/11/02/business/questions-raised-on-withdrawal-of-congressional-research-services-report-on-tax-rates.html">1</a></sup> when a report the service had prepared was withdrawn due to pressure from Republican leaders. </p>
<p>The report - titled <em>Taxes and the Economy: An Economic Analysis of the Top Tax Rates Since 1945</em><sup><a href="http://graphics8.nytimes.com/news/business/0915taxesandeconomy.pdf">2</a>,<a href='http://biostatmatt.com/uploads/0915taxesandeconomy.pdf'>3</a></sup> - addressed some of the evidence for the association between tax rates and economic growth, private savings, and investment. No statistically significant associations were found between tax rate (either the top marginal tax rate or the capital gains tax rate) and the degree of personal savings, personal investment, labor productivity, or per capita GDP growth. Hence, the report authors conclude that there is insufficient evidence to suggest that changes in tax rates affect these economic indicators.</p>
<p>But, while these authors did not find statistical significance, I found the associated graphics tell an interesting story:</p>
<h3>Tax Rate vs. Personal Savings and Investment</h3>
<p><a href="http://biostatmatt.com/uploads/TaxSavingsInvestment.svg"><img src="http://biostatmatt.com/uploads/TaxSavingsInvestment.svg" alt="" title="TaxSavingsInvestment" class="aligncenter size-full wp-image-2268" /></a></p>
<h3>Tax Rate vs. Per Capita GDP Growth</h3>
<p><a href="http://biostatmatt.com/uploads/TaxPerCapitaGDPGrowth.svg"><img src="http://biostatmatt.com/uploads/TaxPerCapitaGDPGrowth.svg" alt="" title="TaxPerCapitaGDPGrowth" class="aligncenter size-full wp-image-2269" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://biostatmatt.com/archives/2258/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>My Picture in the Vanderbilt Reporter</title>
		<link>http://biostatmatt.com/archives/2217</link>
		<comments>http://biostatmatt.com/archives/2217#comments</comments>
		<pubDate>Thu, 11 Oct 2012 16:43:24 +0000</pubDate>
		<dc:creator>BioStatMatt</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[pictures]]></category>
		<category><![CDATA[The Reporter]]></category>
		<category><![CDATA[Vanderbilt]]></category>

		<guid isPermaLink="false">http://biostatmatt.com/?p=2217</guid>
		<description><![CDATA[My picture made the front page of The Reporter, the Vanderbilt University Medical Center weekly paper, for an article about our department's biostatistics clinics. By sheer luck, I was wearing a tie that day. Unfortunately, it appears that I am gesturing crudely toward Bob Johnson. Although I was not making a crude gesture, Bob appears [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://biostatmatt.com/uploads/Biostat-clinic-SU006-585x321.jpg"><img src="http://biostatmatt.com/uploads/Biostat-clinic-SU006-585x321.jpg" alt="" title="Biostat-clinic-SU006-585x321" width="585" height="321" class="aligncenter size-full wp-image-2218" /></a><br />
My picture made the front page of <em>The Reporter</em>, the Vanderbilt University Medical Center weekly paper, for an <a href="http://news.vanderbilt.edu/2012/09/biostatistics-clinics-help-investigators-hone-studies/">article about our department's biostatistics clinics</a>. By sheer luck, I was wearing a tie that day. Unfortunately, it appears that I am gesturing crudely toward Bob Johnson. Although I was not making a crude gesture, Bob appears to be making a face in reaction.</p>
<p>Alas, my name doesn't appear anywhere in the article or image caption. Also pictured (from left to right; those that I can name off the top of my head) are Nate Mercaldo, Dave Afshartous, Frank Harrell, and Sarah Fletcher.</p>
]]></content:encoded>
			<wfw:commentRss>http://biostatmatt.com/archives/2217/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
