Outlier Detection with DPM Slides from JSM 2011

Here are the 14 slides I used during my talk at the Joint Statistical Meetings 2011: shotwell-jsm-2011.pdf. I'm trying hard to minimize the text in my presentation slides. But, this usually requires that I practice more. Hence, you will know which talks I have practiced thoroughly by the amount of text in the slides 🙂 . Below are a few notes to accompany the slides (in numerical order):

This is the title slide. The work presented was with my advisor Elizabeth Slate, and was recently accepted to appear in the journal Bayesian Analysis.
Dirichlet Process Mixture (DPM). This slide presents hierarchical notation for the DPM, and illustrates how (implicit) clustering occurs among draws from DP-distributed distributions.
Product Partition Model (PPM). This is the PPM representation of the DPM in the previous slide. The partition parameter 'z' makes the data partition explicit. I think this model is easier to describe and understand than the DPM representation on the previous slide. Note that PPMs are a much larger class of models than DPMs. Only when the prior distribution over 'z' takes the form of the expression given in the slide, does the PPM represent a DPM.
Outlier Detection Using Partitioning. When we do clustering, we can think of 'small' clusters as outlying, relative to other clusters. The trick is to decide what 'small' means. The '1% of n' rule prescribes that clusters are considered small then they consist of less than or equal to 1% of the total number of observations.
Quantifying Evidence to Detect Outliers: Questions. Partition estimation, or clustering, isn't enough to make inferences about outliers. These are some key unanswered questions.
A Criterion for Outlier Detection: Setup. These are some candidate partitions. The first consists of three clusters, where clusters 2 and 3 consist of just one observations apiece. The remaining four candidate partitions are formed by merging one or both of clusters 2 and 3 with cluster 1, or with one another. The key point here, is that outlier detection may be cast as a decision between the first candidate (the 'outlier partition') and the remaining four candidate partitions.
A Criterion for Outlier Detection: The Trick. This slide illustrates how, under the decision principle of largest posterior mass (yes, yes, zero-one loss), the fixed-precision DPM imposes a lower bound on the Bayes factor favoring the outlier partition versus any partition formed by merging one or more outlier clusters. The inverse DPM precision parameter is then interpreted as the fold increase in said Bayes factor, required for each detected outlier.
A Criterion for Outlier Detection: How to Fix α. Since the inverse precision parameter forms a lower bound on a Bayes factor, it's natural to consider an established scale of evidence for Bayes factors.
A Criterion for Outlier Detection: Nice Properties. This slide is self-explanatory.
Microarray Time Series in Cell Cycle Synchronized Yeast. The grey lines in this figure represent 297 yeast RNA microarray probes, monitored over a 120 minute time-series. These probes were determined by the original author (Spellman et al., 1998) to be regulated in the yeast cell cycle, because of their periodic expression. Our goal was to identify the outlier probes (if any) in these data. That is, each grey line is a potential outlier. Though I didn't mention this in the talk, the likelihood for these data was a normal linear model, where the time covariate is transformed onto a collection of periodic and non-linear basis functions, in order to capture periodic and non-linear expression.
Microarray Time Series in Cell Cycle Synchronized Yeast. For DPM precision fixed at 1/150, this figure represents the maximum a posteriori (MAP) data partition estimate. Using the '1% of n' rule, any cluster with fewer than four observations is considered outlying. Consider, for example, the collection of partitions that might result from merging the upper rightmost cluster with one of the other clusters. By fixing the precision parameter to 1/150, we have ensured that the Bayes factor favoring the MAP partition estimate versus any such partition is at least 150. Hence, there is 'very strong' evidence that this cluster is outlying.
MAP Estimation for 'z'. We considered several existing methods, and proposed a new method that is free of posterior sampling. More details will be available in the forthcoming Bayesian Analysis article. The R package profdpm implements each method.
Outlier Detection with Finite Mixtures. This slide mentions the comparison between outlier detection with DPMs and finite mixtures in the Fraley and Raftery framework. The DPM method is a bit more conservative than the finite mixture method. Again, more details will be had in the article.
This slide is a list of references used in the presentation.