Parameter vs. Observation Dimension?

October 24, 2011TechnicalBayesian, R, statisticsBioStatMatt

*** Updated 10/27/11: Original text appended in ~~strike~~. ***

Bill Bolstad's response to Xi'an's review of his book Understanding Computational Bayesian Statistics included the following comment, which I found interesting:

Frequentist p-values are constructed in the parameter dimension using a probability distribution defined only in the observation dimension. Bayesian credible intervals are constructed in the parameter dimension using a probability distribution in the parameter dimension. I think that is more straightforward.

Classical statistics is concerned with the distribution of statistics that estimate a fixed population parameter. And, statistics are clearly constructed in the observation dimension. But, consider that certain statistics may come very close in value to a population parameter as the sample grows. In this sense, a classical procedure may not need to consider the parameter dimension.

Classical statistics is concerned with the distribution of statistics that estimate a fixed population parameter. And, statistics are clearly constructed in the observation dimension. But, consider that a statistic evaluated at the population level (i.e., computed using all members of a finite population, or the limit in an infinite sequence of observations) is also a population parameter. In this sense, there is no distinction between the observation and parameter dimensions.

It seems natural to ask: "What is the region that contains the value of quantity X with some level of confidence?". Equivalently, the "value of quantity X" may be the "value of a parameter", or the "value of a statistic in large samples or a census". The Bayesian might construct a posterior predictive density for the value of a statistic on 'new' observations.

~~It seems natural to ask: "What is the value of a statistic if computed at the population level?" Classical statistics offer an answer. The Bayesian analog is not immediately clear.~~

Perhaps 'classical statistics' should be just 'statistics', and 'Bayesian statistics', just 'parameters'. 😉

10 thoughts on “Parameter vs. Observation Dimension?”

Jared says:

October 25, 2011 at 12:53 am

There absolutely is a distinction between "the observation and parameter dimensions". Namely, you can observe observations. Asymptotically you can usually pinpoint parameters, but you can't observe them. Bayesian statistics has precisely what you seem to want in most problems with a finite number of parameters: the posterior distribution of a parameter (which, incidentally, is itself a statistic, being a function of the data and some known constants) concentrates to a point mass at the "true" value of that parameter, given some mild conditions.

You might also rethink what is natural, since I have never once been handed an infinite sample, but often have wondered what my finite, non-comprehensive sample tells me given what I already know...
Jens says:

October 25, 2011 at 1:33 am

Haha, nice tongue-in-the cheek comment!

I think I could argue that firstly, Bayesian methods can also be used for finite populations. Secondly, common frequentist methods such as t-test OLS regression are inadequate for finite populations ^^

On the other hand, as long as most people do not even distinguish parameter from statistic, your argument may be useful.
BioStatMatt says:

October 25, 2011 at 9:58 am
Jared,

For finite populations, it is possible to collect all observations, and to compute (i.e., observe) the value of population parameters. For example, the population mean is identical to sample mean, when the sample consists of all population members. Do you mean that the population mean is not 'observed' in this case?

However, I am not sure that the value of all parameters may be observed in this way, because not all parameters have clear statistics that behave as above.

I just don't see the conflict as clearly as Bolstad does. Consider a level alpha nonparametric bootstrap confidence interval for statistic Y given a sample of size n, and interpret it thus: In repeated samples of size n from population Z, the value of statistic Y, computed at the population level, lies within this interval (1-alpha)% of the time. In this context, it doesn't seem that the parameter dimension takes part.

As an example, consider this R function:
```
coverage <- function(pop=rnorm(1000), stat=function(dat, ind) mean(dat[ind]),
                     N=30, R=1000,  alpha=0.95) {
    require(boot)
    popstat <- stat(pop, 1:length(pop))
    mean(replicate(R, {
        ci <- boot.ci(boot(sample(pop, N, TRUE), stat, R), alpha, 'basic')$basic
        (ci[1,4] < popstat) && (ci[1,5] > popstat)
    }))
}
```
Jared says:

October 25, 2011 at 4:03 pm

Sure, it's less clear in finite populations. You might say then that the population mean isn't really a parameter since it's an observable. That's maybe a little pedantic, but not incorrect in the way that statisticians usually define a parameter (roughly, as indexing different models for the data). Under the usual definition there can't possibly be any confusion - A parameter and whatever statistic you use to estimate it are 100% distinct, and this isn't disputed by Bayesians or frequentists. So your objection at the quote seems to come from a misunderstanding of the technical language in play. I'm unclear on what you think a parameter is, frankly, and absent that we'll just talk in circles I'm afraid. Incidentally, your suggestion of trying to define population parameters (in the infinite population case) as the limit of statistics fails out of the gate because these limits aren't unique! In special cases they're unique *with probability one* but this is very different indeed.

I don't like Bolstad's quote incidentally. To my mind the conflict is much more clearly stated as this: Do I want to make probability statements about the parameter (conditional on data in hand), or about the behavior of a statistic under repeated sampling (based on data I've never seen)? Contrary to your original post I claim the former is much more natural when we're studying a parametric model of the world (though both are valuable). Classical/frequentist statistics doesn't let us make probability statements about parameters, period. But a Bayesian procedure allows the former and can be studied from a frequentist perspective, providing the latter as well.
1. BioStatMatt says:
  
  October 25, 2011 at 11:33 pm
  
  Jared,
  
  Some of your comments take a patronizing or pejorative tone. Please do be mindful to avoid this.
  
  Bolstad refers to the p-value, making implicit that the parameter dimension includes any population quantity or property that we might hypothesize about, including those quantities for which there is a consistent statistic, or might be computed directly in finite populations.
  
  For statistics that converge almost surely, evaluation on an infinite sequence of observations is like 'observing' a population quantity, because it's impossible (i.e., with probability zero) that the two are different. Of course, we can't actually do this. But the idea, along with the notion that parameters are observable in finite populations, sidesteps the issue of parameter dimensions and weakens Bolstad's criticism of the frequentist method. That is, we may sometimes think of frequentist methods as operating entirely within the observation dimension, even if that's not how statisticians usually think.
  
  Regarding probability statements about parameters, I agree that this notion is more natural for the statistician. But, statisticians understand probability distributions, know what parameters are (or think they do ;)), and can accept that a population quantity is random. For everyone else, I'm not certain that this is more natural than the question I presented originally.
  
  I also agree that forming inferences on the basis of a null distribution (data that are never observed) is wrong. But, as the example in my last comment demonstrates, it is possible to make inferences in the classical paradigm using only the observed data.
Jared says:

October 26, 2011 at 11:23 am

I'm not sure what you think was perjorative but I'll do my best. Some points:

1) Impossible and with probability zero are not the same thing. And you can't "evaluate on an infinite sequence", unless I'm misunderstanding what you're saying. Things don't "just work" at infinity; some seriously wonky things can happen in the limit. But I think this is not the important part of your argument - you're saying that inference can proceed only on observed data, which is obviously true. But in that context there are no parameters, since there is no model for the data, and you're considering cases that I think Bolstad intended to be outside the scope of his argument (he doesn't make that explicit, unfortunately...)

2) "But, statisticians understand probability distributions, know what parameters are (or think they do ), and can accept that a population quantity is random. For everyone else, I'm not certain that this is more natural than the question I presented originally."

Three comments: First, statisticians know exactly what a parameter is, because they've defined it! You still have not. Bolstad is working within the common mathematical formalism, in the context of which your objection doesn't make much sense. A parameter indexes a model for the data. It lives in its own space, separate from the sample space, and the two are unequivocally distinct. As you point out (correctly) this is not the only way to do inference, and such a definition doesn't square with what one might commonly call a "parameter", but this doesn't weaken Bolstad's original point when it's put in the proper context. You seem to want to argue for nonparameteric (in the model-free sense) versus parametric inference. That's certainly a reasonable stance. But then you go on to incorrectly identify parametric statistics with Bayesian statistics and nonparametric statistics with frequentist or classical statistics, which is just strange. The majority of classical or frequentist statistics has historically been parametric.

Second, most statisticians would *not* accept that a population parameter is random. Certainly no frequentist would, and most Bayesians wouldn't either. Rather, if I'm a Bayesian, since I am uncertain about its value I quantify my beliefs with a (prior) probability distribution. Then I update my beliefs as new data arrive (via Bayes rule), and my updated beliefs are reflected in a posterior distribution. Your comment reflects a fundamental misunderstanding of the (arguably dominant) mode of thought in Bayesian inference (not intended as a slight, but a simple fact. I'm quite sure that I fundamentally misunderstand plenty of things in your field, or anyone else's but my own...).

Third, if you're claiming that frequentist p-values or confidence intervals are more comprehensible than statements such as "Given what I knew about the parameter before I saw the data, and after what I've learned from the data, I am x% confident that the parameter lies in this set" then I definitely disagree (ever taught intro stats? 🙂 ). Plenty of things to argue re: frequentist p-values/CI's versus posterior probability statements, of course, but my own experience leads me to seriously doubt that the uninitiated would prefer to think in frequentist p-values.

"But, as the example in my last comment demonstrates, it is possible to make inferences in the classical paradigm using only the observed data."

Yes, this is obviously true. Frequentist nonparametric methods (such as the bootstrap) don't require a model or parameters to index them. Bolstad's quote in context refers to "historical applied statistics", which has been predominately parametric, and it seems to me that he's talking about parametric frequentist inference, (but again, he isn't explicit).

The point remains that you can't do inference on parameters without a model unless you're redefining what a parameter is. Complain about the statistician's definition all you want, but you are after all commenting on an argument between statisticians over a book about statistics.
1. BioStatMatt says:
  
  October 26, 2011 at 9:40 pm
  
  Almost sure convergence ensures that the value of a statistic does not become too wonky.
  
  Maybe Bolstad had intended to restrict the argument to parametric hypothesis tests, but this is smaller than the scope of 'frequentist p-values'.
  
  Regarding the definition of a parameter, the consensus is not as narrow as you claim. Even some Bayesians have a slightly more broad definition. Congdon (Bayesian Statistical Modeling) writes that "methods of Bayesian estimation provide a full distributional profile of a parameter (e.g. median, percentiles, mean)". Percentiles may be part of a probability model, but how often do we see models indexed by a percentile?
  
  Some intro textbooks define the term in a statement like "The constants n and p are called parameters of the binomial distribution." (from Hogg & Tanis, Probability and Statistical Inference 7e). Casella and Berger (Statistical Inference 2e) do likewise. But, neither a more precise.
  
  For a broader definition, Li (Statistical Inference) writes that "a parameter is a characteristic of a population such as the mean". Davison and Hinkley (Bootstrap Methods and their Application) write that "the sample is to be used to make inferences about a population characteristic, generally denoted \theta ... the parameter of interest". Dupont (Statistical Modeling for Biomedical Researchers 2e) writes that "in general, unknown attributes of a target population are called parameters". Myles and Hollander (Nonparametric Statistical Methods 2e) use the term often in the context of 'nonparametric' statistical methods. Clearly, statisticians do use 'parameters' in nonparametric and distribution-free contexts.
  
  "...you identify parametric statistics with Bayesian statistics and nonparametric statistics with frequentist or classical statistics...". Not at all!
  
  Regarding the Bayesian dogma, there is no misundertanding. But, it's not possible to make a probability statememt about X unless X is a random variable. How can a Bayesian then claim that a parameter is not random? This can't be reconciled by an appeal to the 'dominant mode of thought'.
  
  I made my position on p-values clear at the end of my last comment.
  
  There is no 'parameter' redefinition. The definition is simply broader, in the view of some statisticians (see above).
Jared says:

October 27, 2011 at 12:12 am

"Almost sure convergence ensures that the value of a statistic does not become too wonky."

I was thinking past the (pretty special) case of one parameter with a strongly consistent estimator, although that isn't immediately clear from what I wrote.

"Maybe Bolstad had intended to restrict the argument to parametric hypothesis tests, but this is smaller than the scope of 'frequentist p-values'."

True indeed. Like I said, he's looser than I'd like.

"Regarding the definition of a parameter, the consensus is not as narrow as you claim. Even some Bayesians have a slightly more broad definition. Congdon (Bayesian Statistical Modeling) writes that "methods of Bayesian estimation provide a full distributional profile of a parameter (e.g. median, percentiles, mean)". Percentiles may be part of a probability model, but how often do we see models indexed by a percentile?"

Implicitly, all the time. The model (again, indexed by its parameters) defines the sampling distribution of the data, which obviously defines the percentiles, mean, mode(s), or whatever else you like. That's what it means for parameters to index a model: y~P_t, where P_t is a probability distribution that depends on parameters t (and nothing else). Congdon's quote as you're interpreting it is therefore perfectly consistent with the usual definition of a parameter.

But you've completely misinterpreted his quote: "methods of Bayesian estimation provide a full distributional profile OF a parameter (e.g. median, percentiles, mean)". (emph mine). He's saying that Bayesian inference gives you the posterior distribution ("distributional profile") OF a parameter. You might have another look at that passage! 🙂

"Regarding the definition of a parameter, the consensus is not as narrow as you claim. "

It is among statisticians. Certainly not in general usage, and maybe not in intro applied stat textbooks, but then that wasn't my claim. That is, the consensus is narrow within that narrow group 🙂

"Some intro textbooks define the term in a statement like "The constants n and p are called parameters of the binomial distribution." (from Hogg & Tanis, Probability and Statistical Inference 7e). Casella and Berger (Statistical Inference 2e) do likewise. But, neither a more precise."

Texts at that level aren't strong on the formalism, though the statement agrees with the usual definition of a parameter as indexing probability models. I would refer you to a more intermediate text like Bickel & Doksum or Jun Shao's book for the definitions that statisticians typically are using amongst one another.

"Regarding the Bayesian dogma, there is no misundertanding. But, it's not possible to make a probability statememt about X unless X is a random variable. How can a Bayesian then claim that a parameter is not random? This can't be reconciled by an appeal to the 'dominant mode of thought'."

Actually yes, it can ( by the way, I was referring to the dominant mode of thought among Bayesians - most of whom would disagree vehemently with your mischaracterization). Probability distributions represent degrees of belief about the value of a parameter to a subjective Bayesian. They quantify *our* uncertainty, not (necessarily) some indeterminacy or stochasticity in the parameter itself. Claiming otherwise does indeed represent a fundamental (but common) misunderstanding of the philosophical underpinnings of Bayesian inference. Try e.g. Bernardo & Smith (or deFinetti's work if you want to go deep into the probability theory) to understand the principles behind Bayesian inference. Then you can decide if you disagree; it's hardly a settled issue of course. But you should try to understand it first - the vast majority of Bayesians simply don't believe that there is no such thing as a fixed parameter as you claim they do.

And again, your statement that statisticians are happy considering a parameter to be random is just wrong on its face: every principled frequentist would disagree with you (ironically, raising the objection you do just now), and like I've said, so would most Bayesians.

"...you identify parametric statistics with Bayesian statistics and nonparametric statistics with frequentist or classical statistics...". Not at all!

That's what I inferred from "Perhaps 'classical statistics' should be just 'statistics', and 'Bayesian statistics', just 'parameters'." but maybe that wasn't what you meant. Incidentally it doesn't make much sense, since any Bayes estimator is also a statistic. Sometimes frequentists can ditch the model altogether, which seems to be what you'd prefer. That's why I think that many of the points you've made are more readily cast as an argument for nonparametric vs parametric modes of inference.

You might prefer Larry Wasserman's (I think) quote about Bayesians being slaves to the likelihood (model). Much more apropos...

"There is no 'parameter' redefinition. The definition is simply broader, in the view of some statisticians (see above)."

You still haven't provided a definition at all, so I can't rightly comment 🙂 If I might try to do it for you: The broadest reasonable definition I can imagine (which is appears to be consistent with your post and the quotes you provide) is that a parameter is a (possibly set valued) function of P, the sampling distribution of the data. But this isn't quite how the term is generally understood when statisticians communicate with one another (though the two definitions are essentially equivalent).

And it's still obvious from this definition that (at least beyond the special case of a finite, static population) a parameter is distinct from any collection of draws from P (or any statistic derived thereof) - it lives in a different space entirely - which is the point I set out to make. You've apparently edited away that part of the post though (which is unfortunate, since it kind of leaves my first comment and portions of my second comment hanging). Your blog your rules, but IMO it's pretty bad form not to at least leave strikethrough text (even though you mark it as edited and welcome requests for the changes). Why not post the edits, or make a new post entirely? I put some thought and time into my comments, so it's disappointing that they're no longer available in context unless someone makes a special request. Certainly discourages future comments.
1. BioStatMatt says:
  
  October 27, 2011 at 5:01 pm
  
  I never claimed that statisticians are happy to believe parameters are random, or that Bayesians do not believe in fixed parameters. I hoped the context of my comment made clear that in contrast with non-statisticians, statisticians are more willing to entertain the idea.
  
  A probability statement like "The probability that the parameter takes a value within an interval" suggests that the parameter is random. Combining this with Li's idea of a parameter, a frequentist might deduce that some unknown population quantity is random, and this is reinforced by statements that directly compare the frequentist and Bayesian approaches. For example, from Robert (The Bayesian Choice 2e, Section 5.5):
  
  Once again, the Bayesian formulation that \theta has a given probability to belong to a fixed region Cx is more appealing than the frequentist interpretation that a random region Cx has a given probability to contain the unknown parameter \theta.
  
  But you claim (and I have no dispute) that the dominant (Bayesian) mode of thought is to interpret this such that the parameter \theta is a random variable representing uncertainty about an unknown population quantity, and that the parameter \theta does not represent the quantity itself. In the interpretation of a frequentist confidence interval, the parameter \theta does represent the population quantity.
  
  There is still a disconnect. Dominance of an idea is not evidence in favor of the idea (but to argue so is the so-called 'appeal to belief' fallacy). I was looking for more than 'most Bayesians would disagree with you'.
  
  My original thought wasn't really related to the above, or to parametrics versus nonparametrics, or intended to be critical of Bayesian methods or even to Bolstad's comment. In fact, you may find that I am quite enthusiastic about Bayesian methods. This was an opportunity to express an idea. The idea was that, in some cases, a sequence of statistics may eventually take a value that is identical or very close to a population quantity. It might be reasonable then to construct confidence statements regarding the statistic as the sample grows large, rather than about a parameter.
  
  My last comment was poking fun at Bolstad's argument, and to the tendency of Bayesians to focus on the distribution of parameters, whereas frequentists put more focus on the distribution of statistics. Of course, the Bayesian method utilizes statistics, and the classical method uses 'parameters'.
  
  I've included edits in strikethrough to preserve the context of you comments. I think this is more than accommodating.
Jared says:

October 27, 2011 at 8:41 pm

"I never claimed that statisticians are happy to believe parameters are random, or that Bayesians do not believe in fixed parameters. I hoped the context of my comment made clear that in contrast with non-statisticians, statisticians are more willing to entertain the idea."

I'd refer you to your earlier comment:

"Regarding probability statements about parameters, I agree that this notion is more natural for the statistician. But, statisticians understand probability distributions, know what parameters are (or think they do ), and can accept that a population quantity is random. For everyone else, I'm not certain that this is more natural than the question I presented originally."

I took "population quantity" to mean a feature of the distribution generating the data (which may be defined empirically in finite populations). If you want to consider the general case you really must do so.

I seem to have read "can accept that a population quantity is random." as "can accept that (all) population quantities are random." What you actually wrote is weaker, and perhaps more agreeable, but I still maintain that at the very least a significant plurality of statisticians would disagree (namely frequentists, as you've said yourself). We (apparently) agree that even a Bayesian need not consider every population quantity as random, though I unfortunately used "parameter" several times throughout when I intended "population quantity" or the like. I guess it was the glare in my glass house.

"But you claim (and I have no dispute) that the dominant (Bayesian) mode of thought is to interpret this such that the parameter \theta is a random variable representing uncertainty about an unknown population quantity, and that the parameter \theta does not represent the quantity itself. In the interpretation of a frequentist confidence interval, the parameter \theta does represent the population quantity."

Indeed, I think we're reconciled on the larger point despite the muddled bits in my earlier post(s).

"There is still a disconnect. Dominance of an idea is not evidence in favor of the idea (but to argue so is the so-called 'appeal to belief' fallacy). I was looking for more than 'most Bayesians would disagree with you'."

No no, there is no such fallacy -- I claimed (incorrectly, I think) that your description of what most Bayesian statisticians believe was incorrect or imprecise. As a remedy I gave my accounting (quite poorly, I'm afraid) of the dominant view, along with a couple classic references which describe that view and its justifications. I did so to provide evidence of what those beliefs *are*. I definitely was not claiming that those beliefs were correct because they are held by an authority or the majority - just that they weren't what you wrote (though I think I was mistaken on that!). In fact, I suggested reading *their* arguments yourself and I even said that the issue was far from settled...

" My original thought wasn't really related to the above, or to parametrics versus nonparametrics, or intended to be critical of Bayesian methods or even to Bolstad's comment. In fact, you may find that I am quite enthusiastic about Bayesian methods. This was an opportunity to express an idea. The idea was that, in some cases, a sequence of statistics may eventually take a value that is identical or very close to a population quantity. It might be reasonable then to construct confidence statements regarding the statistic as the sample grows large, rather than about a parameter."

I'll assume that Li has defined the population as the true data generating distribution (this is quite common, since it is a natural generalization of the finite population case). From there I'm not sure why you think your observation is not a comment on parametric versus (classical) nonparametric inference, whether or not you intended it as such. After all, the motivation of classical nonparametrics is to do model-free inference, and its most common justifications are based on arguments such as the convergence of empirical distributions to the population or true data generating distribution. In that way they avoid the "parameter dimension" as much as possible, relying instead on the data.

Comments are closed.