Student’s ttest is a staple of statistical analysis. A quick search on Google Scholar for “ttest” results in 170,000 hits in 2013 alone. In comparison, “Bayesian” gives 130,000 hits while “box plot” results in only 12,500 hits. To be honest, if I had to choose I would most of the time prefer a notched boxplot to a ttest. The ttest comes in many flavors: one sample, twosample, paired samples and Welch’s. We’ll start with the two most simple; here follows the Bayesian First Aid alternatives to the one sample ttest and the paired samples ttest.
Bayesian First Aid is an attempt at implementing reasonable Bayesian alternatives to the classical hypothesis tests in R. For the rationale behind Bayesian First Aid see the original announcement and the description of the alternative to the binomial test. The development of Bayesian First Aid can be followed on GitHub. Bayesian First Aid is a work in progress and I’m grateful for any suggestion on how to improve it!
The Model
A straight forward alternative to the ttest would be to assume normality, add some noninformative priors to the mix and be done with it. However, one of great things with Bayesian data analysis is that it is easy to not assume normality. One alternative to the normal distribution, that still will fit normally distributed data well but that is more robust against outliers, is the t distribution. Hang on, you say, isn’t the ttest already using the tdistribution?. Right, the ttest uses the tdistribution as the distribution of the sample mean divided by the sample SD, here the trick is to assume it as the distribution of the data.
Instead of reinventing the wheel I’ll here piggyback on the work of John K. Kruschke who has developed a Bayesian estimation alternative to the ttest called Bayesian Estimation Supersedes the Ttest, or BEST for short. The rationale and the assumptions behind BEST are well explained in a paper published 2013 in the Journal of Experimental Psychology (the paper is also a very pedagogical introduction to Bayesian estimation in general). That paper and more information regarding BEST is available on John Kruschkes web page. He has also made a nice video based on the paper (mostly focused on the two sample version though):
All information regarding BEST is given in the paper and the video, here is just a short rundown of the model for the one sample BEST: BEST assumes the data ($x$) is distributed as a t distribution which is more robust than a normal distribution due to its wider tails. Except for the mean ($\mu$) and the scale ($\sigma$) the t has one additional parameter, the degreeoffreedoms ($\nu$), where the lower $\nu$ is the wider the tails become. When $\nu$ gets larger the t distribution approaches the normal distribution. While it would be possible to fix $\nu$ to a single value BEST instead estimates $\nu$ allowing the tdistribution to become more or less normal depending on the data. Here is the full model for the one sample BEST:
The prior on $\nu$ is an exponential distribution with mean 29 shifted 1 to the right keeping $\nu$ away from zero. From the JEP 2013 paper: “This prior was selected because it balances nearly normal distributions ($\nu$ > 30) with heavy tailed distributions ($\nu$ < 30)”. The priors on $\mu$ and $\sigma$ are decided by the hyper parameters , , and . By taking a peek at the data these parameters are set so that the resulting priors are extremely wide. While having a look at the data preanalysis is generally not considered kosher, in practice this gives the same results as putting $\mathrm{Uniform}(\infty,\infty)$ distributions on $\mu$ and $\sigma$.
The Model for Paired Samples
Here I use the simple solution. Instead of modeling the distribution of both groups and the paired differences the Bayesian First Aid alternative uses the same trick as the original paired samples ttest: Take the difference between each paired sample and model just the paired differences using the one sample procedure. Thus the alternative to the paired samples ttest is the same as the one sample alternative, the only difference is in how the data is prepared and the how the result is presented.
The bayes.t.test
Function
The t.test
function is used to run all four versions of the ttest. Here I’ll just show the one sample and paired samples alternatives. The bayes.t.test
runs the Bayesian First Aid alternative to the ttest and has a function signature that is compatible with t.test
function. That is, if you just ran a ttest, say t.test(x, conf.int=0.8, mu=1)
, just prepend bayes.
and it should work out of the box.
The example data I’ll use to show off bayes.t.test
is from the 2002 Nature article The Value of Bees to the Coffee Harvest (doi:10.1038/417708a, pdf). In this article David W. Roubik argues that bees are important to the coffee harvest despite that the “selfpollinating African shrub Coffea arabica, a pillar of tropical agriculture, was considered to gain nothing from insect pollinators”. Supporting the argument is a data set of the mean average coffee yield (in kg / 10,000 m) for new world countries in 19611980, before the establishment of African honeybees, and in 19812001, when African honeybees had been more or less been naturalized. This data shows an increased yield after the introduction of bees and when analyzed using a paired ttest results in p = 0.04. This is compared with the increase in yield in old world countries, where the bees been busy buzzing all along, were a paired ttest gives p = 0.232 interpreted as “no change”. The full dataset is given in the table below and in this csv file (to match the analysis in the paper the csv does not include the Caribbean islands)
There are a couple of reasons for why it is not proper to use a ttest to analyze this data set. A ttest does not consider the geographical location of the countries nor is it clear what “population” the sample of countries is drawn from. I also feel tempted to mutter the old cliché “correlation does not imply causation”, surely there must have been many things except for the introduction of bees that changed in Bolivia between 1961 and 2001. Being aware of these objections I’m going to use it to nevertheless show off the paired bayes.t.test
.
Let’s first run the original analysis from the paper:
1 2 3 4 

1 2 3 4 5 6 7 8 9 10 11 

In the paper they used one tailed ttest hence the unhelpfully wide confidence interval. Now the Bayesian First Aid alternative:
1 2 

1


1 2 3 4 5 6 7 8 9 10 11 

So here we get estimates for the mean and SD with credible intervals on the side. We also get to know that the probability that the mean increase is more than zero is 99.4%. We could also take a look at the data from the old world:
1 2 3 

1 2 3 4 5 6 7 8 9 10 11 

So the trend here is also increasing yields, though the parameter estimate is much less precise and the 95% CI includes zero. Also notable is the much larger SD compare to the new world countries.
Generic Functions
Every Bayesian First Aid test have corresponding plot
, summary
, diagnostics
and model.code
functions. Here follows examples of these using the new world data.
Using plot
we get to look at the posterior distributions directly. We also get a posterior predictive check in the form of a histogram with a smattering of tdistributions drawn from the posterior. If there is a large discrepancy between the model fit and the data then we need to think twice before proceeding. As it is now I would say the post. pred. check looks OK.
1 2 

summary
gives us a detailed summary of the posterior. Note that the number of decimals places in this summary is a bit excessive, due to the posterior being approximated using MCMC the numbers will jump around slightly between runs. If this worries you increase the number of MCMC iterations by using the argument n.iter
when calling bayes.t.test
.
1


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 

diagnostics
prints and plots MCMC diagnostics (similar to the example in bayes.binom.test
). Finally model.code
prints out R and JAGS code that replicates the analysis and that can be directly copynpasted into an R script and modified from there. Here goes:
1


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 

If we believe, for example, that robustness is not such a big issue and would like to assume that the data is normally distributed rather than t distributed we just have to make some small adjustments to this script. In the model code we change dt
into dnorm
and remove the nu
parameter resulting in this model:
1 2 3 4 5 6 7 8 9 10 11 

We also has to remove the monitoring of the nu
parameter.
1 2 

And that’s all.
Some Comments
The bayes.t.test
does not include all the functionality of BEST. Check out the BEST package by John K. Kruschke and Mike Meredith for an R and JAGS implementation of BEST including power analysis and more. Ttestable data was not that easy to get by and I browsed through a lot of papers before I found the examples with the bees. Wouldn’t it be nice if people actually published some data, well well… I did find two other fun examples of ttestable data if someone would be interested:
 Biomechanics: Rubber bands reduce the cost of carrying loads. The data is in the supplementary material.
 Chimpanzees Recruit the Best Collaborators. The data is in the paper.