Publishable Stuff

Rasmus Bååth's Blog


Hello stranger, and welcome! 👋😊
I'm Rasmus Bååth, data scientist, engineering manager, father, husband, tinkerer, tweaker, coffee brewer, tea steeper, and, occasionally, publisher of stuff I find interesting down below👇


Bayesian First Aid: Test of Proportions

2014-06-27

Does pill A or pill B save the most lives? Which web design results in the most clicks? Which in vitro fertilization technique results in the largest number of happy babies? A lot of questions out there involves estimating the proportion or relative frequency of success of two or more groups (where success could be a saved life, a click on a link, or a happy baby) and there exists a little known R function that does just that, prop.test. Here I’ll present the Bayesian First Aid version of this procedure. A word of caution, the example data I’ll use is mostly from the Journal of Human Reproduction and as such it might be slightly NSFW :)

Bayesian First Aid logo

Read on →

The Most Comprehensive Review of Comic Books Teaching Statistics

2014-06-11

As I’m more or less an autodidact when it comes to statistics, I have a weak spot for books that try to introduce statistics in an accessible and pedagogical way. I have therefore collected what I believe are all books that introduces statistics using comics (at least those written in English). What follows are highly subjective reviews of those four books. If you know of any other comic book on statistics, please do tell me!

I’ll start with a tl;dr version of the reviews, but first here are the four books:

Read on →

Jeffreys’ Substitution Posterior for the Median: A Nice Trick to Non-parametrically Estimate the Median

2014-05-03

While reading up on quantile regression I found a really nice hack described in Bayesian Quantile Regression Methods (Lancaster & Jae Jun, 2010). It is called Jeffreys’ substitution posterior for the median, first described by Harold Jeffreys in his Theory of Probability, and is a non-parametric method for approximating the posterior of the median. What makes it cool is that it is really easy to understand and pretty simple to compute, while making no assumptions about the underlying distribution of the data. The method does not strictly produce a posterior distribution, but has been shown to produce a conservative approximation to a valid posterior (Lavine, 1995). In this post I will try to explain Jeffreys’ substitution posterior, give R-code that implements it and finally compare it with a classical non-parametric test, the Wilcoxon signed-rank test. But first a picture of Sir Harold Jeffreys:

Read on →

Bayesian First Aid: Pearson Correlation Test

2014-03-17

Correlation does not imply causation, right but, as Edward Tufte writes, “it sure is a hint.” The Pearson product-moment correlation coefficient is perhaps one of the most common ways of looking for such hints and this post describes the Bayesian First Aid alternative to the classical Pearson correlation test. Except for being based on Bayesian estimation (a good thing in my book) this alternative is more robust to outliers and comes with a pretty nice default plot. :)

Bayesian Fist Aid with Anscombe’s quartet

Read on →

A Hack to Create Matrices in R, Matlab style

2014-03-07

The Matlab syntax for creating matrices is pretty and convenient. Here is a 2x3 matrix in Matlab syntax where , marks a new column and ; marks a new row:

[1, 2, 3;
 4, 5, 6]

Here is how to create the corresponding matrix in R:

matrix(c(1,4,2,5,3,6), 2, 3)
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6

Functional but not as pretty, plus the default is to specify the values column wise. A better solution is to use rbind:

Read on →

Oldies but Goldies: Statistical Graphics Books

2014-03-02

I just wanted to plug for three classical books on statistical graphics that I really enjoyed reading. The books are old (that is, older than me) but still relevant and together they give a sense of the development of exploratory graphics in general and the graphics system in R specifically as all three books were written at Bell Labs where the S-language was developed. What follows is not a review but just me highlighting some things that I liked about these books. So, without further ado, here they are:

Read on →

Bayesian First Aid: Two Sample t-test

2014-02-25

As spring follows winter once more here down in southern Sweden, the two sample t-test follows the one sample t-test. This is a continuation of the Bayesian First Aid alternative to the one sample t-test where I’ll introduce the two sample alternative. It will be a quite short post as the two sample alternative is just more of the one sample alternative, more of using John K. Kruschke’s BEST model, and more of the coffee yield data from the 2002 Nature article The Value of Bees to the Coffee Harvest.

BFA logo with two bees

Read on →

A Significantly Improved Significance Test. Not!

2014-02-13

It is my great pleasure to share with you a breakthrough in statistical computing. There are many statistical tests: the t-test, the chi-squared test, the ANOVA, etc. I here present a new test, a test that answers the question researchers are most anxious to figure out, a test of significance, the significance test. While a test like the two sample t-test tests the null hypothesis that the means of two populations are equal the significance test does not tiptoe around the canoe. It jumps right in, paddle in hand, and directly tests whether a result is significant or not.

The significance test has been implemented in R as signif.test and is ready to be sourced and run. While other statistical procedures bombards you with useless information such as parameter estimates and confidence intervals signif.test only reports what truly matters, the one value, the p-vale.

I heart p values

For your convenience signif.test can be called exactly like t.test and will return the same p-value in order to facilitate p-value comparison with already published studies. Let me show you how signif.test works through a couple of examples using a dataset from the RANDOM.ORG database:

Read on →

Bayesian Mugs Galore!

2014-02-07

Having no personal mug at the department I recently created a Bayesian themed one with the message “Make the Puppies Happy. Do Bayesian Data Analysis.” This is of course a homage to the cover of Johns K. Kruschke’s extraordinary book Doing Bayesian Data Analysis. I also ordered some extra copies of the mug and posted to some Bayesian “heroes” of mine and yesterday I got a mug back from Christian Robert (!) and an awesome one too! Here they are together with a not so interested cat (no treats in the mugs…)

Stina the Cat with some Bayesian Cups

Read on →

Bayesian First Aid: One Sample and Paired Samples t-test

2014-02-04

Student’s t-test is a staple of statistical analysis. A quick search on Google Scholar for “t-test” results in 170,000 hits in 2013 alone. In comparison, “Bayesian” gives 130,000 hits while “box plot” results in only 12,500 hits. To be honest, if I had to choose I would most of the time prefer a notched boxplot to a t-test. The t-test comes in many flavors: one sample, two-sample, paired samples and Welch’s. We’ll start with the two most simple; here follows the Bayesian First Aid alternatives to the one sample t-test and the paired samples t-test.

BFA logo with a bee

Read on →