Publishable Stuff

Rasmus Bååth's Research Blog

A Stan case study, sort of: The probability my son will be stung by a bumblebee

The Stan project for statistical computation has a great collection of curated case studies which anybody can contribute to, maybe even me, I was thinking. But I don’t have time to worry about that right now because I’m on vacation, being on the yearly visit to my old family home in the north of Sweden.

What I do worry about is that my son will be stung by a bumblebee. His name is Torsten, he’s almost two years old, and he loves running around barefoot on the big lawn. Which has its fair share of bumblebees. Maybe I should put shoes on him so he wont step on one, but what are the chances, really.

Well, what are the chances? I guess if I only had

  • Data on the bumblebee density of the lawn.
  • Data on the size of Torsten’s feet and how many steps he takes when running around.
  • A reasonable Bayesian model, maybe implemented in Stan.

I could figure that out. “How hard can it be?”, I thought. And so I made an attempt.

Getting the data

To get some data on bumblebee density I marked out a 1 m² square on a representative part of the lawn. During the course of the day, now and then, I counted up how many bumblebees sat in the square.

Video Introduction to Bayesian Data Analysis, Part 3: How to do Bayes?

This is the last video of a three part introduction to Bayesian data analysis aimed at you who isn’t necessarily that well-versed in probability theory but that do know a little bit of programming. If you haven’t watched the other parts yet, I really recommend you do that first: Part 1 & Part 2.

This third video covers the how? of Bayesian data analysis: How to do it efficiently and how to do it in practice. But covers is really a big word, briefly introduces is really more appropriate. Along the way I will then briefly introduce Markov chain Monte Carlo, parameter spaces and the computational framework Stan:

Video Introduction to Bayesian Data Analysis, Part 2: Why use Bayes?

This is video two of a three part introduction to Bayesian data analysis aimed at you who isn’t necessarily that well-versed in probability theory but that do know a little bit of programming. If you haven’t watched part one yet, I really recommend you do that first, here it is. This second video covers the why? of Bayesian data analysis: Why (and when) use it instead of some other method of analyzing data?

Video Introduction to Bayesian Data Analysis, Part 1: What is Bayes?

This is video one of a three part introduction to Bayesian data analysis aimed at you who isn’t necessarily that well-versed in probability theory but that do know a little bit of programming. I gave a version of this tutorial at the UseR 2015 conference, but I didn’t get around doing a screencast of it. Until now, that is! I should warn you that this tutorial is quite handwavey (but it’s also pretty short), and if you want a more rigorous video tutorial I can really recommend Richard McElreath’s YouTube lectures.

This first video covers the what? of Bayesian data analysis with part two and three covering the why? and the how?. I expect to be able to record part two and three over the next couple of weeks but, for now, here is part one:

Beginners Exercise: Bayesian Computation with Stan and Farmer Jöns

Over the last two years I’ve occasionally been giving a very basic tutorial to Bayesian statistics using R and Stan. At the end of the tutorial I hand out an exercise for those that want to flex their newly acquired skills. I call this exercise Bayesian computation with Stan and Farmer Jöns and it’s pretty cool! Now, it’s not cool because of me, but because the expressiveness of Stan allowed me to write a small number of data analytic questions that quickly takes you from running a simple binomial model up to running a linear regression. Throughout the exercise you work with the same model code and each question just requires you to make a minimal change to this code, yet you will cover most models taught in a basic statistics course! Well, briefly at least… :) If you want to try out this exercise yourself, or use it for some other purpose, you can find it here:

Beginners Exercise: Bayesian computation with Stan and Farmer Jöns (R-markdown source)
Solutions to Bayesian computation with Stan and Farmer Jöns (R-markdown source)

My friend and colleague Christophe Carvenius also helped me translate this exercise into Python:

Python Beginners Exercise: Bayesian computation with Stan and Farmer Jöns
Python Solutions to Bayesian computation with Stan and Farmer Jöns

Now, this exercise would surely have been better if I’d used real data, but unfortunately I couldn’t find enough datasets related to cows… Finally, here is a depiction of farmer Jöns and his two lazy siblings by the great master Hokusai.

A Fun Gastronomical Dataset: What’s on the Menu?

I just found a fun food themed dataset that I’d never heard about and that I thought I’d share. It’s from a project called What’s on the menu where the New York Public Library has crowdsourced a digitization of their collection of historical restaurant menus. The collection stretches all the way back to the 19th century and well into the 1990’s, and on the home page it is stated that there are “1,332,271 dishes transcribed from 17,545 menus”. Here is one of those menus, from a turn of the (old) century Chinese-American restaurant:

The data is freely available in csv format (yay!) and here I ‘ll just show how to the get the data into R and I’ll use it to plot the popularity of some foods over time.

How I made some Pokémon Business Cards

As I’m in the industry now I figured I needed some business cards and as it seems the 90s never left us and Japanese monsters are hip again, I decided to make them Pokémon themed.

I think they turned out pretty well, and here I’m just going to give some pointers on how I did them.

Bayesian Bootstrap: The Movie + Some Highlights from UseR! 2016

Not surprisingly, this year’s UseR! conference was a great event with heaps of talented researchers and R-developers showing off the latest and greatest R packages. (A surprise visit from Donald Knuth didn’t hurt either.) What was extra great this year was that all talks were recorded, including mine. So if you want to know more about how the non-parametric Bootstrap is really a Bayesian procedure, and how you can run the Bayesian bootstrap in R using my bayesboot package, just press play. :)


How to Cut Your Planks with R

Today I’m extraordinarily pleased because today I solved an actuall real world problem using R. Sure, I’ve solved many esoteric statistical problems with R, but I’m not sure if any of those solutions have escaped the digital world and made some impact ex silico.

It is now summer and in Sweden that means that many people tend to overhaul and rebuild their wooden decks as you need somewhere to sit during those precious few weeks of +20°C (70° F) weather. And so, we also decided to rebuild our algae ridden, half-rotten deck and everything went well until we got to the point where we had to construct the last steps leading into the house. As we had been slightly sloppy when buying planks we only had five left, and when naïvely measuring out the lengths we needed it seemed that the planks were not long enough. Now the problem was this: Was there some way we could saw the planks into the lengths we needed or did we have to go all the way to the lumber yard to get more planks?

These were the planks we had (in centimeters):

1
planks_we_have <- c(120, 137, 220, 420, 480)

bayesboot: An R package for doing the Bayesian bootstrap

I recently wrapped up a version of my R function for easy Bayesian bootstrappin’ into the package bayesboot. This package implements a function, also named bayesboot, which performs the Bayesian bootstrap introduced by Rubin in 1981. The Bayesian bootstrap can be seen as a smoother version of the classical non-parametric bootstrap, but I prefer seeing the classical bootstrap as an approximation to the Bayesian bootstrap :)

The implementation in bayesboot can handle both summary statistics that works on a weighted version of the data (such as weighted.mean) and that works on a resampled data set (like median). As bayesboot just got accepted on CRAN you can install it in the usual way:

1
install.packages("bayesboot")