This January I played the most intriguing computer game I’ve played in ages: The Return of the Obra Dinn. Except for being a masterpiece of murder-mystery storytelling it also has the most unique art-style as it only uses black and white pixels. To pull this off Obra Dinn makes use of *image dithering*: the arrangement of pixels of low color resolution to emulate the color shades in between. Since the game was over all too quickly I thought I instead would explore how basic image dithering can be implemented in R. If old school graphics piques your interest, read on! There will be some grainy looking ggplot charts at the end.

(*The image above is copyright Lucas Pope and is the title screen of The Return of the Obra Dinn*)

Image dithering tries to solve the problem that you want to show an image with many shades of color, but your device can only display a much smaller number of colors. This might sound like a silly problem now, but was a very real problem in the early days of computers. For example, the original Mac could only display black and white pixels, not even any shades of grey!

So let’s do some image dithering in R! The Return of Obra Dinn takes place on an early 19th century East Indiaman ship, so let’s use something related as an example image. Why not use a low-resolution painting of Vice Admiral Horatio Nelson (1758 - 1805) the British officer who defeated the French and Spanish navies during the battle of Trafalgar. To read in the image I will use the `imager`

package.

library(imager)
nelson <- load.image("horatio_nelson.jpg")
nelson

```
## Image. Width: 199 pix Height: 240 pix Depth: 1 Colour channels: 3
```

The `imager`

package is a really useful package when you want to manipulate (and mess with) images directly in R. The `nelson`

object is now a `cimg`

object, which is basically an `array`

with dimensions Width, Height, Depth (a time dimension, if you have a series of images), and Color channels. More importantly, `cimg`

objects can be `plot`

ted:

plot(nelson)

As an example, I’m going to do black-and-white dithering so let’s remove the color and any transparency (called “alpha” here) from the image.

```
nelson_gray <- grayscale( rm.alpha(nelson) )
plot(nelson_gray)
```

Now let’s imagine that we would want to display this image using only black and white pixels. Before getting to the dithering, what would the simplest method be to achieve this? Well, we could just *threshold* the image. A pixel with value `0.0`

(fully black) to `0.5`

are made black, and pixels with values above `0.5`

are set to white (`1.0`

). This is easy to do as `nelson_gray`

can be treated as a matrix:

nelson_threshold <- nelson_gray > 0.5
plot(nelson_threshold)

So, while the image looks kind of cool, it has lost a lot of nuance as large parts of it are now completely black. So how to fake shades of gray using only black and white pixels? Well, you can *dither* the image, that is, add some noise to the image as you reduce the number of colors. Let’s start by trying out the most basic kind of noise: White noise, here created using the `runif`

function:

rand_matrix <- matrix(
data = runif(length(nelson_gray)),
ncol = ncol(nelson_gray), nrow=nrow(nelson_gray))
rand_cimg <- as.cimg(rand_matrix)
plot(rand_cimg)

Each pixel in `rand_cimg`

is a value from `0.0`

to `1.0`

and we can now use `rand_cimg`

when thresholding instead of the static `0.5`

. If you try out many different noise images then every black and white pixel will *on average* have the same value as the original grayscale pixel. This sounds like a good property, but let’s see how it looks with the current `rand_cimg`

:

nelson_rand <- nelson_gray > rand_cimg
plot(nelson_rand)

To be correct *on average* doesn’t help much, *in practice*, we get a very noisy Nelson. But if you squint you can now see shades of gray in the picture, at least. Random noise is just too random, but maybe we can get better dithering by adding less random noise. What about a checker pattern?

checker_pattern <- rbind(c(1/3, 2/3),
c(2/3, 1/3))
plot(as.cimg(checker_pattern))

The pattern above uses cutoffs of 1/3 and 2/3, so Nelson-pixels that gets compared to a darker 1/3-pixel will be more likely to go white and Nelson-pixels that are compared to a lighter 2/3-pixel will tend to go black. Let’s scale this patter to Nelson-size.

# rep_mat takes a matrix (mat) and tiles it so that the resulting
# matrix has size nrow_out x ncol_out.
# It's basically a 2d version of rep()
rep_mat <- function(mat, nrow_out, ncol_out) {
mat[rep(seq_len(nrow(mat)), length.out = nrow_out),
rep(seq_len(ncol(mat)), length.out = ncol_out)]
}
checker_cimg <- as.cimg(rep_mat(checker_pattern, nrow(nelson_gray), ncol(nelson_gray)))
plot(checker_cimg)

And let’s do the thresholding with this new checker pattern:

nelson_checker <- nelson_gray > checker_cimg
plot(nelson_checker)

Well, it’s not good, but it kind of looks like we got at least one shade of gray now compared to using the static `0.5`

. Actually, that’s exactly what we got! We can see that by taking a smooth gradient…

gradient <- as.cimg( rep(seq(0, 1, 0.01), 101), x=101, y=101)
plot(gradient)

… and thresholding with the checker pattern:

checker_cimg <- as.cimg(rep_mat(checker_pattern,
nrow(gradient), ncol(gradient)))
gradient_checker <- gradient > checker_cimg
plot(gradient_checker)

This gives us three columns: black, “checker-gray”, and white. So, using the checker pattern we can achieve some more nuance than with simple thresholding. Is there perhaps an even better pattern that allows for even more nuance?

Yes, there is! The classical pattern used in many image dithering implementations is the *Bayer pattern* (or *Bayer matrix*) named after it’s inventor Bryce Bayer. It’s an evolution of the checker pattern defined for matrices of size 2×2, 4×4, 8×8, etc. The exact construction and properties of the Bayer matrix are well described in the Wikipedia article but here is how to create it in R and how it looks:

# Calculates a non-normalized Bayer pattern matrix of size 2^n
recursive_bayer_pattern <- function(n) {
if(n <= 0) {
return(matrix(0))
}
m <- recursive_bayer_pattern(n - 1)
rbind(
cbind(4 * m + 0, 4 * m + 2),
cbind(4 * m + 3, 4 * m + 1))
}
# Returns a Bayer pattern of size 2^n normalized so all values
# are between 0.0 and 1.0.
normalized_bayer_pattern <- function(n) {
pattern <- recursive_bayer_pattern(n)
(1 + pattern) / ( 1 + length(pattern) )
}
par(mfcol = c(1, 3), mar = c(0, 0, 2, 1), ps = 18)
plot(as.cimg(normalized_bayer_pattern(1)), main = "Bayer 2×2")
plot(as.cimg(normalized_bayer_pattern(2)), main = "Bayer 4×4")
plot(as.cimg(normalized_bayer_pattern(3)), main = "Bayer 8×8")

Basically, a Bayer matrix contains as many shades of gray it’s possible to fit in there, and the shades are as spread out as possible. Let’s see how a 4x4 Bayer matrix transforms the smooth gradient:

bayer_cimg <- as.cimg(rep_mat(normalized_bayer_pattern(2),
nrow(gradient), ncol(gradient)))
gradient_bayer <- gradient > bayer_cimg
plot(gradient_bayer)

Pretty smooth! We get the classical “crosshatch” patterns reminiscent of last-century computer graphics. Let’s give Admiral Nelson the same treatment:

bayer_matrix <- rep_mat(normalized_bayer_pattern(2),
nrow(nelson_gray), ncol(nelson_gray))
bayer_cimg <- as.cimg(bayer_matrix)
nelson_bayer <- nelson_gray > bayer_cimg
plot(nelson_bayer)

Now he looks doubly old-school. So far I’ve only worked with grayscale images and black-and-white dithering, but we can quickly hack together some color dithering by just performing the dither thresholding on one color channel at a time.

nelson_bayer_color <- nelson
for(rgb_i in 1:3) {
color_channel <- nelson_bayer_color[ , , 1, rgb_i, drop = FALSE]
nelson_bayer_color[ , , 1, rgb_i] <- color_channel > bayer_cimg
}
plot(nelson_bayer_color)

This method does not generalize to arbitrary color scales, but I still think it looks pretty cool!

Finally, I’ll show you how to dither some ggplots. Below is most of the code above wrapped up into functions:

# rep_mat takes a matrix (mat) and tiles it so that the resulting
# matrix has size nrow_out × ncol_out.
# It's basically a 2d version of rep()
rep_mat <- function(mat, nrow_out, ncol_out) {
mat[rep(seq_len(nrow(mat)), length.out = nrow_out),
rep(seq_len(ncol(mat)), length.out = ncol_out)]
}
# Calculates a Bayer pattern matrix of size 2^n
# Source: https://gist.github.com/MehdiNS/bd41bbc6db780c9409157d35d331ac80
recursive_bayer_pattern <- function(n) {
if(n <= 0) {
return(matrix(0))
}
m <- recursive_bayer_pattern(n - 1)
rbind(
cbind(4 * m + 0, 4 * m + 2),
cbind(4 * m + 3, 4 * m + 1))
}
# Returns a Bayer pattern of size 2^n normalized so all values
# are between 1 / (m + 1) and m / (m + 1) where m is the number
# of elements in the 2^n × 2^n matrix.
normalized_bayer_pattern <- function(n) {
pattern <- recursive_bayer_pattern(n)
(1 + pattern) / ( 1 + length(pattern) )
}
# Returns a nrow_out × ncol_out cimg image repeatig a 2×2 Bayer pattern
rep_bayer_cimg <- function(nrow_out, ncol_out) {
bayer_matrix <- rep_mat(normalized_bayer_pattern(2), nrow_out, ncol_out)
as.cimg(bayer_matrix)
}
# Transforms a cimg image into a dithered black and white image
img_to_bayer_bw <- function(img) {
img <- grayscale(rm.alpha(img))
bayer_cimg <- rep_bayer_cimg(nrow(img), ncol(img))
img >= bayer_cimg
}
# Transforms a cimg image into a dithered color image with 8 colors.
img_to_bayer_color <- function(img) {
img <- rm.alpha(img)
bayer_cimg <- rep_bayer_cimg(nrow(img), ncol(img))
for(rgb_i in 1:3) {
color_channel <- img[ , , 1, rgb_i, drop = FALSE]
img[ , , 1, rgb_i] <- color_channel >= bayer_cimg
}
img
}

Let’s then create the ggplot we will transform.

library(ggplot2)
ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(cyl))) +
geom_violin(color = "black") +
theme_classic() +
theme(axis.text= element_text(colour="black"))

Then we’ll turn it into a low res `cimg`

image.

# This function is a hack to read in a ggplot2 plot as a cimg image
# by saving it as a png to disk and reading it back in.
ggplot_to_cimg <- function(width, height, dpi) {
tmp_fname <- tempfile(fileext = ".png")
ggsave(tmp_fname, width = width, height = height, dpi = dpi, antialias = "none")
load.image(tmp_fname)
}
plot_img <- ggplot_to_cimg( width = 3, height = 2, dpi = 140)
plot(plot_img)

Finally, we can turn it into a retro-dithered black and white plot…

plot( img_to_bayer_bw(plot_img) )

…or a dithered eight-color plot.

plot( img_to_bayer_color(plot_img) )

Something for your next retro inspired presentation, maybe? If you want to have full control over your image dithering it would probably be more convenient to post-process your plots using an image editor, such as the free and open source GIMP rather than to do it directly in R.

This post has covered basic image dithering in R and, to be more specific, it has covered ordered dithering. There is also error-diffusion dithering which works in a rather different way. But that’s for another time. Now I’m going to go back to mourning that I’ve already finished The Return of the Obra Dinn.

]]>The Beta-Binomial model is the “hello world” of Bayesian statistics. That is, it’s the first model you get to run, often before you even know what you are doing. There are many reasons for this:

- It only has one parameter, the underlying proportion of success, so it’s easy to visualize and reason about.
- It’s easy to come up with a scenario where it can be used, for example: “What is the proportion of patients that will be cured by this drug?”
- The model can be computed analytically (no need for any messy MCMC).
- It’s relatively easy to come up with an informative prior for the underlying proportion.
- Most importantly: It’s fun to see some results before diving into the theory! 😁

That’s why I also introduced the Beta-Binomial model as the first model in my DataCamp course Fundamentals of Bayesian Data Analysis in R and quite a lot of people have asked me for the code I used to visualize the Beta-Binomial. Scroll to the bottom of this post if that’s what you want, otherwise, here is how I visualized the Beta-Binomial in my course given two successes and four failures:

The function that produces these plots is called `prop_model`

(`prop`

as in proportion) and takes a vector of `TRUE`

s and `FALSE`

s representing successes and failures. The visualization is created using the excellent `ggridges`

package (previously called joyplot). Here’s how you would use `prop_model`

to produce the last plot in the animation above:

data <- c(FALSE, TRUE, FALSE, FALSE, FALSE, TRUE)
prop_model(data)

The result is, I think, a quite nice visualization of how the model’s knowledge about the parameter changes as data arrives. At `n=0`

the model doesn’t know anything and — as the default prior states that it’s equally likely the proportion of success is anything from 0.0 to 1.0 — the result is a big, blue, and uniform square. As more data arrives the probability distribution becomes more concentrated, with the final posterior distribution at `n=6`

.

Some added features of `prop_model`

is that it also plots larger data somewhat gracefully and that it returns a random sample from the posterior that can be further explored. For example:

big_data <- sample(c(TRUE, FALSE), prob = c(0.75, 0.25),
size = 100, replace = TRUE)
posterior <- prop_model(big_data)

quantile(posterior, c(0.025, 0.5, 0.975))

## 2.5% 50% 98%
## 0.68 0.77 0.84

So here we calculated that the underlying proportion of success is most likely 0.77 with a 95% CI of [0.68, 0.84] (which nicely includes the correct value of 0.75 which we used to simulate `big_data`

).

To be clear, `prop_model`

is not intended as anything serious, it’s just meant as a nice way of exploring the Beta-Binomial model when learning Bayesian statistics, maybe as part of a workshop exercise.

`prop_model`

function# This function takes a number of successes and failuers coded as a TRUE/FALSE
# or 0/1 vector. This should be given as the data argument.
# The result is a visualization of the how a Beta-Binomial
# model gradualy learns the underlying proportion of successes
# using this data. The function also returns a sample from the
# posterior distribution that can be further manipulated and inspected.
# The default prior is a Beta(1,1) distribution, but this can be set using the
# prior_prop argument.
# Make sure the packages tidyverse and ggridges are installed, otherwise run:
# install.packages(c("tidyverse", "ggridges"))
# Example usage:
# data <- c(TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE)
# prop_model(data)
prop_model <- function(data = c(), prior_prop = c(1, 1), n_draws = 10000) {
library(tidyverse)
data <- as.logical(data)
# data_indices decides what densities to plot between the prior and the posterior
# For 20 datapoints and less we're plotting all of them.
data_indices <- round(seq(0, length(data), length.out = min(length(data) + 1, 20)))
# dens_curves will be a data frame with the x & y coordinates for the
# denities to plot where x = proportion_success and y = probability
proportion_success <- c(0, seq(0, 1, length.out = 100), 1)
dens_curves <- map_dfr(data_indices, function(i) {
value <- ifelse(i == 0, "Prior", ifelse(data[i], "Success", "Failure"))
label <- paste0("n=", i)
probability <- dbeta(proportion_success,
prior_prop[1] + sum(data[seq_len(i)]),
prior_prop[2] + sum(!data[seq_len(i)]))
probability <- probability / max(probability)
data_frame(value, label, proportion_success, probability)
})
# Turning label and value into factors with the right ordering for the plot
dens_curves$label <- fct_rev(factor(dens_curves$label, levels = paste0("n=", data_indices )))
dens_curves$value <- factor(dens_curves$value, levels = c("Prior", "Success", "Failure"))
p <- ggplot(dens_curves, aes(x = proportion_success, y = label,
height = probability, fill = value)) +
ggridges::geom_density_ridges(stat="identity", color = "white", alpha = 0.8,
panel_scaling = TRUE, size = 1) +
scale_y_discrete("", expand = c(0.01, 0)) +
scale_x_continuous("Underlying proportion of success") +
scale_fill_manual(values = hcl(120 * 2:0 + 15, 100, 65), name = "", drop = FALSE,
labels = c("Prior ", "Success ", "Failure ")) +
ggtitle(paste0(
"Binomial model - Data: ", sum(data), " successes, " , sum(!data), " failures")) +
theme_light() +
theme(legend.position = "top")
print(p)
# Returning a sample from the posterior distribution that can be further
# manipulated and inspected
posterior_sample <- rbeta(n_draws, prior_prop[1] + sum(data), prior_prop[2] + sum(!data))
invisible(posterior_sample)
}

]]>So, after having held workshops introducing Bayes for a couple of years now, I finally pulled myself together and completed my DataCamp course: Fundamentals of Bayesian Data Analysis in R! 😁

While it’s called a course, it’s more like a 4 hour workshop and — without requiring anything but basic R skills and a vague notion of probability — it introduces Bayesian data analysis from scratch. The whole course is done *online*, coding and all, using DataCamp’s interactive course interface (like Rstudio, but in the browser) and, while you need a DataCamp subscription to do the full course, you can *do the first part for free* by just signing into DataCamp! If you feel that’s too much of a commitment you have the first two videos in the course right here:

Please try it out and let me know what you think!

As always I recomend either of the following two excelent introductions to Bayesia data analysis:

My DataCamp course roughly covers chapters 1-3 in *Statistical Rethinking* so you’ll already have a headstart on that one.

The Stan project for statistical computation has a great collection of curated case studies which anybody can contribute to, maybe even me, I was thinking. But I don’t have time to worry about that right now because I’m on vacation, being on the yearly visit to my old family home in the north of Sweden.

What I *do* worry about is that my son will be stung by a bumblebee. His name is Torsten, he’s almost two years old, and he loves running around barefoot on the big lawn. Which has its fair share of bumblebees. Maybe I should put shoes on him so he wont step on one, but what are the chances, really.

Well, what *are* the chances? I guess if I only had

- Data on the bumblebee density of the lawn.
- Data on the size of Torsten’s feet and how many steps he takes when running around.
- A reasonable Bayesian model, maybe implemented in Stan.

I could figure that out. “How hard can it be?”, I thought. And so I made an attempt.

To get some data on bumblebee density I marked out a 1 m² square on a representative part of the lawn. During the course of the day, now and then, I counted up how many bumblebees sat in the square.

Most of the time I saw zero bumblebees, but 1 m² is not that large of an area. Let’s put the data into R:

bumblebees <- c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 1, 0, 0, 0, 0, 0, 0)

During the same day I kept an eye on Torsten, and sometimes when he was engaged in *active play* I started a 60 s. timer and counted how many steps he took while running around. Here are the counts:

toddler_steps <- c(26, 16, 37, 101, 12, 122, 90, 55, 56, 39, 55, 15, 45, 8)

Finally I needed to know the area of Torsten’s feet. At a moment when he was *not* running around I put his foot on a piece of graph paper and traced its edge, which made him giggle. I then calculated how many of the 0.5 cm² squares were fully covered and how many were partially covered.

To estimate of the area of his foot I took the average of the number of full squares and the number partial squares, and then converted to m²:

full_squares <- 174
partial_squares <- 251
squares <- (full_squares + partial_squares) / 2
foot_cm2 <- squares / 4
foot_m2 <- foot_cm2 / 100^2
foot_m2

```
## [1] 0.0053125
```

Turns out my son’s feet are about 0.0053 m² each.

The idea I had here was that *if* you know the average number of steps Torsten takes per minute while playing, and *if* you know the average number of bumblebees per m², *then* you could calculate the average area covered by Torsten’s tiny feet while running around, and then *finally* you could calculate the probability of a bumblebee being in that area. This would give you a probability for how likely Torsten is to be stung by a Bumblebee. However, while we have data on all of the above, we don’t know any of the above for certain.

So, to capture some of this uncertainty I thought I would whip together a small Bayesian model and, even though it’s a bit overkill in this case, why not fit it using Stan? (If you are unfamiliar with Bayesian data analysis and Stan I’ve made a short video tutorial you can find here.)

Let’s start our Stan program by declaring what data we have:

data {
int n_bumblebee_observations;
int bumblebees[n_bumblebee_observations];
int n_toddler_steps_observations;
int toddler_steps[n_toddler_steps_observations];
real foot_m2;
}

Now we need to decide on a model for the number of `bumblebees`

in a m² and a model for the number of `toddler_steps`

in 60 s. For the bees I’m going with a simple *Poisson* model, that is, the model assumes there is an average number of bees per m² ($\mu_\text{bees}$) and that this is all there is to know about bees.

For the number of steps I’m going to step it up a notch with a *negative binomial* model. The negative binomial distribution can be viewed in a number of ways, but one particularly useful way is as an extension of the Poisson distribution where the mean number of steps for each outcome is not fixed, but instead is varying, where this is controlled by the precision parameter $\phi_\text{steps}$. The *larger* $\phi_\text{steps}$ is the *less* the mean number of steps for each outcome is going to vary around the overall mean $\mu_\text{steps}$. That is, when $\phi_\text{steps} \rightarrow \infty$ then the negative binomial becomes a Poisson distribution. The point with using the negative binomial is to capture that Torsten’s activity level when playing on the lawn is definitely *not* constant. Sometimes he can run full speed for minutes, but sometimes he spends a long time being fascinated by the same little piece of rubbish and then he’s not taking many steps.

So we almost have a Bayesian model for our data, we just need to put priors on the parameters $\mu_\text{bees}$, $\mu_\text{steps}$, and $\phi_\text{steps}$. All three parameters are positive and a quick n’ dirty prior when you have positive parameters is a *half-Cauchy* distribution centered at zero: It’s just like half a normal distribution centered at zero, but with a much fatter tail. It has one free parameter, the scale, which also happens to be the median of the half-Cauchy. Set the scale/median to a good guess for the parameter in question and you are ready to go! If your guess *is* good then this prior will give the parameter the slightest nudge in the right direction, if your guess is bad then, as long as you have enough data, the fat tail of the half-Cauchy distribution will help you save face.

My guess for $\mu_\text{bees}$ is (4.0 + 4.0) / (10 × 10) = 0.08. I base this on that while crossing the lawn I usually scan an area of about 10×10 m², I then usually see around four bees, and I assume I’m missing half of the bees. So eight bees on a 100 m² lawn gives 0.08 bees per m². My guess for the number of steps per minute is going to be 60. For the precision parameter $\phi_\text{steps}$ I have no clue, but it shouldn’t be too large nor to small, so my guess is going to be 1.0. Here are the priors:

With the priors specified we now have all the required parts and here is the resulting Bayesian model:

$$% <![CDATA[ \begin{align} & \text{bees} \sim \text{Poisson}(\mu_\text{bees}) \ & \text{steps} \sim \text{Neg-Binomial}(\mu_\text{steps}, \phi_\text{steps}) \ & \mu_\text{bees} \sim \text{Half-Cauchy}((4 + 4) / (10 \times 10)) \ & \mu_\text{steps} \sim \text{Half-Cauchy}(60) \ & \phi_\text{steps} \sim \text{Half-Cauchy}(1) \end{align} %]]>$$

And here is the Stan code implementing the model above:

parameters {
real<lower=0> mu_bees;
real<lower=0> mu_steps;
real<lower=0> precision_steps;
}
model {
# Since we have contrained the parameters to be positive we get implicit
# half-cauchy distributions even if we declare them to be 'full'-cauchy.
mu_bees ~ cauchy(0, (4.0 + 4.0) / (10.0 * 10.0) );
mu_steps ~ cauchy(0, 60);
precision_steps ~ cauchy(0, 1);
bumblebees ~ poisson(mu_bees);
toddler_steps ~ neg_binomial_2(mu_steps, precision_steps);
}

The final step is to predict how many bumblebees Torsten will step on during, say, one hour of active play. We do this in the `generated quantities`

code block in Stan. The code below will step through 60 minutes (a.k.a. one hour) and for each minute: (1) Sample `pred_steps`

from the negative binomial, (2) calculate the area (m²) covered by these steps, and (3) sample the number of bees in this area from the Poisson giving a predicted `stings_by_minute`

. Finally we sum these 60 minutes worth of strings into `stings_by_hour`

.

generated quantities {
int stings_by_minute[60];
int stings_by_hour;
int pred_steps;
real stepped_area;
for(minute in 1:60) {
pred_steps = neg_binomial_2_rng(mu_steps, precision_steps);
stepped_area = pred_steps * foot_m2;
stings_by_minute[minute] = poisson_rng(mu_bees * stepped_area);
}
stings_by_hour = sum(stings_by_minute);
}

We have data, we have a model, and now all we need to do is fit it. After we’ve put the whole Stan model into `bee_model_code`

as a string we do it like this using the `rstan`

interface:

# First we put all the data into a list
data_list <- list(
n_bumblebee_observations = length(bumblebees),
bumblebees = bumblebees,
n_toddler_steps_observations = length(toddler_steps),
toddler_steps = toddler_steps,
foot_m2 = foot_m2
)
# Then we fit the model which description is in bee_model_code .
library(rstan)
fit <- stan(model_code = bee_model_code, data = data_list)

With the model fitted, let’s take a look at the posterior probability distribution of our parameters.

stan_hist(fit, c("mu_bees", "mu_steps", "precision_steps"))
print(fit, c("mu_bees", "mu_steps", "precision_steps"))

## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## mu_bees 0.12 0.00 0.05 0.04 0.08 0.11 0.15 0.23 3551 1
## mu_steps 50.88 0.22 11.65 32.56 43.09 49.34 57.13 78.00 2801 1
## precision_steps 1.80 0.01 0.68 0.78 1.32 1.70 2.17 3.41 2706 1

This seems reasonable, *but* note that the distributions are pretty wide, which means there is a lot of uncertainty! For example, looking at `mu_bees`

it’s credible that there is everything from 0.04 bees/m² to 0.2 bees/m².

Anyway, let’s take a look at what we are really interested in, the predictive distribution over how many stings per hour Torsten will suffer during active play:

# Getting the sample representing the prob. distribution over stings/hour .
stings_by_hour <- as.data.frame(fit)$stings_by_hour
# Now actually calculating the prob of 0 stings, 1 stings, 2 stiongs, etc.
strings_probability <- prop.table( table(stings_by_hour) )
# Plotin' n' printin'
barplot(strings_probability, col = "yellow",
main = "Posterior predictive stings per hour",
xlab = "Number of stings", ylab = "Probability")

```
round(strings_probability, 2)
```

## stings_by_hour
## 0 1 2 3 4 5 6 7 8 9 10
## 0.27 0.30 0.22 0.12 0.05 0.02 0.01 0.00 0.00 0.00 0.00

Ok, it seems like it is most probable that Torsten will receive one sting per hour, but we should not be surprised if it’s two or even three stings. I’d better put some shoes on him! The *problem* is that after a couple of days full of active barefoot play, my son Torsten’s feet look like this:

As you can see his feet are *not* swollen from all the bee stings they should receive according to the model. Actually, even after a week, he has not gotten a single bee sting! Which is good, I suppose, in a way, but, well, it means that my model is likely pretty crap. How can this be? Well,

- I should maybe have gotten more and better data. Perhaps the square I monitored for bumblebees didn’t yield data that was really representative of the bumblebee density in general.
- The assumption that bumblebees always sting when stepped upon might be wrong. Or maybe Torsten is so smart so that he actively avoids them…
- Maybe the model was too simplistic. I really should have made a hierarchical model incorporating data from multiple squares and several days of running around. To factor in the effect of the weather, the flower density, and the diet of Torsent also couldn’t hurt.

I guess I could spend some time improving the model. And I guess there is a lesson to be learned here about that it is hard to predict the predictive performance of a model. But all that has to wait. I am on vacation after all.

*The full data and code behind this blog post can be found here.*