The Tidyverse in a Table

This is a submission to the 2020 RStudio Table Contest in the Tutorials for interactive-HTML tables category.

Some tables are beautiful. And yes, I’m talking about the stats-and-numbers kind of tables and not the ones you get at IKEA. Some tables show carefully selected statistics, with headers in bold and spacious yet austere design; the numbers rounded to just the right number of decimal places.

But here we’re not going to make a beautiful table, instead we’re making a useful table. In this tutorial, I’m going show you how to take all the documentation, for all the functions in the tidyverse core packages, and condense it into one single table. Why is this useful? As we’re going to use the excellent DT package the result is going to be an interactive table that makes it easy to search, sort, and explore the functions of the tidyverse.

Actually, let’s start with the finished table, and then I’ll show you how it’s made. Try it out below, for example, find all functions that take a pattern argument or find all ggplot2 functions with line in the name!

Show entries

Search:

Package	Function	Arguments	Description

ggplot2	%+%	(e1, e2)	Add components to a plot '+' is the key to constructing sophisticated ggplot2 graphics. It allows you to start simple, then get more and more complex, checking your work at each step.	⊕
ggplot2	%+replace%	(e1, e2)	Get, set, and modify the active theme The current/active theme (see 'theme()') is automatically applied to every plot you draw. Use 'theme_get' to get the current theme, and 'theme_set' to completely override it. 'theme_update' and 'theme_replace' are shorthands for changing individual…	⊕
ggplot2	aes	(x, y, ...)	Construct aesthetic mappings Aesthetic mappings describe how variables in the data are mapped to visual properties (aesthetics) of geoms. Aesthetic mappings can be set in 'ggplot()' and in individual layers.	⊕
ggplot2	aes_	(x, y, ...)	Define aesthetic mappings programmatically Aesthetic mappings describe how variables in the data are mapped to visual properties (aesthetics) of geoms. 'aes()' uses non-standard evaluation to capture the variable names. 'aes_' and 'aes_string' require you to explicitly quote the inputs…	⊕
ggplot2	aes_q	(x, y, ...)	Define aesthetic mappings programmatically Aesthetic mappings describe how variables in the data are mapped to visual properties (aesthetics) of geoms. 'aes()' uses non-standard evaluation to capture the variable names. 'aes_' and 'aes_string' require you to explicitly quote the inputs…	⊕

Showing 1 to 5 of 782 entries

Previous1 2 3 4 5…157Next

Making the table above is a two-step affair, where both steps are somewhat tricky:

Take the documentation for all the tidyverse packages and put it into a tidy data frame. This is going to be tricky because parsing untidy data into a data frame is often messy and we will have to use undocumented functions from the tools package to parse the R documentation (Rd) files.
Format the documentation data frame and make it into a pretty DT table. This is going to be tricky because to format and customize the table using DT we will have to write both some HTML and some javascript (!).

Putting the tidyverse documentation into a data frame

Let’s load the packages we’re going to need: DT to make the final table, glue to easily “glue” complex strings together, and, of course, tidyverse which we’ll use for processing data, but that’s also where the documentation lives that we’re going to extract!

library(DT)
library(glue)
library(tidyverse)

The tidyverse contains many different packages, but here we’re going to use only the core tidyverse packages.

package_names <- tidyverse:::core
package_names

## [1] "ggplot2" "tibble"  "tidyr"   "readr"   "purrr"   "dplyr"   "stringr"
## [8] "forcats"

We’re being a bit reckless here because we’re using ::: a.k.a. the triple colon operator. A close cousin to the double colon operator (::), the :::-operator also plucks out a function or value from within a package, but ::: is the cousin dangereux! While :: allows you to pluck out exported functions, functions the package author has marked as being dependable and useful, ::: allows you to pluck out any obscure, undocumented, and possibly unreliable function. In general, it’s not good practice to use :::, but in practice it’s generally OK when getting from A to B is more important than reliability down the line. We will find a use for ::: again.

By loading the tidyverse we’ve also loaded all core packages and we can now extract the function names and arguments from these. We’ll use

lsf.str("package:A_PACKAGE_NAME") to extract a list of all function names in the given package.
str(get("A_FUNCTION_NAME")) to get the arguments for each function. As str() doesn’t return anything (it directly prints the arguments) we need to capture.output to get it back as a string.
map_dfr to loop over all the core packages in package_names, create a data frame for each, and row bind them together.

tidyverse_functions <- map_dfr(package_names, function(package_name) {
  tibble(
    Package = package_name,
    Function = as.character(lsf.str(glue("package:{Package}"))),
    Arguments = map_chr(Function, function(function_name) {
      capture.output(str(get(function_name))) %>% 
        str_squish() %>% 
        str_c(collapse = " ") %>% 
        str_remove("^function ")
    })
  ) 
})

We now have a data frame of all 1,153 exported functions (+ some other values) in the tidyverse!

tidyverse_functions

## # A tibble: 1,153 x 3
##    Package Function    Arguments         
##    <chr>   <chr>       <chr>             
##  1 ggplot2 %+%         (e1, e2)          
##  2 ggplot2 %+replace%  (e1, e2)          
##  3 ggplot2 aes         (x, y, ...)       
##  4 ggplot2 aes_        (x, y, ...)       
##  5 ggplot2 aes_all     (vars)            
##  6 ggplot2 aes_auto    (data = NULL, ...)
##  7 ggplot2 aes_q       (x, y, ...)       
##  8 ggplot2 aes_string  (x, y, ...)       
##  9 ggplot2 after_scale (x)               
## 10 ggplot2 after_stat  (x)               
## # … with 1,143 more rows

What’s missing is the actual documentation. We can get a parsed version of the documentation for any installed package by using the tools::Rd_db() function. This gives us the documentation as a deeply nested list structure, which is better than having to work with the raw Rd-files, but it’s still tedious to figure out how to pick out the parts we want. Good news: The tools package has many functions that help with this. Bad news: They are all hidden away as non-exported functions (some are even “doubly” hidden by having function names starting with .). Good news, again: We know about the :::-operator and are renegades enough to use it!

# plucking out non-exported Rd helper functions from the tools package
Rd_get_metadata <- tools:::.Rd_get_metadata
Rd_contents <- tools:::Rd_contents
Rd_get_example_code <- tools:::.Rd_get_example_code
Rd_get_section <- tools:::.Rd_get_section
Rd_get_text <- tools:::.Rd_get_text

# Extracts the text of the named section from the rd_doc
Rd_get_section_text <- function(rd_doc, section) {
  Rd_get_section(rd_doc, section) %>% 
    Rd_get_text() %>% 
    discard(~ .x == "")
}

Now comes a slightly messy section where we will, again, use map_dfr to loop over all the core packages, create a data frame (this time with extracted documentation), and row bind it all together.

rd_info <- map_dfr(package_names, function(package_name) {
  # A list with the parsed package documentation
  rd_list <- tools::Rd_db(package_name)
  rd_list %>% 
    # Turn the documentation contents into a data frame, one doc page per row.
    Rd_contents() %>% 
    as_tibble() %>% 
    # Remove all documentation of datasets and "internal" functions. 
    # We need to use map_lgl here as Keywords is a list of character vectors.
    filter(map_lgl(Keywords, 
      ~ length(.x) == 0 || ! .x %in% c("datasets", "internal")
    )) %>%
    select(File, Name, Title, Aliases, Keywords) %>% 
    mutate(Package = package_name) %>% 
    # For each row/doc page we're going to extract some information
    rowwise() %>% 
    mutate(
      rd_doc = list(rd_list[[File]]),
      # The function examples. We're using the paste0("", x) trick as x might
      # be a 0-length vector, but we still want to get back an empty (1-length)
      # character vector.
      Examples = str_trim(paste0("", Rd_get_example_code(rd_doc))),
      Description = paste(Rd_get_section_text(rd_doc, "description"), collapse = " "),
      # A single doc page can document many function we here make a list of
      #  all functions documented by the doc page.
      names_and_aliases = list(unique(c(Name, Aliases)))
    ) %>%
    # A page can document many functions and this unnest will get us one row
    # per function instead of one row per doc page. All row values, except
    # those in names_and_aliases, will be duplicated.
    unnest(names_and_aliases)
})

We now have one data frame listing all tidyverse functions and their arguments (tidyverse_functions) and one data frame with extracted documentation (rd_info). Now we can join these two data frames together using an inner_join which will concatenate all rows with the same package and function name. Using inner_join also has the added benefit of removing all functions that do not occur in both data frames which removes undocumented functions and documentation on non-functions.

tidyverse_functions_info <- inner_join(
  tidyverse_functions, rd_info,
  by = c("Package" = "Package", "Function" = "names_and_aliases")
)

Here’s the final data frame with info on all the 782 documented functions in the core tidyverse.

tidyverse_functions_info

## # A tibble: 782 x 11
##    Package Function Arguments File  Name  Title Aliases Keywords rd_doc Examples
##    <chr>   <chr>    <chr>     <chr> <chr> <chr> <list>  <list>   <list> <chr>   
##  1 ggplot2 %+%      (e1, e2)  gg-a… +.gg  Add … <chr [… <chr [0… <Rd>   "base <…
##  2 ggplot2 %+repla… (e1, e2)  them… them… Get,… <chr [… <chr [0… <Rd>   "p <- g…
##  3 ggplot2 aes      (x, y, .… aes.… aes   Cons… <chr [… <chr [0… <Rd>   "aes(x …
##  4 ggplot2 aes_     (x, y, .… aes_… aes_  Defi… <chr [… <chr [0… <Rd>   "# Thre…
##  5 ggplot2 aes_q    (x, y, .… aes_… aes_  Defi… <chr [… <chr [0… <Rd>   "# Thre…
##  6 ggplot2 aes_str… (x, y, .… aes_… aes_  Defi… <chr [… <chr [0… <Rd>   "# Thre…
##  7 ggplot2 after_s… (x)       aes_… aes_… Cont… <chr [… <chr [0… <Rd>   "# Defa…
##  8 ggplot2 after_s… (x)       aes_… aes_… Cont… <chr [… <chr [0… <Rd>   "# Defa…
##  9 ggplot2 annotate (geom, x… anno… anno… Crea… <chr [… <chr [0… <Rd>   "p <- g…
## 10 ggplot2 annotat… (grob, x… anno… anno… Anno… <chr [… <chr [0… <Rd>   "# Dumm…
## # … with 772 more rows, and 1 more variable: Description <chr>

So, we have a data frame with all the information, but we still need to turn this into a useful table we can easily search, filter, and skim.

Formatting the documentation and turn it into a DT table

The DT package by Yihui Xie is an amazing package that wraps the DataTables javascript library. It allows you to quickly turn any data frame into an interactive, sortable, and searchable HTML table that can be included in Rmarkdown documents and shiny apps. To turn the data frame tidyverse_functions_info into a DataTable we just select the columns we want and then we feed it to the datatable function:

tidyverse_functions_info %>% 
  select(Package, Function, Arguments, Title, Description) %>% 
  # Show a datatable with five visible rows
  datatable(options = list(pageLength = 5))

Show entries

Search:

	Package	Function	Arguments	Title	Description
1	ggplot2	%+%	(e1, e2)	Add components to a plot	'+' is the key to constructing sophisticated ggplot2 graphics. It allows you to start simple, then get more and more complex, checking your work at each step.
2	ggplot2	%+replace%	(e1, e2)	Get, set, and modify the active theme	The current/active theme (see 'theme()') is automatically applied to every plot you draw. Use 'theme_get' to get the current theme, and 'theme_set' to completely override it. 'theme_update' and 'theme_replace' are shorthands for changing individual elements.
3	ggplot2	aes	(x, y, ...)	Construct aesthetic mappings	Aesthetic mappings describe how variables in the data are mapped to visual properties (aesthetics) of geoms. Aesthetic mappings can be set in 'ggplot()' and in individual layers.
4	ggplot2	aes_	(x, y, ...)	Define aesthetic mappings programmatically	Aesthetic mappings describe how variables in the data are mapped to visual properties (aesthetics) of geoms. 'aes()' uses non-standard evaluation to capture the variable names. 'aes_' and 'aes_string' require you to explicitly quote the inputs either with '""' for 'aes_string()', or with 'quote' or '~' for 'aes_()'. ('aes_q()' is an alias to 'aes_()'). This makes 'aes_()' and 'aes_string()' easy to program with.
5	ggplot2	aes_q	(x, y, ...)	Define aesthetic mappings programmatically	Aesthetic mappings describe how variables in the data are mapped to visual properties (aesthetics) of geoms. 'aes()' uses non-standard evaluation to capture the variable names. 'aes_' and 'aes_string' require you to explicitly quote the inputs either with '""' for 'aes_string()', or with 'quote' or '~' for 'aes_()'. ('aes_q()' is an alias to 'aes_()'). This makes 'aes_()' and 'aes_string()' easy to program with.

Showing 1 to 5 of 782 entries

Previous1 2 3 4 5…157Next

And that’s it. All the tidyverse documentation in a table! But wouldn’t it be nice if the function names were set in a monospaced font? And maybe we could have individual search for each column? And wouldn’t it be nifty if you could click on the function and get to the official documentation? We can do all that, and more, with DT, but the price we pay is that we have to write some HTML and javascript.

By setting escape = FALSE when creating a datatable all HTML in the table cells will be rendered. So, let’s add some HTML markup and links to our table.

formatted_functions_info <- tidyverse_functions_info %>%
  mutate(
    # The package and function name links to the tidyverse.org documentation
    Function = glue(
      "<a href='https://{Package}.tidyverse.org/reference/{Name}.html'>{Function}</a>"
    ),
    Package = glue("<a href='https://{Package}.tidyverse.org/'>{Package}</a>"),
    # Let's replace all space in the arguments with non-breaking space (&nbsp;)
    # except after a comma, so that text only wraps between arguments.
    Arguments = str_replace_all(Arguments, "(?<!,) " , "&nbsp;"),
    # Join the Title and Description, and format the examples
    Description = glue("<b>{Title}</b><br>{Description}"),
    Examples = glue("<b>Examples</b><pre><code>{Examples}</pre></code>"),
    # A mystery column consisting only of pluses (&oplus;).
    # Read on for the explanation!
    " " = '&oplus;'
  ) %>%
  select(Package, Function, Arguments, Description,  Examples, ` `)

If we make a new datatable out of this we can see the formatting in action. But we also see some problems: The examples and description don’t really fit, and what is the weird (+) doing there?

datatable(
  formatted_functions_info, 
  escape = FALSE, 
  # The rows are so tall that we have to show just 1 row at a time...
  options = list(pageLength = 1)
)

Show entries

Search:

Package Function Arguments Description Examples

	Package	Function	Arguments	Description	Examples
1	ggplot2	%+%	(e1, e2)	Add components to a plot '+' is the key to constructing sophisticated ggplot2 graphics. It allows you to start simple, then get more and more complex, checking your work at each step.	Examples `base <- ggplot(mpg, aes(displ, hwy)) + geom_point() base + geom_smooth() # To override the data, you must use %+% base %+% subset(mpg, fl == "p") # Alternatively, you can add multiple components with a list. # This can be useful to return from a function. base + list(subset(mpg, fl == "p"), geom_smooth())`	⊕

ggplot2

%+%

(e1, e2)

Add components to a plot
'+' is the key to constructing sophisticated ggplot2 graphics. It allows you to start simple, then get more and more complex, checking your work at each step.

Examples

base <- ggplot(mpg, aes(displ, hwy)) + geom_point()
base + geom_smooth()

# To override the data, you must use %+%
base %+% subset(mpg, fl == "p")

# Alternatively, you can add multiple components with a list.
# This can be useful to return from a function.
base + list(subset(mpg, fl == "p"), geom_smooth())

⊕

Showing 1 to 1 of 782 entries

Previous1 2 3 4 5…782Next

Since a datatable runs on HTML and javascript it’s possible to make it do anything as long as we’re ready to write some javascript and add it as a callback: A piece of code that runs once a datatable has loaded. By modifying the code from this guide I ended up with a (somewhat impenetrable, but working) javascript callback where

A function is defined that triggers when we 'click' in the column with the HTML class details-control. That’s going to be the column with the (+).
This function shows or hides a row.child: An extra row under the row we clicked on.
What is shown in that row is what the format function pastes together: The content in column 5 (d[4], 4 because javascript starts counting items from 0) plus the content in column 6 (d[5]). That is, the Description plus the Examples.

datatable_callback <- JS("
  var format = function(d) {
    return '<div style=\"padding: .5em;\">' +
           '<p>' + d[4] + '</p>' +
           '<p>' + d[5] + '</p>' + 
           '</div>';
  };
  table.on('click', 'td.details-control', function() {
    var td = $(this), row = table.row(td.closest('tr'));
    if (row.child.isShown()) {
      row.child.hide();
      td.html('&oplus;');
    } else {
      row.child(format(row.data())).show();
      td.html('&CircleMinus;');
    }
  });"
)

Now we’re ready to make the final formatted table. What’s left is to set a couple of more datatable options to customize our table, commented inline below.

tidyverse_in_a_table <- 
  datatable(
    formatted_functions_info, 
    # Render HTML in the table
    escape = FALSE, 
    # Add search boxes for each column at the "top" of the table
    filter = "top",  
    # Register the javascript code we wrote above as a callback
    callback = datatable_callback,
    # To shorten the descriptions we're going to use the datatable ellipsis 
    # plugin which adds ... when the text in a cell is too long.
    plugins = "ellipsis",
    options =  list(
      # Show 5 rows by default
      pageLength = 5,
      # But it will be possible to show up to a 100 rows!
      lengthMenu = c(5, 10, 20, 100),
      # Some column specific settings
      columnDefs = list(
        # column 0 (row numbers) and 6 (Examples) are hidden
        list(visible = FALSE, targets = c(0, 5)),
        # The special column with (+) gets the details-control class so that it
        # triggers the callback code
        list(orderable = FALSE, className = 'details-control', targets = 6),
        # Adds an ellipsis (...) when the Description (in column 4) is 
        # longer than 300 characters  
        list(render = JS("$.fn.dataTable.render.ellipsis(300, true)"), targets = 4)
      )
    )
  ) %>% 
  # Column specific formatting
  formatStyle("Package", `vertical-align` = "top", `font-family` =  "monospace") %>%
  formatStyle("Function", `vertical-align` = "top", `font-family` =  "monospace") %>%
  formatStyle("Arguments", `vertical-align` = "top", `font-family` =  "monospace") %>% 
  formatStyle("Description", `vertical-align` = "top") %>% 
  formatStyle(6, `font-size` = "20px", cursor = "pointer")

OK, so now we’ve made the final table with the following features:

It includes all the tidyverse core packages documentation.
It can be sorted and searched, both globally and per column.
Function names link to the original tidyverse.org documentation.
It shows all the function arguments and a 300 character excerpt of the description.
If you click on the (+) you get a longer description and all the examples.

Here it is again:

tidyverse_in_a_table

Show entries

Search:

Package	Function	Arguments	Description

ggplot2	%+%	(e1, e2)	Add components to a plot '+' is the key to constructing sophisticated ggplot2 graphics. It allows you to start simple, then get more and more complex, checking your work at each step.	⊕
ggplot2	%+replace%	(e1, e2)	Get, set, and modify the active theme The current/active theme (see 'theme()') is automatically applied to every plot you draw. Use 'theme_get' to get the current theme, and 'theme_set' to completely override it. 'theme_update' and 'theme_replace' are shorthands for changing individual…	⊕
ggplot2	aes	(x, y, ...)	Construct aesthetic mappings Aesthetic mappings describe how variables in the data are mapped to visual properties (aesthetics) of geoms. Aesthetic mappings can be set in 'ggplot()' and in individual layers.	⊕
ggplot2	aes_	(x, y, ...)	Define aesthetic mappings programmatically Aesthetic mappings describe how variables in the data are mapped to visual properties (aesthetics) of geoms. 'aes()' uses non-standard evaluation to capture the variable names. 'aes_' and 'aes_string' require you to explicitly quote the inputs…	⊕
ggplot2	aes_q	(x, y, ...)	Define aesthetic mappings programmatically Aesthetic mappings describe how variables in the data are mapped to visual properties (aesthetics) of geoms. 'aes()' uses non-standard evaluation to capture the variable names. 'aes_' and 'aes_string' require you to explicitly quote the inputs…	⊕

Showing 1 to 5 of 782 entries

Previous1 2 3 4 5…157Next

This table could now be included as part of an Rmarkdown document, or you can export it as a self-contained HTML-file like this:

DT::saveWidget(tidyverse_in_a_table, "tidyverse_in_a_table.html")

Here’s a link to the full code to generate the final table. And while this tutorial showed you how to turn the tidyverse docs into a table, it should be easy to tweak the code to work for any R-package.

Thank you for reading through this tutorial, a huge thanks to Yihui Xie for the DT package, and a shout-out to Michael Chow for the tweet that inspired me to put the tidyverse docs in a table!

The Tidyverse in a Table

Rasmus Bååth, rasmus.baath@gmail.com, @rabaath

10/24/2020

Putting the tidyverse documentation into a data frame

Formatting the documentation and turn it into a DT table