So R is awesome. Felt good to get that out of the way! But sometimes I long for some small syntax additions while programming away in R in the night…
The picture is of a restaurant called Risotto in Berlin.
Just to make it clear, I fully understand that changing the syntax of a language as popular as R is a pretty big thing and I’m not arguing that the changes I propose below actually should be implemented. There are probably many good reasons for why it would be a pretty bad idea to implement these syntax changes. It is also very likely that there are reasons for why these syntax changes would not be possible to implement at all. That still doesn’t mean it wouldn’t be cool if these were implemented :)
Addition 1: Shorthand Function Declaration
I find that I spend a lot of time defining functions in R. When defining standalone functions the current syntax is great, but most functions I define are anonymous functions declared within calls to functions such as apply
and ddply
. For example:
ddply(d, "factor1", function(x) {
...do some stuff to x...
})
For such cases it would be nice if there were some shorthand function declaration syntax. It’s not that function
takes a particularly long time to write, it’s more that it takes up visual space and distracts from what is actually happening. So what could this shorthand look like? One alternative would simply be:
f(a,b) {a + b}
If we write c(1, 2, 3)
instead of combine(1 , 2, 3)
why couldn’t we write f(a,b) {a + b}
instead of function(a,b) { a + b}
.
Another alternative,
instpired by Haskel, would be to use a \
as in:
\(a,b) {a + b}
We could even (but this is a bit crazy…) drop the parentheses…
\a,b {a + b}
… which would make the above call of ddply
look like:
ddply(d, "factor1", \x {
...do some stuff to x...
})
Addition 2: Pipe operator
Since many functions are vectorized in R it is often easy to chain together multiple function and doing a lot of work in just one line. This is great but easily results in nested statements which are hard to decipher because you can’t read them from left to right but rather have to read them from the “inside out”. For example, I’ve written statements far worse than this:
sum(pmax(1, abs(some_variable), na.rm = TRUE))
But this is still not easy to get directly, to what function call does na.rm=TRUE
belong to, for example.
As proposed by Robert Sugar on the R-help mailing list, statements like these would be easier to write if there existed a pipe-like operator that took the result of the statement on the left and “piped” it to the function on the right. Let’s arbitrarily use a period .
for this operator. Then the following:
sum(rnorm(10))
exp(rnorm(10, mean(x)))
plot(rnorm(10, mean(x), sd(x)), log = "xy")
… could be written as:
rnorm(10).sum
rnorm(10, x.mean).exp
plot(rnorm(10, x.mean, x.sd), log="xy")
We could add a further rule: If the function on the right of the pipe operator already has arguments filled in we insert the result on the left of the operator as the first argument to the function on the right. Then the following:
mean(x, na.rm = T)
str_match(c("B", "A", "R"), "R")
plot(rnorm(10, mean(x), sd(x)), log = "xy")
… could be written as:
x.mean(na.rm=T)
c("B", "A", "R").str_match("R")
rnorm(10, x.mean, x.sd).plot(log="xy")
As R is already object oriented a pipe operator would make it possible to write code in a similar left-to-right fashion as in other object oriented languages. Now, the period .
would be pretty impossible to use as the pipe operator as it is a legal and common part of many identifiers. I’m not sure what would be a good alternative… perhaps ->
? Nobody uses that for assignment anyway, right? :)
Addition 3: Ruby style yields
The Ruby language has a very useful feature which allows you to pass an anonymous function as an argument by appending the anonymous function to the end of the function call. That is, a Ruby function that takes a function and calls it 10 times like this:
repeat_10_times( lambda { print("hello!") } )
Could be rewritten to be called like this:
repeat_10_times() { print("hello!") }
This could be implemented in R by the following rule: If an object follows after the closing parenthesis of a function call add it as the last argument of the function call and bind it to, for example, the .yield
variable in the called function. Using this rule the following:
replicate(100, {
sum(rnorm(10))
})
… could be written as:
replicate(100) { sum(rnorm(10)) }
Above we passed an expression but we could also pass a function:
ddply(my_df, "factor1") function(x) {
... do stuff with x ...
}
Using all three proposed syntax additions, the following:
ddply(my_df, "factor1", function(x) {
log(mean(exp(x$y), na.rm = TRUE))
})
… could be rewritten as:
my_df.ddply("factor1") \(x) {
exp(x$y) -> mean(na.rm=TRUE) -> log
}
So, this was just some ideas… Don’t take them to seriously :)