Odd Hypothesis: 2013/08

2013/08/26

mfg, mfcol, mfrow, and layout() - secret friends

I was working on an issue (enhancement) today in my groan R-package today that required adding additional plotting elements via lines() and points() to a device that had already been partitioned by layout(). The code I wanted to use was essentially:

# Y and S are lists of xy.coords() objects of the same length

lyt = matrix(1:length(Y), ncol=10)
layout(mat=lyt)

# function A
# plot Y first as points
lapply(Y, function(x) {
  plot(x, ...)
})

# function B
# overlay S as lines on the grid of plots for Y
lapply(S, function(x){
  lines(x, ...)
})

However, I would only get all of the above lines in one subplot. For a brief moment, I considered rewriting my whole set of plotting methods to use split.screen() or par(mfcol). Ugh!

On a whim, I decided to check what par('mfg') would return after a device had been partitioned and plotted to with:

layout(matrix(1:9, nrow=3))
par(mar=c(0,0,0,0))
plot(runif(10))

I was pleasantly surprised to find:

> par('mfg')
[1] 1 1 3 3

indicating that I could potentially direct the next plot in a layout()'ed device by setting the value of mfg= to the next plot id:

lyt = matrix(1:9, nrow=3)
par(mfg=which(lyt == NextPlotID, arr.ind=TRUE)[1,])

Unicorns and rainbows, this works! (despite all the dire warnings in the documentation regarding incompatibilities)

Thus, the resulting code:

# Y and S are lists of xy.coords() objects of the same length

lyt = matrix(1:length(Y), ncol=10)
layout(mat=lyt)

# function A
# plot Y first as points
lapply(Y, function(x) {
  plot(x, ...)
})

# function B
# overlay S as lines on the grid of plots for Y
lapply(seq_along(S), function(n){
  par(mfg=which(lyt == n, arr.ind=TRUE)[1,]) # sets next plot in grid!
  lines(S[[n]], ...)
})

and issue resolved.

Written with StackEdit.

2013/08/20

Groan - my first R package

Being one of two R experts at my current job I figured I should be familiar with package development. Frankly, I've been procrastinating on this topic since I started using R in 2007 - I was doing just fine with source() and the section of the R manual on package development fell into the category of TL;DR.

What finally drove me to learn the esoteric details of R packaging were the following two events at work:

A coworker sent out an email announcing a new R analysis script which included a few algorithms I wrote and passed on. It was accompanied by documentation in a README.txt file and employed console menus and user dialogs to ease use. Otherwise, details of what the code was doing were left to code comments.
I read an email train between coworkers after being out of the office and disconnected for a day asking about the correct set of parameters to use in a function I wrote. Fortunately, they figured it out thanks to my excessive commenting habits.

Lesson learned: I no longer write R code just for myself and using code comments as documentation just wasn't cutting it anymore. I needed an efficient way to:

distribute code
provide documentation that uses the in-built help system

Unlike Matlab or Python, R does not have an effective way to provide simple documentation for code - functions, objects, etc. There is this post on StackOverflow, but I expect that such functionality should be built into the environment, not hacked on.

Long winded introduction over, I finally dove in. Thankfully, my entry wasn't too rough. A couple months ago I read a couple posts submitted to R-bloggers regarding "easy" package development using roxygen2, devtools, and RStudio (my R IDE of choice).

My problem to solve: get parameters from biological growth curves
My package: groan

Package: groan
Type: Package
Title: Utilities for biological GROwth curve ANalysis
Version: 1.0
Date: 2013-05-14
Author: W. Lee Pang
Description: groan is a set of tools to assist in the analysis of biological
    growth curves.  It provides functions to smooth input data and extract key
    parameters such as specific growth rate (mu), carrying capacity (A), and lag
    time (tau).

Working with microorganisms, a common task is determining a culture's specific growth rate - e.g. how many times the population will double in an hour. While not a hard task, it can be tedious if numerous cultures are involved, or if the underlying data is noisy.

groan is essentially the R package I wish I had as a grad student and postdoc, but was too occupied otherwise to write.

Yes, the name groan is a pun:

"grown" : as in yay the cells grew
"groan" : as in ugh, I have to process yet another growth curve

Humor aside, it reduces a CSV of multiple growth curve data points into a table of growth parameters and a summary plot in under 10 lines of code.

From this ...

Via ...

Y = read.csv('path/to/your/data.csv', stringsAsFactors=F)
Y = groan.init(Y)
Y.s = groan.smooth(Y, adaptive=T, method='loess')

U = groan.mu(Y.s)
U.s = groan.smooth(U, adaptive=T, method='loess')
U.f = groan.fit(U.s, method='pulse')

stats = data.frame(mumax = max(U.f),
                   t.lag = groan.tlag(U.f),
                   gen   = groan.generations(U.f))

plot(Y)   # plot thumbnail grid of raw growth curves

To ...

For more information, examples, or to test out the code yourself head to the groan repository on Github.

2013/08/03

A new R trick ... for me at least

What were going to be talking about today are dynamic argument lists for functions. Specifically, how to unpack and prepare them in R using ..., list(), and do.call()

Biased by Matlab and varargin

Initially, I based my use of ... in R on my experience with Matlab's varargin. Using varargin, Matlab functions can have a signature of:

function f(varargin)
% do stuff here

Functions that use varargin are responsible for processing its contents, which is easy since it is simply a cell array. Thus, it can be "unpacked" and modified using cell array methods.

function f(varargin)
arg1 = varargin{1}
arg2 = varargin{2}
return(arg1*arg2)

At call, arguments captured by varargin can be specified as an expanded cell array:

args = {foo, bar}
f(args{:})

As a matter of fact, functions that do not use varargin can also be called this way since Matlab effectively interprets an expanded cell array as a comma-separated list

This comes in handy when you have a mixture of required and optional arguments for a function.

f(arg, opts{:})

Back to R ...

I used to think ... was analogous to varargin since:

it captures all function arguments not explicitly defined by the call signature
the number of arguments it captures can vary

However, unlike varargin:

... is a special R language expression/object
it needs to be converted to a list to access the arguments (names and/or values) that it captures

The former point is strength and quirk of R, as it allows for arguments encapsulated in ... to be passed on to additional functions:

f = function(x, ...) {
  y = g(x, ...)
  return(y)
}

The latter point above (unpacking ...) is actually easy to do:

f = function(x, ...) {
  args = list(...) # contains a=1, b=2
  return(args$a * args$b)
}

Where confusion arises for many is that ... is essentially immutable (cannot be changed). While conceptually a list(), you can't modify it directly using list accessors:

f = function(x, ...) {
  ...[[1]] = 3 # this produces an error, as would ...$var and ...[1]
  y = g(x, ...)
  return(y)
}

So, what if I wanted to unpack arguments in ..., check/change their values, and repackage it for another function call? Since ... is immutable the code below would throw an error.

f = function(x, ...) {
  args = list(...) # unpack, contains a='foo'
  args$a = bar

  ... = args # ERROR!

  y = g(x, ...)
  return(y)
}

Also, there isn't a way (that I've found yet) to unroll a list() object in R into a comma-separated list like you can with a cell array in Matlab.

# this totally doesn't work
args = list(a=1, b='foo')
result = f(args[*]) # making up syntax here. would be nice, no?

As it turns out, ... doesn't even come into play here. In fact, you need to use a rather deep R concept - calls.

Whenever a function is used in R, a call is produced, which is an unprocessed expression that is then interpreted by the underlying engine. Why the delay? Only the creators/developers of R can fully detail why, but it does allow for some neat effects - e.g. the automatic labeling of plots.

To package a programmatically generated argument list one uses the do.call() function:

result = do.call('fun', list(arg1, arg2, etc, etc))

where the first argument is the name of the function to call, and the second argument is a list of arguments to pass along. For all intents and purposes, the R statement above is equivalent to the Matlab statement below.

results = fun(args{:}) % where args = {arg1, arg2, etc, etc}

Thus, process to unpack ..., check/modify an argument, and repack for another function call becomes:

f = function(x, ...) {
  args = list(...) # unpack, contains a='foo'
  args$a = bar     # change argument "a"

  y = do.call(g, c(x, args)) # repack arguments for call to g()
  return(y)
}

I must credit this epiphany to the following StackOverflow question and answer: http://stackoverflow.com/questions/3414078/unpacking-argument-lists-for-ellipsis-in-r

Written with StackEdit.