- daddy sleeping only in "the crack"
- baby choosing to sleep perpendicular to "the crack" (and kicking dad in the face)
- one sore back in the morning
- 3" long 1/4-20 machine screw
- 0.75" long nylon spacer
- 1/4-20 nut
- 1/4-20 lock nut
Today I was putting some code together that made plots from slices of a 3-dimensional array
object aa
. A couple of the dimensions in aa
had names defined by named vectors. For example:
> aa = array(runif(2*3*4),
dim=c(2,3,4),
dimnames=list(id = c(good='id1', evil='id2'),
x = c(1,2,3),
var = c(up='a', dn='b', lt='c', rt='d')))
> str(aa)
num [1:2, 1:3, 1:4] 0.0138 0.2942 0.7988 0.3465 0.8751 ...
- attr(*, "dimnames")=List of 3
..$ id : Named chr [1:2] "id1" "id2"
.. ..- attr(*, "names")= chr [1:2] "good" "evil"
..$ x : chr [1:3] "1" "2" "3"
..$ var: Named chr [1:4] "a" "b" "c" "d"
.. ..- attr(*, "names")= chr [1:4] "up" "dn" "lt" "rt"
Thus, I could access “aliases” for dimension names in id
and var
by:
> names(dimnames(aa)$id)
[1] "good" "evil"
> names(dimnames(aa)$var)
[1] "up" "dn" "lt" "rt"
The code I wrote would iterate over the 3rd dimension, using the resulting 2D array
’s to produce a series of plots using matplot()
. To make legends more readable, I made use of the names
attribute for dimnames
as above. In the first version, I used apply()
to do the iterating:
> apply(aa, 3, function(xy) {
x = as.numeric(dimnames(xy)$x)
matplot(x, y=t(xy))
legend('topleft', legend=names(dimnames(xy)$id), fill=1:nrow(xy))
NULL
})
This worked perfectly fine, however, later I decided it would be more informative to use the names in the iterating dimension for a plot title. So I refactored a bit to use sapply()
:
> sapply(1:dim(aa)[3], function(k) {
xy = aa[,,k]
x = as.numeric(dimnames(xy)$x)
matplot(x, y=t(xy))
legend('topleft', legend=names(dimnames(xy)$id), fill=1:nrow(xy))
title(main=names(dimnames(aa)$var[k]))
NULL
})
I was a little surprised that this threw an error indicating that the names associated with dimnames(aa)$id
were non-existant:
Error in legend("topleft", legend = names(dimnames(xy)$id), fill = 1:nrow(xy)) :
'legend' is of length 0
Upon inspection, it seems that it is R’s default behavior to drop attributes on dimnames
when an array
is subsetted.
> str(aa[,,1])
num [1:2, 1:3] 0.0138 0.2942 0.7988 0.3465 0.8751 ...
- attr(*, "dimnames")=List of 2
..$ id: chr [1:2] "id1" "id2"
..$ x : chr [1:3] "1" "2" "3"
Adding a drop=FALSE
to the indexing doesn’t work. The only fix I could come up with was to reassign the additional attributes after subsetting:
> sapply(1:dim(aa)[3], function(k) {
xy = aa[,,k]
# !! recover additional dimname attributes
# dropped by subsetting !! #
dimnames(xy) = dimnames(aa)[names(dimnames(aa)) %in% names(dimnames(xy))]
x = as.numeric(dimnames(xy)$x)
matplot(x, y=t(xy))
legend('topleft', legend=names(dimnames(xy)$id), fill=1:nrow(xy))
title(main=names(dimnames(aa)$var[k]))
NULL
})
To the greater R community, I ask - is this behavior a flaw, or was it done on purpose? If the latter, I pleadingly ask WHYYYYyyyyyyyy!
Written with StackEdit.
Here’s a weird R error/bug/nuance I came across today:
What would you expect the following lines of code to return?
x = c('1', '2', '3')
mean(x)
sd(x)
Well, apparently it is:
# mean(x)
[1] NA
# sd(x)
[1] 1
So, sd()
silently converts its input to numeric, while mean()
does not. More evidence of this is in the source:
> mean
function (x, ...)
UseMethod("mean")
<bytecode: 0x000000001165e790>
<environment: namespace:base>
> sd
function (x, na.rm = FALSE)
sqrt(var(if (is.vector(x)) x else as.double(x), na.rm = na.rm))
<bytecode: 0x000000001158eb00>
<environment: namespace:stats>
One hour of my work day was spent sorting this out. You've been warned.
Written with StackEdit.
Previously, I came up with a solution to R's less than ideal handling of named capture in regular expressions with my re.capture()
function. A little more than a year later, the problem is rearing its ugly - albeit subtly different - head again.
I now have a single character string:
x = '`a` + `[b]` + `[1c]` + `[d] e`'
from which I need to pull matches from. In the case above anything encapuslated in backticks. Since my original re.capture()
function was based on R's regexpr()
function, it would only return the first match:
> re.capture('`(?<tok>.*?)`', x)$names
$tok
[1] "a"
Simply switching the underlying regexpr()
to gregexpr()
wasn't straight forward as gregexpr()
returns a list:
> str(gregexpr('`(?<tok>.*?)`', x, perl=T))
List of 1
$ : atomic [1:4] 1 7 15 24
..- attr(*, "match.length")= int [1:4] 3 5 6 7
..- attr(*, "useBytes")= logi TRUE
..- attr(*, "capture.start")= int [1:4, 1] 2 8 16 25
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr "tok"
..- attr(*, "capture.length")= int [1:4, 1] 1 3 4 5
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr "tok"
..- attr(*, "capture.names")= chr "tok"
which happens to be as long as the input character vector against which the regex pattern is matched:
> x = '`a` + `[b]` + `[1c]` + `[d] e`'
> z = '`f` + `[g]` + `[1h]` + `[i] j`'
> str(gregexpr('`(?<tok>.*?)`', c(x,z) , perl=T), max.level=0)
List of 2
each element of which is a regex match object with its own set of attributes. Thus the new solution was to write a new function that walks the list()
generated by gregexpr()
looking for name captured tokens:
gregexcap = function(pattern, x, ...) {
args = list(...)
args[['perl']] = T
re = do.call(gregexpr, c(list(pattern, x), args))
mapply(function(re, x){
cap = sapply(attr(re, 'capture.names'), function(n, re, x){
start = attr(re, 'capture.start')[, n]
len = attr(re, 'capture.length')[, n]
end = start + len - 1
tok = substr(rep(x, length(start)), start, end)
return(tok)
}, re, x, simplify=F, USE.NAMES=T)
return(cap)
}, re, x, SIMPLIFY=F)
}
thereby returning my R coding universe to one-liner bliss:
> gregexcap('`(?<tok>.*?)`', x)
[[1]]
[[1]]$tok
[1] "a" "[b]" "[1c]" "[d] e"
> gregexcap('`(?<tok>.*?)`', c(x,z))
[[1]]
[[1]]$tok
[1] "a" "[b]" "[1c]" "[d] e"
[[2]]
[[2]]$tok
[1] "ff" "[gg]" "[11hh]" "[ii] jj"
Written with StackEdit.
I was working on an issue (enhancement) today in my groan R-package today that required adding additional plotting elements via lines()
and points()
to a device that had already been partitioned by layout()
. The code I wanted to use was essentially:
# Y and S are lists of xy.coords() objects of the same length
lyt = matrix(1:length(Y), ncol=10)
layout(mat=lyt)
# function A
# plot Y first as points
lapply(Y, function(x) {
plot(x, ...)
})
# function B
# overlay S as lines on the grid of plots for Y
lapply(S, function(x){
lines(x, ...)
})
However, I would only get all of the above lines in one subplot. For a brief moment, I considered rewriting my whole set of plotting methods to use split.screen()
or par(mfcol)
. Ugh!
On a whim, I decided to check what par('mfg')
would return after a device had been partitioned and plotted to with:
layout(matrix(1:9, nrow=3))
par(mar=c(0,0,0,0))
plot(runif(10))
I was pleasantly surprised to find:
> par('mfg')
[1] 1 1 3 3
indicating that I could potentially direct the next plot in a layout()
'ed device by setting the value of mfg=
to the next plot id:
lyt = matrix(1:9, nrow=3)
par(mfg=which(lyt == NextPlotID, arr.ind=TRUE)[1,])
Unicorns and rainbows, this works! (despite all the dire warnings in the documentation regarding incompatibilities)
Thus, the resulting code:
# Y and S are lists of xy.coords() objects of the same length
lyt = matrix(1:length(Y), ncol=10)
layout(mat=lyt)
# function A
# plot Y first as points
lapply(Y, function(x) {
plot(x, ...)
})
# function B
# overlay S as lines on the grid of plots for Y
lapply(seq_along(S), function(n){
par(mfg=which(lyt == n, arr.ind=TRUE)[1,]) # sets next plot in grid!
lines(S[[n]], ...)
})
and issue resolved.
Written with StackEdit.
Being one of two R experts at my current job I figured I should be familiar with package development. Frankly, I've been procrastinating on this topic since I started using R in 2007 - I was doing just fine with source()
and the section of the R manual on package development fell into the category of TL;DR.
What finally drove me to learn the esoteric details of R packaging were the following two events at work:
A coworker sent out an email announcing a new R analysis script which included a few algorithms I wrote and passed on. It was accompanied by documentation in a README.txt file and employed console menus and user dialogs to ease use. Otherwise, details of what the code was doing were left to code comments.
I read an email train between coworkers after being out of the office and disconnected for a day asking about the correct set of parameters to use in a function I wrote. Fortunately, they figured it out thanks to my excessive commenting habits.
Lesson learned: I no longer write R code just for myself and using code comments as documentation just wasn't cutting it anymore. I needed an efficient way to:
Unlike Matlab or Python, R does not have an effective way to provide simple documentation for code - functions, objects, etc. There is this post on StackOverflow, but I expect that such functionality should be built into the environment, not hacked on.
Long winded introduction over, I finally dove in. Thankfully, my entry wasn't too rough. A couple months ago I read a couple posts submitted to R-bloggers regarding "easy" package development using roxygen2
, devtools
, and RStudio (my R IDE of choice).
My problem to solve: get parameters from biological growth curves
My package: groan
Package: groan
Type: Package
Title: Utilities for biological GROwth curve ANalysis
Version: 1.0
Date: 2013-05-14
Author: W. Lee Pang
Description: groan is a set of tools to assist in the analysis of biological
growth curves. It provides functions to smooth input data and extract key
parameters such as specific growth rate (mu), carrying capacity (A), and lag
time (tau).
Working with microorganisms, a common task is determining a culture's specific growth rate - e.g. how many times the population will double in an hour. While not a hard task, it can be tedious if numerous cultures are involved, or if the underlying data is noisy.
groan
is essentially the R package I wish I had as a grad student and postdoc, but was too occupied otherwise to write.
Yes, the name groan
is a pun:
Humor aside, it reduces a CSV of multiple growth curve data points into a table of growth parameters and a summary plot in under 10 lines of code.
From this ...
Via ...
Y = read.csv('path/to/your/data.csv', stringsAsFactors=F)
Y = groan.init(Y)
Y.s = groan.smooth(Y, adaptive=T, method='loess')
U = groan.mu(Y.s)
U.s = groan.smooth(U, adaptive=T, method='loess')
U.f = groan.fit(U.s, method='pulse')
stats = data.frame(mumax = max(U.f),
t.lag = groan.tlag(U.f),
gen = groan.generations(U.f))
plot(Y) # plot thumbnail grid of raw growth curves
To ...
For more information, examples, or to test out the code yourself head to the groan repository on Github.
What were going to be talking about today are dynamic argument lists for functions. Specifically, how to unpack and prepare them in R using ...
, list()
, and do.call()
Initially, I based my use of ...
in R on my experience with Matlab's varargin
. Using varargin
, Matlab functions can have a signature of:
function f(varargin)
% do stuff here
Functions that use varargin
are responsible for processing its contents, which is easy since it is simply a cell array. Thus, it can be "unpacked" and modified using cell array methods.
function f(varargin)
arg1 = varargin{1}
arg2 = varargin{2}
return(arg1*arg2)
At call, arguments captured by varargin
can be specified as an expanded cell array:
args = {foo, bar}
f(args{:})
As a matter of fact, functions that do not use
varargin
can also be called this way since Matlab effectively interprets an expanded cell array as a comma-separated list
This comes in handy when you have a mixture of required and optional arguments for a function.
f(arg, opts{:})
I used to think ...
was analogous to varargin
since:
However, unlike varargin
:
...
is a special R language expression/objectThe former point is strength and quirk of R, as it allows for arguments encapsulated in ...
to be passed on to additional functions:
f = function(x, ...) {
y = g(x, ...)
return(y)
}
The latter point above (unpacking ...
) is actually easy to do:
f = function(x, ...) {
args = list(...) # contains a=1, b=2
return(args$a * args$b)
}
Where confusion arises for many is that ...
is essentially immutable (cannot be changed). While conceptually a list()
, you can't modify it directly using list accessors:
f = function(x, ...) {
...[[1]] = 3 # this produces an error, as would ...$var and ...[1]
y = g(x, ...)
return(y)
}
So, what if I wanted to unpack arguments in ...
, check/change their values, and repackage it for another function call? Since ...
is immutable the code below would throw an error.
f = function(x, ...) {
args = list(...) # unpack, contains a='foo'
args$a = bar
... = args # ERROR!
y = g(x, ...)
return(y)
}
Also, there isn't a way (that I've found yet) to unroll a list()
object in R into a comma-separated list like you can with a cell array in Matlab.
# this totally doesn't work
args = list(a=1, b='foo')
result = f(args[*]) # making up syntax here. would be nice, no?
As it turns out, ...
doesn't even come into play here. In fact, you need to use a rather deep R concept - calls.
Whenever a function is used in R, a call
is produced, which is an unprocessed expression that is then interpreted by the underlying engine. Why the delay? Only the creators/developers of R can fully detail why, but it does allow for some neat effects - e.g. the automatic labeling of plots.
To package a programmatically generated argument list one uses the do.call()
function:
result = do.call('fun', list(arg1, arg2, etc, etc))
where the first argument is the name of the function to call, and the second argument is a list of arguments to pass along. For all intents and purposes, the R statement above is equivalent to the Matlab statement below.
results = fun(args{:}) % where args = {arg1, arg2, etc, etc}
Thus, process to unpack ...
, check/modify an argument, and repack for another function call becomes:
f = function(x, ...) {
args = list(...) # unpack, contains a='foo'
args$a = bar # change argument "a"
y = do.call(g, c(x, args)) # repack arguments for call to g()
return(y)
}
I must credit this epiphany to the following StackOverflow question and answer: http://stackoverflow.com/questions/3414078/unpacking-argument-lists-for-ellipsis-in-r
Written with StackEdit.