2014/02/05

An R flaw: unexpected attribute droppings

Today I was putting some code together that made plots from slices of a 3-dimensional array object aa. A couple of the dimensions in aa had names defined by named vectors. For example:

> aa = array(runif(2*3*4), 
             dim=c(2,3,4), 
             dimnames=list(id  = c(good='id1', evil='id2'), 
                           x   = c(1,2,3), 
                           var = c(up='a', dn='b', lt='c', rt='d')))
> str(aa)
 num [1:2, 1:3, 1:4] 0.0138 0.2942 0.7988 0.3465 0.8751 ...
 - attr(*, "dimnames")=List of 3
  ..$ id : Named chr [1:2] "id1" "id2"
  .. ..- attr(*, "names")= chr [1:2] "good" "evil"
  ..$ x  : chr [1:3] "1" "2" "3"
  ..$ var: Named chr [1:4] "a" "b" "c" "d"
  .. ..- attr(*, "names")= chr [1:4] "up" "dn" "lt" "rt"

Thus, I could access “aliases” for dimension names in id and var by:

> names(dimnames(aa)$id)
[1] "good" "evil"
> names(dimnames(aa)$var)
[1] "up" "dn" "lt" "rt"

The code I wrote would iterate over the 3rd dimension, using the resulting 2D array’s to produce a series of plots using matplot(). To make legends more readable, I made use of the names attribute for dimnames as above. In the first version, I used apply() to do the iterating:

> apply(aa, 3, function(xy) {
    x = as.numeric(dimnames(xy)$x)
    matplot(x, y=t(xy))
    legend('topleft', legend=names(dimnames(xy)$id), fill=1:nrow(xy))

    NULL
  })

This worked perfectly fine, however, later I decided it would be more informative to use the names in the iterating dimension for a plot title. So I refactored a bit to use sapply():

> sapply(1:dim(aa)[3], function(k) {
    xy = aa[,,k]

    x = as.numeric(dimnames(xy)$x)
    matplot(x, y=t(xy))
    legend('topleft', legend=names(dimnames(xy)$id), fill=1:nrow(xy))

    title(main=names(dimnames(aa)$var[k]))

    NULL
  })

I was a little surprised that this threw an error indicating that the names associated with dimnames(aa)$id were non-existant:

 Error in legend("topleft", legend = names(dimnames(xy)$id), fill = 1:nrow(xy)) : 
  'legend' is of length 0 

Upon inspection, it seems that it is R’s default behavior to drop attributes on dimnames when an array is subsetted.

> str(aa[,,1])
 num [1:2, 1:3] 0.0138 0.2942 0.7988 0.3465 0.8751 ...
 - attr(*, "dimnames")=List of 2
  ..$ id: chr [1:2] "id1" "id2"
  ..$ x : chr [1:3] "1" "2" "3"

Adding a drop=FALSE to the indexing doesn’t work. The only fix I could come up with was to reassign the additional attributes after subsetting:

> sapply(1:dim(aa)[3], function(k) {
    xy = aa[,,k]

    # !! recover additional dimname attributes 
    #    dropped by subsetting !! #
    dimnames(xy) = dimnames(aa)[names(dimnames(aa)) %in% names(dimnames(xy))]

    x = as.numeric(dimnames(xy)$x)
    matplot(x, y=t(xy))
    legend('topleft', legend=names(dimnames(xy)$id), fill=1:nrow(xy))

    title(main=names(dimnames(aa)$var[k]))

    NULL
  })

To the greater R community, I ask - is this behavior a flaw, or was it done on purpose? If the latter, I pleadingly ask WHYYYYyyyyyyyy!

Written with StackEdit.

No comments:

Post a Comment