Odd Hypothesis: math

Showing posts with label math. Show all posts

2015/01/21

Easy error propagation in R

In a previous post I demonstrated how to use R’s simple built-in symbolic engine to generate Jacobian and (pseudo)-Hessian matrices that make non-linear optimization perform much more efficiently. Another related application is Gaussian error propagation.

Say you have data from a set of measurements in variables x and y where you know the corresponding measurement errors (dx and dy, typically the standard deviation or error from a set of replicates or a calibration curve). Next you want to create a derived value defined by an arbitrary function z = f(x,y). What would the corresponding error in value of z, i.e. dz = df, be?

If the function f(x,y) is a simple sum or product, their are simple equations for determining df. However, if f(x,y) is something more complex, like:

z = f (x, y) = x - y ( x + y ) 2

$z = f(x,y) = \frac{x-y}{(x+y)^2}$

you’ll need to use a bit of calculus, specifically the chain rule:

d f = (d x \partial f \partial x) 2 + (d y \partial f \partial y) 2 + . . . - - - - - - - - - - - - - - - - - - - - - - - \sqrt

$df = \sqrt{\left(dx \frac{\partial f}{\partial x}\right)^2 + \left(dy \frac{\partial f}{\partial y}\right)^2 + ...}$

Applying the above equation allows for the derivation of Gaussian error propagation for any arbitrary function. So how does one do this in R? Again, the D() function and R expression() objects come to our rescue.

Say the definition of z (ergo f(x,y)) is defined in an R formula:

> f = z ~ (x-y)/(x+y)^2

If you probe the structure of a formula object you get:

> str(f)
Class 'formula' length 3 z ~ (x - y)/(x + y)^2
  ..- attr(*, ".Environment")=<environment: R_GlobalEnv>

What’s key is the “length 3” bit:

> f[[1]]; f[[2]]; f[[3]]
`~`
z
(x - y)/(x + y)^2

The code above shows us that a formula object can be subsetted into its constituent parts:

the formula operator: ~
the left-hand side (LHS) of the formula: z
the right-hand side (RHS) of the formula: (x - y)/(x + y)^2

The class() of the RHS is a call, which is close enough to an R expression that both all.vars() and D() work as expected to generate the mathematical expressions for the partial derivatives with respect to each variable:

> all.vars(f[[3]])
[1] "x" "y"

> lapply(all.vars(f[[3]]), function(v) D(f[[3]], v))
[[1]]
1/(x + y)^2 - (x - y) * (2 * (x + y))/((x + y)^2)^2

[[2]]
-(1/(x + y)^2 + (x - y) * (2 * (x + y))/((x + y)^2)^2)

These expressions need to be modified a bit - i.e. in this case they need to be multiplied by dx and dy, respectively and then squared. What’s returned from D() is a call object, so the elements above need to be converted to character to manipulate them accordingly. This is done with deparse().

> lapply(all.vars(f[[3]]), function(v) deparse(D(f[[3]], v)))
[[1]]
[1] "1/(x + y)^2 - (x - y) * (2 * (x + y))/((x + y)^2)^2"

[[2]]
[1] "-(1/(x + y)^2 + (x - y) * (2 * (x + y))/((x + y)^2)^2)"

The final error propagation expression is created with a bit of string manipulation:

> sprintf('sqrt(%s)', 
    paste(
        sapply(all.vars(f[[3]]), function(v) {
            sprintf('(d%s*(%s))^2', v, deparse(D(f[[3]], v)))
        }), 
        collapse='+'
    )
  )
[1] "sqrt((dx*(1/(x + y)^2 - (x - y) * (2 * (x + y))/((x + y)^2)^2))^2+(dy*(-(1/(x + y)^2 + (x - y) * (2 * (x + y))/((x + y)^2)^2)))^2)"

Now that we’ve got the basics down, let’s test this out with some data …

> set.seed(0)
> data = data.frame(
    x  = runif(5), 
    y  = runif(5), 
    dx = runif(5)/10, 
    dy = runif(5)/10
  )
> data
          x         y          dx         dy
1 0.8966972 0.2016819 0.006178627 0.07698414
2 0.2655087 0.8983897 0.020597457 0.04976992
3 0.3721239 0.9446753 0.017655675 0.07176185
4 0.5728534 0.6607978 0.068702285 0.09919061
5 0.9082078 0.6291140 0.038410372 0.03800352

and with a little help from dplyr:

> library(dplyr)
> data %>%
+   mutate_(.dots=list(
+     # compute derived value
+     z  = deparse(f[[3]]),
+     
+     # generates a mathematical expression to compute dz
+     # as a character string
+     dz = sapply(all.vars(f[[3]]), function(v) {
+             dfdp = deparse(D(f[[3]], v))
+             sprintf('(d%s*(%s))^2', v, dfdp)
+           }) %>%
+           paste(collapse='+') %>%
+           sprintf('sqrt(%s)', .)
+       ))
          x         y          dx         dy           z         dz
1 0.8966972 0.2016819 0.006178627 0.07698414  0.57608929 0.14457245
2 0.2655087 0.8983897 0.020597457 0.04976992 -0.46718831 0.03190297
3 0.3721239 0.9446753 0.017655675 0.07176185 -0.33019871 0.01978697
4 0.5728534 0.6607978 0.068702285 0.09919061 -0.05778613 0.07604809
5 0.9082078 0.6291140 0.038410372 0.03800352  0.11809201 0.02424023

Taking this a step further, this method can be wrapped in a chainable function that determines the name of new variables from the LHS of a formula argument:

mutate_with_error = function(.data, f) {
  exprs = list(
      # expression to compute new variable values
      deparse(f[[3]]),

      # expression to compute new variable errors
      sapply(all.vars(f[[3]]), function(v) {
        dfdp = deparse(D(f[[3]], v))
        sprintf('(d%s*(%s))^2', v, dfdp)
      }) %>%
        paste(collapse='+') %>%
        sprintf('sqrt(%s)', .)
  )
  names(exprs) = c(
    deparse(f[[2]]),
    sprintf('d%s', deparse(f[[2]]))
  )

  .data %>%
    # the standard evaluation alternative of mutate()
    mutate_(.dots=exprs)
}

Thus, adding new derived variables and propagating errors accordingly becomes relatively easy:

> set.seed(0)
> data = data.frame(x=runif(5), y=runif(5), dx=runif(5)/10, dy=runif(5)/10)
> data
          x         y          dx         dy
1 0.8966972 0.2016819 0.006178627 0.07698414
2 0.2655087 0.8983897 0.020597457 0.04976992
3 0.3721239 0.9446753 0.017655675 0.07176185
4 0.5728534 0.6607978 0.068702285 0.09919061
5 0.9082078 0.6291140 0.038410372 0.03800352

> data %>% mutate_with_error(z ~ (x-y)/(x+y)^2)
          x         y          dx         dy           z         dz
1 0.8966972 0.2016819 0.006178627 0.07698414  0.57608929 0.14457245
2 0.2655087 0.8983897 0.020597457 0.04976992 -0.46718831 0.03190297
3 0.3721239 0.9446753 0.017655675 0.07176185 -0.33019871 0.01978697
4 0.5728534 0.6607978 0.068702285 0.09919061 -0.05778613 0.07604809
5 0.9082078 0.6291140 0.038410372 0.03800352  0.11809201 0.02424023

Written with StackEdit.

2014/08/20

Optimizing with R expressions

I recently discovered a powerful use for R expression()’s

Say you are trying to fit some experimental data to the following nonlinear equation:

K y 0 e u ( x - t l ) K + y 0 ( e u * ( x - t l ) - 1 ) + b 1 + (b 0 - b 1) e - k x + b 2 x

$\frac{K y_0 e^{u (x-tl)}}{K + y_0 (e^{u*(x-tl)}-1)} + b_1 + (b_0 - b_1) e^{-k x} + b_2 x$
with the independent variable x using nlminb() as the minimization optimizer.

This sort of work is significantly improved (i.e. faster with better convergence) if an analytical gradient vector and a Hessian matrix for the objective function are provided. This means a lot of partial differentiation of the model equation with respect to each parameter.
To get these partial derivatives one could:

Review some calculus and derive them by hand
Feed the equation into an online engine like Wolfram Alpha and copy/paste the results

The discovery I made is that there is a purely R solution via the combination of expression()’s and the functions D() and all.vars().

The all.vars() function extracts all variable and parameter names from an expression as a character vector. For example:

> all.vars(expression(b1 + (b0 - b1)*exp(-k*x) + b2*x))
[1] "b1" "b0" "k"  "x"  "b2"

The D() function takes two arguments: an expression to differentiate and a character specifying the variable term to differentiate by:

> D(expression(b1 + (b0 - b1)*exp(-k*x) + b2*x), 'x')
b2 - (b0 - b1) * (exp(-k * x) * k)

The following code produces a list of the partial derivatives of the above equation with respect to each variable/parameter.

# the model equation
expr = expression(
    (K*y0*exp(u*(x-tl)))/(K + y0*(exp(u*(x-tl))-1)) + 
        b1 + (b0 - b1)*exp(-k*x) + b2*x
)

sapply(all.vars(expr), function(v){
  D(expr, v)
})

# returns:
# $K
# y0 * exp(u * (x - tl))/(K + y0 * (exp(u * (x - tl)) - 1)) - (K * 
#     y0 * exp(u * (x - tl)))/(K + y0 * (exp(u * (x - tl)) - 1))^2
# 
# $y0
# K * exp(u * (x - tl))/(K + y0 * (exp(u * (x - tl)) - 1)) - (K * 
#     y0 * exp(u * (x - tl))) * (exp(u * (x - tl)) - 1)/(K + y0 * 
#     (exp(u * (x - tl)) - 1))^2
# 
# $u
# K * y0 * (exp(u * (x - tl)) * (x - tl))/(K + y0 * (exp(u * (x - 
#     tl)) - 1)) - (K * y0 * exp(u * (x - tl))) * (y0 * (exp(u * 
#     (x - tl)) * (x - tl)))/(K + y0 * (exp(u * (x - tl)) - 1))^2
# ...

Each element of the list returned by the sapply() statement above is itself an expression. Evaluation of each will give rows of the Jacobian matrix, which we’ll subsequently need to compute the gradient:

p0 = c(y0=0.01, u=0.3, tl=5, K=2, b0=0.01, b1=1, b2=0.001, k=0.1)
x = seq(0,10)

# notice that t() is applied to put parameters on rows
J = t(sapply(all.vars(expr), function(v, env){
  eval(D(expr, v), env)
}, env=c(as.list(p0), list(x=x))))

J
# returns:
#             [,1]          [,2]          [,3]          [,4] ...
# K  -4.367441e-06 -5.298871e-06 -6.067724e-06 -6.218461e-06 ...
# y0  2.248737e-01  3.033101e-01  4.089931e-01  5.512962e-01 ...
# u  -1.118747e-02 -1.207174e-02 -1.220845e-02 -1.097079e-02 ...
# x   1.006712e-01  9.148428e-02  8.327519e-02  7.598662e-02 ...
# tl -6.712481e-04 -9.053805e-04 -1.220845e-03 -1.645619e-03 ...
# b1  0.000000e+00  9.516258e-02  1.812692e-01  2.591818e-01 ...
# b0  1.000000e+00  9.048374e-01  8.187308e-01  7.408182e-01 ...
# k   0.000000e+00  8.957890e-01  1.621087e+00  2.200230e+00 ...
# b2  0.000000e+00  1.000000e+00  2.000000e+00  3.000000e+00 ...

Since x is the independent variable, the row corresponding to it can be safely removed from the Jacobian:

J = J[names(p0),,drop=F]
J
# returns:
#             [,1]          [,2]          [,3]          [,4] ...
# y0  2.248737e-01  3.033101e-01  4.089931e-01  5.512962e-01 ...
# u  -1.118747e-02 -1.207174e-02 -1.220845e-02 -1.097079e-02 ...
# tl -6.712481e-04 -9.053805e-04 -1.220845e-03 -1.645619e-03 ...
# K  -4.367441e-06 -5.298871e-06 -6.067724e-06 -6.218461e-06 ...
# b0  1.000000e+00  9.048374e-01  8.187308e-01  7.408182e-01 ...
# b1  0.000000e+00  9.516258e-02  1.812692e-01  2.591818e-01 ...
# b2  0.000000e+00  1.000000e+00  2.000000e+00  3.000000e+00 ...
# k   0.000000e+00  8.957890e-01  1.621087e+00  2.200230e+00 ...

The gradient vector is simply the inner product of the Jacobian and a vector of residuals:

gr = -J %*% r

For the Hessian, the full form in Gibbs-Einstein notation is:

H j k = \partial f i \partial p j \partial f i \partial p k + r i \partial 2 r i \partial p j \partial p k

$H_{jk} = \frac{\partial f_i}{\partial p_j} \frac{\partial f_i}{\partial p_k} + r_i \frac{\partial^2 r_i}{\partial p_j \partial p_k}$

The first term is simply

JTJ $\mathbf{J}^T \mathbf{J}$ . The second term is typically called the “Hessian of the residuals” and is referred to as a matrix

B $\mathbf{B}$ . I’m still trying to wrap my head around what it actually is in vector notation.

Thankfully, in optimization cases where the initial guess is near the optimum, the behavior of the system should be “linear enough” that one can ignore the second term:

H \approx J T J

$\mathbf{H} \approx \mathbf{J}^T \mathbf{J}$

The equivalent R code would be:

H = J %*% t(J)

(because linear algebra in R is a little strange, the transpose is applied to the second Jacobian)

Putting it all together:

# the model equation
expr = expression(
    (K*y0*exp(u*(x-tl)))/(K + y0*(exp(u*(x-tl))-1)) + 
        b1 + (b0 - b1)*exp(-k*x) + b2*x
)

p0 = c(y0=0.01, u=0.3, tl=5, K=2, b0=0.01, b1=1, b2=0.001, k=0.1)
x = seq(0,48,by=0.25)

# let's say these are the residuals
r = runif(length(x))

# magic syntax that converts an equation expression into a jacobian matrix
J = t(sapply(all.vars(expr), function(v, env){
  eval(D(expr, v), env)
}, env = c(as.list(p0), list(x=x))))

# and then a gradient vector
gr = -J %*% r

# and then an approximate Hessian matrix
H = J %*% t(J)

Extending this further, one can now write a generic model object like so:

ModelObject = setRefClass('ModelObject', 
  fields = list(
    name = 'character',
    expr = 'expression'
  ),
  methods = list(
    value = function(p, data){
      eval(.self$expr, c(as.list(p), as.list(data)))
    },
    jacobian = function(p, data){
      J = t(sapply(all.vars(.self$expr), function(v, p, data){
              eval(D(.self$expr, v), c(as.list(p), as.list(data)))
            }, p=p, data=data))

      return(J[names(p),,drop=F])
    },
    gradient = function(p, data){
        r = data$y - value(p, data)
        return(-jacobian(p, data) %*% r)
    },
    hessian = function(p, data){
      J = jacobian(p, data)
      return(J %*% t(J))
    }
  )
)

which is instantiated with simply an expression and can be used to provide gradient and Hessian functions to nlminb():

> xy = list(x=seq(0,10,by=0.25), y=dnorm(seq(0,10,by=0.25),10,2)) # test data
> p0 = c(y0=0.01, u =0.2, l=5, A=log(1.5/0.01))
> mo = ModelObject(
         name = 'gompertz', 
         expr = expression( y0*exp(A*exp(-exp((u*exp(1)/A)*(l-x)+1))) )
       )

> fit = nlminb(p0, function(p, data){
        r = data$y - mo$value(p,data)
        return(r %*% r)
    }, gradient = mo$gradient, hessian = mo$hessian, data=xy)

> fit$par
         y0           u           l           A 
0.001125766 1.345796890 3.646340494 5.408138500 

> fit$message
[1] "relative convergence (4)"

> plot(xy, main='Fit Results'); lines(xy$x, mo$value(fit$par, xy))

Painless!

Written with StackEdit.