Someone needs to explain to me why this is a Nature publication ...
Granted it is Nature Protocols which started its publication run when I was in grad school and is likely less high-impact as full blown Nature. Seriously tough - using the Solver Add-in in Excel to do non-linear curve fitting gets published? If a younger version of myself knew about this, I'd have a Nature publication as a junior in undergrad.
It should be noted that such Nature worthy protocols are accomplished in the following 6 lines of R code (in this case for fitting a Michaelis-Menton curve):
f = formula(v ~ vmax*s/(km + s))
data = data.frame(
s = c(1250, 625, 313, 156, 78, 39, 20, 10),
v = c(148.68, 123.66, 96.84, 69.76, 45.54, 27.84, 16.38, 9.24)
)
fit = nls(f, data=data, start=list(vmax=1, km=1),
algorithm='port', trace=F, lower=0)
summary(fit)
(note: I'm including the blank line in my 6-line count)
On a similar thread, does that mean I can get a publication for this?
On a whim, I thought it might be fun to try to implement a PID control algorithm ... in R.
Yes, I know I have a strange idea of fun.
Background
PID control is a heuristic method of automatically controlling processes as wide ranging as water levels in tanks to the direction of ships running against a current. What's important is that you don't need to know much about a process to control it with a PID controller. Hence, why it is so broadly used. I studied the theory behind it in college and even built a LabVIEW UI for use in a student teaching lab.
For more detailed information, the obligatory Wikipedia link should be a good starting point. In fact, that's where I got all the information I needed since my control theory text is currently buried in a closet somewhere.
A PID controller produces an output (aka a "Manipulated Variable", MV(t) or u(t)) - usually from 0 to 100% - based on a weighted sum of three terms:
Error between where you want the process to be (a setpoint) and where the process is
How much error has accumulated over time
How rapidly the error is changing
which naturally leads to the P, I, and D in the name and the three weighting parameters (gains) that need to be tuned:
Proportional
Integral
Derivative
The math for this looks like:
Don't be scared, it isn't as gnarly as it looks.
For the moment, let's ignore the Kp on the outside while we sort out the individual parts being added together.
The main variable in the equation above is e(t). This is the error between the process variable PV(t) - the thing you want to control - and the setpoint SP(t) - the value you want the process to be at. So this makes e(t):
e(t) = PV(t) - SP(t)
This happens to also be the first term - the Proportional response. This should make sense - move the control by about as much error that you currently see.
The second term - the Integral response - isn't as ugly as it looks. Integrals can be thought of as the area under a curve - in this case e(t). Numerically, this simplifies to the sum of e(t)*dt, where dt is the amount of time between readings of the process variable. To complete the Integral response, this sum is divided by Ti, the integral time, which is an estimate of how long it will take to abolish all the accumulated error. In terms of the overall sum, this adds more control output to account for error that was previously "missed".
Last is the third and final term - the Derivative response. Here the derivative is of e(t) with respect to time - or how fast is e(t) changing at time t. To compute this, take the difference between the current value of e(t) and its last value e(t-dt) and divide it by dt. To complete the Derivative response, multiply the derivative value by Td, the derivative time, which is an estimate of how far in the future the error can be reliably predicted. This term adds control output to account for error that might happen.
All of the above is multiplied by Kp, the proportional gain, which scales the whole control response.
Code
Process
First things first, we need a process variable.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The code above is a simple function that simulates a process that, uncontrolled, will grow exponentially over time. For added realism, I've included some uniform noise to the process growth. The value of the control variable (in this case `u`) decreases the process value.
Controller
To code the controller for the above process, we first need to initialize a few variables:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In the interest of book-keeping, I'm storing the values of the Integral and Derivative responses in vectors EI and ED, respectively. This isn't necessary, and indeed if this were a continuously operating controller, I wouldn't want to for the sake of conserving memory space.
Next a setpoint. Below is code to create a setpoint profile that has a few step changes in time.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
From here on, it's a matter of running a simulation loop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Folks savvy in calculus may note that my implementation of the integral isn't entirely textbook - meaning, I'm not using the Trapezoid rule or Simpson's rule to compute a more accurate area for each time step. This is on purpose, making the algorithm easier to implement. It also assumes that my time step, dt, is small enough that this doesn't matter much.
Running the simulation and plotting the necessary variables, shows that the controller works reasonably well. Note, I disabled the noise in the process variable for this first run.
Adding noise to the process doesn't affect the results much, just a noisier output in U (the manipulated variable):
Using a more complicated setpoint profile:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Set operations are a common place thing to do in R, and the enabling functions in the base stats package are:
intersect(x, y)
union(x, y)
setdiff(x, y)
setequal(x, y)
That said, you'll note that each ONLY takes two arguments - i.e. set X and set Y - which was a limitation I needed to overcome recently.
Googling around I found that you can apply set operations to multiple (>2) sets using the Reduce() function. For example, an intersection of sets stored in the list x would be achieved with:
Reduce(intersect, x)
However, things get trickier if you want to do a setdiff() in this manner - that is find the elements that are only in set a, but not in sets b, c, d, ..., etc. Since I'm not a master at set theory, I decided to write my own, brute force method to get intersections, unions, and setdiffs for an arbitrary number of sets.
Implementation
The function I wrote uses a truth table to do this where:
rows are the union of all elements in all sets
columns are sets
So each "cell" of the table is TRUE if the element represented by the row is in the set represented by the column.
To find an element that intersects all sets, the values across the row need to be all TRUE.
To find an element that is only in one set, only one row value is TRUE. To determine which set that element is in, numeric values are applied to each column such that:
col1 = 1
col2 = 2
col3 = 4
col4 = 8
... and so on
The values above are multiplied by each row's TRUE/FALSE (ergo: 1/0) values and summed across to produce a row's inclusion value.
For example, if an element is only in set 2 (that is column 2) the corresponding row inclusion value is 2. If an element is in both sets 2 and 3 (columns 2 and 3) the corresponding row inclusion value is 6.
Visually ...
1
2
4
set:
x
y
z
inc.val
a
T
F
F
1
b
F
T
T
6
c
T
F
T
5
d
T
T
T
7
Thus, determining intersections or setdiffs becomes a simple matter of filtering by row sums.
Code
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters