safe version of subset
As subset()
manual states:
Warning : This is a convenience function intended for use interactively
I learned from this great article not only the secret behind this warning, but a good understanding of substitute()
, match.call()
, eval()
, quote()
, call
, promise
and other related R subjects, that are a little bit complicated.
Now I understand what's the warning above for. A super-simple implementation of subset()
could be as follows:
subset = function(x, condition) x[eval(substitute(condition), envir=x),]
While subset(mtcars, cyl==4)
returns the table of rows in mtcars
that satisfy cyl==4
, enveloping subset()
in another function fails:
sub = function(x, condition) subset(x, condition)
sub(mtcars, cyl == 4)
# Error in eval(expr, envir, enclos) : object 'cyl' not found
Using the original version of subset()
also produces exactly the same error condition. This is due to the limitation of substitute()-eval()
pair: It works fine while condition
is cyl==4
, but when the condition
is passed through the enveloping function sub()
, the condition
argument of subset()
will be no longer cyl==4
, but the nested condition
in the sub()
body, and the eval()
fails - it's a bit complicated.
But does it exist any other implementation of subset()
with exactly the same arguments that would be programming-safe - ie able to evaluate its condition while it's called by another function?
Just because it's such mind-bending fun (??), here is a slightly different solution that addresses a problem Hadley pointed to in comments to my accepted solution.
Hadley posted a gist demonstrating a situation in which my accepted function goes awry. The twist in that example (copied below) is that a symbol passed to SUBSET()
is defined in the body (rather than the arguments) of one of the calling functions; it thus gets captured by substitute()
instead of the intended global variable. Confusing stuff, I know.
f <- function() {
cyl <- 4
g()
}
g <- function() {
SUBSET(mtcars, cyl == 4)$cyl
}
f()
Here is a better function that will only substitute the values of symbols found in calling functions' argument lists. It works in all of the situations that Hadley or I have so far proposed.
SUBSET <- function(`_dat`, expr) {
ff <- sys.frames()
n <- length(ff)
ex <- substitute(expr)
ii <- seq_len(n)
for(i in ii) {
## 'which' is the frame number, and 'n' is # of frames to go back.
margs <- as.list(match.call(definition = sys.function(n - i),
call = sys.call(sys.parent(i))))[-1]
ex <- eval(substitute(substitute(x, env = ll),
env = list(x = ex, ll = margs)))
}
`_dat`[eval(ex, envir = `_dat`),]
}
## Works in Hadley's counterexample ...
f()
# [1] 4 4 4 4 4 4 4 4 4 4 4
## ... and in my original test cases.
sub <- function(x, condition) SUBSET(x, condition)
sub2 <- function(AA, BB) sub(AA, BB)
a <- SUBSET(mtcars, cyl == 4) ## Direct call to SUBSET()
b <- sub(mtcars, cyl == 4) ## SUBSET() called one level down
c <- sub2(mtcars, cyl == 4)
all(identical(a, b), identical(b, c))
# [1] TRUE
IMPORTANT: Please note that this still is not (nor can it be made into) a generally useful function. There's simply no way for the function to know which symbols you want it to use in all of the substitutions it performs as it works up the call stack. There are many situations in which users would want it to use the values of symbols assigned to within function bodies, but this function will always ignore those.
The [ function is what you're looking for. ?"[". mtcars[mtcars$cyl == 4,]
is equivalent to the subset command and is "programming" safe.
sub = function(x, condition) {
x[condition,]
}
sub(mtcars, mtcars$cyl==4)
Does what you're asking without the implicit with()
in the function call. The specifics are complicated, however a function like:
sub = function(x, quoted_condition) {
x[with(x, eval(parse(text=quoted_condition))),]
}
sub(mtcars, 'cyl==4')
Sorta does what you're looking for, but there are edge cases where this will have unexpected results.
using data.table
and the [
subset function you can get the implicit with(...)
you're looking for.
library(data.table)
MT = data.table(mtcars)
MT[cyl==4]
there are better, faster ways to do this subsetting in data.table
, but this illustrates the point well.
using data.table
you can also construct expressions to be evaluated later
cond = expression(cyl==4)
MT[eval(cond)]
these two can now be passed through functions:
wrapper = function(DT, condition) {
DT[eval(condition)]
}
Here's an alternative version of subset()
which continues to work even when it's nested -- at least as long as the logical subsetting expression (eg cyl == 4
) is supplied to the top-level function call.
It works by climbing up the call stack, substitute()
ing at each step to ultimately capture the logical subsetting expression passed in by the user. In the call to sub2()
below, for example, the for
loop works up the call stack from expr
to x
to AA
and finally to cyl ==4
.
SUBSET <- function(`_dat`, expr) {
ff <- sys.frames()
ex <- substitute(expr)
ii <- rev(seq_along(ff))
for(i in ii) {
ex <- eval(substitute(substitute(x, env=sys.frames()[[n]]),
env = list(x = ex, n=i)))
}
`_dat`[eval(ex, envir = `_dat`),]
}
## Define test functions that nest SUBSET() more and more deeply
sub <- function(x, condition) SUBSET(x, condition)
sub2 <- function(AA, BB) sub(AA, BB)
## Show that it works, at least when the top-level function call
## contains the logical subsetting expression
a <- SUBSET(mtcars, cyl == 4) ## Direct call to SUBSET()
b <- sub(mtcars, cyl == 4) ## SUBSET() called one level down
c <- sub2(mtcars, cyl == 4) ## SUBSET() called two levels down
identical(a,b)
# [1] TRUE
> identical(a,c)
# [1] TRUE
a[1:5,]
# mpg cyl disp hp drat wt qsec vs am gear carb
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
# Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
# Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
** For some explanation of the construct inside the for
loop, see Section 6.2, paragraph 6 of the R Language Definition manual.
上一篇: R列表到数据帧
下一篇: 安全版本的子集