Extract rows from a data.frame based on common values with a list
I'm looking for an easy way for filtering rows from a data.frame, based on a list of numeric sequences.
Here's a exemple:
My initial data frame:
data <- data.frame(x=c(0,1,2,0,1,2,3,4,5,12,2,0,10,11,12,13),y="other_data")
My list:
list1 <- list(1:5,10:13)
My goal is to keep only the rows from "data" which contains exactly the same numeric sequences of "list1" as in the "x" column of "data". So the output data.frame should be:
finaldata <- data.frame(x=c(1:5,10:13),y="other_data")
Any ideas for doing this?
我开始使用自定义函数来获取一个序列的子集,然后用lapply进行扩展很容易。
#function that takes sequence and a vector
#and returns indices of vector that have complete sequence
get_row_indices<- function(sequence,v){
#get run lengths of whether vector is in sequence
rle_d <- rle(v %in% sequence)
#test if it's complete, so both v in sequence and length of
#matches is length of sequence
select <- rep(length(sequence)==rle_d$lengths &rle_d$values,rle_d$lengths)
return(select)
}
#add row ID to data to show selection
data$row_id <- 1:nrow(data)
res <- do.call(rbind,lapply(list1,function(x){
return(data[get_row_indices(sequence=x,v=data$x),])
}))
res
> res
x y row_id
5 1 other_data 5
6 2 other_data 6
7 3 other_data 7
8 4 other_data 8
9 5 other_data 9
13 10 other_data 13
14 11 other_data 14
15 12 other_data 15
16 13 other_data 16
为什么不使用zoo
rollapply
:
library(zoo)
ind = lapply(list1, function(x) {
n = length(x)
which(rollapply(data$x, n, function(y) all(y==x))) + 0:(n-1)
})
data[unlist(ind),]
#x y
#5 1 other_data
#6 2 other_data
#7 3 other_data
#8 4 other_data
#9 5 other_data
#13 10 other_data
#14 11 other_data
#15 12 other_data
#16 13 other_data
extract_fun <- function(x, dat){
# Index where the sequences start
ind <- which(dat == x[1])
# Indexes (within dat) where the sequence should be
ind_seq <- lapply(ind, seq, length.out = length(x))
# Extract the values from dat at the position
dat_val <- mapply(`[`, list(dat), ind_seq)
# Check if values within dat == those in list1
i <- which(as.logical(apply(dat_val, 2, all.equal, x))) # which one is equal?
# Return the correct indices
ind_seq[[i]]
}
Get the indices per Item in list1
and combine them to the needed indices
all_ind <- do.call(c, lapply(list1, extract_fun, data$x))
data[all_ind,]
Result:
x y
5 1 other_data
6 2 other_data
7 3 other_data
8 4 other_data
9 5 other_data
13 10 other_data
14 11 other_data
15 12 other_data
16 13 other_data
链接地址: http://www.djcxy.com/p/87850.html