Extract rows from a data.frame based on common values with a list

2018-07-01 12:06:46

I'm looking for an easy way for filtering rows from a data.frame, based on a list of numeric sequences.

Here's a exemple:

My initial data frame:

data <- data.frame(x=c(0,1,2,0,1,2,3,4,5,12,2,0,10,11,12,13),y="other_data")

My list:

list1 <- list(1:5,10:13)

My goal is to keep only the rows from "data" which contains exactly the same numeric sequences of "list1" as in the "x" column of "data". So the output data.frame should be:

finaldata <- data.frame(x=c(1:5,10:13),y="other_data")

Any ideas for doing this?

我开始使用自定义函数来获取一个序列的子集，然后用lapply进行扩展很容易。

#function that takes sequence and a vector
#and returns indices of vector that have complete sequence
get_row_indices<- function(sequence,v){
  #get run lengths of whether vector is in sequence
  rle_d <- rle(v %in% sequence)
  #test if it's complete, so both v in sequence and length of 
  #matches is length of sequence
  select <- rep(length(sequence)==rle_d$lengths &rle_d$values,rle_d$lengths)

  return(select)

}


#add row ID to data to show selection
data$row_id <- 1:nrow(data)
res <- do.call(rbind,lapply(list1,function(x){
  return(data[get_row_indices(sequence=x,v=data$x),])
}))

res

> res
    x          y row_id
5   1 other_data      5
6   2 other_data      6
7   3 other_data      7
8   4 other_data      8
9   5 other_data      9
13 10 other_data     13
14 11 other_data     14
15 12 other_data     15
16 13 other_data     16

为什么不使用zoo rollapply ：

library(zoo)

ind = lapply(list1, function(x) {
    n = length(x)
    which(rollapply(data$x, n, function(y) all(y==x))) + 0:(n-1)
})

data[unlist(ind),]
#x          y
#5   1 other_data
#6   2 other_data
#7   3 other_data
#8   4 other_data
#9   5 other_data
#13 10 other_data
#14 11 other_data
#15 12 other_data
#16 13 other_data

extract_fun <- function(x, dat){
  # Index where the sequences start
  ind <- which(dat == x[1])
  # Indexes (within dat) where the sequence should be
  ind_seq <- lapply(ind, seq, length.out = length(x))
  # Extract the values from dat at the position
  dat_val <- mapply(`[`, list(dat), ind_seq)
  # Check if values within dat == those in list1
  i <- which(as.logical(apply(dat_val, 2, all.equal, x))) # which one is equal?
  # Return the correct indices
  ind_seq[[i]]
}

Get the indices per Item in list1 and combine them to the needed indices

all_ind <- do.call(c, lapply(list1, extract_fun, data$x))
data[all_ind,]

Result:

    x          y
5   1 other_data
6   2 other_data
7   3 other_data
8   4 other_data
9   5 other_data
13 10 other_data
14 11 other_data
15 12 other_data
16 13 other_data

链接地址: http://www.djcxy.com/p/87850.html

上一篇: 从嵌套ItemsControl内指定完整绑定路径

下一篇: 根据列表中的公共值从data.frame中提取行