Efficiently construct GRanges/IRanges from Rle vector
I have a Run length encoded vector representing some value at every position on the genome, in order. As a toy example suppose I had just one chromosome of length 10, then I would have a vector looking like
library(GenomicRanges)
set.seed(1)
toyData = Rle(sample(1:3,10,replace=TRUE))
I would like to coerce this into a GRanges object. The best I can come up with is
gr = GRanges('toyChr',IRanges(cumsum(c(0,runLength(toyData)[-nrun(toyData)])),
width=runLength(toyData)),
toyData = runValue(toyData))
which works, but is quite slow. Is there a faster way to construct the same object?
As @TheUnfunCat pointed out, the OP's solution is pretty solid. The solution below is only moderately faster than the original solution. I tried almost every combination of base R
and couldn't beat the efficiency Rle
from the S4Vectors
package, thus I resorted to Rcpp
. Here is the main function :
GenomeRcpp <- function(v) {
x <- WhichDiffZero(v)
m <- v[c(1L,x+1L)]
s <- c(0L,x)
e <- c(x,length(v))-1L
GRanges('toyChr',IRanges(start = s, end = e), toyData = m)
}
The WhichDiffZero
is the Rcpp
function that pretty much does the exact same thing as which(diff(v) != 0)
in base R
. Much credit goes to @G.Grothendieck.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector WhichDiffZero(IntegerVector x) {
int nx = x.size()-1;
std::vector<int> y;
y.reserve(nx);
for(int i = 0; i < nx; i++) {
if (x[i] != x[i+1]) y.push_back(i+1);
}
return wrap(y);
}
Below are some benchmarks:
set.seed(437)
testData <- do.call(c,lapply(1:10^5, function(x) rep(sample(1:50, 1), sample(1:30, 1))))
microbenchmark(GenomeRcpp(testData), GenomeOrig(testData))
Unit: milliseconds
expr min lq mean median uq max neval cld
GenomeRcpp(testData) 20.30118 22.45121 26.59644 24.62041 27.28459 198.9773 100 a
GenomeOrig(testData) 25.11047 27.12811 31.73180 28.96914 32.16538 225.1727 100 a
identical(GenomeRcpp(testData), GenomeOrig(testData))
[1] TRUE
I've been working on this off and on for the past few days and I'm definitely not satisfied. I'm hoping that someone will take what I've done (as it is a different approach) and create something much better.
链接地址: http://www.djcxy.com/p/36042.html上一篇: 代码分割/预加载内容,而用户正在浏览?