parallel processing in R using snow

I have 1000's of list and each list has multiple time series. I would like to apply forecasting to each element in the list. This has became an intractable problem interms of computing resources. I don't have backgrounder in parallel computing or advanced R programming. Any help would be greatly appreciated.

I have created dummy list. Basically, dat.list is similar to what I'm working on.

library("snow")
library("plyr")
library("forecast")

    ## Create Dummy Data

    z <- ts(matrix(rnorm(30,10,10), 100, 3), start = c(1961, 1), frequency = 12)
    lam <- 0.8
    ap <- list(z=z,lam=lam)

## forecast using lapply

    z <- ts(matrix(rnorm(30,10,10), 100, 3), start = c(1971, 1), frequency = 12)
    lam <- 0.5
    zp <- list(z=z,lam=lam)

    dat.list <- list(ap=ap,zp=zp)

    xa <- proc.time()
    tt <- lapply(dat.list,function(x) lapply(x$z,function(y) (forecast::ets(y))))
    xb <- proc.time()

The above code gives me what I need. I would like apply parrallel processing to both lapply in the code above. So I have attempted to use snow package and an example shown in this site.

  ## Parallel Processing


    clus <- makeCluster(3)
    custom.function <- function(x) lapply(x$z,function(y) (forecast::ets(y)))
    clusterExport(clus,"custom.function")

    x1 <- proc.time()
    tm <- parLapply(clus,dat.list,custom.function)
    x2<-proc.time()

    stopCluster(clus)

Below are my questions,

  • For some reason, the output of tm is differenct for the non parallel version. the forecast function ets is applied to every single data point as opposed to the element in the list.
  • Non parallel:

    summary(tt)
       Length Class  Mode
    ap 3      -none- list
    zp 3      -none- list
    

    Parallel Version:

        summary(tm)
           Length Class  Mode
        ap 300    -none- list
        zp 300    -none- list
    
  • My second question is how should I parallelize the lapply in the custom function, basically a nested parLapply

    custom.function <- function(x) parLapply(clus,x$z,function(y) (forecast::ets(y))) ## Not working

  • Many Thanks for your help


    The problem is that the forecast package isn't loaded on the cluster workers which causes lapply to iterate over the ts objects incorrectly. You can load forecast on the workers using clusterEvalQ :

    clusterEvalQ(clus, library(forecast))
    

    To answer your second question, your attempt at nested parallelism failed because the workers don't have snow loaded or clus defined. But if you have 1000's of lists then you should have plenty of ways to keep all of your cores busy without worrying about nested parallelism. You're more likely to hurt your performance rather than help it, and it doesn't seem necessary.

    链接地址: http://www.djcxy.com/p/53426.html

    上一篇: 如何查看某个进程在具有Sun Grid Engine的群集上使用的节点数量?

    下一篇: 在R中使用雪并行处理