parallel processing in R using snow
I have 1000's of list and each list has multiple time series. I would like to apply forecasting to each element in the list. This has became an intractable problem interms of computing resources. I don't have backgrounder in parallel computing or advanced R programming. Any help would be greatly appreciated.
I have created dummy list. Basically, dat.list is similar to what I'm working on.
library("snow")
library("plyr")
library("forecast")
## Create Dummy Data
z <- ts(matrix(rnorm(30,10,10), 100, 3), start = c(1961, 1), frequency = 12)
lam <- 0.8
ap <- list(z=z,lam=lam)
## forecast using lapply
z <- ts(matrix(rnorm(30,10,10), 100, 3), start = c(1971, 1), frequency = 12)
lam <- 0.5
zp <- list(z=z,lam=lam)
dat.list <- list(ap=ap,zp=zp)
xa <- proc.time()
tt <- lapply(dat.list,function(x) lapply(x$z,function(y) (forecast::ets(y))))
xb <- proc.time()
The above code gives me what I need. I would like apply parrallel processing to both lapply in the code above. So I have attempted to use snow package and an example shown in this site.
## Parallel Processing
clus <- makeCluster(3)
custom.function <- function(x) lapply(x$z,function(y) (forecast::ets(y)))
clusterExport(clus,"custom.function")
x1 <- proc.time()
tm <- parLapply(clus,dat.list,custom.function)
x2<-proc.time()
stopCluster(clus)
Below are my questions,
Non parallel:
summary(tt)
Length Class Mode
ap 3 -none- list
zp 3 -none- list
Parallel Version:
summary(tm)
Length Class Mode
ap 300 -none- list
zp 300 -none- list
My second question is how should I parallelize the lapply in the custom function, basically a nested parLapply
custom.function <- function(x) parLapply(clus,x$z,function(y) (forecast::ets(y))) ## Not working
Many Thanks for your help
The problem is that the forecast
package isn't loaded on the cluster workers which causes lapply
to iterate over the ts
objects incorrectly. You can load forecast
on the workers using clusterEvalQ
:
clusterEvalQ(clus, library(forecast))
To answer your second question, your attempt at nested parallelism failed because the workers don't have snow
loaded or clus
defined. But if you have 1000's of lists then you should have plenty of ways to keep all of your cores busy without worrying about nested parallelism. You're more likely to hurt your performance rather than help it, and it doesn't seem necessary.
上一篇: 如何查看某个进程在具有Sun Grid Engine的群集上使用的节点数量?
下一篇: 在R中使用雪并行处理