parallel regression in R (maybe with snowfall)

I'm trying to run R in parallel to run a regression. I'm trying to use the snowfall library (but am open to any approach). Currently, I'm running the following regression which is taking an extremely long time to run. Can someone show me how to do this?

 sales_day_region_ctgry_lm <- lm(log(sales_out+1)~factor(region_out) 
             + date_vector_out + factor(date_vector_out) +
             factor(category_out) + mean_temp_out)

I've started down the following path:

library(snowfall)
sfInit(parallel = TRUE, cpus=4, type="SOCK")

wrapper <- function() {
return(lm(log(sales_out+1)~factor(region_out) + date_vector_out +
               factor(date_vector_out) + factor(category_out) +   mean_temp_out))
}

output_lm <- sfLapply(*no idea what to do here*,wrapper)
sfStop()
summary(output_lm)

But this approach is riddled with errors.

Thanks!


The partools package offers an easy, off-the-shelf implementation of parallelised linear regression via its calm() function. (The "ca" prefix stands for "chunk averaging".)

In your case -- leaving aside @Roland's correct comment about mixing up factor and continuous predictors -- the solution should be as simple as:

library(partools)
#library(parallel) ## loads as dependency

cls <- makeCluster(4) ## Or, however many cores you want/have.

sales_day_region_ctgry_calm <- 
  calm(
    cls, 
    "log(sales_out+1) ~ factor(region_out) + date_vector_out + 
     factor(date_vector_out) + factor(category_out) + mean_temp_out, 
     data=YOUR_DATA_HERE"
    )

Note that the model call is described within quotation marks. Note further that you may need to randomise your data first if it is ordered in any way (eg by date.) See the partools vignette for more details.


Since you're fitting one big model (as opposed to several small models), and you're using linear regression, a quick-and-easy way to get parallelism is to use a multithreaded BLAS. Something like Microsoft R Open (previously known as Revolution R Open) should do the trick.*

* disclosure: I work for Microsoft/Revolution.

链接地址: http://www.djcxy.com/p/53434.html

上一篇: R foreach:来自单身

下一篇: R中的并行回归(可能与降雪有关)