My tools of the trade: Using multiple processor cores in R-Projet

Monday, January 16, 2012

Using multiple processor cores in R-Projet

My laptop is equipped with a Core i7 processor with 4 cores that can execute in parallel 8 processes. My R-Project computations use only one single core, if I use the default instructions. I have ended up by thinking that it is a pity that other cores are just sitting idle (sort of), instead of contributing to the speed of my computations, even if I do not run yet really heavy ones in my research. As a consequence, I have started to look for an easy way to use all cores in R-project. And, indeed, there is an easy solution to this problem. It uses the doMC library, and the instructions foreach and %dopar%.

For example, for computing linear models with different dependent variables and a given set of exogenous ones, one can do the following computations:

library(doMC) # There are other parallel computing libraries

registerDoMC() # You mud register one of them for foreach

getDoParWorkers() # Indicates you how many cores have been detected by registerDoMC()

Suppose that you have a dataset called mydata, containing the dependent variables y1, y2, y3, and the independent variables, x1,x2,x3.
We can execute in parallel the estimation of linear models of each y on the set of independent variables, by executing the following code:

myVariableList <- c("y1", "y2", "y3")

results <- foreach(i = 1:length(myVariableList),.errorhandling="stop",.inorder=TRUE)

%dopar% {

model <- lm(as.formula(paste(myVariableList[i],"~x1+x2+x3")),data=mydata)

return(model)

}

%dopar% executes these estimations on different cores, in parallel and a list of the estimated models is saved in the variable results.
We can now look at the characteristics of the estimated models, by printing them successively on the output of R:

for (i in 1:length(results)) { print(summary(results[[i]])) }
Voilà!

Of course, this possibility is especially useful for more complex computations, like stepwise regressions with many independent variables, that can take some time, or regression trees with big datasets, etc.

My tools of the trade

Murat Yildizoglu - Google+ Public Posts

Tags

Popular Posts

Blog Archive

Blogs and sites I like to read

Monday, January 16, 2012

Using multiple processor cores in R-Projet

No comments: