Monday, January 16, 2012

Using multiple processor cores in R-Projet

My laptop is equipped with a Core i7 processor with 4 cores that can execute in parallel 8 processes. My R-Project computations use only one single core, if I use the default instructions. I have ended up by thinking that it is a pity that other cores are just sitting idle (sort of), instead of contributing to the speed of my computations, even if I do not run yet really heavy ones in my research. As a consequence, I have started to look for an easy way to use all cores in R-project. And, indeed, there is an easy solution to this problem. It uses the doMC library, and the instructions foreach and %dopar%.
For example, for computing linear models with different dependent variables and a given set of exogenous ones, one can do the following computations:
library(doMC) # There are other parallel computing libraries
registerDoMC() # You mud register one of them for foreach
getDoParWorkers() # Indicates you how many cores have been detected by registerDoMC()

Suppose that you have a dataset called mydata, containing the dependent variables y1, y2, y3, and the independent variables, x1,x2,x3.
We can execute in parallel the estimation of linear models of each y on the set of independent variables, by executing the following code:
myVariableList <- c("y1", "y2", "y3")
results <- foreach(i = 1:length(myVariableList),.errorhandling="stop",.inorder=TRUE)
%dopar% {
model <- lm(as.formula(paste(myVariableList[i],"~x1+x2+x3")),data=mydata)
return(model)
}
%dopar% executes these estimations on different cores, in parallel and a list of the estimated models is saved in the variable results.
We can now look at the characteristics of the estimated models, by printing them successively on the output of R:

for (i in 1:length(results)) { print(summary(results[[i]])) }
Voilà!

Of course, this possibility is especially useful for more complex computations, like stepwise regressions with many independent variables, that can take some time, or regression trees with big datasets, etc.

No comments: