Shared Memory in Parallel Foreach in R

Shared memory in parallel foreach in R

I think the solution to the problem can be seen from the post of Steve Weston, the author of the foreach package, here. There he states:

The doParallel package will auto-export variables to the workers that are referenced in the foreach loop.

So I think the problem is that in your code your big matrix c is referenced in the assignment c<-m[1,1]. Just try xyz <- m[1,1] instead and see what happens.

Here is an example with a file-backed big.matrix:

#create a matrix of size 1GB aproximatelly
n <- 10000
m <- 10000
c <- matrix(runif(n*m),n,m)
#convert it to bigmatrix
x <- as.big.matrix(x = c, type = "double", 
                 separated = FALSE, 
                 backingfile = "example.bin", 
                 descriptorfile = "example.desc")
# get a description of the matrix
mdesc <- describe(x)
# Create the required connections    
cl <- makeCluster(detectCores ())
registerDoSNOW(cl)
## 1) No referencing
out <- foreach(linID = 1:4, .combine=c) %dopar% {
  t <- attach.big.matrix("example.desc")
  for (i in seq_len(30L)) {
    for (j in seq_len(m)) {
      y <- t[i,j]
    }
  }
  return(0L)
}

Sample Image

## 2) Referencing
out <- foreach(linID = 1:4, .combine=c) %dopar% {
  invisible(c) ## c is referenced and thus exported to workers
  t <- attach.big.matrix("example.desc")
  for (i in seq_len(30L)) {
    for (j in seq_len(m)) {
      y <- t[i,j]
    }
  }
  return(0L)
}
closeAllConnections()

Sample Image

R converting code to run in shared memory

With foreach, you can do

ncores <- parallel::detectCores(logical = FALSE)
cl <- parallel::makeCluster(ncores)
doParallel::registerDoParallel(cl)
library(foreach)
wss_values2 <- foreach(k = k.values, .combine = 'c') %dopar% {
  kmeans(df, k, nstart = 10)$tot.withinss
}
parallel::stopCluster(cl)

If you wrap the kmeans call in a function, you need to pass all the variables as arguments (df and k).

R parallel shared memory object (windows)

Since R isn't multithreaded, parallel workers are implemented as processes in the various parallel programming packages. One of the features of processes is that their memory is protected from other processes, so programs have to use special mechanisms to share memory between different processes, such as memory mapped files. Since R doesn't have direct, builtin support for any such mechanism, packages such as "bigmemory" have been written that let you create objects that can be shared between different processes. Unfortunately, the "data.table" package doesn't support such a mechanism, so I don't think there is a way to do what you want.

Note that memory can be "read-only" shared between a process and a forked child process on Posix operating systems (such as Mac OS X and Linux), so you could sort of do what you want using the "doMC" backend, but that doesn't work on Windows, of course.

Parallelization using shared memory [bigmemory]

If the problem is a concurrent access to the big.matrix descriptor file, you can just pass the descriptor object (with describe) rather than the descriptor file which contains the object.

Explanation:
When attaching from the descriptor file, it first creates the big.matrix.descriptor object and then attach the big.matrix from this object. So, if you use directly the object, it will be copied to all your clusters and you can attach the big.matrix from them.

R: nested foreach performs worse than non-parallel method?

OK, I myself managed to parallelize the outer loop.
first step is as easy as anyone would intuitively think of, use foreach()%dopar%{foreach()%do%{}}
structure.
This will not work unless go with the most importantly step - add .packages = c("doSNOW") in the outer
loop augment, otherwise there will be a "doSNOW function not found"
sort of error.
Then free to enjoy the efficiency of the real parallel with each core doing one line of works, instead of having all cores waiting for combinations of every 100 results and also of each inner loops.

for inside foreach parallel not populating a dataframe in R

I am definetly not sure if I got the problem, could you also provide an Elec1 in your data Example?

An idea:
Foreach might not find df, you could create the data frame at the beginning of your loop with something like

df <- data.frame('Elec1'=rep(NA,18),'Elec2'=rep(NA,18),'est'=rep(NA,18),'ste'=rep(NA,18))

maybe add then below in the for loop: df[j,c('est','ste')] <- c(est,ste)

Memory issue with foreach loop in R on Windows 8 (64-bit) (doParallel package)

Iterators can help to reduce the amount of memory that needs to be passed to the workers of a parallel program. Since you're using the data.table package, it's a good idea to use iterators and combine functions that are optimized for data.table objects. For example, here is a function like isplit that works on data.table objects:

isplitDT <- function(x, colname, vals) {
  colname <- as.name(colname)
  ival <- iter(vals)
  nextEl <- function() {
    val <- nextElem(ival)
    list(value=eval(bquote(x[.(colname) == .(val)])), key=val)
  }
  obj <- list(nextElem=nextEl)
  class(obj) <- c('abstractiter', 'iter')
  obj
}

Note that it isn't completely compatible with isplit, since the arguments and return value are slightly different. There may also be a better way to subset the data.table, but I think this is more efficient than using isplit.

Here is your example using isplitDT and a combine function that uses rbindlist which combines data.tables faster than rbind:

dtcomb <- function(...) {
  rbindlist(list(...))
}

results <- 
  foreach(dt.sub=isplitDT(dt.all, 'grp', unique(dt.all$grp)),
          .combine='dtcomb', .multicombine=TRUE,
          .packages='data.table') %dopar% {
    f_lm(dt.sub$value, dt.sub$key)
  }

Update

I wrote a new iterator function called isplitDT2 which performs much better than isplitDT but requires that the data.table have a key:

isplitDT2 <- function(x, vals) {
  ival <- iter(vals)
  nextEl <- function() {
    val <- nextElem(ival)
    list(value=x[val], key=val)
  }
  obj <- list(nextElem=nextEl)
  class(obj) <- c('abstractiter', 'iter')
  obj
}

This is called as:

setkey(dt.all, grp)
results <-
  foreach(dt.sub=isplitDT2(dt.all, levels(dt.all$grp)),
          .combine='dtcomb', .multicombine=TRUE,
          .packages='data.table') %dopar% {
    f_lm(dt.sub$value, dt.sub$key)
  }

This uses a binary search to subset dt.all rather than a vector scan, and so is more efficient. I don't know why isplitDT would use more memory, however. Since you're using doParallel, which doesn't call the iterator on-the-fly as it sends out tasks, you might want to experiment with splitting dt.all and then removing it to reduce your memory usage:

dt.split <- as.list(isplitDT2(dt.all, levels(dt.all$grp)))
rm(dt.all)
gc()
results <- 
  foreach(dt.sub=dt.split,
          .combine='dtcomb', .multicombine=TRUE,
          .packages='data.table') %dopar% {
    f_lm(dt.sub$value, dt.sub$key)
  }

This may help by reducing the amount of memory needed by the master process during the execution of the foreach loop, while still only sending the required data to the workers. If you still have memory problems, you could also try using doMPI or doRedis, both of which get iterator values as needed, rather than all at once, making them more memory efficient.

Shared Memory in Parallel Foreach in R