Limiting Memory Usage in R Under Linux

limiting memory usage in R under linux

There's unix::rlimit_as() that allows setting memory limits for a running R process using the same mechanism that is also used for ulimit in the shell. Windows and macOS not supported.

In my .Rprofile I have

unix::rlimit_as(1e12, 1e12)

to limit memory usage to ~12 GB.

Before that...

I had created a small R package, ulimit with similar functionality.

Install it from GitHub using

devtools::install_github("krlmlr/ulimit")

To limit the memory available to R to 2000 MiB, call:

ulimit::memory_limit(2000)

Now:

> rep(0L, 1e9)
Error: cannot allocate vector of size 3.7 Gb

Alternative to R's `memory.size()` in linux?

Using pryr library:

library("pryr")

mem_used()
# 27.9 MB

x <- mem_used()
x
# 27.9 MB
class(x)
# [1] "bytes"

Result is the same as @RHertel's answer, with pryr we can assign the result into a variable.

system('grep MemTotal /proc/meminfo')
# MemTotal: 263844272 kB

To assign to a variable with system call, use intern = TRUE:

x <- system('grep MemTotal /proc/meminfo', intern = TRUE)
x
# [1] "MemTotal: 263844272 kB"
class(x)
# [1] "character"

Out of memory on R using linux but not on Windows

With the help of a member from another forum (https://community.rstudio.com/t/out-of-memory-on-r-using-linux-but-not-on-windows/106549), I found the solution. The crash was a result of memory limitation in the swap partition, as speculated earlier. I increased my swap from 2 Gb to 16 Gb and now R/RStudio is able to complete the whole script. It is a quite demanding task since all of my physical memory is exhausted and nearly 15 Gb of the swap is eaten.

Increasing (or decreasing) the memory available to R processes

From:

http://gking.harvard.edu/zelig/docs/How_do_I2.html (mirror)

Windows users may get the error that R
has run out of memory.

If you have R already installed and
subsequently install more RAM, you may
have to reinstall R in order to take
advantage of the additional capacity.

You may also set the amount of
available memory manually. Close R,
then right-click on your R program
icon (the icon on your desktop or in
your programs directory). Select
``Properties'', and then select the
``Shortcut'' tab. Look for the
``Target'' field and after the closing
quotes around the location of the R
executible, add

--max-mem-size=500M

as shown in the figure below. You may
increase this value up to 2GB or the
maximum amount of physical RAM you
have installed.

If you get the error that R cannot
allocate a vector of length x, close
out of R and add the following line to
the ``Target'' field:

--max-vsize=500M

or as appropriate. You can always
check to see how much memory R has
available by typing at the R prompt

memory.limit()

which gives you the amount of available memory in MB. In previous versions of R you needed to use: round(memory.limit()/2^20, 2).

Tricks to manage the available memory in an R session

To further illustrate the common strategy of frequent restarts, we can use littler which allows us to run simple expressions directly from the command-line. Here is an example I sometimes use to time different BLAS for a simple crossprod.

 r -e'N<-3*10^3; M<-matrix(rnorm(N*N),ncol=N); print(system.time(crossprod(M)))'

Likewise,

 r -lMatrix -e'example(spMatrix)'

loads the Matrix package (via the --packages | -l switch) and runs the examples of the spMatrix function. As r always starts 'fresh', this method is also a good test during package development.

Last but not least r also work great for automated batch mode in scripts using the '#!/usr/bin/r' shebang-header. Rscript is an alternative where littler is unavailable (e.g. on Windows).

Limit memory usage for a single Linux process

There's some problems with ulimit. Here's a useful read on the topic: Limiting time and memory consumption of a program in Linux, which lead to the timeout tool, which lets you cage a process (and its forks) by time or memory consumption.

The timeout tool requires Perl 5+ and the /proc filesystem mounted. After that you copy the tool to e.g. /usr/local/bin like so:

curl https://raw.githubusercontent.com/pshved/timeout/master/timeout | \
sudo tee /usr/local/bin/timeout && sudo chmod 755 /usr/local/bin/timeout

After that, you can 'cage' your process by memory consumption as in your question like so:

timeout -m 500 pdftoppm Sample.pdf

Alternatively you could use -t <seconds> and -x <hertz> to respectively limit the process by time or CPU constraints.

The way this tool works is by checking multiple times per second if the spawned process has not oversubscribed its set boundaries. This means there actually is a small window where a process could potentially be oversubscribing before timeout notices and kills the process.

A more correct approach would hence likely involve cgroups, but that is much more involved to set up, even if you'd use Docker or runC, which among things, offer a more user-friendly abstraction around cgroups.



Related Topics



Leave a reply



Submit