How to Speed Up R Packages Installation in Docker

How to reduce build time for a Docker container installing R libraries?

I rewrote the last bit of your packages.R as follows:

install.packages(package_ls, Ncpus=16)

This gave me a 3x speed improvement over a run with Ncpus=1 (189s vs 719s).

Preventing repeated package installation, or pre installing packages in R

I always use:

if (!require(package)) install.packages("package")

So if the package isn't available in the library, it will be installed.

Dockerfile for R-3.4.4, How to reduce the time of docker build, it takes at least 30 min

Here some ideas that might help you:

The fastest installation would probably Ubuntu 18.04 as base together with the c4d2u PPA. That way you could install most (all?) R packages as binaries instead of installing them from source. IIRC there is even a Docker image from the rocker project that implements that idea. Another possibility would be rocker/r-ver:3.4.4, which is based on Debian.

If it has to be CentOS, you should enable EPEL. That way R 3.4.4 might be directly available. However, all the build dependencies for R are available as binaries, including a split between development and run-time packages. If you build R from source, you should use one RUN block which

  • installs runtime and development dependencies
  • downloads, configures and installs R
  • deletes R sources and development packages

For package installation I would use a single install.packages with a suitable option nCPU (sp?) to enable parallel processing. Note that the final rm -rf /tmp/* should be part of the same RUN statement.

I would use two RUN statements, one for installing R and one for installing the packages. This should not increase the image size (segnificantly). But if you only change the list of packages, you do not have to reinstall R.

Building a Docker Image a little more quickly

First, I'd suggest building a base image containing all of the tools and packages that you think you'll need. There's no need to be picky, because you only need to do this once. That's kind of the whole point of Docker -- portability and reuse.

FROM ubuntu:bionic

RUN apt-get update && apt-get install -y libxml2-dev libcurl4-openssl-dev libssl-dev r-base

RUN Rscript -e "install.packages('tidyverse')"
RUN Rscript -e "install.packages('stringr')"
...

Build that image and tag it as grader:1.0.0 or whatever.

Then, when it's time to grade, just mount the assignments and grading code using the -v, --volume option to docker run. You don't need to alter the container to make files accessible within it.

docker run \
--rm \
-it \
-v /path/to/assignments:/data/assignments \
-v /path/to/autograder:/data/autograder \
grader:1.0.0 \
/bin/bash

If at some point you need to add some packages, you can rebuild it by modifying the original Dockerfile or extend it by using it as the base of your next image:

FROM grader:1.0.0

RUN apt-get update && apt-get install -y the-package-i-forgot

Build it, tag it.

Install R packages using Dockerfile

strangely enough, with exactly the same Dockerfile, I had no problem installing the packages using the rocker/tidyverse repository instead of the rocker/rstudio. Anyone knows why is that?



Related Topics



Leave a reply



Submit