How to reduce build time for a Docker container installing R libraries?
I rewrote the last bit of your packages.R
as follows:
install.packages(package_ls, Ncpus=16)
This gave me a 3x speed improvement over a run with Ncpus=1
(189s vs 719s).
Preventing repeated package installation, or pre installing packages in R
I always use:
if (!require(package)) install.packages("package")
So if the package isn't available in the library, it will be installed.
Dockerfile for R-3.4.4, How to reduce the time of docker build, it takes at least 30 min
Here some ideas that might help you:
The fastest installation would probably Ubuntu 18.04 as base together with the c4d2u PPA. That way you could install most (all?) R packages as binaries instead of installing them from source. IIRC there is even a Docker image from the rocker
project that implements that idea. Another possibility would be rocker/r-ver:3.4.4
, which is based on Debian.
If it has to be CentOS, you should enable EPEL. That way R 3.4.4 might be directly available. However, all the build dependencies for R are available as binaries, including a split between development and run-time packages. If you build R from source, you should use one RUN
block which
- installs runtime and development dependencies
- downloads, configures and installs R
- deletes R sources and development packages
For package installation I would use a single install.packages
with a suitable option nCPU
(sp?) to enable parallel processing. Note that the final rm -rf /tmp/*
should be part of the same RUN
statement.
I would use two RUN
statements, one for installing R and one for installing the packages. This should not increase the image size (segnificantly). But if you only change the list of packages, you do not have to reinstall R.
Building a Docker Image a little more quickly
First, I'd suggest building a base image containing all of the tools and packages that you think you'll need. There's no need to be picky, because you only need to do this once. That's kind of the whole point of Docker -- portability and reuse.
FROM ubuntu:bionic
RUN apt-get update && apt-get install -y libxml2-dev libcurl4-openssl-dev libssl-dev r-base
RUN Rscript -e "install.packages('tidyverse')"
RUN Rscript -e "install.packages('stringr')"
...
Build that image and tag it as grader:1.0.0
or whatever.
Then, when it's time to grade, just mount the assignments and grading code using the -v, --volume
option to docker run
. You don't need to alter the container to make files accessible within it.
docker run \
--rm \
-it \
-v /path/to/assignments:/data/assignments \
-v /path/to/autograder:/data/autograder \
grader:1.0.0 \
/bin/bash
If at some point you need to add some packages, you can rebuild it by modifying the original Dockerfile or extend it by using it as the base of your next image:
FROM grader:1.0.0
RUN apt-get update && apt-get install -y the-package-i-forgot
Build it, tag it.
Install R packages using Dockerfile
strangely enough, with exactly the same Dockerfile, I had no problem installing the packages using the rocker/tidyverse repository instead of the rocker/rstudio. Anyone knows why is that?
Related Topics
Use Dygraph for R to Plot Xts Time Series by Year Only
R: Save Multiple Plots from a File List into a Single File (Png or PDF or Other Format)
Ddply Multiple Quantiles by Group
Lme4::Glmer VS. Stata's Melogit Command
Saving a List of Plots by Their Names()
How to Call the 'Function' Function
Have Lubridate Subtraction Return Only a Numeric Value
R Markdown - Format Text in Code Chunk with New Lines
Calculate Mean by Group Using Dplyr Package
A Way to Access Google Streetview from R
How to Read Knitr/Rmd Cache in Interactive Session
Align Plots Next to Each Other with Knitr
How to Count Occurrences Combinations in Data.Table in R
How to Pass Pandoc_Args to Yaml Header in Rmarkdown