How to Tell Which Packages I am Not Using in My R Script

How can I tell which packages I am not using in my R script?

Update 2020-04-13

I've now updated the referenced function to use the abstract syntax tree (AST) instead of using regular expressions as before. This is a much more robust way of approaching the problem (it's still not completely ironclad). This is available from version 0.2.0 of funchir, now on CRAN.


I've just got around to writing a quick-and-dirty function to handle this which I call stale_package_check, and I've added it to my package (funchir).

e.g., if we save the following script as test.R:

library(data.table)
library(iotools)
DT = data.table(a = 1:3)

Then (from the directory with that script) run funchir::stale_package_check('test.R'), we'll get:

Functions matched from package data.table: data.table

**No exported functions matched from iotools**

How can I see what are packages being used for in an R script, and which packages are currently not used?

I have been looking for a clear answer to this and finally, building on the useful function pointed out by @eh21 here, I built up this small approach that fits the intention with 3 lines of code and that can be replicated by anyone (and with this I mean by non-experienced programmes like me) on their case with no effort.

The principle is to use this approach after the packages have been loaded and before the actual project code (i.e. no need for it to be run in order to get the desired information), as below:

# Load packages ----

packageload <- c("ggplot2", "readxl")
lapply(packageload, library, character.only = TRUE)

# Find which packages do used functions belong to ----

used.functions <- NCmisc::list.functions.in.file(filename = "thisfile.R", alphabetic = FALSE) |> print()

# Find which loaded packages are not used ----

used.packages <- used.functions |> names() |> grep(pattern = "packages:", value = TRUE) |> gsub(pattern = "package:", replacement = "") |> print()

unused.packages <- packageload[!(packageload %in% used.packages)] |> print()

# Actual project code (no need to be run) ----

ggplot(diamonds, aes(x = cut)) +
geom_bar()

The relevant outputs are:

> used.packages
[1] "base" "ggplot2"

> used.functions
$`character(0)`
[1] "list.functions.in.file"

$`package:base`
[1] "c" "lapply" "print" "names" "grep" "gsub"

$`package:ggplot2`
[1] "ggplot" "aes" "geom_bar"

> unused.packages
[1] "readxl"

Notes:

  • This requires install.packages("NCmisc"), however I didn't load that package (and used :: instead) for consistency, as it shouldn't appear among the used.packages;
  • if using RStudio and wanting to apply this to multiple scripts, using rstudioapi::getSourceEditorContext()$path instead of "thisfile.R" in NCmisc::list.functions.in.file will be handy.
  • The approach above works for the case in which lapply() is used on a named object to load packages. If packages are instead loaded without resorting to a named object (e.g. with a series of library() or require()), the # Load packages ---- section of the code above can be modified as follows:
# Load packages ----

packageload <- search()

library(ggplot2)
library(readxl)

packageload <- search()[!(search() %in% packageload)] |> grep(pattern = "package:", value = TRUE) |> gsub(pattern = "package:", replacement = "")

determine which packages are used

An answer based on ideas in the question comments. The key functions are getParseData() and packageName().

# create an R file that uses a few functions

fileConn<-file("test.R")
writeLines(c("df <- data.frame(v1=c(1, 1, 1), v2=c(1, 2, 3))",
"\n",
"m <- mean(df$v2)",
"\n",
"describe(df) #psych package"),
fileConn)
close(fileConn)

# getParseData approach
pkg <- getParseData(parse("test.R"))
pkg <- pkg[pkg$token=="SYMBOL_FUNCTION_CALL",]
pkg <- pkg[!duplicated(pkg$text),]
pkgname <- pkg$text
pkgname
# [1] "data.frame" "c" "mean" "describe"

# load all probable packages first
pkgList <- list(pkgname)
for (i in 1:length(pkgname)) {
try(print(packageName(environment(get(pkgList[[1]][i])))))
}

#[1] "base"
#Error in packageName(environment(get(pkgList[[1]][i]))) :
# 'env' must be an environment
#[1] "base"
#[1] "psych"

I'll mark this as correct for now, but happy to consider other solutions.

How to find out which package version is loaded in R?

You can use sessionInfo() to accomplish that.

> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] graphics grDevices utils datasets stats grid methods base

other attached packages:
[1] ggplot2_0.9.0 reshape2_1.2.1 plyr_1.7.1

loaded via a namespace (and not attached):
[1] colorspace_1.1-1 dichromat_1.2-4 digest_0.5.2 MASS_7.3-18 memoise_0.1 munsell_0.3
[7] proto_0.3-9.2 RColorBrewer_1.0-5 scales_0.2.0 stringr_0.6
>

However, as per comments and the answer below, there are better options

> packageVersion("snow")

[1] ‘0.3.9’

Or:

"Rmpi" %in% loadedNamespaces()

Elegant way to check for missing packages and install them?

Yes. If you have your list of packages, compare it to the output from installed.packages()[,"Package"] and install the missing packages. Something like this:

list.of.packages <- c("ggplot2", "Rcpp")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)

Otherwise:

If you put your code in a package and make them dependencies, then they will automatically be installed when you install your package.

How to stop a R package from executing an .R script?

If you look at the Writing R Extensions manual for packages, it offers three basic steps: R CMD build to create a tarball, R CMD INSTALL to install it (not your goal here) and R CMD check to check it during development. They all offer numerous switches to tweak the behaviour. Use those -- i.e. I often do R CMD check --no-manual --no-vignettes to skip pdf / latex part.

And R CMD check has the very --no-examples flag you are looking for. I am not an active user of devtools but I would suspect it also offers you a pass-through of those options. And, worst case, if it doesn't, just use the standard tools. (In RStudio you will find a toggle, and you can set options to the R CMD ... calls as you would on the command-line.)

(In the narrow sense of stopping examples, I keep forgetting what is current but you can try all of \dontrun{}, \donttest{}, ... as well as explicit conditioning on an environment variable you set. All of that will be visible in the code and may not be what you want to show in your documentation though.)



Related Topics



Leave a reply



Submit