small ggplot object (1 mb) turns into 7 gigabyte .Rdata object when saved
I stumbled upon this problem as well. This is indeed related to the environment. If you want to save your plots as an Rdata
file, then you should be creating a new environment inside the function that is generating your plot, so that the complete environment doesn't get saved. Example:
makePlot <- function(plot.data){
env <- new.env(parent = globalenv())
env$subset <- plot.data
my.plot <- with(env, {
my.plot <- ggplot(subset, ...)
return(my.plot)
})
return(my.plot)
}
RDS file size difference between ggplot2 objects created inside vs. outside function
This one is tricky. My initial advice was to use pryr::object_size()
, which is more thorough about including the size of objects stored in the environment of an object, but that shows only a tiny difference between the two ggplot
objects.
However, ggplot
objects contain an environment, the $plot_env
component, the contents of which will get stored along with the object.
The environment of p2$plot_env
is that corresponding to the inside of your function:
ls(p2$plot_env)
# [1] "p" "y"
while the environment of p1$plot_env
is the global environment, which contains a copy of the data as well as the other plot object ...
ls(p1$plot_env)
# [1] "data" "p1" "p2" "plot_fun"
But this still seems a bit mysterious to me. p1
(with more objects in its environment) creates the smaller file size (7.4M), while p2
(with fewer objects) creates the larger file size (22M), and p1
naively seems to have more stuff stored:
sapply(p1$plot_env,object.size)
## plot_fun p1 p2 data
## 6568 8004632 8004632 8000728
sapply(p2$plot_env,object.size)
## p y
## 8004632 8000728
Is this some kind of recursive nightmare where environments are referencing other environments, which all have to get stored? As @Chris says:
p2
's environment has a parent environment of the global environment, whilep1
's environment is the global environment...I imag[in]e what is happening is that, when R needs to serialize an environment that inherits from another env (i.e., a parent env), it saves the parent env along with the child. That would explain why savingp1
would result in a smaller file size as compared top2
If I replace the plotting environment of p2
with the global environment, the file size does get smaller ... and I think I didn't break the plotting object.
p2$plot_env <- p1$plot_env
saveRDS(p2, "plot2.rds")
system("ls -lht plot?.rds")
## -rw-r--r-- 1 bolker staff 7.4M 15 Jun 20:15 plot2.rds
## -rw-r--r-- 1 bolker staff 7.4M 15 Jun 20:14 plot1.rds
If your workflow allows it, you might consider storing rendered versions of these plots (as PDF/SVG/whatever) rather than the plot objects themselves ... although the plot objects are certainly more flexible.
Related Topics
How to Create a Rank Variable Under Certain Conditions
R: How to Filter a Timestamp by Hour and Minute
How to Round Percentage to 2 Decimal Places in Ggplot2
"Nas Introduced by Coercion" During Cluster Analysis in R
Loop with a Defined Ggplot Function Over Multiple Dataframes
R: How to Expand a Row Containing a "List" to Several Rows...One for Each List Member
How to Plot Multiple Lines in R
Shiny Slider Customized Values
How to Filter an R Simple Features Collection Using Sf Methods Like St_Intersects()
Get Tick Break Positions in Ggplot
Get Data Out of a Tcltk Function
Getting Stargazer Column Labels to Print on Two or Three Lines
Creating Categorical Variables from Mutually Exclusive Dummy Variables
How to Remove Rows with Nas Only If They Are Present in More Than Certain Percentage of Columns
Adding an Image to Shiny Action Button
R: How to Get Row and Column Names of The True Elements of a Matrix
Create New Variable by Multiple Conditions via Mutate Case_When