setting seed locally (not globally) in R
Something like this does it for me:
myfunction <- function () {
old <- .Random.seed
set.seed(2)
res <- runif(1)
.Random.seed <<- old
res
}
Or perhaps more elegantly:
myfunction <- function () {
old <- .Random.seed
on.exit( { .Random.seed <<- old } )
set.seed(2)
runif(1)
}
For example:
> myfunction()
[1] 0.1848823
> runif(1)
[1] 0.3472722
> myfunction()
[1] 0.1848823
> runif(1)
[1] 0.4887732
Isolate randomness of a local environment from the global R process
I thought it would be nice to have just an independent RNG inside your function, that is not affected by the global seed, but would have its own seed. Turns out, randtoolbox
offers this functionality:
library(randtoolbox)
replicate(3, {
set.seed(1)
c(runif(1), WELL(3), runif(1))
})
# [,1] [,2] [,3]
#[1,] 0.265508663 0.2655087 0.2655087
#[2,] 0.481195594 0.3999952 0.9474923
#[3,] 0.003865934 0.6596869 0.4684255
#[4,] 0.484556709 0.9923884 0.1153879
#[5,] 0.372123900 0.3721239 0.3721239
Top and bottom rows are affected by the seed, whereas middle ones are "truly random".
Based on that, here's the implementation of your function:
sample_WELL <- function(n, size=n) {
findInterval(WELL(size), 0:n/n)
}
createUniqueId_WELL <- function(bytes) {
paste(as.hexmode(sample_WELL(256, bytes) - 1), collapse = "")
}
length(unique(replicate(10000, createUniqueId_WELL(5))))
#[1] 10000
# independency on the seed:
set.seed(1)
x <- replicate(100, createGlobalUniqueId(5))
x_WELL <- replicate(100, createUniqueId_WELL(5))
set.seed(1)
y <- replicate(100, createGlobalUniqueId(5))
y_WELL <- replicate(100, createUniqueId_WELL(5))
sum(x==y)
#[1] 100
sum(x_WELL==y_WELL)
#[1] 0
Edit
To understand why we get duplicated keys, we should take a look what happens when we call set.seed(NULL)
. All RNG-related code is written in C, so we should go directly to svn.r-project.org/R/trunk/src/main/RNG.c and refer to the function do_setseed
. If seed = NULL
then clearly TimeToSeed
is called. There's a comment that states it should be located in datetime.c, however, it can be found in svn.r-project.org/R/trunk/src/main/times.c.
Navigating the R source can be difficult, so I'm pasting the function here:
/* For RNG.c, main.c, mkdtemp.c */
attribute_hidden
unsigned int TimeToSeed(void)
{
unsigned int seed, pid = getpid();
#if defined(HAVE_CLOCK_GETTIME) && defined(CLOCK_REALTIME)
{
struct timespec tp;
clock_gettime(CLOCK_REALTIME, &tp);
seed = (unsigned int)(((uint_least64_t) tp.tv_nsec << 16) ^ tp.tv_sec);
}
#elif defined(HAVE_GETTIMEOFDAY)
{
struct timeval tv;
gettimeofday (&tv, NULL);
seed = (unsigned int)(((uint_least64_t) tv.tv_usec << 16) ^ tv.tv_sec);
}
#else
/* C89, so must work */
seed = (Int32) time(NULL);
#endif
seed ^= (pid <<16);
return seed;
}
So each time we call set.seed(NULL)
, R does these steps:
- Takes current time in seconds and nanoseconds (if possible, platform dependency here in
#if defined
blocks) - Applies bit shift to nanoseconds and bit 'xor'es result with seconds
- Applies bit shift to pid and bit 'xor'es it with the previous result
- Sets the result as a new seed
Well, now it's almost obvious that we get duplicated values when the resulting seeds collide. My guess is this happens when two calls fall within 1 second, so that tv_sec is constant. To confirm that, I'm introducing a lag:
createUniqueIdWithLag <- function(bytes, lag) {
Sys.sleep(lag)
createUniqueId(bytes)
}
lags <- 1 / 10 ^ (1:5)
sapply(lags, function(x) length(unique(replicate(n, createUniqueIdWithLag(5, x)))))
[1] 1000 1000 996 992 990
What's confusing is that even the lag is substantial compared to nanoseconds, we still get collisions! Let's dig it further then, I wrote a "debugging emulator" for the seed:
emulate_seed <- function() {
tv <- as.numeric(system('echo $(($(date +%s%N)))', intern = TRUE))
pid <- Sys.getpid()
tv_nsec <- tv %% 1e9
tv_sec <- tv %/% 1e9
seed <- bitwXor(bitwShiftL(tv_nsec, 16), tv_sec)
seed <- bitwXor(bitwShiftL(pid, 16), seed)
c(seed, tv_nsec, tv_sec, pid)
}
z <- replicate(1000, emulate_seed())
sapply(1:4, function(i) length(unique(z[i, ])))
# unique seeds, nanosecs, secs, pids:
#[1] 941 1000 36 1
That is really confusing: nanoseconds are all unique, and that does not guarantee uniqueness of the final seed. To demonstrate that effect, here's one of the duplicates:
# [,1] [,2]
#[1,] -1654969360 -1654969360
#[2,] 135644672 962643456
#[3,] 1397894128 1397894128
#[4,] 2057 2057
bitwShiftL(135644672, 16)
#[1] -973078528
bitwShiftL(962643456, 16)
#[1] -973078528
The final note: the binary representation of these two numbers and the shift is
00001000000101011100011000000000 << 16 => 1100011000000000 + 16 zeroes
00111001011000001100011000000000 << 16 => 1100011000000000 + 16 zeroes
So yes, this is really an unwanted collision.
Well, after all that, the final conclusion is: set.seed(NULL)
is vulnerable to high load and does not guarantee the absence of collisions when dealing with multiple consecutive calls!
R set.seed() 's scope
set.seed
is indeed global. But note this from the example in ?set.seed
:
## If there is no seed, a "random" new one is created:
rm(.Random.seed); runif(1); .Random.seed[1:6]
This means you can call rm(.Random.seed, envir=.GlobalEnv)
either at the end of your function or after you call the function to decouple the rest of the program from the call to set.seed
in the function.
To see this in action, run the following code in two different R sessions. The outputs should be the same in both sessions. Then re-run the code again in two new R sessions with the rm
line uncommented. You'll see the output in the two new sessions now differ, indicating that the call to set.seed
in the function hasn't transferred the reproducibility to the main program.
subfun <- function() {
set.seed(100)
rnorm(1)
#rm(.Random.seed, envir=.GlobalEnv)
}
subfun()
#[1] -0.5022
rnorm(1)
# [1] 0.1315
How do I stop set.seed() after a specific line of code?
Simply use the current system time to "undo" the seed by introducing a new unique random seed:
set.seed(Sys.time())
If you need more precision, consider fetching the system timestamp by millisecond (use R's system(..., intern = TRUE)
function).
R package submission error concerning set.seed()
If you really, really, want to set the seed inside a function, which I believe you nor anyone should do, save the current seed, do whatever you want, and before exiting the function reset it to the saved value.
old_seed <- .Random.seed
rnorm(1)
#[1] -1.173346
set.seed(42)
rbinom(1, size = 1, prob = 0.5)
#[1] 0
.Random.seed <- old_seed
rnorm(1)
#[1] -1.173346
In a function it could be something like the following, without the message
instructions. Note that the function prints nothing, it never calls any pseudo-RNG and always outputs TRUE
. The point is to save the seed's current value and reset the seed in on.exit
.
f <- function(simul = FALSE){
if(simul){
message("simul is TRUE")
old_seed <- .Random.seed
on.exit(.Random.seed <- old_seed)
# rest of code
} else message("simul is FALSE")
invisible(TRUE)
}
f()
s <- .Random.seed
f(TRUE)
identical(s, .Random.seed)
#[1] TRUE
rm(s)
create knitr chunks that don't change the random seed?
If functions need persistent storage, you can use local()
to create an environment to hold it and it will not interfere with anything else. Here's a modification of your code that does this:
hook_fn <- local({
initial_seed <- NULL
function(before, options, envir) {
if (before){
initial_seed <<- if(exists(".Random.seed", envir = .GlobalEnv, inherits=FALSE))
.Random.seed
}else{
if(!is.null(initial_seed))
.Random.seed <<- initial_seed
else
rm(".Random.seed", envir = .GlobalEnv)
}
}
})
knitr::knit_hooks$set(no_seed = hook_fn)
This allows a fair bit of simplification:
initial_seed
is known to exist in the function's environment (the parent of the evaluation environment), so<<-
will make assignments to it..GlobalEnv
is known to be the parent of that local environment, so if.Random.seed
exists there, it's the one the function will see.- If
.Random.seed
doesn't exist in.GlobalEnv
, theinitial_seed
will be set toNULL
because theif
has noelse
clause.
Related Topics
Hiding Personal Functions in R
How to Create Textarea as Input in a Shiny Webapp in R
How to Jitter Text to Avoid Overlap in a Ggplot2 Scatterplot
R Sequence of Dates with Lubridate
Difference Between As.Data.Frame(X) and Data.Frame(X)
Is It Bad Practice to Access S4 Objects Slots Directly Using @
How to Syntax Highlight Inline R Code in R Markdown
HTML with Multicolumn Table in Markdown Using Knitr
Street Address to Geolocation Lat/Long
Rmarkdown: Pandoc: PDFlatex Not Found
R Change All Columns of Type Factor to Numeric
Convert and Save Distance Matrix to a Specific Format
Memory Profiling in R - Tools for Summarizing