Sample from vector of varying length (including 1)
This is a documented feature:
If
x
has length1
, isnumeric
(in the sense ofis.numeric
) andx >= 1
, sampling via sample takes place from1:x
. Note that this convenience feature may lead to undesired behaviour whenx
is of varying length in calls such assample(x)
.
An alternative is to write your own function to avoid the feature:
sample.vec <- function(x, ...) x[sample(length(x), ...)]
sample.vec(10)
# [1] 10
sample.vec(10, 3, replace = TRUE)
# [1] 10 10 10
Some functions with similar behavior are listed under seq vs seq_along. When will using seq cause unintended results?
sample() in R unpredictable when vector length is one
From help("sample")
:
If x has length 1, is numeric (in the sense of is.numeric) and x >= 1,
sampling via sample takes place from 1:x.
So, when you have remaining = 2
, then sample(remaining)
is equivalent to sample(x = 1:2)
Update
From the comments it's clear you are also looking for a way around this behavior. Here is a benchmark comparison of three mentioned alternatives:
library(microbenchmark)
# if remaining is of length one
remaining <- 2
microbenchmark(a = {if ( length(remaining) > 1 ) { sample(remaining) } else { remaining }},
b = ifelse(length(remaining) > 1, sample(remaining), remaining),
c = remaining[sample(length(remaining))])
Unit: nanoseconds
expr min lq mean median uq max neval cld
a 349 489 625.12 628.0 663.5 3283 100 a
b 1536 1886 2240.58 2025.0 2165.5 13898 100 b
c 4051 4400 5193.41 4679.5 5064.0 38413 100 c
# If remaining is not of length one
remaining <- 1:10
microbenchmark(a = {if ( length(remaining) > 1 ) { sample(remaining) } else { remaining }},
b = ifelse(length(remaining) > 1, sample(remaining), remaining),
c = remaining[sample(length(remaining))])
Unit: microseconds
expr min lq mean median uq max neval cld
a 5.238 5.7970 6.82703 6.251 6.9145 51.264 100 a
b 11.663 12.2920 13.14831 12.851 13.3745 34.851 100 b
c 5.238 5.9715 6.57140 6.426 6.8450 14.667 100 a
It looks like the suggestion from joran may be the fastest in your case if sample()
is called much more often when remaining
is of length > 1, and the if() {} else {}
approach would be faster otherwise.
Why does sample() not work for a single number?
Or do I just need to include an if statement to avoid this.
Yeah, unfortunately. Something like this:
result = if(length(x) == 1) {x} else {sample(x, ...)}
Creating a sample vector of variable length for metadata
Maybe using paste
im Map
is another way.
stage <- c(Blast = 2, HSC = 4, LSC = 3)
unlist(Map(function(x, y) paste(x, seq_len(y), sep="_"), names(stage), stage)
, FALSE, FALSE)
#[1] "Blast_1" "Blast_2" "HSC_1" "HSC_2" "HSC_3" "HSC_4" "LSC_1"
#[8] "LSC_2" "LSC_3"
sampling bug in R?
Have a look at the Details of the sample function:
"If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x"
How to sample 1:x where x is a vector of random integers with length greater than 1
Maybe use sapply
to loop over vec
:
out <- sapply(vec,sample,size = 1)
Sample a single value from list of vectors multiple times
When you have vector of length 1 the sampling happens from 1:x. From ?sample
:
If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x
So when you do
set.seed(123)
sample(10, 1)
#[1] 3
It is selecting 1 number from 1 to 10. To avoid that from happening you can check length of vector in sapply
:
sapply(groups, function(x) if(length(x) == 1) rep(x, repetition)
else sample(x, repetition, replace = TRUE))
So this will return the same number repetition
number of times when the length of vector is 1.
Sampling without replacement from multiple vectors of different length using vector lengths as some sort of weight
This will get you approximately 50 students (depending on how a
was split)
new = unlist(lapply(a, function(x) sample(x, round(length(x)/2))))
To get exactly 50 each time, you can do this
ll = sapply(a, length) # Get length of each vector in "a"
target = 50
new_ll = 0
while (sum(new_ll) != target)
new_ll = round(ll * target / sum(ll) + runif(length(ll), -0.5, 0.5))
new = unlist(lapply(1:length(a), function(i) sample(a[[i]], new_ll[i])))
Explanation: Get the length of each vector in a
and assign to ll
. This amounts to doing ll[1] = length(vec1); ll[2] = length(vec2)
and so on. We need to sample a certain amount from each vector in a
such that we get 50 elements (target
). This amount is determined with new_ll
. It is approximately equal to target / num_students
times each vector length.
Since this does not guarantee that target
students are selected each time, we add a little jitter with runif
to move the numbers around slightly, and we continue looping until the the sum of new_ll
is equal to target
.
The final line then iterates i
from 1 through 10 (or the number of vectors in a
) and samples new_ll[i]
from each vector a[[i]]
.
Related Topics
Calculating Statistics on Subsets of Data
Unique on a Dataframe With Only Selected Columns
Using Stat_Function and Facet_Wrap Together in Ggplot2 in R
How to Make Consistent-Width Plots in Ggplot (With Legends)
How to Get a Vertical Geom_Vline to an X-Axis of Class Date
How to Subtract Months from a Date in R
How to Convert Dataframe into Time Series
Remove Parentheses and Text Within from Strings in R
What Is Meaning of First Tilde in Purrr::Map
R Ifelse to Replace Values in a Column
How to Put a Transformed Scale on the Right Side of a Ggplot2
Return Elements of List as Independent Objects in Global Environment
Select Rows With Min Value by Group
Creating Arbitrary Panes in Ggplot2
Overlay Histogram With Density Curve
How to Assign from a Function Which Returns More Than One Value