Sample Rows of Subgroups from Dataframe with Dplyr

How can I randomly sample a subgroup with multiple rows from within a larger group?

Like so perhaps:


set.seed( 100 )
df %>% group_by( ID, Group ) %>%
sample_n(1) %>%
select( -Score ) %>%
left_join( df, by=c("ID","Group","Color") )


Think I misunderstood you at first, but this sounds like it could be it.

Output:


ID Group Color Score
1 Bravo 1 yellow 0.65
2 Bravo 1 yellow 0.70
3 Bravo 1 yellow 0.90
4 Charlie 1 red 0.55
5 Charlie 2 red 0.60
6 Charlie 3 red 0.80
7 Charlie 4 red 0.90
8 Delta 1 red 0.85
9 Delta 2 red 0.63
10 Delta 2 red 0.51
11 Echo 1 yellow 0.85
12 Echo 1 yellow 0.89

Take random sample by group

Try this:

library(plyr)
ddply(df,.(ID),function(x) x[sample(nrow(x),500),])

Randomly sample groups

Just use sample() to choose some number of groups

iris %>% filter(Species %in% sample(levels(Species),2))

How to sample rows without replacement within (multiple) subgroups in R

Here is a base R solution. If you want to sample all elements of a vector exactly once, then just sample(vec) and it will return a permutation of vec.

set.seed(42)
res <-lapply(participant_id, function(p){
data.frame(participant_id = rep(p, length(item)),
colour = sample(colour), item = sample(item))
})
res <- do.call(rbind, res)
res

How to randomly subset of data with dplyr?

Maybe this is what you are after:

# sample from distinct values of No
my_groups <-
df %>%
select(No) %>%
distinct %>%
sample_n(5)

# merge the two datasets
my_df <-
left_join(my_groups, df)

Select subgroups with replacement in a dataframe R

You could generate your group sample:

x <- sample(unique(df$groups), 3, replace = TRUE)

Then select the appropriate parts of df:

do.call(rbind, lapply(x, function(i) df[df$groups == i,]))

sample with dplyr and rowwise


The very first row shows that col_1 and col_2 are different, while I
expect them to be the same.

set.seed(7) makes sure that every time you run your script, it will create the same my_df. It does not mean that every single time you run sample, it will sample the same number, so col_1 and col_2 do not need to be the same. However, if you run your code twice, both will get you the same col_1.

I expect col_1 and col_2 be sampled from set_diff column.

From the documentation of sample: If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x. Therefore, if set_diff equals 3, a sample is drawn from c(1,2,3).



Related Topics



Leave a reply



Submit