How can I randomly sample a subgroup with multiple rows from within a larger group?
Like so perhaps:
set.seed( 100 )
df %>% group_by( ID, Group ) %>%
sample_n(1) %>%
select( -Score ) %>%
left_join( df, by=c("ID","Group","Color") )
Think I misunderstood you at first, but this sounds like it could be it.
Output:
ID Group Color Score
1 Bravo 1 yellow 0.65
2 Bravo 1 yellow 0.70
3 Bravo 1 yellow 0.90
4 Charlie 1 red 0.55
5 Charlie 2 red 0.60
6 Charlie 3 red 0.80
7 Charlie 4 red 0.90
8 Delta 1 red 0.85
9 Delta 2 red 0.63
10 Delta 2 red 0.51
11 Echo 1 yellow 0.85
12 Echo 1 yellow 0.89
Take random sample by group
Try this:
library(plyr)
ddply(df,.(ID),function(x) x[sample(nrow(x),500),])
Randomly sample groups
Just use sample()
to choose some number of groups
iris %>% filter(Species %in% sample(levels(Species),2))
How to sample rows without replacement within (multiple) subgroups in R
Here is a base R solution. If you want to sample all elements of a vector exactly once, then just sample(vec)
and it will return a permutation of vec
.
set.seed(42)
res <-lapply(participant_id, function(p){
data.frame(participant_id = rep(p, length(item)),
colour = sample(colour), item = sample(item))
})
res <- do.call(rbind, res)
res
How to randomly subset of data with dplyr?
Maybe this is what you are after:
# sample from distinct values of No
my_groups <-
df %>%
select(No) %>%
distinct %>%
sample_n(5)
# merge the two datasets
my_df <-
left_join(my_groups, df)
Select subgroups with replacement in a dataframe R
You could generate your group sample:
x <- sample(unique(df$groups), 3, replace = TRUE)
Then select the appropriate parts of df:
do.call(rbind, lapply(x, function(i) df[df$groups == i,]))
sample with dplyr and rowwise
The very first row shows that col_1 and col_2 are different, while I
expect them to be the same.
set.seed(7)
makes sure that every time you run your script, it will create the same my_df
. It does not mean that every single time you run sample
, it will sample the same number, so col_1
and col_2
do not need to be the same. However, if you run your code twice, both will get you the same col_1
.
I expect col_1 and col_2 be sampled from set_diff column.
From the documentation of sample
: If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x. Therefore, if set_diff
equals 3, a sample is drawn from c(1,2,3)
.
Related Topics
Finding 2 & 3 Word Phrases Using R Tm Package
How to Change the Color Value of Just One Value in Ggplot2's Scale_Fill_Brewer
Assigning Dates to Fiscal Year
Shiny App: Downloadhandler Does Not Produce a File
How to Define the "Mid" Range in Scale_Fill_Gradient2()
How to Reorder Data.Table Columns (Without Copying)
Network Chord Diagram Woes in R
Convert Character to Date *Quickly* in R
Command to See 'R' Path That Rstudio Is Using
How to Delete Everything After Nth Delimiter in R
Referring to Data.Table Columns by Names Saved in Variables
Same Function Over Multiple Data Frames in R
Find Which Interval Row in a Data Frame That Each Element of a Vector Belongs In
Merge Rows in a Dataframe Where the Rows Are Disjoint and Contain Nas
How to Create a "Macro" for Regressors in R
What's the Difference Between Integer Class and Numeric Class in R
Time Out an R Command via Something Like Try()
Getting the Last N Elements of a Vector. Is There a Better Way Than Using the Length() Function