from data table, randomly select one row per group
OP provided only a single column in the example. Assuming that there are multiple columns in the original dataset, we group by 'z', sample
1 row from the sequence of rows per group, get the row index (.I
), extract the column with the row index ($V1
) and use that to subset the rows of 'dt'.
dt[dt[ , .I[sample(.N,1)] , by = z]$V1]
How to randomly choose only one row in each group
In plain R you can use sample()
within tapply()
:
df$Chosen <- 0
df[-tapply(-seq_along(df$Region),df$Region, sample, size=1),]$Chosen <- 1
df
Region Combo Chosen
1 A 1 0
2 A 2 1
3 A 3 0
4 B 1 1
5 B 2 0
6 C 1 1
7 D 1 0
8 D 2 0
9 D 3 1
10 D 4 0
Note the -(-selected_row_number)
trick to avoid sampling from 1 to n when there is a single row number for one group
Sample random rows within each group in a data.table
Maybe something like this?
> DT[,.SD[sample(.N, min(3,.N))],by = a]
a b
1: 1 744
2: 1 497
3: 1 167
4: 2 888
5: 2 950
6: 2 343
(Thanks to Josh for the correction, below.)
Random Sample 1 row for each unique column value in R
df %>% group_by(match_no) %>% sample_n(1)
Flag randomly selected N rows by group in data.table
dt[, C3 := 1:.N %in% sample(.N, min(.N, 2)), by = C1]
Or use head
, but I think that should be slower
dt[, C3 := 1:.N %in% head(sample(.N), 2) , by = C1]
If the number of flagged rows is not constant you can do
flagsz <- c(2, 1, 2, 3)
dt[, C3 := 1:.N %in% sample(.N, min(.N, flagsz[.GRP])), by = C1]
Take random sample by group
Try this:
library(plyr)
ddply(df,.(ID),function(x) x[sample(nrow(x),500),])
How to efficiently sample from a datatable by column in R?
You can use sample
on .N
for each group and select 1 random row.
library(data.table)
set.seed(123)
dt[, .SD[sample(.N, 1)], A]
# A B C
#1: A 31 143
#2: D 16 175
#3: B 100 165
#4: E 27 190
#5: C 90 197
dplyr
has slice_sample
(previously sample_n
) function for it :
library(dplyr)
dt %>% group_by(A) %>% slice_sample(n = 1)
R data.table: Random sample of rows from second table by group
A direct translation of your needs is:
DT2[DT1, on=.(group), allow.cartesian=TRUE, .(var1, obs=obs[sample(.N, 2L)]), by=.EACHI]
This might be faster:
gn <- DT1[, .(nsamp=2*.N), keyby=.(group)]
DT2[gn, on=.(group), .(obs=obs[sample(.N, nsamp, replace=TRUE)]), by=.EACHI][,
var1 := rep(DT1$var1, each=2L)]
data:
set.seed(0L)
library(data.table)
DT1 <- data.table(var1=101:120, group=c(1,1,1,1,1,2,2,2,2,3,3,3,4,4,4,4,4,4,4,4))
DT2 <- data.table(obs=201:213, group=c(1,1,1,2,2,2,3,3,3,4,4,4,5))
sample output:
group var1 obs
1: 1 101 203
2: 1 101 201
3: 1 102 202
4: 1 102 203
5: 1 103 203
6: 1 103 201
7: 1 104 203
8: 1 104 202
9: 1 105 202
10: 1 105 203
11: 2 106 204
12: 2 106 206
13: 2 107 204
14: 2 107 205
15: 2 108 205
16: 2 108 206
17: 2 109 205
18: 2 109 206
19: 3 110 209
20: 3 110 207
21: 3 111 209
22: 3 111 208
23: 3 112 207
24: 3 112 208
25: 4 113 210
26: 4 113 212
27: 4 114 211
28: 4 114 210
29: 4 115 211
30: 4 115 212
31: 4 116 211
32: 4 116 210
33: 4 117 211
34: 4 117 210
35: 4 118 210
36: 4 118 211
37: 4 119 212
38: 4 119 211
39: 4 120 210
40: 4 120 211
group var1 obs
How to randomly sample entire group based on multiple grouping conditions
You can use lubridate::floor_date
to create groups and then filter
one randomly sample
d frame per group. You can manually set the interval you need in floor_date
, here it's "1 minute"
.
df %>%
mutate(datetime = ymd_hms(datetime),
fl = floor_date(datetime, "1 minute")) %>%
group_by(uniquename, fl) %>%
filter(frame == sample(unique(frame), 1))
output:
# A tibble: 11 × 5
# Groups: uniquename, floor [4]
uniquename frame id datetime fl
<chr> <dbl> <chr> <dttm> <dttm>
1 unique1 2 b1 2021-05-05 07:05:03 2021-05-05 07:05:00
2 unique1 2 b2 2021-05-05 07:05:03 2021-05-05 07:05:00
3 unique1 2 b3 2021-05-05 07:05:03 2021-05-05 07:05:00
4 unique1 3 b2 2021-05-05 07:07:03 2021-05-05 07:07:00
5 unique1 3 b4 2021-05-05 07:07:03 2021-05-05 07:07:00
6 unique2 1 b3 2021-06-06 09:17:25 2021-06-06 09:17:00
7 unique2 1 b4 2021-06-06 09:17:25 2021-06-06 09:17:00
8 unique2 16 b1 2021-06-06 09:20:59 2021-06-06 09:20:00
9 unique2 16 b2 2021-06-06 09:20:59 2021-06-06 09:20:00
10 unique2 16 b3 2021-06-06 09:20:59 2021-06-06 09:20:00
11 unique2 16 b4 2021-06-06 09:20:59 2021-06-06 09:20:00
Related Topics
How to Pass Parameters to a Shiny App via Url
Delete Columns/Rows with More Than X% Missing
Connecting Across Missing Values with Geom_Line
Dplyr::Group_By_ with Character String Input of Several Variable Names
Plotting with Ggplot2: "Error: Discrete Value Supplied to Continuous Scale" on Categorical Y-Axis
Download a File from Https Using Download.File()
Same Function Over Multiple Data Frames in R
How to Insert an Image into the Navbar on a Shiny Navbarpage()
Extract a Column from a Data.Table as a Vector, by Position
Stacked Bar Chart in R (Ggplot2) with Y Axis and Bars as Percentage of Counts
How to Extract the Row with Min or Max Values
Change Row Order in a Matrix/Dataframe
Changing the Line Type in the Ggplot Legend
Fill Region Between Two Loess-Smoothed Lines in R with Ggplot
How to Speed Up Subset by Groups