Subset by multiple ranges
Using the non-equi join possibility of data.table
:
values[range, on = .(value >= start, value <= end), .(results = x.value)]
which gives:
results
1: 6
2: 7
3: 8
4: 9
5: 10
6: 29
7: 30
8: 31
9: 32
10: 33
11: 34
12: 35
13: 87
14: 88
15: 89
16: 90
17: 91
18: 92
Or as per the suggestion of @Henrik: values[value %inrange% range]
. This works also very well on data.table's with multiple columns:
# create new data
set.seed(26042017)
values2 <- data.table(value = c(1:100), let = sample(letters, 100, TRUE), num = sample(100))
> values2[value %inrange% range]
value let num
1: 6 v 70
2: 7 f 77
3: 8 u 21
4: 9 x 66
5: 10 g 58
6: 29 f 7
7: 30 w 48
8: 31 c 50
9: 32 e 5
10: 33 c 8
11: 34 y 19
12: 35 s 97
13: 87 j 80
14: 88 o 4
15: 89 h 65
16: 90 c 94
17: 91 k 22
18: 92 g 46
subset the data frame based on multiple ranges and save each range as element in the list
You can split
the data frame according to levels obtained by cut
ting df$x
by range$start
. You don't even need a loop for this:
nlist <- split(df, cut(df$x, breaks = c(-Inf, range$start, Inf)))
Or if you want it in the same format (an unnamed list in reverse order, you can do:
nlist <- setNames(rev(split(df, cut(df$x, breaks=c(-Inf, range$start, Inf)))),NULL)
This also gives the correct answer for Reduce
:
Reduce('+', lapply(nlist, nrow))
#> [1] 34
How to create subsets of multiple date ranges in R
You can try looping through index
for (i in seq_along(date_ranges$start_dates)){
print (
df %>%
filter(between(df_date, date_ranges$start_dates[i], date_ranges$end_dates[i])))
}
Subset data frame in R based on matching multiple ranges for multiple variables
You can use outer
to calculate all pairwise differences between df
and realdata
and examine if both x
and y
are less than the tolerance
tolerance <- .10
# x
xx <- abs(outer(df$x, realdata$x, "-")) < tolerance
# y
yy <- abs(outer(df$y, realdata$y, "-")) < tolerance
# if both are within the tolerance the sum of xx and yy will be 2
(mat <- xx + yy > 1)
# [,1] [,2] [,3]
#[1,] TRUE FALSE FALSE
#[2,] FALSE TRUE FALSE
#[3,] FALSE FALSE FALSE
#[4,] FALSE FALSE TRUE
#[5,] FALSE FALSE FALSE
#[6,] FALSE FALSE FALSE
So the first column of mat
shows which rows of df
are within the tolerance (in this case the first).
Rather inelegantly return the row of matches in df in the order of the rows of realdata
lapply(1:ncol(mat), function(i) df[mat[,i], ])
# return all matched data
df[row(mat)[mat], ]
r subset by multiple columns
If I understand your explanation correctly along with the expected output shown you are looking for something like this -
library(dplyr)
df %>%
group_by(ID) %>%
filter(ifelse(Sex == 'M' & between(Age, 6,11),
between(Score, 34, 100), TRUE)) %>%
ungroup
# ID Sex Age Score
# <int> <chr> <dbl> <int>
#1 1 M 4.2 19
#2 1 M 4.8 21
#3 2 F 6.1 23
#4 2 F 6.7 45
#5 3 F 9.4 39
#6 5 M 10 56
between(Score, 34, 100)
is only checked when the Sex
is 'M'
and Age
is between 6 and 11.
Subsetting in python using multiple row ranges
You can use numpy.r_
for selecting multiple ranges at once:
Try this:
import numpy as np
plt.scatter(df.iloc[np.r_[0:34, 80:101], 1], df.iloc[np.r_[0:34, 80:101], 0])
Subset multiple columns in R with multiple matches
You can use rowSums
:
df[rowSums(df[-1] == criteria) >= 2, ]
# x Col1 Col2 Col3
#1 1 A A A
#4 4 B A A
If criteria
is of length > 1 you cannot use ==
directly in which case use sapply
with %in%
.
df[rowSums(sapply(df[-1], `%in%`, criteria)) >= 2, ]
In dplyr
you can use filter
with rowwise
:
library(dplyr)
df %>%
rowwise() %>%
filter(sum(c_across(starts_with('col')) %in% criteria) >= 2)
Related Topics
How to Edit Column Names in Datatable Function When Running R Shiny App
How to Know a Dimension of Matrix or Vector in R
Bar Plot for Count Data by Group in R
How to Format the X-Axis of the Hard Coded Plotting Function of Spei Package in R
Map Array of Strings to an Array of Integers
Looping Over Combinations of Regression Model Terms
Extra Curly Braces When Using Xtable and Knitr, After Specifiying Size
Using Ggplot2 with Columns That Have Spaces in Their Names
Backports 1.1.1 Package Fails to Install
Merge Multiple Data.Frames in R with Varying Row Length
Reshape Data from Wide to Long
Replace All Values Lower Than Threshold in R
Ggplot2: Geom_Smooth Confidence Band Does Not Extend to Edge of Graph, Even with Fullrange=True
Sum Columns Row-Wise with Similar Names
How to Color Bar Plots When Using ..Prop.. in Ggplot
Http Error 400 on Google_Elevation() Call
How to Sort a Vector of Alphanumeric Values Using Lexical Ordering in R
As.Date Produces Unexpected Result in a Sequence of Week-Based Dates