R: Check if value from dataframe is within range other dataframe
Here's a simple base method that uses the OP's logic:
f <- function(vec, id) {
if(length(.x <- which(vec >= x$from & vec <= x$to & id == x$number))) .x else NA
}
y$name <- x$name[mapply(f, y$location, y$id_number)]
y
# location id_number name
#1 1.5 30 region 1
#2 2.8 30 region 2
#3 10.0 38 <NA>
#4 3.5 40 <NA>
#5 2.0 36 region 7
Check if column value from one dataframe is in between (range) of two other columns of second dataframe
I believe this gives you your desired result:
df1 %>%
left_join(df2 %>% rename_at(vars(Start, End, sample_id), paste0, "_2")) %>%
mutate(sample_id_new = case_when(Start < End_2 & Start > Start_2 ~ sample_id_2)) %>%
select(Chr, Start, End, Gene, sample_id, sample_id_new)
Output:
Chr Start End Gene sample_id sample_id_new
1 1 15 15 gene1 ss6 ss1
2 1 120 130 gene2 ss7 ss1
3 2 210 210 gene3 ss9 <NA>
4 3 210 210 gene3 ss9 ss1
5 4 450 450 gene3 ss10 <NA>
Check if a value in a dataframe is conditionally between a range of values specified by two columns of another dataframe
Use a non-equi join with data.table
- convert the first data to data.table (setDT
), create the filter
column as logical (FALSE
) values. Do a non-equi join, and assign (:=
) the filter
to TRUE
, which changes the FALSE
to TRUE
only when the condition (abs(weight - th_weight) < 2
) meets
library(data.table)
setDT(df1)[, filter := FALSE]
df1[df2, filter := abs(weight - th_weight) < 2,
on = .(low <= th_weight, high >= th_weight)]
-output
> df1
weight low high filter
<num> <num> <num> <lgcl>
1: 94.99610 94.99608 94.99613 TRUE
2: 95.00561 95.00558 95.00566 FALSE
data
df1 <- structure(list(weight = c(94.9961, 95.00561), low = c(94.99608,
95.00558), high = c(94.99613, 95.00566)), class = "data.frame", row.names = c(NA,
-2L))
df2 <- structure(list(index = 1:5, th_weight = c(94.996092, 95.496336,
95.509906, 97.473292, 100.51906)), class = "data.frame", row.names = c(NA,
-5L))
Check if column value is in between (range) of two other column values
We can loop over each x$number
using sapply
and check if it lies in range of any
of y$number1
and y$number2
and give the value accordingly.
x$found <- ifelse(sapply(x$number, function(p)
any(y$number1 <= p & y$number2 >= p)),"YES", NA)
x
# id number found
#1 1 5225 YES
#2 2 2222 <NA>
#3 3 3121 YES
Using the same logic but with replace
x$found <- replace(x$found,
sapply(x$number, function(p) any(y$number1 <= p & y$number2 >= p)), "YES")
EDIT
If we want to also compare the id
value we could do
x$found <- ifelse(sapply(seq_along(x$number), function(i) {
inds <- y$number1 <= x$number[i] & y$number2 >= x$number[i]
any(inds) & (x$id[i] == y$id[which.max(inds)])
}), "YES", NA)
x$found
#[1] "YES" NA "YES"
Check if value in a dataframe is between two values in another dataframe
For loop solution use:
for v in df['volumne']:
df3 = df2[(df2['range_low'] < v) & (df2['range_high'] > v)]
print (df3)
For non loop solution is possible use cross join, but if large DataFrames there should be memory problem:
df = df.assign(a=1).merge(df2.assign(a=1), on='a', how='outer')
print (df)
volumne a range_low range_high price
0 11 1 10 20 1
1 11 1 21 30 2
2 24 1 10 20 1
3 24 1 21 30 2
4 30 1 10 20 1
5 30 1 21 30 2
df3 = df[(df['range_low'] < df['volumne']) & (df['range_high'] > df['volumne'])]
print (df3)
volumne a range_low range_high price
0 11 1 10 20 1
3 24 1 21 30 2
How to check if a column value is within a range of another two for each row in data table
This works idiomatically in data.table
dat[, inConf := ifelse(true >= low & true <= up,T,F)]
###alternatively with 0,1
dat[, inConf := ifelse(true >= low & true <= up,1,0)]
R: find rows in data frame within range of each other across multiple columns
You can check all the conditions using outer
to form a logical matrix (remembering to exclude the self-matching diagonal), and apply
the result to subset the ID column, pasting the result together into strings:
df$ID.matches <- apply(outer(df$lat, df$lat, function(x, y) abs(x - y) < 1) &
outer(df$lon, df$lon, function(x, y) abs(x - y) < 1) &
outer(df$score, df$score, function(x, y) abs(x - y) < 0.7) &
diag(nrow(df)) == 0,
MARGIN = 1,
function(x) paste(df$ID[x], collapse = ", "))
df
#> ID lat long score ID.matches
#> 1 1 41.5 -62.3 22.4 3, 7
#> 2 2 41.0 -70.2 21.9
#> 3 3 42.2 -63.0 22.7 1
#> 4 4 36.7 -72.9 20.0
#> 5 5 36.2 -62.4 24.1 6
#> 6 6 35.8 -61.7 24.7 5
#> 7 7 40.8 -61.9 22.1 1
Created on 2020-07-07 by the reprex package (v0.3.0)
Related Topics
Cumulative Sums Over Run Lengths. Can This Loop Be Vectorized
Split Data.Frame Row into Multiple Rows Based on Commas
Importing Multiple .Csv Files into R and Adding a New Column with File Name
How to Make UI Respond to Reactive Values in for Loop
Convert a Row of a Data Frame to a Simple Vector in R
Importing Many Files at The Same Time and Adding Id Indicator
Error Installing R Package for Linux
Strange Behaviour Dropping Column from Data.Frame in R
Small Ggplot Object (1 Mb) Turns into 7 Gigabyte .Rdata Object When Saved
Control The Fill Order and Groups for a Ggplot2 Geom_Bar
How to Add Geo-Spatial Connections on a Ggplot Map
Function to Count Na Values at Each Level of a Factor
Converting a Long-Formated Dataframe to Wide Format Tidyverse
Creating a Stacked Bar Chart Centered on Zero Using Ggplot