Find which interval row in a data frame that each element of a vector belongs in
Here's a possible solution using the new "non-equi" joins in data.table
(v>=1.9.8). While I doubt you'll like the syntax, it should be very efficient soluion.
Also, regarding findInterval
, this function assumes continuity in your intervals, while this isn't the case here, so I doubt there is a straightforward solution using it.
library(data.table) #v1.10.0
setDT(intervals)[data.table(elements), on = .(start <= elements, end >= elements)]
# phase start end
# 1: a 0.1 0.1
# 2: a 0.2 0.2
# 3: a 0.5 0.5
# 4: NA 0.9 0.9
# 5: b 1.1 1.1
# 6: b 1.9 1.9
# 7: c 2.1 2.1
Regarding the above code, I find it pretty self-explanatory: Join intervals
and elements
by the condition specified in the on
operator. That's pretty much it.
There is a certain caveat here though, start
, end
and elements
should be all of the same type, so if one of them is integer
, it should be converted to numeric
first.
finding vector value belonging to each interval
Try:
> split(x, findInterval(x, y))
$`0`
[1] 1 2
$`1`
[1] 3.5 4.0 6.0
$`2`
[1] 7.5 8.0 9.0 10.0 11.5 12.0
Here's what happens when we change y
> y = c(2.5, 6.5, 10.5)
> split(x, findInterval(x, y))
$`0`
[1] 1 2
$`1`
[1] 3.5 4.0 6.0
$`2`
[1] 7.5 8.0 9.0 10.0
$`3`
[1] 11.5 12.0
How to get a vector which identify to which intervals the elements belong in R
You can also use cut
.
x <- c(1,4,12,13,18,24)
interval.vector <- c(1,7,13,19,25)
x.cut <- cut(x, breaks = interval.vector, include.lowest = TRUE)
data.frame(x, x.cut, group = as.numeric(x.cut))
x x.cut group
1 1 [1,7] 1
2 4 [1,7] 1
3 12 (7,13] 2
4 13 (7,13] 2
5 18 (13,19] 3
6 24 (19,25] 4
Another option is the very efficient findInterval
function, but I'm not sure how robust this solution on different variations of x
findInterval(x, interval.vector + 1L, all.inside = TRUE)
## [1] 1 1 2 2 3 4
Check in which interval lies all values in vector R
One option is to bin the data into quantiles, where the number of quantiles is set based on the maximum number of values allowed in a given interval. Below is an example. Let me know if this is what you had in mind:
# Fake data
set.seed(1)
dat = data.frame(x=rnorm(83, 10, 5))
# Cut into intervals containing no more than n values
n = 5
dat$x.bin = cut(dat$x, quantile(dat$x, seq(0,1,length=ceiling(nrow(dat)/n)+1)),
include.lowest=TRUE)
# Check
table(dat$x.bin)
[-1.07,3.62] (3.62,5.87] (5.87,6.7] (6.7,7.29] (7.29,8.2] (8.2,9.32] (9.32,9.72]
5 5 5 5 5 4 5
(9.72,9.97] (9.97,10.8] (10.8,11.7] (11.7,12.1] (12.1,12.9] (12.9,13.5] (13.5,14]
5 5 5 5 4 5 5
(14,15.5] (15.5,17.4] (17.4,22]
5 5 5
To implement @LorenzoBusetto's suggestion, you could do the following. This method ensures that every interval except the last contains n
values:
dat = dat[order(dat$x),]
dat$x.bin = 0:(nrow(dat)-1) %/% n
multiply each row of a dataframe by it's vector R
You may do this in single mutate statement, using dplyr
's powerful cur_data()
set.seed(2021)
x <- data.frame(age = c("one", "two", "three", "four", "five","one", "two", "three", "four", "five"),
replicate(10,sample(0:5,5,rep=TRUE)),
time = c("one", "two", "three", "four", "five","one", "two", "three", "four", "five"),
vector = c("1-2-9-4-5-1-5-6-1-2",
"3-2-3-4-5-2-6-6-1-2",
"1-2-4-4-2-4-5-4-2-1",
"9-2-3-1-5-5-5-3-1-2",
"1-1-3-4-5-1-5-6-3-2"))
library(tidyverse)
x %>% mutate(select(cur_data(), starts_with('X')) * t(map_dfc(strsplit(vector, '-'), as.numeric)))
#> age X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 time vector
#> 1 one 5 10 36 4 25 1 20 12 1 0 one 1-2-9-4-5-1-5-6-1-2
#> 2 two 15 10 0 8 5 4 0 6 4 0 two 3-2-3-4-5-2-6-6-1-2
#> 3 three 1 4 12 12 6 12 25 20 10 5 three 1-2-4-4-2-4-5-4-2-1
#> 4 four 27 10 6 4 20 20 5 15 2 8 four 9-2-3-1-5-5-5-3-1-2
#> 5 five 3 5 9 8 25 5 10 30 3 6 five 1-1-3-4-5-1-5-6-3-2
#> 6 one 5 10 36 4 25 1 20 12 1 0 one 1-2-9-4-5-1-5-6-1-2
#> 7 two 15 10 0 8 5 4 0 6 4 0 two 3-2-3-4-5-2-6-6-1-2
#> 8 three 1 4 12 12 6 12 25 20 10 5 three 1-2-4-4-2-4-5-4-2-1
#> 9 four 27 10 6 4 20 20 5 15 2 8 four 9-2-3-1-5-5-5-3-1-2
#> 10 five 3 5 9 8 25 5 10 30 3 6 five 1-1-3-4-5-1-5-6-3-2
or even using across
as G.Grothendieck has suggested (that would eliminate use of cur_data()
x %>% mutate(across(starts_with('X')) * t(map_dfc(strsplit(vector, '-'), as.numeric)))
Find interval from data frame start and end points
You can use a ready-built function or create your own:
findInt <- function(value, start, end) {
start < value & end > value
}
indx <- findInt(81, DF$Start, DF$End)
DF$Result[indx]
#[1] "FL91" "FL12"
Check whether elements of vectors are inside intervals given by matrix
We can use sapply
to loop over each element of x
and find if it lies in the range of any
of those matrix values.
x[sapply(x, function(i) any(i > A[, 1] & i < A[,2]))]
#[1] 4 15
In case, if length(x)
and nrow(A)
are same then we don't even need the sapply
loop and we can use this comparison directly.
x[x > A[, 1] & x < A[,2]]
#[1] 4 15
Calculation of distribution using data.table
Assuming that N refers to number of pieces per bin and not number of rows. There is probably a shorter way without creating an index. But here is one where you group them first and then sum
setorder(dt, Price)
dt[,GROUP:=ceiling(seq_along(Price)/5)][,
list(PriceRange=paste(range(Price), collapse=" - "),
Volume=sum(Volume)),
by="GROUP"]
EDIT after OP's comments
If you want bands of equal width, you can use this:
dt[, sum(Volume), by=cut(Price, 5)]
If you want to show all bands, you can use this
dt[,Band:=cut(Price, 5)]
dt[dt[, list(Band=levels(Band))], on="Band"][, sum(Volume, na.rm=TRUE), by="Band"]
HTH
Check if column value is in between (range) of two other column values
We can loop over each x$number
using sapply
and check if it lies in range of any
of y$number1
and y$number2
and give the value accordingly.
x$found <- ifelse(sapply(x$number, function(p)
any(y$number1 <= p & y$number2 >= p)),"YES", NA)
x
# id number found
#1 1 5225 YES
#2 2 2222 <NA>
#3 3 3121 YES
Using the same logic but with replace
x$found <- replace(x$found,
sapply(x$number, function(p) any(y$number1 <= p & y$number2 >= p)), "YES")
EDIT
If we want to also compare the id
value we could do
x$found <- ifelse(sapply(seq_along(x$number), function(i) {
inds <- y$number1 <= x$number[i] & y$number2 >= x$number[i]
any(inds) & (x$id[i] == y$id[which.max(inds)])
}), "YES", NA)
x$found
#[1] "YES" NA "YES"
Assigning interval number to numbers in vector
Here's one idea:
library(data.table)
setDT(df)
df[.(start = x), on="start", roll=Inf][start > end, id := NA_integer_]$id
[1] NA 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 NA 3
[20] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[39] 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5
[58] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
[77] 5 5 5 5 5 5 5 5 5 5 5 5 5 5
I'm not sure if this has the desired output, though, since none was given explicitly in the OP.
Related Topics
Detach All Packages While Working in R
What Are the Double Colons (::) in R
How to Parametrize Function Calls in Dplyr 0.7
There Is Pmin and Pmax Each Taking Na.Rm, Why No Psum
How to Force Specific Order of the Variables on the X Axis
Creating a Unique Sequence of Dates
Elegantly Assigning Multiple Columns in Data.Table with Lapply()
Time Out an R Command via Something Like Try()
Convert Character to Date *Quickly* in R
How to Install a R Package on a Offline Debian MAChine
Export a Graph to .Eps File with R
Add Objects to Package Namespace
Extract a Column from a Data.Table as a Vector, by Position
Speed Up Plot() Function for Large Dataset
Re-Ordering Bars in R's Barplot()
Cowplot Made Ggplot2 Theme Disappear/How to See Current Ggplot2 Theme, and Restore the Default
Create Empty Data Frame with Column Names by Assigning a String Vector