Find Which Interval Row in a Data Frame That Each Element of a Vector Belongs In

Find which interval row in a data frame that each element of a vector belongs in

Here's a possible solution using the new "non-equi" joins in data.table (v>=1.9.8). While I doubt you'll like the syntax, it should be very efficient soluion.

Also, regarding findInterval, this function assumes continuity in your intervals, while this isn't the case here, so I doubt there is a straightforward solution using it.

library(data.table) #v1.10.0
setDT(intervals)[data.table(elements), on = .(start <= elements, end >= elements)]
# phase start end
# 1: a 0.1 0.1
# 2: a 0.2 0.2
# 3: a 0.5 0.5
# 4: NA 0.9 0.9
# 5: b 1.1 1.1
# 6: b 1.9 1.9
# 7: c 2.1 2.1

Regarding the above code, I find it pretty self-explanatory: Join intervals and elements by the condition specified in the on operator. That's pretty much it.

There is a certain caveat here though, start, end and elements should be all of the same type, so if one of them is integer, it should be converted to numeric first.

finding vector value belonging to each interval

Try:

> split(x, findInterval(x, y))
$`0`
[1] 1 2

$`1`
[1] 3.5 4.0 6.0

$`2`
[1] 7.5 8.0 9.0 10.0 11.5 12.0

Here's what happens when we change y

> y = c(2.5, 6.5, 10.5)
> split(x, findInterval(x, y))
$`0`
[1] 1 2

$`1`
[1] 3.5 4.0 6.0

$`2`
[1] 7.5 8.0 9.0 10.0

$`3`
[1] 11.5 12.0

How to get a vector which identify to which intervals the elements belong in R

You can also use cut.

x <- c(1,4,12,13,18,24)
interval.vector <- c(1,7,13,19,25)
x.cut <- cut(x, breaks = interval.vector, include.lowest = TRUE)

data.frame(x, x.cut, group = as.numeric(x.cut))

x x.cut group
1 1 [1,7] 1
2 4 [1,7] 1
3 12 (7,13] 2
4 13 (7,13] 2
5 18 (13,19] 3
6 24 (19,25] 4

Another option is the very efficient findInterval function, but I'm not sure how robust this solution on different variations of x

findInterval(x, interval.vector + 1L, all.inside = TRUE)
## [1] 1 1 2 2 3 4

Check in which interval lies all values in vector R

One option is to bin the data into quantiles, where the number of quantiles is set based on the maximum number of values allowed in a given interval. Below is an example. Let me know if this is what you had in mind:

# Fake data
set.seed(1)
dat = data.frame(x=rnorm(83, 10, 5))

# Cut into intervals containing no more than n values
n = 5
dat$x.bin = cut(dat$x, quantile(dat$x, seq(0,1,length=ceiling(nrow(dat)/n)+1)),
include.lowest=TRUE)

# Check
table(dat$x.bin)
[-1.07,3.62]  (3.62,5.87]   (5.87,6.7]   (6.7,7.29]   (7.29,8.2]   (8.2,9.32]  (9.32,9.72] 
5 5 5 5 5 4 5
(9.72,9.97] (9.97,10.8] (10.8,11.7] (11.7,12.1] (12.1,12.9] (12.9,13.5] (13.5,14]
5 5 5 5 4 5 5
(14,15.5] (15.5,17.4] (17.4,22]
5 5 5

To implement @LorenzoBusetto's suggestion, you could do the following. This method ensures that every interval except the last contains n values:

dat = dat[order(dat$x),]
dat$x.bin = 0:(nrow(dat)-1) %/% n

multiply each row of a dataframe by it's vector R

You may do this in single mutate statement, using dplyr's powerful cur_data()

set.seed(2021)
x <- data.frame(age = c("one", "two", "three", "four", "five","one", "two", "three", "four", "five"),
replicate(10,sample(0:5,5,rep=TRUE)),
time = c("one", "two", "three", "four", "five","one", "two", "three", "four", "five"),
vector = c("1-2-9-4-5-1-5-6-1-2",
"3-2-3-4-5-2-6-6-1-2",
"1-2-4-4-2-4-5-4-2-1",
"9-2-3-1-5-5-5-3-1-2",
"1-1-3-4-5-1-5-6-3-2"))

library(tidyverse)

x %>% mutate(select(cur_data(), starts_with('X')) * t(map_dfc(strsplit(vector, '-'), as.numeric)))

#> age X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 time vector
#> 1 one 5 10 36 4 25 1 20 12 1 0 one 1-2-9-4-5-1-5-6-1-2
#> 2 two 15 10 0 8 5 4 0 6 4 0 two 3-2-3-4-5-2-6-6-1-2
#> 3 three 1 4 12 12 6 12 25 20 10 5 three 1-2-4-4-2-4-5-4-2-1
#> 4 four 27 10 6 4 20 20 5 15 2 8 four 9-2-3-1-5-5-5-3-1-2
#> 5 five 3 5 9 8 25 5 10 30 3 6 five 1-1-3-4-5-1-5-6-3-2
#> 6 one 5 10 36 4 25 1 20 12 1 0 one 1-2-9-4-5-1-5-6-1-2
#> 7 two 15 10 0 8 5 4 0 6 4 0 two 3-2-3-4-5-2-6-6-1-2
#> 8 three 1 4 12 12 6 12 25 20 10 5 three 1-2-4-4-2-4-5-4-2-1
#> 9 four 27 10 6 4 20 20 5 15 2 8 four 9-2-3-1-5-5-5-3-1-2
#> 10 five 3 5 9 8 25 5 10 30 3 6 five 1-1-3-4-5-1-5-6-3-2

or even using across as G.Grothendieck has suggested (that would eliminate use of cur_data()

x %>% mutate(across(starts_with('X')) * t(map_dfc(strsplit(vector, '-'), as.numeric)))

Find interval from data frame start and end points

You can use a ready-built function or create your own:

findInt <- function(value, start, end) {
start < value & end > value
}

indx <- findInt(81, DF$Start, DF$End)
DF$Result[indx]
#[1] "FL91" "FL12"

Check whether elements of vectors are inside intervals given by matrix

We can use sapply to loop over each element of x and find if it lies in the range of any of those matrix values.

x[sapply(x, function(i) any(i > A[, 1] & i < A[,2]))]
#[1] 4 15

In case, if length(x) and nrow(A) are same then we don't even need the sapply loop and we can use this comparison directly.

x[x > A[, 1] & x < A[,2]]
#[1] 4 15

Calculation of distribution using data.table

Assuming that N refers to number of pieces per bin and not number of rows. There is probably a shorter way without creating an index. But here is one where you group them first and then sum

setorder(dt, Price)
dt[,GROUP:=ceiling(seq_along(Price)/5)][,
list(PriceRange=paste(range(Price), collapse=" - "),
Volume=sum(Volume)),
by="GROUP"]

EDIT after OP's comments

If you want bands of equal width, you can use this:

dt[, sum(Volume), by=cut(Price, 5)]

If you want to show all bands, you can use this

dt[,Band:=cut(Price, 5)]
dt[dt[, list(Band=levels(Band))], on="Band"][, sum(Volume, na.rm=TRUE), by="Band"]

HTH

Check if column value is in between (range) of two other column values

We can loop over each x$number using sapply and check if it lies in range of any of y$number1 and y$number2 and give the value accordingly.

x$found <- ifelse(sapply(x$number, function(p) 
any(y$number1 <= p & y$number2 >= p)),"YES", NA)
x

# id number found
#1 1 5225 YES
#2 2 2222 <NA>
#3 3 3121 YES

Using the same logic but with replace

x$found <- replace(x$found, 
sapply(x$number, function(p) any(y$number1 <= p & y$number2 >= p)), "YES")

EDIT

If we want to also compare the id value we could do

x$found <- ifelse(sapply(seq_along(x$number), function(i) {
inds <- y$number1 <= x$number[i] & y$number2 >= x$number[i]
any(inds) & (x$id[i] == y$id[which.max(inds)])
}), "YES", NA)

x$found
#[1] "YES" NA "YES"

Assigning interval number to numbers in vector

Here's one idea:

library(data.table)
setDT(df)

df[.(start = x), on="start", roll=Inf][start > end, id := NA_integer_]$id

[1] NA 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 NA 3
[20] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[39] 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5
[58] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
[77] 5 5 5 5 5 5 5 5 5 5 5 5 5 5

I'm not sure if this has the desired output, though, since none was given explicitly in the OP.



Related Topics



Leave a reply



Submit