R: Select Values from Data Table in Range

R: Select values from data table in range

Construct some data

df <- data.frame( name=c("John",
"Adam"), date=c(3, 5) )

Extract exact matches:

subset(df, date==3)

name date
1 John 3

Extract matches in range:

subset(df, date>4 & date<6)

name date
2 Adam 5

The following syntax produces identical results:

df[df$date>4 & df$date<6, ]

name date
2 Adam 5

How to filter rows by column value ranges in R?

Here is a data.table approach

library(data.table)
# keep Gene that are not joined in the non-equi join on df1 below
df2[!Gene %in% df2[df1, on = .(Chromosome, Gene.Start >= Min, Gene.End <= Max)]$Gene, ]
# Gene Gene.Start Gene.End Chromosome
# 1: Gene2 950 990 1

Filtering by different ranges by group in data.table

Perhaps:

Y[range, K := TRUE, on = .(group, a >= start, a <= end)][!is.na(K),]
# a val group K
# <int> <num> <char> <lgcl>
# 1: 9 0.60189755 a TRUE
# 2: 5 0.99874081 a TRUE
# 3: 16 0.55512663 a TRUE
# 4: 4 0.42944396 a TRUE
# 5: 14 0.43101637 a TRUE
# 6: 3 0.47880269 a TRUE
# 7: 2 0.02220682 a TRUE
# 8: 6 0.63891131 a TRUE
# 9: 8 0.83470266 a TRUE
# 10: 17 0.98304402 a TRUE
# 11: 98 0.76785547 b TRUE
# 12: 94 0.30766574 b TRUE
# 13: 88 0.25814665 b TRUE
# 14: 89 0.49954639 b TRUE
# 15: 83 0.50892062 b TRUE
# 16: 95 0.49443856 b TRUE
# 17: 97 0.56695890 b TRUE
# 18: 87 0.98970989 b TRUE
# 19: 82 0.53190509 b TRUE
# 20: 100 0.59662376 b TRUE
# a val group K

There are other ways to do this, but they involve renaming or loss of information. For instance,

  • left-join range and Y, we lose a:

    Y[range, on = .(group, a >= start, a <= end)]
    # a val group a.1
    # <int> <num> <char> <int>
    # 1: 1 0.60189755 a 20
    # 2: 1 0.99874081 a 20
    # 3: 1 0.55512663 a 20
    # ...
    # 18: 80 0.98970989 b 100
    # 19: 80 0.53190509 b 100
    # 20: 80 0.59662376 b 100
    # a val group a.1

    The fix is to copy Y$a into a new variable and join on it instead:

    Y[,a1 := a][range, on = .(group, a1 >= start, a1 <= end)]
    # a val group a1 a1.1
    # <int> <num> <char> <int> <int>
    # 1: 9 0.60189755 a 1 20
    # 2: 5 0.99874081 a 1 20
    # 3: 16 0.55512663 a 1 20
    # ...
    # 18: 87 0.98970989 b 80 100
    # 19: 82 0.53190509 b 80 100
    # 20: 100 0.59662376 b 80 100
    # a val group a1 a1.1
  • left-join Y and range, we get a duplicated into start and end but no clear indicator to filter on:

    range[Y, on = .(group, start <= a, end >= a)]
    # group start end val
    # <char> <int> <int> <num>
    # 1: a 28 28 0.85026492
    # 2: a 80 80 0.23466126
    # 3: a 22 22 0.98816745
    # ...
    # 98: b 82 82 0.53190509
    # 99: b 100 100 0.59662376
    # 100: b 30 30 0.26388647
    # group start end val

    A remedy would be to copy in another field that would give us the indicator of merge that we need to be able to filter. But even with that we have to rename to regain a's data:

    range[, K := TRUE][Y, on = .(group, start <= a, end >= a)][ !is.na(K), ]
    # group start end K val
    # <char> <int> <int> <lgcl> <num>
    # 1: a 9 9 TRUE 0.60189755
    # 2: a 5 5 TRUE 0.99874081
    # 3: a 16 16 TRUE 0.55512663
    # ...
    # 18: b 87 87 TRUE 0.98970989
    # 19: b 82 82 TRUE 0.53190509
    # 20: b 100 100 TRUE 0.59662376
    # group start end K val

select records from data frame using a range as a variable

This would have to calculate the mean of my column "weight" when writing the following code in my function:

mean(table$weight[,id])

The comma here doesn't make sense. table$weight is a vector, which means it has only one dimension, not two. Hence you should use mean(table$weight[id]).

for(i in id){ table2 <- table[table$ID == i,] } 

followed by:

mean(table2$weight)

Notice that each time you loop inside the for function, you are replacing table2 by another one with a different row from table. To create a subset you can either use

table2 <- table[id,]

or

table2 <- subset(table, ID %in% id)

R: select range of columns in data.table


# (1)
DT[, c(paste0('year.', 1:3), 'ID'), with = F]

# (2)
DT[, year.1[1], by = ID]

mult is used when merging/joining two data.tables and signifies what to do when multiple matches exist. Therefore, as @Arun pointed out, the way to use mult for your 2nd question would be (given that you are already keyed by ID):

DT[J(unique(ID)), list(ID, year.1), mult = 'first']

R: selecting row values based on row range

Try:

df <- read.table(header=TRUE, text="V1   V2   V3   V4   max  min
1 3 6 8 7 5
23 30 5 17 30 16")

df.new<-apply(df[,1:4],2,function(x) ifelse(x>df[,5] | x<df[,6],NA,x))
df.new<-cbind(df.new,df[,5:6])
df.new$mean=rowMeans(df.new[1:4],na.rm=TRUE)
df.new

Subsetting data table by date range

If you have data.table, you can use as.IDate with %between% as

library(data.table)
setDT(df)

df[as.IDate(Date, "%d/%m/%Y") %between% as.IDate(c("2016-07-01","2019-06-30"))]

# Date Chla
# 1: 11/08/2016 0.0019
# 2: 2/12/2016 0.0013
# 3: 2/03/2017 0.0030
# 4: 6/06/2017 0.0030
# 5: 4/09/2017 0.0050
# 6: 6/12/2017 0.0000
# 7: 1/03/2018 0.0200
#...

You can also do this in base R

df$Date <- as.Date(df$Date, "%d/%m/%Y")
df[df$Date >= as.Date("2016-07-01") & df$Date <= as.Date("2019-06-30"), ]

Or with lubridate and dplyr without changing the original format of dates

library(dplyr)
library(lubridate)

df %>% filter(between(dmy(Date), date("2016-07-01"), date("2019-06-30")))

data

df <- structure(list(Date = structure(c(20L, 12L, 2L, 17L, 15L, 24L, 
23L, 25L, 1L, 5L, 21L, 16L, 3L, 22L, 7L, 8L, 10L, 19L, 18L, 9L,
6L, 11L, 13L, 14L, 4L), .Label = c("1/03/2018", "11/08/2016",
"11/09/2018", "12/04/2012", "12/06/2018", "13/05/2019", "13/11/2018",
"14/12/2018", "17/04/2019", "18/01/2019", "18/06/2019", "19/04/2016",
"19/07/2019", "19/08/2019", "2/03/2017", "2/08/2018", "2/12/2016",
"21/03/2019", "22/02/2019", "22/12/2015", "3/07/2018", "3/10/2018",
"4/09/2017", "6/06/2017", "6/12/2017"), class = "factor"), Chla = c(0.0084,
0.0036, 0.0019, 0.0013, 0.003, 0.003, 0.005, 0, 0.02, 0.09, 0.04,
0.026, 0.02, 0.02, 0.01, 0, 0, 0.05, 0, 0, 0.03, 0, 0.002, 0.0018,
0.012)), class = "data.frame", row.names = c(NA, -25L))

subset a dataframe in R within a specific time range

Here's a dplyr solution:

library(dplyr)
dataBase %>%
mutate(date = as.Date(date, format = "%d/%m/%Y")) %>%
filter(date >= "2020-07-30" & date <= "2020-08-30")
a date
V12 -0.23017749 2020-08-28
V13 1.55870831 2020-08-01
V21 0.07050839 2020-08-27
V32 -1.26506123 2020-08-01
V41 -0.44566197 2020-08-28
V43 0.35981383 2020-08-01
V52 0.11068272 2020-08-01

Data:

set.seed(123)
dataBase <- data.frame(a = rnorm(15), date = unlist(read.table(text = '"30/06/2020" "27/08/2020" "30/06/2020" "28/08/2020" "30/06/2020"
"28/08/2020" "30/06/2020" "01/08/2020" "30/06/2020" "01/08/2020"
"01/08/2020" "30/06/2020" "30/06/2020" "01/08/2020" "30/06/2019"')))


Related Topics



Leave a reply



Submit