R: Select Values from Data Table in Range

R: Select values from data table in range

Construct some data

df <- data.frame( name=c("John",
"Adam"), date=c(3, 5) )

Extract exact matches:

subset(df, date==3)

  name date
1 John    3

Extract matches in range:

subset(df, date>4 & date<6)

  name date
2 Adam    5

The following syntax produces identical results:

df[df$date>4 & df$date<6, ]

  name date
2 Adam    5

How to filter rows by column value ranges in R?

Here is a data.table approach

library(data.table)
# keep Gene that are not joined in the non-equi join on df1 below
df2[!Gene %in% df2[df1, on = .(Chromosome, Gene.Start >= Min, Gene.End <= Max)]$Gene, ]
#     Gene Gene.Start Gene.End Chromosome
# 1: Gene2        950      990          1

Filtering by different ranges by group in data.table

Perhaps:

Y[range, K := TRUE, on = .(group, a >= start, a <= end)][!is.na(K),]
#         a        val  group      K
#     <int>      <num> <char> <lgcl>
#  1:     9 0.60189755      a   TRUE
#  2:     5 0.99874081      a   TRUE
#  3:    16 0.55512663      a   TRUE
#  4:     4 0.42944396      a   TRUE
#  5:    14 0.43101637      a   TRUE
#  6:     3 0.47880269      a   TRUE
#  7:     2 0.02220682      a   TRUE
#  8:     6 0.63891131      a   TRUE
#  9:     8 0.83470266      a   TRUE
# 10:    17 0.98304402      a   TRUE
# 11:    98 0.76785547      b   TRUE
# 12:    94 0.30766574      b   TRUE
# 13:    88 0.25814665      b   TRUE
# 14:    89 0.49954639      b   TRUE
# 15:    83 0.50892062      b   TRUE
# 16:    95 0.49443856      b   TRUE
# 17:    97 0.56695890      b   TRUE
# 18:    87 0.98970989      b   TRUE
# 19:    82 0.53190509      b   TRUE
# 20:   100 0.59662376      b   TRUE
#         a        val  group      K

There are other ways to do this, but they involve renaming or loss of information. For instance,

left-join range and Y, we lose a:

Y[range, on = .(group, a >= start, a <= end)]
#         a        val  group   a.1
#     <int>      <num> <char> <int>
#  1:     1 0.60189755      a    20
#  2:     1 0.99874081      a    20
#  3:     1 0.55512663      a    20
# ...
# 18:    80 0.98970989      b   100
# 19:    80 0.53190509      b   100
# 20:    80 0.59662376      b   100
#         a        val  group   a.1

The fix is to copy Y$a into a new variable and join on it instead:

Y[,a1 := a][range, on = .(group, a1 >= start, a1 <= end)]
#         a        val  group    a1  a1.1
#     <int>      <num> <char> <int> <int>
#  1:     9 0.60189755      a     1    20
#  2:     5 0.99874081      a     1    20
#  3:    16 0.55512663      a     1    20
# ...
# 18:    87 0.98970989      b    80   100
# 19:    82 0.53190509      b    80   100
# 20:   100 0.59662376      b    80   100
#         a        val  group    a1  a1.1

left-join Y and range, we get a duplicated into start and end but no clear indicator to filter on:

range[Y, on = .(group, start <= a, end >= a)]
#       group start   end        val
#      <char> <int> <int>      <num>
#   1:      a    28    28 0.85026492
#   2:      a    80    80 0.23466126
#   3:      a    22    22 0.98816745
# ...
#  98:      b    82    82 0.53190509
#  99:      b   100   100 0.59662376
# 100:      b    30    30 0.26388647
#       group start   end        val

A remedy would be to copy in another field that would give us the indicator of merge that we need to be able to filter. But even with that we have to rename to regain a's data:

range[, K := TRUE][Y, on = .(group, start <= a, end >= a)][ !is.na(K), ]
#      group start   end      K        val
#     <char> <int> <int> <lgcl>      <num>
#  1:      a     9     9   TRUE 0.60189755
#  2:      a     5     5   TRUE 0.99874081
#  3:      a    16    16   TRUE 0.55512663
# ...
# 18:      b    87    87   TRUE 0.98970989
# 19:      b    82    82   TRUE 0.53190509
# 20:      b   100   100   TRUE 0.59662376
#      group start   end      K        val

select records from data frame using a range as a variable

This would have to calculate the mean of my column "weight" when writing the following code in my function:

mean(table$weight[,id])

The comma here doesn't make sense. table$weight is a vector, which means it has only one dimension, not two. Hence you should use mean(table$weight[id]).

for(i in id){ table2 <- table[table$ID == i,] }

followed by:

mean(table2$weight)

Notice that each time you loop inside the for function, you are replacing table2 by another one with a different row from table. To create a subset you can either use

table2 <- table[id,]

table2 <- subset(table, ID %in% id)

R: select range of columns in data.table

# (1)
DT[, c(paste0('year.', 1:3), 'ID'), with = F]

# (2) 
DT[, year.1[1], by = ID]

mult is used when merging/joining two data.tables and signifies what to do when multiple matches exist. Therefore, as @Arun pointed out, the way to use mult for your 2nd question would be (given that you are already keyed by ID):

DT[J(unique(ID)), list(ID, year.1), mult = 'first']

R: selecting row values based on row range

Try:

df <- read.table(header=TRUE, text="V1   V2   V3   V4   max  min
 1    3    6    8     7    5
23   30    5   17    30   16")

df.new<-apply(df[,1:4],2,function(x) ifelse(x>df[,5] | x<df[,6],NA,x))
df.new<-cbind(df.new,df[,5:6])
df.new$mean=rowMeans(df.new[1:4],na.rm=TRUE)
df.new

Subsetting data table by date range

If you have data.table, you can use as.IDate with %between% as

library(data.table)
setDT(df)

df[as.IDate(Date, "%d/%m/%Y") %between% as.IDate(c("2016-07-01","2019-06-30"))]

#        Date   Chla
# 1: 11/08/2016 0.0019
# 2:  2/12/2016 0.0013
# 3:  2/03/2017 0.0030
# 4:  6/06/2017 0.0030
# 5:  4/09/2017 0.0050
# 6:  6/12/2017 0.0000
# 7:  1/03/2018 0.0200
#...

You can also do this in base R

df$Date <- as.Date(df$Date, "%d/%m/%Y")
df[df$Date >= as.Date("2016-07-01") & df$Date <= as.Date("2019-06-30"), ]

Or with lubridate and dplyr without changing the original format of dates

library(dplyr)
library(lubridate)

df %>% filter(between(dmy(Date), date("2016-07-01"), date("2019-06-30")))

data

df <- structure(list(Date = structure(c(20L, 12L, 2L, 17L, 15L, 24L, 
23L, 25L, 1L, 5L, 21L, 16L, 3L, 22L, 7L, 8L, 10L, 19L, 18L, 9L, 
6L, 11L, 13L, 14L, 4L), .Label = c("1/03/2018", "11/08/2016", 
"11/09/2018", "12/04/2012", "12/06/2018", "13/05/2019", "13/11/2018", 
"14/12/2018", "17/04/2019", "18/01/2019", "18/06/2019", "19/04/2016", 
"19/07/2019", "19/08/2019", "2/03/2017", "2/08/2018", "2/12/2016", 
"21/03/2019", "22/02/2019", "22/12/2015", "3/07/2018", "3/10/2018", 
"4/09/2017", "6/06/2017", "6/12/2017"), class = "factor"), Chla = c(0.0084, 
0.0036, 0.0019, 0.0013, 0.003, 0.003, 0.005, 0, 0.02, 0.09, 0.04, 
0.026, 0.02, 0.02, 0.01, 0, 0, 0.05, 0, 0, 0.03, 0, 0.002, 0.0018, 
0.012)), class = "data.frame", row.names = c(NA, -25L))

subset a dataframe in R within a specific time range

Here's a dplyr solution:

library(dplyr)
dataBase %>%
  mutate(date = as.Date(date, format = "%d/%m/%Y")) %>%
  filter(date >= "2020-07-30" & date <= "2020-08-30")
              a       date
V12 -0.23017749 2020-08-28
V13  1.55870831 2020-08-01
V21  0.07050839 2020-08-27
V32 -1.26506123 2020-08-01
V41 -0.44566197 2020-08-28
V43  0.35981383 2020-08-01
V52  0.11068272 2020-08-01

Data:

set.seed(123)
dataBase <- data.frame(a = rnorm(15), date = unlist(read.table(text = '"30/06/2020" "27/08/2020" "30/06/2020" "28/08/2020" "30/06/2020"
"28/08/2020" "30/06/2020" "01/08/2020" "30/06/2020" "01/08/2020"
"01/08/2020" "30/06/2020" "30/06/2020" "01/08/2020" "30/06/2019"')))

R: Select Values from Data Table in Range