R: Select values from data table in range
Construct some data
df <- data.frame( name=c("John",
"Adam"), date=c(3, 5) )
Extract exact matches:
subset(df, date==3)
name date
1 John 3
Extract matches in range:
subset(df, date>4 & date<6)
name date
2 Adam 5
The following syntax produces identical results:
df[df$date>4 & df$date<6, ]
name date
2 Adam 5
How to filter rows by column value ranges in R?
Here is a data.table
approach
library(data.table)
# keep Gene that are not joined in the non-equi join on df1 below
df2[!Gene %in% df2[df1, on = .(Chromosome, Gene.Start >= Min, Gene.End <= Max)]$Gene, ]
# Gene Gene.Start Gene.End Chromosome
# 1: Gene2 950 990 1
Filtering by different ranges by group in data.table
Perhaps:
Y[range, K := TRUE, on = .(group, a >= start, a <= end)][!is.na(K),]
# a val group K
# <int> <num> <char> <lgcl>
# 1: 9 0.60189755 a TRUE
# 2: 5 0.99874081 a TRUE
# 3: 16 0.55512663 a TRUE
# 4: 4 0.42944396 a TRUE
# 5: 14 0.43101637 a TRUE
# 6: 3 0.47880269 a TRUE
# 7: 2 0.02220682 a TRUE
# 8: 6 0.63891131 a TRUE
# 9: 8 0.83470266 a TRUE
# 10: 17 0.98304402 a TRUE
# 11: 98 0.76785547 b TRUE
# 12: 94 0.30766574 b TRUE
# 13: 88 0.25814665 b TRUE
# 14: 89 0.49954639 b TRUE
# 15: 83 0.50892062 b TRUE
# 16: 95 0.49443856 b TRUE
# 17: 97 0.56695890 b TRUE
# 18: 87 0.98970989 b TRUE
# 19: 82 0.53190509 b TRUE
# 20: 100 0.59662376 b TRUE
# a val group K
There are other ways to do this, but they involve renaming or loss of information. For instance,
left-join
range
andY
, we losea
:Y[range, on = .(group, a >= start, a <= end)]
# a val group a.1
# <int> <num> <char> <int>
# 1: 1 0.60189755 a 20
# 2: 1 0.99874081 a 20
# 3: 1 0.55512663 a 20
# ...
# 18: 80 0.98970989 b 100
# 19: 80 0.53190509 b 100
# 20: 80 0.59662376 b 100
# a val group a.1The fix is to copy
Y$a
into a new variable and join on it instead:Y[,a1 := a][range, on = .(group, a1 >= start, a1 <= end)]
# a val group a1 a1.1
# <int> <num> <char> <int> <int>
# 1: 9 0.60189755 a 1 20
# 2: 5 0.99874081 a 1 20
# 3: 16 0.55512663 a 1 20
# ...
# 18: 87 0.98970989 b 80 100
# 19: 82 0.53190509 b 80 100
# 20: 100 0.59662376 b 80 100
# a val group a1 a1.1left-join
Y
andrange
, we geta
duplicated intostart
andend
but no clear indicator to filter on:range[Y, on = .(group, start <= a, end >= a)]
# group start end val
# <char> <int> <int> <num>
# 1: a 28 28 0.85026492
# 2: a 80 80 0.23466126
# 3: a 22 22 0.98816745
# ...
# 98: b 82 82 0.53190509
# 99: b 100 100 0.59662376
# 100: b 30 30 0.26388647
# group start end valA remedy would be to copy in another field that would give us the indicator of merge that we need to be able to filter. But even with that we have to rename to regain
a
's data:range[, K := TRUE][Y, on = .(group, start <= a, end >= a)][ !is.na(K), ]
# group start end K val
# <char> <int> <int> <lgcl> <num>
# 1: a 9 9 TRUE 0.60189755
# 2: a 5 5 TRUE 0.99874081
# 3: a 16 16 TRUE 0.55512663
# ...
# 18: b 87 87 TRUE 0.98970989
# 19: b 82 82 TRUE 0.53190509
# 20: b 100 100 TRUE 0.59662376
# group start end K val
select records from data frame using a range as a variable
This would have to calculate the mean of my column "weight" when writing the following code in my function:
mean(table$weight[,id])
The comma here doesn't make sense. table$weight
is a vector, which means it has only one dimension, not two. Hence you should use mean(table$weight[id])
.
for(i in id){ table2 <- table[table$ID == i,] }
followed by:
mean(table2$weight)
Notice that each time you loop inside the for
function, you are replacing table2
by another one with a different row from table
. To create a subset you can either use
table2 <- table[id,]
or
table2 <- subset(table, ID %in% id)
R: select range of columns in data.table
# (1)
DT[, c(paste0('year.', 1:3), 'ID'), with = F]
# (2)
DT[, year.1[1], by = ID]
mult
is used when merging/joining two data.tables and signifies what to do when multiple matches exist. Therefore, as @Arun pointed out, the way to use mult
for your 2nd question would be (given that you are already keyed by ID
):
DT[J(unique(ID)), list(ID, year.1), mult = 'first']
R: selecting row values based on row range
Try:
df <- read.table(header=TRUE, text="V1 V2 V3 V4 max min
1 3 6 8 7 5
23 30 5 17 30 16")
df.new<-apply(df[,1:4],2,function(x) ifelse(x>df[,5] | x<df[,6],NA,x))
df.new<-cbind(df.new,df[,5:6])
df.new$mean=rowMeans(df.new[1:4],na.rm=TRUE)
df.new
Subsetting data table by date range
If you have data.table
, you can use as.IDate
with %between%
as
library(data.table)
setDT(df)
df[as.IDate(Date, "%d/%m/%Y") %between% as.IDate(c("2016-07-01","2019-06-30"))]
# Date Chla
# 1: 11/08/2016 0.0019
# 2: 2/12/2016 0.0013
# 3: 2/03/2017 0.0030
# 4: 6/06/2017 0.0030
# 5: 4/09/2017 0.0050
# 6: 6/12/2017 0.0000
# 7: 1/03/2018 0.0200
#...
You can also do this in base R
df$Date <- as.Date(df$Date, "%d/%m/%Y")
df[df$Date >= as.Date("2016-07-01") & df$Date <= as.Date("2019-06-30"), ]
Or with lubridate
and dplyr
without changing the original format of dates
library(dplyr)
library(lubridate)
df %>% filter(between(dmy(Date), date("2016-07-01"), date("2019-06-30")))
data
df <- structure(list(Date = structure(c(20L, 12L, 2L, 17L, 15L, 24L,
23L, 25L, 1L, 5L, 21L, 16L, 3L, 22L, 7L, 8L, 10L, 19L, 18L, 9L,
6L, 11L, 13L, 14L, 4L), .Label = c("1/03/2018", "11/08/2016",
"11/09/2018", "12/04/2012", "12/06/2018", "13/05/2019", "13/11/2018",
"14/12/2018", "17/04/2019", "18/01/2019", "18/06/2019", "19/04/2016",
"19/07/2019", "19/08/2019", "2/03/2017", "2/08/2018", "2/12/2016",
"21/03/2019", "22/02/2019", "22/12/2015", "3/07/2018", "3/10/2018",
"4/09/2017", "6/06/2017", "6/12/2017"), class = "factor"), Chla = c(0.0084,
0.0036, 0.0019, 0.0013, 0.003, 0.003, 0.005, 0, 0.02, 0.09, 0.04,
0.026, 0.02, 0.02, 0.01, 0, 0, 0.05, 0, 0, 0.03, 0, 0.002, 0.0018,
0.012)), class = "data.frame", row.names = c(NA, -25L))
subset a dataframe in R within a specific time range
Here's a dplyr
solution:
library(dplyr)
dataBase %>%
mutate(date = as.Date(date, format = "%d/%m/%Y")) %>%
filter(date >= "2020-07-30" & date <= "2020-08-30")
a date
V12 -0.23017749 2020-08-28
V13 1.55870831 2020-08-01
V21 0.07050839 2020-08-27
V32 -1.26506123 2020-08-01
V41 -0.44566197 2020-08-28
V43 0.35981383 2020-08-01
V52 0.11068272 2020-08-01
Data:
set.seed(123)
dataBase <- data.frame(a = rnorm(15), date = unlist(read.table(text = '"30/06/2020" "27/08/2020" "30/06/2020" "28/08/2020" "30/06/2020"
"28/08/2020" "30/06/2020" "01/08/2020" "30/06/2020" "01/08/2020"
"01/08/2020" "30/06/2020" "30/06/2020" "01/08/2020" "30/06/2019"')))
Related Topics
Shiny: Passing Input$Var to Aes() in Ggplot2
Create Column with Grouped Values Based on Another Column
How to Assign a Value Using If-Else Conditions in R
Merging Rows with the Same Id Variable
Inputting Na Where There Are Missing Values When Scraping with Rvest
Efficiently Computing a Linear Combination of Data.Table Columns
Rounding Numbers in R to Specified Number of Digits
How to Use Subscripts in Ggplot2 Legends [R]
Set One or More of Coefficients to a Specific Integer
Rename Multiple Columns Given Character Vectors of Column Names and Replacement
R: What Do You Call the :: and ::: Operators and How Do They Differ
How to Check If CSV File Has a Comma or a Semicolon as Separator
How to Make a Discontinuous Axis in R with Ggplot2
MAC Os X R Error "Ld: Warning: Directory Not Found for Option"