Filtering a data frame on a vector
You can use the %in%
operator:
> df <- data.frame(id=c(LETTERS, LETTERS), x=1:52)
> L <- c("A","B","E")
> subset(df, id %in% L)
id x
1 A 1
2 B 2
5 E 5
27 A 27
28 B 28
31 E 31
If your IDs are unique, you can use match()
:
> df <- data.frame(id=c(LETTERS), x=1:26)
> df[match(L, df$id), ]
id x
1 A 1
2 B 2
5 E 5
or make them the rownames of your dataframe and extract by row:
> rownames(df) <- df$id
> df[L, ]
id x
A A 1
B B 2
E E 5
Finally, for more advanced users, and if speed is a concern, I'd recommend looking into the data.table
package.
Filter data frame matching all values of a vector
Here's another dplyr
solution without ever leaving the pipe:
ID <- c('A','A','A','A','A','B','B','B','B','C','C')
Hour <- c('0','2','5','6','9','0','2','5','6','0','2')
x <- data.frame(ID, Hour)
testVector <- c('0','2','5')
x %>%
group_by(ID) %>%
mutate(contains = Hour %in% testVector) %>%
summarise(all = sum(contains)) %>%
filter(all > 2) %>%
select(-all) %>%
inner_join(x)
## ID Hour
## <fctr> <fctr>
## 1 A 0
## 2 A 2
## 3 A 5
## 4 A 6
## 5 A 9
## 6 B 0
## 7 B 2
## 8 B 5
## 9 B 6
Filtering a dataframe by list of character vectors
Try this:
#Code
L <- lapply(ls,function(x) data.frame(type=x[x %in% df$type]))
names(L) <- paste0('new_df_',c('fruit','vegetable'))
Output:
L
$new_df_fruit
type
1 Apple
2 Cherry
$new_df_vegetable
type
1 Courgette
How to filter a dataframe with a character vector
There are multiple issues. First, you need to quote inside quotation for the second condition:
conditions <- c("Sepal.Width < 3.2", "Species == 'setosa'")
Then, you need to specify the association between the two conditions. Here, I assumed an &
. Then you can use eval(parse(...))
:
iris %>%
filter(eval(parse(text = paste(conditions, sep = "&"))))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
On the other hand, I think it is always important to quote @Martin Mächler to warn about the potential problems associated with this approach:
The (possibly) only connection is via parse(text = ....) and all good
R programmers should know that this is rarely an efficient or safe
means to construct expressions (or calls). Rather learn more about
substitute(), quote(), and possibly the power of using
do.call(substitute, ......).
Filter data in loop over vector and bind data frames
I think you've got a typo/error in your filter; do you get the correct output when you change "block" to "value" in your grepl? E.g.
library(tidyverse)
area <- data.frame(
land = c("68N03E220090", "68N03E244635", "68N03E244352", "68N03E223241"),
type = c("home", "mobile", "home", "vacant"),
object_id = c(NA, 7, NA, 34)
)
block <- c("68N03E22", "68N03E24")
datalist = list()
for (value in block){
df <- area %>% filter(is.na(object_id) & grepl(paste0("^", value),land))
df$value <- value
datalist[[value]] <- df # add it to your list
}
df_filtered <- dplyr::bind_rows(datalist)
df_filtered
#> land type object_id value
#> 1 68N03E220090 home NA 68N03E22
#> 2 68N03E244352 home NA 68N03E24
For this example, you could also avoid the for-loop by using:
df_filtered_2 <- area %>%
filter(is.na(object_id) & grepl(pattern = paste0(block, collapse = "|"), x = land)) %>%
mutate(value = str_sub(land, 1, 8))
identical(df_filtered, df_filtered_2)
#> [1] TRUE
How to filter a dataframe using a preset vector in R
Use %in%
:
df %>%
filter(code %in% x)
Filtering vector by values with filter()
I assume that your data is in characters
, so to filter that you first have to convert that to numeric
. After that you can filter the conditions using one filter
function with &
operation. You can use the following code:
dat <- data.frame(Duration..in.seconds. = c("114", "188", "453", "114" , "188" , "453" , "114" , "188" , "453" , "188" , "453", "2000" ,"2000" ,"1900" ))
library(dplyr)
dat = dat %>%
mutate(Duration..in.seconds. = as.numeric(Duration..in.seconds.)) %>%
filter(Duration..in.seconds. > 180 & Duration..in.seconds. < 1800)
Output:
Duration..in.seconds.
1 188
2 453
3 188
4 453
5 188
6 453
7 188
8 453
filter columns of a dataframe based on a vector
Here's one way -
filter(df, apply(df, 1, function(a) all(a > x)))
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 8 10 7 9 8 6 10 8 8 9
filter values in a list of dataframes based on a vector, and add rows for vector values not contained in dataframes
Create a data.frame with all the rows you want
data.frame(province=vector)
Merge this with the data frame you do have, setting
all.x=TRUE
(so every row from point 1 is retained, and filled withNA
if necessary)merge(data.frame(province=vector), df1, all.x=TRUE)
Done!
> merge(data.frame(province=vector), df1, all.x=TRUE)
province value value2
1 prov1 23 25
2 prov2 NA NA
3 prov3 56 57
4 prov4 NA NA
5 prov5 93 83
6 prov6 NA NA
Bonus 1: you can trivially loop this with
lapply
lapply(list_df, function(df) merge(data.frame(province=vector), df, all.x=TRUE))
(if you have a lot of data frames you want to apply this to, you will probably want to avoid re-building the vector data frame anonymously each time but create it as a named data frame instead)
Bonus 2: all
base-r
with no dependencies whatsoeverBonus 3: you did say it doesn't matter, but the rows are in order as in
vector
Related Topics
How Can One Work Fully Generically in Data.Table in R With Column Names in Variables
Create a Variable Name With "Paste" in R
Generate Sequence Within Group in R
R Shiny Passing Reactive to Selectinput Choices
How to Plot a Function Curve in R
How to Subtract Months from a Date in R
How to Print When Using %Dopar%
Multiple Use of the Positional '$' Operator to Update Nested Arrays
Assign Multiple Columns Using := in Data.Table, by Group
Put Stars on Ggplot Barplots and Boxplots - to Indicate the Level of Significance (P-Value)
Unordered Combinations of All Lengths
All Levels of a Factor in a Model Matrix in R
Convert Unix Epoch to Date Object
How to Get a Vertical Geom_Vline to an X-Axis of Class Date
What's Wrong With My Function to Load Multiple .Csv Files into Single Dataframe in R Using Rbind