Filtering a Data Frame on a Vector

Filtering a data frame on a vector

You can use the %in% operator:

> df <- data.frame(id=c(LETTERS, LETTERS), x=1:52)
> L <- c("A","B","E")
> subset(df, id %in% L)
id x
1 A 1
2 B 2
5 E 5
27 A 27
28 B 28
31 E 31

If your IDs are unique, you can use match():

> df <- data.frame(id=c(LETTERS), x=1:26)
> df[match(L, df$id), ]
id x
1 A 1
2 B 2
5 E 5

or make them the rownames of your dataframe and extract by row:

> rownames(df) <- df$id
> df[L, ]
id x
A A 1
B B 2
E E 5

Finally, for more advanced users, and if speed is a concern, I'd recommend looking into the data.table package.

Filter data frame matching all values of a vector

Here's another dplyr solution without ever leaving the pipe:

ID <- c('A','A','A','A','A','B','B','B','B','C','C')
Hour <- c('0','2','5','6','9','0','2','5','6','0','2')

x <- data.frame(ID, Hour)

testVector <- c('0','2','5')

x %>%
group_by(ID) %>%
mutate(contains = Hour %in% testVector) %>%
summarise(all = sum(contains)) %>%
filter(all > 2) %>%
select(-all) %>%
inner_join(x)

## ID Hour
## <fctr> <fctr>
## 1 A 0
## 2 A 2
## 3 A 5
## 4 A 6
## 5 A 9
## 6 B 0
## 7 B 2
## 8 B 5
## 9 B 6

Filtering a dataframe by list of character vectors

Try this:

#Code
L <- lapply(ls,function(x) data.frame(type=x[x %in% df$type]))
names(L) <- paste0('new_df_',c('fruit','vegetable'))

Output:

L
$new_df_fruit
type
1 Apple
2 Cherry

$new_df_vegetable
type
1 Courgette

How to filter a dataframe with a character vector

There are multiple issues. First, you need to quote inside quotation for the second condition:

conditions <- c("Sepal.Width < 3.2", "Species == 'setosa'")

Then, you need to specify the association between the two conditions. Here, I assumed an &. Then you can use eval(parse(...)):

iris %>%
filter(eval(parse(text = paste(conditions, sep = "&"))))

Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa

On the other hand, I think it is always important to quote @Martin Mächler to warn about the potential problems associated with this approach:

The (possibly) only connection is via parse(text = ....) and all good
R programmers should know that this is rarely an efficient or safe
means to construct expressions (or calls). Rather learn more about
substitute(), quote(), and possibly the power of using
do.call(substitute, ......).

Filter data in loop over vector and bind data frames

I think you've got a typo/error in your filter; do you get the correct output when you change "block" to "value" in your grepl? E.g.

library(tidyverse)
area <- data.frame(
land = c("68N03E220090", "68N03E244635", "68N03E244352", "68N03E223241"),
type = c("home", "mobile", "home", "vacant"),
object_id = c(NA, 7, NA, 34)
)

block <- c("68N03E22", "68N03E24")

datalist = list()

for (value in block){
df <- area %>% filter(is.na(object_id) & grepl(paste0("^", value),land))
df$value <- value
datalist[[value]] <- df # add it to your list
}

df_filtered <- dplyr::bind_rows(datalist)

df_filtered
#> land type object_id value
#> 1 68N03E220090 home NA 68N03E22
#> 2 68N03E244352 home NA 68N03E24

For this example, you could also avoid the for-loop by using:

df_filtered_2 <- area %>%
filter(is.na(object_id) & grepl(pattern = paste0(block, collapse = "|"), x = land)) %>%
mutate(value = str_sub(land, 1, 8))

identical(df_filtered, df_filtered_2)
#> [1] TRUE

How to filter a dataframe using a preset vector in R

Use %in%:

df %>% 
filter(code %in% x)

Filtering vector by values with filter()

I assume that your data is in characters, so to filter that you first have to convert that to numeric. After that you can filter the conditions using one filter function with & operation. You can use the following code:

dat <- data.frame(Duration..in.seconds. = c("114",  "188",  "453",  "114" , "188" , "453" , "114" , "188" , "453" , "188" , "453",  "2000" ,"2000" ,"1900" ))

library(dplyr)

dat = dat %>%
mutate(Duration..in.seconds. = as.numeric(Duration..in.seconds.)) %>%
filter(Duration..in.seconds. > 180 & Duration..in.seconds. < 1800)

Output:

 Duration..in.seconds.
1 188
2 453
3 188
4 453
5 188
6 453
7 188
8 453

filter columns of a dataframe based on a vector

Here's one way -

filter(df, apply(df, 1, function(a) all(a > x)))

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 8 10 7 9 8 6 10 8 8 9

filter values in a list of dataframes based on a vector, and add rows for vector values not contained in dataframes

  1. Create a data.frame with all the rows you want

    data.frame(province=vector)

  2. Merge this with the data frame you do have, setting all.x=TRUE (so every row from point 1 is retained, and filled with NA if necessary)

    merge(data.frame(province=vector), df1, all.x=TRUE)

  3. Done!

> merge(data.frame(province=vector), df1, all.x=TRUE)
province value value2
1 prov1 23 25
2 prov2 NA NA
3 prov3 56 57
4 prov4 NA NA
5 prov5 93 83
6 prov6 NA NA
  • Bonus 1: you can trivially loop this with lapply

    lapply(list_df, function(df) merge(data.frame(province=vector), df, all.x=TRUE))

    (if you have a lot of data frames you want to apply this to, you will probably want to avoid re-building the vector data frame anonymously each time but create it as a named data frame instead)

  • Bonus 2: all base-r with no dependencies whatsoever

  • Bonus 3: you did say it doesn't matter, but the rows are in order as in vector



Related Topics



Leave a reply



Submit