Filter Dataframe Using Global Variable with The Same Name as Column Name

Filter dataframe using global variable with the same name as column name

You can do:

df %>% filter(y == .GlobalEnv$y)

or:

df %>% filter(y == .GlobalEnv[["y"]])

or:

both of which work in this context, but won't if all this is going on inside a function. But get will:

df %>% filter(y == get("y"))
f = function(df, y){df %>% filter(y==get("y"))}

So use get.

Or just use df[df$y==y,] instead of dplyr.

R: How to use a global variable that clashes with column name a dplyr workflow?

We can use (!!)

library(dplyr)
data.frame(x = 1, y = 2) %>%
mutate(xy = x+ !!y)

-output

#  x y xy
#1 1 2 2

Or extract directly from the .GlobalEnv

data.frame(x = 1, y = 2) %>%
mutate(xy = x+ .GlobalEnv$y)

-output

#  x y xy
#1 1 2 2

Referring to columns and variables with the same name in dplyr filter

This could be achieved via the .env pronoun from rlang:

See e.g. this blog post.

library(dplyr)

id = "a"

df <- tibble(
id = c("a", "b", "c"),
value = c(1, 2, 3)
)

df %>%
dplyr::filter(id == .env$id)
#> # A tibble: 1 × 2
#> id value
#> <chr> <dbl>
#> 1 a 1

Search for does-not-contain on a DataFrame in pandas

You can use the invert (~) operator (which acts like a not for boolean data):

new_df = df[~df["col"].str.contains(word)]

where new_df is the copy returned by RHS.

contains also accepts a regular expression...


If the above throws a ValueError or TypeError, the reason is likely because you have mixed datatypes, so use na=False:

new_df = df[~df["col"].str.contains(word, na=False)]

Or,

new_df = df[df["col"].str.contains(word) == False]

data.table: column name same as variable name

To use get and avoid .SD, you need to set the environment

dt <- data.table::data.table(myvar = 1:10)
myvar <- 2
dt[myvar %in% get("myvar", envir = parent.env(environment()))]
#> myvar
#> 1: 2

Using parent.env(environment()) instead of globalenv() is more stable. Consider its usage in a function where looking in the Global Environment would not work

myfun <- function() {
dt <- data.table::data.table(myvar = 1:10)
myvar <- 2
dt[myvar %in% get("myvar", envir = parent.env(environment()))]
}
myfun()
#> myvar
#> 1: 2

Subsetting a data.table with a variable (when varname identical to colname)

If you don't mind doing it in 2 steps, you can just subset out of the scope of your data.table (though it's usually not what you want to do when working with data.table...):

wh_v1 <- my_data_table[, V1]==V1
my_data_table[wh_v1]
# V1 V2
#1: A 1
#2: A 4

filtering with a variable does not give the same results as with a constant - R

The problem is that your data contains a column i. And in tidyverse pipes, the functions will always look within the data first, so what you essentially trying to do with patch_sparse %>% filter(period==i) is to filter on rows where period is equal to the column i of your data.

So if you want to filter based on an external scalar, make sure the name of the scalar is different from your data's column names, e.g. something like:

filter_i <- 0
patch_sparse %>% filter(period==filter_i)


Related Topics



Leave a reply



Submit