R: Replace Multiple Values in Multiple Columns of Dataframes with Na

R: Replace multiple values in multiple columns of dataframes with NA

You can also do this using replace:

sel <- grepl("var",names(df))
df[sel] <- lapply(df[sel], function(x) replace(x,x %in% 3:4, NA) )
df

# name foo var1 var2
#1 a 1 1 NA
#2 a 2 2 NA
#3 a 3 NA NA
#4 b 4 NA NA
#5 b 5 5 NA
#6 b 6 6 NA
#7 c 7 7 5
#8 c 8 8 5
#9 c 9 9 5

Some quick benchmarking using a million row sample of data suggests this is quicker than the other answers.

Replace multiple characters with multiple values in multiple columns? R

You can use dplyr::recode

df <- data.frame(name = rep(letters[1:3], each = 3), foo=rep(1:9),var1 = letters[1:3], var2 = rep(3:5, each = 3))


library(dplyr, warn.conflicts = FALSE)

df %>%
mutate(across(c(name, var1), ~ recode(., a = 1, b = 2, c = 3)))
#> name foo var1 var2
#> 1 1 1 1 3
#> 2 1 2 2 3
#> 3 1 3 3 3
#> 4 2 4 1 4
#> 5 2 5 2 4
#> 6 2 6 3 4
#> 7 3 7 1 5
#> 8 3 8 2 5
#> 9 3 9 3 5

Created on 2021-10-19 by the reprex package (v2.0.1)

Across will apply the function defined by ~ recode(., a = 1, b = 2, c = 3) to both name and var1.

Using ~ and . is another way to define a function in across. This function is equivalent to the one defined by function(x) recode(x, a = 1, b = 2, c = 3), and you could use that code in across instead of the ~ form and it would give the same result. The only name I know for this is what it's called in ?across, which is "purrr-style lambda function", because the purrr package was the first to use formulas to define functions in this way.

If you want to see the actual function created by the formula, you can look at rlang::as_function(~ recode(., a = 1, b = 2, c = 3)), although it's a little more complex than the one above to support the use of ..1, ..2 and ..3 which are not used here.

Now that R supports the easier way of defining functions below, this purrr-style function is maybe no longer useful, it's just an old habit to write it that way.

df <- data.frame(name = rep(letters[1:3], each = 3), foo=rep(1:9),var1 = letters[1:3], var2 = rep(3:5, each = 3))

library(dplyr, warn.conflicts = FALSE)

df %>%
mutate(across(c(name, var1), \(x) recode(x, a = 1, b = 2, c = 3)))
#> name foo var1 var2
#> 1 1 1 1 3
#> 2 1 2 2 3
#> 3 1 3 3 3
#> 4 2 4 1 4
#> 5 2 5 2 4
#> 6 2 6 3 4
#> 7 3 7 1 5
#> 8 3 8 2 5
#> 9 3 9 3 5

Created on 2021-10-19 by the reprex package (v2.0.1)

replace values with NA in several columns

We may do this in two steps - loop across the columns that have 'VAR' followed by digits (\\d+) in column names, replace the values where the first two characters are not AA or DD to NA, then replace the corresponding DATE column to NA based on the NA in the 'VAR1', 'VAR2' columns

library(dplyr)
library(stringr)
DF %>%
mutate(across(matches("^VAR\\d+$"),
~ replace(., !substr(., 1, 2) %in% c("AA", "DD"), NA)),
across(ends_with("DATE"),
~ replace(., is.na(get(str_remove(cur_column(), "DATE"))), NA)))

-output

# A tibble: 5 × 5
ID VAR1 VAR1DATE VAR2 VAR2DATE
<int> <chr> <chr> <chr> <chr>
1 1 AABB 2001-01-01 <NA> <NA>
2 2 AACC 2001-01-02 AACC 2001-01-02
3 3 <NA> <NA> DDCC 2001-01-03
4 4 DDAA 2001-01-04 <NA> <NA>
5 5 <NA> <NA> <NA> <NA>

Replace multiple values in a dataframe with NA based on conditions given in another dataframe in R

Here is one method to assign i.e. loop across columns that starts_with 'col' in first dataset ('df1'), create a single string vector by pasteing the 'group', 'subgroup' and the corresponding column name (cur_column()), check if that elements are %in% the pasted rows of 'df2' to create logical vector. Use that in replace to replace those elements to NA

library(dplyr)
library(stringr)
library(purrr)
df1 <- df1 %>%
mutate(across(starts_with('col'),
~ replace(., str_c(group, subgroup, cur_column()) %in%
invoke(str_c, c(df2, sep = '')), NA) ))

-output

df1
# A tibble: 4 x 5
col_1 col_2 col_3 group subgroup
<dbl> <dbl> <dbl> <chr> <chr>
1 1 3 5 A p
2 NA 8 NA A q
3 5 NA NA B p
4 1 7 7 B q

How to replace multiple values among multiple columns in R dataframe?

One possibility could be using mutate_at() and then a nested ifelse():

df %>%
mutate_at(vars(contains("label")),
funs(ifelse(. %in% food, "food",
ifelse(. %in% clothing, "clothing",
ifelse(. %in% entertainment, "entertainment",
ifelse(. %in% forest, "forest", NA_character_))))))


id label1 label2
1 1 food food
2 2 clothing <NA>
3 3 food food
4 4 clothing clothing
5 5 <NA> <NA>
6 6 entertainment entertainment
7 7 forest forest
8 8 food food

With mutate_at(), it selects the variables that has "label" in their name and then simply applies a nested ifelse() given the conditions.

Replace multiple NA values with a correct values from a different set of columns

You could use map2 + coalesce:

library(dplyr)
library(purrr)

map2_dfc(select(df, 1:3), select(df, 4:6), coalesce)

# # A tibble: 5 × 3
# M_1 M_2 M_3
# <dbl> <dbl> <dbl>
# 1 1 5 6
# 2 2 1 7
# 3 3 3 2
# 4 6 4 3
# 5 6 1 4

Replace values in multiple columns with NA based on value in a different column

Here a solution that actually evaluates if the variable number is 0 or 1 (previous solutions evaluated whether the varible that end with "_1" or "_2" are 1 or 0).

library(dplyr)
df %>%
mutate(across((ends_with("_1")), ~ na_if(number, 1)),
(across((ends_with("_2")), ~ na_if(number, 0))))

# A tibble: 6 x 6
id X_1 Y_1 number X_2 Y_2
<int> <int> <int> <int> <int> <int>
1 1 NA NA 1 1 1
2 1 0 0 0 NA NA
3 2 NA NA 1 1 1
4 2 0 0 0 NA NA
5 3 NA NA 1 1 1
6 3 0 0 0 NA NA

Edit (keep original values)

df %>% 
mutate(across((ends_with("_1")), ~if_else(number == 1, NA_integer_, .))) %>%
mutate(across((ends_with("_2")), ~if_else(number == 0, NA_integer_, .)))

# A tibble: 6 x 6
id X_1 Y_1 number X_2 Y_2
<int> <int> <int> <int> <int> <int>
1 1 NA NA 1 1 3
2 1 1 3 0 NA NA
3 2 NA NA 1 2 4
4 2 2 4 0 NA NA
5 3 NA NA 1 1 3
6 3 1 3 0 NA NA

Data

df <- tibble::tribble(
~id, ~X_1, ~Y_1, ~number, ~X_2, ~Y_2,
1L, 1L, 3L, 1L, 1L, 3L,
1L, 1L, 3L, 0L, 1L, 3L,
2L, 2L, 4L, 1L, 2L, 4L,
2L, 2L, 4L, 0L, 2L, 4L,
3L, 1L, 3L, 1L, 1L, 3L,
3L, 1L, 3L, 0L, 1L, 3L
)

Replace certain values across multiple columns, before column values change, with NA in R

This is something you can try using cumall. This effectively will convert values not equal to zero to NA until the first zero. This logic will give the same results as desired above. However, if your actual needs differ or you want to explicitly check the first row for a given Target to ensure is equal to 1, you can change to first(.) == 1 & cumall(. != 0) instead.

library(tidyverse)

dat %>%
group_by(Target) %>%
mutate(across(.cols = starts_with("targ2"),
~replace(., cumall(. != 0), NA)))

Output

   Target targ2clicks targ2midclicks targ2rClicks
<chr> <dbl> <dbl> <dbl>
1 01 NA NA 0
2 01 NA NA 0
3 01 NA NA 0
4 01 NA NA 1
5 01 0 0 0
6 01 0 0 0
7 01 0 0 0
8 01 1 1 1
9 02 NA NA 0
10 02 0 0 0
11 02 0 0 0
12 02 0 0 0
13 02 1 1 1
14 03 0 NA NA
15 03 0 0 NA
16 03 0 0 NA
17 03 1 1 NA

Replace values of multiple columns from one dataframe using another dataframe with conditions

Your code didn't work for me so I change it a little but it works. If you are reading data from an external file use the stringAsFactor = FALSE when you read it so you don't run into problems.

df1 = data.frame("ID" = 1:4,"x" = rep("a",4), "y" =rep("b",4),"z" = rep("c",4),
stringsAsFactors=FALSE)
df2 = data.frame("ID" = 2:3,"x" = c("d",NA), "y" = c(NA,"e"),stringsAsFactors=FALSE)

for(i in 1:nrow(df2)){
new_data = df2[i,-which(apply(df2[i,],2,is.na))]
pos = as.numeric(new_data[1])
col_replace = intersect(colnames(new_data),colnames(df1))
df1[pos,col_replace] = new_data
}


Related Topics



Leave a reply



Submit