Replace Column Values With Na Based on a Different Column or Row Position With Tidyverse

Replace column values with NA based on a different column or row position with tidyverse

As the 'badBands' have length greater than 1, use %in% instead of ==, also the case_when is type sensitive, so it is better to have the correct NA i.e. NA_real_ for the double column

myData %>% 
mutate(reflectanceSfp = case_when(bandNumber %in% badBands ~ NA_real_,
TRUE ~ reflectanceSfp))
# A tibble: 6 x 5
# reflectanceSfp wavelength bandNumber reflectanceDT wavelength1
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 NA 376. 1 0.000148 377.
#2 NA 381. 2 0.00589 382.
#3 0.0158 386. 3 0.0101 387.
#4 0.0200 391. 4 0.0110 392.
#5 0.0240 396. 5 0.0117 397.
#6 NA 401. 6 0.0149 402.

Or it is easier to use replace here, where we have to specify only the replacing value that satisfies the logical condition and without the type check

myData %>%
mutate(reflectanceSfp = replace(reflectanceSfp,
bandNumber %in% badBands, NA))

Replace values in multiple columns with NA based on value in a different column

Here a solution that actually evaluates if the variable number is 0 or 1 (previous solutions evaluated whether the varible that end with "_1" or "_2" are 1 or 0).

library(dplyr)
df %>%
mutate(across((ends_with("_1")), ~ na_if(number, 1)),
(across((ends_with("_2")), ~ na_if(number, 0))))

# A tibble: 6 x 6
id X_1 Y_1 number X_2 Y_2
<int> <int> <int> <int> <int> <int>
1 1 NA NA 1 1 1
2 1 0 0 0 NA NA
3 2 NA NA 1 1 1
4 2 0 0 0 NA NA
5 3 NA NA 1 1 1
6 3 0 0 0 NA NA

Edit (keep original values)

df %>% 
mutate(across((ends_with("_1")), ~if_else(number == 1, NA_integer_, .))) %>%
mutate(across((ends_with("_2")), ~if_else(number == 0, NA_integer_, .)))

# A tibble: 6 x 6
id X_1 Y_1 number X_2 Y_2
<int> <int> <int> <int> <int> <int>
1 1 NA NA 1 1 3
2 1 1 3 0 NA NA
3 2 NA NA 1 2 4
4 2 2 4 0 NA NA
5 3 NA NA 1 1 3
6 3 1 3 0 NA NA

Data

df <- tibble::tribble(
~id, ~X_1, ~Y_1, ~number, ~X_2, ~Y_2,
1L, 1L, 3L, 1L, 1L, 3L,
1L, 1L, 3L, 0L, 1L, 3L,
2L, 2L, 4L, 1L, 2L, 4L,
2L, 2L, 4L, 0L, 2L, 4L,
3L, 1L, 3L, 1L, 1L, 3L,
3L, 1L, 3L, 0L, 1L, 3L
)

Replacing NA from a specific column with latest non-NA value from that row in R

If it is a large data.frame, it may be more efficient to use vectorized solution instead of looping over rows. Get the logical index of elements in 'col1' that are NA ('i1'), use max.col to return the column index of first non-NA element from columns 3 to 5 ('j1'), create a row/column index matrix (m1) with cbind, assign the 'col1' where there are missing values with the elements extracted from 3 to 5 columns using 'm1' and assign those elements to NA

df1 <- as.data.frame(df)
i1 <- is.na(df1$col1)
j1 <- max.col(!is.na(df1[3:5]), "first")
m1 <- cbind(which(i1), j1[i1])
df1$col1[i1] <- df1[3:5][m1]
df1[3:5][m1] <- NA

-output

> df1
fruits col1 col2 col3 col4
1 apple 4 5 10 20
2 banana 100 NA NA 4
3 ananas 10 NA 5 1

Replace multiple values in a dataframe with NA based on conditions given in another dataframe in R

Here is one method to assign i.e. loop across columns that starts_with 'col' in first dataset ('df1'), create a single string vector by pasteing the 'group', 'subgroup' and the corresponding column name (cur_column()), check if that elements are %in% the pasted rows of 'df2' to create logical vector. Use that in replace to replace those elements to NA

library(dplyr)
library(stringr)
library(purrr)
df1 <- df1 %>%
mutate(across(starts_with('col'),
~ replace(., str_c(group, subgroup, cur_column()) %in%
invoke(str_c, c(df2, sep = '')), NA) ))

-output

df1
# A tibble: 4 x 5
col_1 col_2 col_3 group subgroup
<dbl> <dbl> <dbl> <chr> <chr>
1 1 3 5 A p
2 NA 8 NA A q
3 5 NA NA B p
4 1 7 7 B q

Conditonally replace NA with value from other rows

Your mutate won't work because you did not assign any value to a variable. your mutate() should look like this mutate(value = unique(value[is.na(value)])). Althought this will not be my approach. What I did below was create a look up table of distinct non NA values and then joined them onto the original dataset. valuedis should be the values you want.

temporal <- c("Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday","Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday")
spatial <- c("North", "South","North", "South","North", "South","North", "South","North", "South", "North", "South","North", "South","North", "South","North", "South","North", "South")
value <- c(NA,2,3,4,5,6,7,NA,9,10,1,NA,3,4,5,6,7,8,9,NA)

df <- as.data.frame(cbind(temporal, spatial, value))

library(dplyr)


dfdis <- df %>%
filter(!is.na(value)) %>%
distinct(temporal,spatial,value) %>%
rename(valuedis = value)

df2 <- left_join(df,dfdis, by = c("temporal","spatial"))

replace values with NA in several columns

We may do this in two steps - loop across the columns that have 'VAR' followed by digits (\\d+) in column names, replace the values where the first two characters are not AA or DD to NA, then replace the corresponding DATE column to NA based on the NA in the 'VAR1', 'VAR2' columns

library(dplyr)
library(stringr)
DF %>%
mutate(across(matches("^VAR\\d+$"),
~ replace(., !substr(., 1, 2) %in% c("AA", "DD"), NA)),
across(ends_with("DATE"),
~ replace(., is.na(get(str_remove(cur_column(), "DATE"))), NA)))

-output

# A tibble: 5 × 5
ID VAR1 VAR1DATE VAR2 VAR2DATE
<int> <chr> <chr> <chr> <chr>
1 1 AABB 2001-01-01 <NA> <NA>
2 2 AACC 2001-01-02 AACC 2001-01-02
3 3 <NA> <NA> DDCC 2001-01-03
4 4 DDAA 2001-01-04 <NA> <NA>
5 5 <NA> <NA> <NA> <NA>


Related Topics



Leave a reply



Submit