Replacing Occurrences of a Number in Multiple Columns of Data Frame with Another Value in R

Replacing occurrences of a number in multiple columns of data frame with another value in R

you want to search through the whole data frame for any value that matches the value you're trying to replace. the same way you can run a logical test like replacing all missing values with 10..

data[ is.na( data ) ] <- 10

you can also replace all 4s with 10s.

data[ data == 4 ] <- 10

at least i think that's what you're after?

and let's say you wanted to ignore the first row (since it's all letters)

# identify which columns contain the values you might want to replace
data[ , 2:3 ]

# subset it with extended bracketing..
data[ , 2:3 ][ data[ , 2:3 ] == 4 ]
# ..those were the values you're going to replace

# now overwrite 'em with tens
data[ , 2:3 ][ data[ , 2:3 ] == 4 ] <- 10

# look at the final data
data

R replace specific value in many columns across dataframe

Instead of select, we can directly specify the matches in mutate to replace the values that are '81' to NA (use na_if)

library(dplyr)
df <- df %>%
mutate(across(matches("^Col_\\d+$"), ~ na_if(., "81")))

-output

df
Id Date Col_01 Col_02 Col_03 Col_04
1 30 2012-03-31 1 A42.2 20.46 43
2 36 1996-11-15 42 V73 23 55
3 96 2010-02-07 X48 <NA> 13 3R
4 40 2010-03-18 AD14 18.12 20.12 36
5 69 2012-02-21 8 22.45 12 10
6 11 2013-07-03 <NA> V017 78.12 <NA>
7 22 2001-06-01 11 09 55 12
8 83 2005-03-16 80.45 V22.15 46.52 X29.11
9 92 2012-02-12 1 4 67 12
10 34 2014-03-10 82.12 N72.22 V45.44 10

Or we can use base R

i1 <- grep("^Col_\\d+$", names(df))
df[i1][df[i1] == "81"] <- NA

The issue in the OP's code is the assignment is not triggered as we expect i.e.

(df %>% 
select(matches("^Col_\\d+$")))[(df %>%
select(matches("^Col_\\d+$"))) == "81" ]
[1] "81" "81" "81"

which is same as

df[i1][df[i1] == "81"]
[1] "81" "81" "81"

and not the assignment

(df %>% 
select(matches("^Col_\\d+$")))[(df %>%
select(matches("^Col_\\d+$"))) == "81" ] <- NA
Error in (df %>% select(matches("^Col_\\d+$")))[(df %>% select(matches("^Col_\\d+$"))) == :
could not find function "(<-"

In base R, it does the assignment with [<-

data

df <- structure(list(Id = c(30L, 36L, 96L, 40L, 69L, 11L, 22L, 83L, 
92L, 34L), Date = c("2012-03-31", "1996-11-15", "2010-02-07",
"2010-03-18", "2012-02-21", "2013-07-03", "2001-06-01", "2005-03-16",
"2012-02-12", "2014-03-10"), Col_01 = c("1", "42", "X48", "AD14",
"8", "81", "11", "80.45", "1", "82.12"), Col_02 = c("A42.2",
"V73", "81", "18.12", "22.45", "V017", "09", "V22.15", "4", "N72.22"
), Col_03 = c("20.46", "23", "13", "20.12", "12", "78.12", "55",
"46.52", "67", "V45.44"), Col_04 = c("43", "55", "3R", "36",
"10", "81", "12", "X29.11", "12", "10")),
class = "data.frame", row.names = c(NA,
-10L))

R: Replace multiple values in multiple columns of dataframes with NA

You can also do this using replace:

sel <- grepl("var",names(df))
df[sel] <- lapply(df[sel], function(x) replace(x,x %in% 3:4, NA) )
df

# name foo var1 var2
#1 a 1 1 NA
#2 a 2 2 NA
#3 a 3 NA NA
#4 b 4 NA NA
#5 b 5 5 NA
#6 b 6 6 NA
#7 c 7 7 5
#8 c 8 8 5
#9 c 9 9 5

Some quick benchmarking using a million row sample of data suggests this is quicker than the other answers.

How to replace all values in multiple columns that are not among the values in another column

An alternative with Base R,

df[,-1][matrix(!(unlist(df[,-1]) %in% df[,1]),nrow(df))] <- NA
df

gives,

  ID PN1 PN2
1 1 2 5
2 2 NA 4
3 4 NA 2
4 5 2 NA

replace values in a column into Data Frame with another value (same for all)

You can replace multiple columns using across and multiple values with %in%. For example, if you want to replace values from column a, b, c and d, you can do :

library(dplyr)
df <- df %>% mutate(across(a:d, ~replace(., . %in% 2:5, 1)))
#For dplyr < 1.0.0 use `mutate_at`
#df <- df %>% mutate_at(vars(a:d), ~replace(., . %in% 2:5, 1))

In base R, you can do this with lapply :

cols <- c('a','b','c','d')
df[cols] <- lapply(df[cols], function(x) replace(x, x %in% 2:5, 1))

Replace multiple similar values in a column in R

Perhaps adding 3 and pasting " years old" will satisfy your needs?

 data$txtAge <- paste(data$Age, "years old")

There is no need for an iterative command. R's functions often iterate automagically. In this case the paste command is designed to return character results of the same length as the longest input argument but it "recycles" (repeats) the shorter argument. You would get a column of the same length as there were rows in the data object.

Replacing occurrences of a number in specific columns from a data table

We can use set which would be more efficient

for(j in 2:4) {
set(DT, i = which(DT[[j]]==4), j=j, value = 10)
}
DT
# V1 V2 V3 V4
#1: A 2 2 10
#2: B 1 10 10
#3: C 3 10 3
#4: D 3 2 10
#5: E 3 3 3
#6: F 10 3 3

The above can be done with column names as well

for(j in names(DT)[2:4]){
set(DT, i = which(DT[[j]]==4), j=j, value = 10)
}

Or another option is to specify the .SDcols with the columns of interest (either the numeric index or the column names), loop through the Subset of Data.table (.SD), replace the values that are 4 to 10 and assign (:=) the output back to columns of interest

DT[, (2:4) := lapply(.SD, function(x) replace(x, x==4, 10)), .SDcols = 2:4]

Or with column names

DT[, (names(DT)[2:4]) := lapply(.SD, function(x) replace(x, x==4, 10)), 
.SDcols = names(DT)[2:4]]

data

set.seed(24)
DT <- data.table(V1 = LETTERS[1:6], V2 = sample(1:4, 6, replace = TRUE),
V3 = sample(2:4, 6, replace = TRUE), V4 = sample(3:4, 6, replace= TRUE))

Replace multiple characters with multiple values in multiple columns? R

You can use dplyr::recode

df <- data.frame(name = rep(letters[1:3], each = 3), foo=rep(1:9),var1 = letters[1:3], var2 = rep(3:5, each = 3))


library(dplyr, warn.conflicts = FALSE)

df %>%
mutate(across(c(name, var1), ~ recode(., a = 1, b = 2, c = 3)))
#> name foo var1 var2
#> 1 1 1 1 3
#> 2 1 2 2 3
#> 3 1 3 3 3
#> 4 2 4 1 4
#> 5 2 5 2 4
#> 6 2 6 3 4
#> 7 3 7 1 5
#> 8 3 8 2 5
#> 9 3 9 3 5

Created on 2021-10-19 by the reprex package (v2.0.1)

Across will apply the function defined by ~ recode(., a = 1, b = 2, c = 3) to both name and var1.

Using ~ and . is another way to define a function in across. This function is equivalent to the one defined by function(x) recode(x, a = 1, b = 2, c = 3), and you could use that code in across instead of the ~ form and it would give the same result. The only name I know for this is what it's called in ?across, which is "purrr-style lambda function", because the purrr package was the first to use formulas to define functions in this way.

If you want to see the actual function created by the formula, you can look at rlang::as_function(~ recode(., a = 1, b = 2, c = 3)), although it's a little more complex than the one above to support the use of ..1, ..2 and ..3 which are not used here.

Now that R supports the easier way of defining functions below, this purrr-style function is maybe no longer useful, it's just an old habit to write it that way.

df <- data.frame(name = rep(letters[1:3], each = 3), foo=rep(1:9),var1 = letters[1:3], var2 = rep(3:5, each = 3))

library(dplyr, warn.conflicts = FALSE)

df %>%
mutate(across(c(name, var1), \(x) recode(x, a = 1, b = 2, c = 3)))
#> name foo var1 var2
#> 1 1 1 1 3
#> 2 1 2 2 3
#> 3 1 3 3 3
#> 4 2 4 1 4
#> 5 2 5 2 4
#> 6 2 6 3 4
#> 7 3 7 1 5
#> 8 3 8 2 5
#> 9 3 9 3 5

Created on 2021-10-19 by the reprex package (v2.0.1)

Count occurrences of value in multiple columns with duplicates

You could just subset the to vector:

data.table(table(unlist(toy_data[,c(from,to[to!=from])])))

V1 N
1: A 3
2: B 1
3: C 2
4: D 1
5: E 2
6: F 1


Related Topics



Leave a reply



Submit