Replacing occurrences of a number in multiple columns of data frame with another value in R
you want to search through the whole data frame for any value that matches the value you're trying to replace. the same way you can run a logical test like replacing all missing values with 10..
data[ is.na( data ) ] <- 10
you can also replace all 4s with 10s.
data[ data == 4 ] <- 10
at least i think that's what you're after?
and let's say you wanted to ignore the first row (since it's all letters)
# identify which columns contain the values you might want to replace
data[ , 2:3 ]
# subset it with extended bracketing..
data[ , 2:3 ][ data[ , 2:3 ] == 4 ]
# ..those were the values you're going to replace
# now overwrite 'em with tens
data[ , 2:3 ][ data[ , 2:3 ] == 4 ] <- 10
# look at the final data
data
R replace specific value in many columns across dataframe
Instead of select
, we can directly specify the matches
in mutate
to replace the values that are '81' to NA
(use na_if
)
library(dplyr)
df <- df %>%
mutate(across(matches("^Col_\\d+$"), ~ na_if(., "81")))
-output
df
Id Date Col_01 Col_02 Col_03 Col_04
1 30 2012-03-31 1 A42.2 20.46 43
2 36 1996-11-15 42 V73 23 55
3 96 2010-02-07 X48 <NA> 13 3R
4 40 2010-03-18 AD14 18.12 20.12 36
5 69 2012-02-21 8 22.45 12 10
6 11 2013-07-03 <NA> V017 78.12 <NA>
7 22 2001-06-01 11 09 55 12
8 83 2005-03-16 80.45 V22.15 46.52 X29.11
9 92 2012-02-12 1 4 67 12
10 34 2014-03-10 82.12 N72.22 V45.44 10
Or we can use base R
i1 <- grep("^Col_\\d+$", names(df))
df[i1][df[i1] == "81"] <- NA
The issue in the OP's code is the assignment is not triggered as we expect i.e.
(df %>%
select(matches("^Col_\\d+$")))[(df %>%
select(matches("^Col_\\d+$"))) == "81" ]
[1] "81" "81" "81"
which is same as
df[i1][df[i1] == "81"]
[1] "81" "81" "81"
and not the assignment
(df %>%
select(matches("^Col_\\d+$")))[(df %>%
select(matches("^Col_\\d+$"))) == "81" ] <- NA
Error in (df %>% select(matches("^Col_\\d+$")))[(df %>% select(matches("^Col_\\d+$"))) == :
could not find function "(<-"
In base R
, it does the assignment with [<-
data
df <- structure(list(Id = c(30L, 36L, 96L, 40L, 69L, 11L, 22L, 83L,
92L, 34L), Date = c("2012-03-31", "1996-11-15", "2010-02-07",
"2010-03-18", "2012-02-21", "2013-07-03", "2001-06-01", "2005-03-16",
"2012-02-12", "2014-03-10"), Col_01 = c("1", "42", "X48", "AD14",
"8", "81", "11", "80.45", "1", "82.12"), Col_02 = c("A42.2",
"V73", "81", "18.12", "22.45", "V017", "09", "V22.15", "4", "N72.22"
), Col_03 = c("20.46", "23", "13", "20.12", "12", "78.12", "55",
"46.52", "67", "V45.44"), Col_04 = c("43", "55", "3R", "36",
"10", "81", "12", "X29.11", "12", "10")),
class = "data.frame", row.names = c(NA,
-10L))
R: Replace multiple values in multiple columns of dataframes with NA
You can also do this using replace
:
sel <- grepl("var",names(df))
df[sel] <- lapply(df[sel], function(x) replace(x,x %in% 3:4, NA) )
df
# name foo var1 var2
#1 a 1 1 NA
#2 a 2 2 NA
#3 a 3 NA NA
#4 b 4 NA NA
#5 b 5 5 NA
#6 b 6 6 NA
#7 c 7 7 5
#8 c 8 8 5
#9 c 9 9 5
Some quick benchmarking using a million row sample of data suggests this is quicker than the other answers.
How to replace all values in multiple columns that are not among the values in another column
An alternative with Base R
,
df[,-1][matrix(!(unlist(df[,-1]) %in% df[,1]),nrow(df))] <- NA
df
gives,
ID PN1 PN2
1 1 2 5
2 2 NA 4
3 4 NA 2
4 5 2 NA
replace values in a column into Data Frame with another value (same for all)
You can replace multiple columns using across
and multiple values with %in%
. For example, if you want to replace values from column a
, b
, c
and d
, you can do :
library(dplyr)
df <- df %>% mutate(across(a:d, ~replace(., . %in% 2:5, 1)))
#For dplyr < 1.0.0 use `mutate_at`
#df <- df %>% mutate_at(vars(a:d), ~replace(., . %in% 2:5, 1))
In base R, you can do this with lapply
:
cols <- c('a','b','c','d')
df[cols] <- lapply(df[cols], function(x) replace(x, x %in% 2:5, 1))
Replace multiple similar values in a column in R
Perhaps adding 3 and pasting " years old" will satisfy your needs?
data$txtAge <- paste(data$Age, "years old")
There is no need for an iterative command. R's functions often iterate automagically. In this case the paste
command is designed to return character results of the same length as the longest input argument but it "recycles" (repeats) the shorter argument. You would get a column of the same length as there were rows in the data
object.
Replacing occurrences of a number in specific columns from a data table
We can use set
which would be more efficient
for(j in 2:4) {
set(DT, i = which(DT[[j]]==4), j=j, value = 10)
}
DT
# V1 V2 V3 V4
#1: A 2 2 10
#2: B 1 10 10
#3: C 3 10 3
#4: D 3 2 10
#5: E 3 3 3
#6: F 10 3 3
The above can be done with column names as well
for(j in names(DT)[2:4]){
set(DT, i = which(DT[[j]]==4), j=j, value = 10)
}
Or another option is to specify the .SDcols
with the columns of interest (either the numeric index or the column names), loop through the Subset of Data.table (.SD
), replace
the values that are 4 to 10 and assign (:=
) the output back to columns of interest
DT[, (2:4) := lapply(.SD, function(x) replace(x, x==4, 10)), .SDcols = 2:4]
Or with column names
DT[, (names(DT)[2:4]) := lapply(.SD, function(x) replace(x, x==4, 10)),
.SDcols = names(DT)[2:4]]
data
set.seed(24)
DT <- data.table(V1 = LETTERS[1:6], V2 = sample(1:4, 6, replace = TRUE),
V3 = sample(2:4, 6, replace = TRUE), V4 = sample(3:4, 6, replace= TRUE))
Replace multiple characters with multiple values in multiple columns? R
You can use dplyr::recode
df <- data.frame(name = rep(letters[1:3], each = 3), foo=rep(1:9),var1 = letters[1:3], var2 = rep(3:5, each = 3))
library(dplyr, warn.conflicts = FALSE)
df %>%
mutate(across(c(name, var1), ~ recode(., a = 1, b = 2, c = 3)))
#> name foo var1 var2
#> 1 1 1 1 3
#> 2 1 2 2 3
#> 3 1 3 3 3
#> 4 2 4 1 4
#> 5 2 5 2 4
#> 6 2 6 3 4
#> 7 3 7 1 5
#> 8 3 8 2 5
#> 9 3 9 3 5
Created on 2021-10-19 by the reprex package (v2.0.1)
Across will apply the function defined by ~ recode(., a = 1, b = 2, c = 3)
to both name
and var1
.
Using ~
and .
is another way to define a function in across
. This function is equivalent to the one defined by function(x) recode(x, a = 1, b = 2, c = 3)
, and you could use that code in across
instead of the ~
form and it would give the same result. The only name I know for this is what it's called in ?across
, which is "purrr-style lambda function", because the purrr package was the first to use formulas to define functions in this way.
If you want to see the actual function created by the formula, you can look at rlang::as_function(~ recode(., a = 1, b = 2, c = 3))
, although it's a little more complex than the one above to support the use of ..1
, ..2
and ..3
which are not used here.
Now that R supports the easier way of defining functions below, this purrr-style function is maybe no longer useful, it's just an old habit to write it that way.
df <- data.frame(name = rep(letters[1:3], each = 3), foo=rep(1:9),var1 = letters[1:3], var2 = rep(3:5, each = 3))
library(dplyr, warn.conflicts = FALSE)
df %>%
mutate(across(c(name, var1), \(x) recode(x, a = 1, b = 2, c = 3)))
#> name foo var1 var2
#> 1 1 1 1 3
#> 2 1 2 2 3
#> 3 1 3 3 3
#> 4 2 4 1 4
#> 5 2 5 2 4
#> 6 2 6 3 4
#> 7 3 7 1 5
#> 8 3 8 2 5
#> 9 3 9 3 5
Created on 2021-10-19 by the reprex package (v2.0.1)
Count occurrences of value in multiple columns with duplicates
You could just subset the to
vector:
data.table(table(unlist(toy_data[,c(from,to[to!=from])])))
V1 N
1: A 3
2: B 1
3: C 2
4: D 1
5: E 2
6: F 1
Related Topics
How to Change the First Row to Be the Header in R
How to Reorder Data.Table Columns (Without Copying)
Create Empty Data Frame with Column Names by Assigning a String Vector
Subsetting a Data Frame Based on Contents of Another Data Frame
R: += (Plus Equals) and ++ (Plus Plus) Equivalent from C++/C#/Java, etc.
Ggplot2 Heatmaps: Using Different Gradients for Categories
Global Variables in Packages in R
Creating a Local R Package Repository
Removing the Border of Legend Symbol
Comparing Two Vectors in an If Statement
Reordering Factor Gives Different Results, Depending on Which Packages Are Loaded
What Does "S3 Methods" Mean in R
How to Fit a Smooth Curve to My Data in R
Why Does Merge Result in More Rows Than Original Data
How to Select a Cran Mirror in R