Replace values in a vector based on another vector
Working with factors might be faster:
xf <- as.factor(x)
y[xf]
Note, that levels(xf)
gives you a character vector similar to your x.lvl. Thus, for this method to work, elements of y should correspond to appropriate elements of levels(xf)
.
R - Replace items in a list based on another vector
You can use a lapply()
to wrap around your list, and use stringi::stri_replace_all_fixed()
to replace the text.
library(stringi)
data_to_change$animal_split <- lapply(data_to_change$animal_split, stri_replace_all_fixed, new_names$V1, new_names$V2, vectorize = F)
data_to_change$animal_split
[[1]]
[1] "doggy" "cat" "monkey"
[[2]]
[1] "goldfish"
[[3]]
[1] "mouse" "doggy" "bunny" "squirrel"
Replace given value in vector
Perhaps replace
is what you are looking for:
> x = c(3, 2, 1, 0, 4, 0)
> replace(x, x==0, 1)
[1] 3 2 1 1 4 1
Or, if you don't have x
(any specific reason why not?):
replace(c(3, 2, 1, 0, 4, 0), c(3, 2, 1, 0, 4, 0)==0, 1)
Many people are familiar with gsub
, so you can also try either of the following:
as.numeric(gsub(0, 1, x))
as.numeric(gsub(0, 1, c(3, 2, 1, 0, 4, 0)))
Update
After reading the comments, perhaps with
is an option:
with(data.frame(x = c(3, 2, 1, 0, 4, 0)), replace(x, x == 0, 1))
Replacement of column values based on a named vector
You could use col
:
df$col1 <- vec[as.character(df$col)]
Or in mutate
:
library(dplyr)
df %>% mutate(col1 = vec[as.character(col)])
# col col1
# <int> <chr>
# 1 1 a
# 2 1 a
# 3 1 a
# 4 1 a
# 5 2 b
# 6 2 b
# 7 3 c
# 8 3 c
# 9 3 c
#10 3 c
#11 3 c
Replace values in one column based on a vector conditionally matching another column
The first step is to realize that defining ranges of integers will not work. Instead, I'll go with a list of number pairs:
badData <- list(c(296,310), c(330,335), c(350,565))
with the understanding that we want to check each $wavelength
to be within any of these three ranges. More ranges are supported.
The second thing we can do is write a function that checks if a vector of values is within one or more pairs of numbers. (In this example, we "know" that it will not be in more than one, but that's not critical.)
within_ranges <- function(x, lims) {
Reduce(`|`, lapply(lims, function(lim) lim[1] <= x & x <= lim[2]))
}
To understand what this is doing, let's debug it, call it, and see what's going on.
debugonce(within_ranges)
within_ranges(df$wavelength, badData)
# debugging in: within_ranges(df$wavelength, badData)
# debug at #1: {
# Reduce(`|`, lapply(lims, function(lim) lim[1] <= x & x <=
# lim[2]))
# }
Let's just run that inner portion:
# Browse[2]>
lapply(lims, function(lim) lim[1] <= x & x <= lim[2])
# [[1]]
# [1] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [[2]]
# [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
# [[3]]
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
So the first element (T,T,F,F,...) is whether the values (x
) fall within the first number pair (296 to 310); the second element with the second pair (330 to 335); etc.
The Reduce(
part calls the first argument, a function, on the first two arguments, saves the return, and then runs the same function on the return and the third argument. It stores it, then runs the same function on the return and fourth argument (if exists). It repeats this along the entire length of the provided list.
In this example, the function is the literal |
(escaped since it is special), so it is "OR"ing the [[1]]
vector with the [[2]]
vector. You can actually see what is happening if you add accumulate=TRUE
:
# Browse[2]>
Reduce(`|`, lapply(lims, function(lim) lim[1] <= x & x <= lim[2]), accumulate=TRUE)
# [[1]]
# [1] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [[2]]
# [1] TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
# [[3]]
# [1] TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE TRUE
The first return is the first vector, unmodified. The second element is the original [[2]]
vector ORed with the previous return which is this [[1]]
vector (which is the same as the original [[1]]
). The third element is the original [[3]]
vector ORed with the previous return, which is this [[2]]
. This results in the three groupings of TRUE
(1, 2, 7, 11, 12) that you are expecting. So we want the [[3]]
element, which is what we get without accumulating:
# Browse[2]>
Reduce(`|`, lapply(lims, function(lim) lim[1] <= x & x <= lim[2]))
# [1] TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE TRUE
Okay, so let's Q
uit out of the debugger, and give it a full go:
within_ranges(df$wavelength, badData)
# [1] TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE TRUE
This output looks familiar.
(BTW: inside our function, we could also have used
rowSums(sapply(lims, ...)) > 0
and it would have worked just as well. For that, though, you need to realize that
sapply
should be returning amatrix
with as many columns as we have rows of data indf
, odd if you aren't familiar.)
Now, we can NA
ify what we need to either with dplyr
:
df %>%
mutate(
reflectance = if_else(within_ranges(wavelength, badData), NA_real_, reflectance)
)
# wavelength reflectance
# 1 300.0000 NA
# 2 305.0087 NA
# 3 310.0173 -11.01733
# 4 315.0260 -16.02600
# 5 320.0347 -21.03467
# 6 325.0433 -26.04333
# 7 330.0520 NA
# 8 335.0607 -36.06067
# 9 340.0693 -41.06934
# 10 345.0780 -46.07800
# 11 350.0867 NA
# 12 355.0953 NA
Edit: or another dplyr
, using your first thought of replace
(not my first by habit, no reason):
df %>%
mutate(
reflectance = replace(reflectance, within_ranges(wavelength, badData), NA_real_)
)
or base R:
df$reflectance <- ifelse(within_ranges(df$wavelength, badData), NA_real_, df$reflectance)
df
# wavelength reflectance
# 1 300.0000 NA
# 2 305.0087 NA
# 3 310.0173 -11.01733
# 4 315.0260 -16.02600
# 5 320.0347 -21.03467
# 6 325.0433 -26.04333
# 7 330.0520 NA
# 8 335.0607 -36.06067
# 9 340.0693 -41.06934
# 10 345.0780 -46.07800
# 11 350.0867 NA
# 12 355.0953 NA
Notes:
- I'm specifically using
NA_real_
, both for clarity (did you know there are different types ofNA
?), and partly because in the use ofdplyr::if_else
, it will complain/fail if the classes of the "true" and "false" arguments are not the same (NA
is technicallylogical
, notnumeric
as yourreflectance
is); - I use
dplyr::if_else
for the first example, since you're already usingdplyr
, but in case you choose to foregodplyr
(or somebody else does), then the base-Rifelse
works, too. (It has its liabilities, but it appears to work just fine here.)
Replacing vector elements based on indices of another vector
I guess you want this indexing
> b[a]
[1] 0.5 2.0 3.0 2.5 2.5 1.0 2.0 3.0 0.5
In R how would you replace values in a matrix that have a certain condition with values from another vector?
Thought you had to use apply(Matrix, \(x) pmin(x, Vector))
, but actually, you can just use pmin()
directly on your Matrix
because it will recycle the Vector
to match the length.
pmin(Matrix, Vector)
#> [,1] [,2]
#> [1,] 2 2
#> [2,] 3 3
#> [3,] 3 2
#> [4,] 1 1
Related Topics
Add Max Value to a New Column in R
Extract a Column from a Data.Table as a Vector, by Position
Print Unicode Character String in R
Calculating Mean for Every N Values from a Vector
Ggplot2 Plot Without Axes, Legends, etc
Percentage on Y Lab in a Faceted Ggplot Barchart
Agrep: Only Return Best Match(Es)
How to Add a Table to My Ggplot2 Output
From Data Table, Randomly Select One Row Per Group
What Leads the First Element of a Printed List to Be Enclosed with Backticks in R V3.5.1
Finding 2 & 3 Word Phrases Using R Tm Package
Add a Row by Reference at the End of a Data.Table Object
Shiny Slider on Logarithmic Scale
Comparing Two Vectors in an If Statement
Why Is Message() a Better Choice Than Print() in R for Writing a Package
How to Add Frequency Count Labels to the Bars in a Bar Graph Using Ggplot2