Dictionary style replace multiple items
map = setNames(c("0101", "0102", "0103"), c("AA", "AC", "AG"))
foo[] <- map[unlist(foo)]
assuming that map
covers all the cases in foo
. This would feel less like a 'hack' and be more efficient in both space and time if foo
were a matrix (of character()), then
matrix(map[foo], nrow=nrow(foo), dimnames=dimnames(foo))
Both matrix and data frame variants run afoul of R's 2^31-1 limit on vector size when there are millions of SNPs and thousands of samples.
Replace multiple characters with multiple values in multiple columns? R
You can use dplyr::recode
df <- data.frame(name = rep(letters[1:3], each = 3), foo=rep(1:9),var1 = letters[1:3], var2 = rep(3:5, each = 3))
library(dplyr, warn.conflicts = FALSE)
df %>%
mutate(across(c(name, var1), ~ recode(., a = 1, b = 2, c = 3)))
#> name foo var1 var2
#> 1 1 1 1 3
#> 2 1 2 2 3
#> 3 1 3 3 3
#> 4 2 4 1 4
#> 5 2 5 2 4
#> 6 2 6 3 4
#> 7 3 7 1 5
#> 8 3 8 2 5
#> 9 3 9 3 5
Created on 2021-10-19 by the reprex package (v2.0.1)
Across will apply the function defined by ~ recode(., a = 1, b = 2, c = 3)
to both name
and var1
.
Using ~
and .
is another way to define a function in across
. This function is equivalent to the one defined by function(x) recode(x, a = 1, b = 2, c = 3)
, and you could use that code in across
instead of the ~
form and it would give the same result. The only name I know for this is what it's called in ?across
, which is "purrr-style lambda function", because the purrr package was the first to use formulas to define functions in this way.
If you want to see the actual function created by the formula, you can look at rlang::as_function(~ recode(., a = 1, b = 2, c = 3))
, although it's a little more complex than the one above to support the use of ..1
, ..2
and ..3
which are not used here.
Now that R supports the easier way of defining functions below, this purrr-style function is maybe no longer useful, it's just an old habit to write it that way.
df <- data.frame(name = rep(letters[1:3], each = 3), foo=rep(1:9),var1 = letters[1:3], var2 = rep(3:5, each = 3))
library(dplyr, warn.conflicts = FALSE)
df %>%
mutate(across(c(name, var1), \(x) recode(x, a = 1, b = 2, c = 3)))
#> name foo var1 var2
#> 1 1 1 1 3
#> 2 1 2 2 3
#> 3 1 3 3 3
#> 4 2 4 1 4
#> 5 2 5 2 4
#> 6 2 6 3 4
#> 7 3 7 1 5
#> 8 3 8 2 5
#> 9 3 9 3 5
Created on 2021-10-19 by the reprex package (v2.0.1)
Dictionary style replacement for columns in dataframe in R
library(tidyverse)
df <- tibble(
miles_ran = c(3,4,NA,9,NA),
miles_cycled = c(9,NA,NA,2,12)
)
df %>%
replace_na(df, list(miles_ran = 0, miles_cycled = 10))
How to replace multiple values at once
A possible solution using match
:
old <- 1:8
new <- c(2,4,6,8,1,3,5,7)
x[x %in% old] <- new[match(x, old, nomatch = 0)]
which gives:
> x
[1] 8 4 0 5 1 5 7 9
What this does:
- Create two vectors:
old
with the values that need to be replaced andnew
with the corresponding replacements. - Use
match
to see where values fromx
occur inold
. Usenomatch = 0
to remove theNA
's. This results in an indexvector of the position inold
for thex
values - This index vector can then be used to index
new
. - Only assign the values from
new
to the positions ofx
that are present inold
:x[x %in% old]
replace items in dictionary using dplyr
Check stringr::str_replace_all
where you can pass a named vector for multiple replacement:
patterns = c("plaza", "street", "suite", "drive", "boulevard", "place", "south", "north",
"west", "east", "square", "avenue", "road", "floor", "parkway", "circle",
"highway")
replacement = c("plz", "st", "ste", "dr", "blvd", "pl", "s", "n", "w", "e", "sq", "ave",
"rd", "flr", "pkwy", "cir", "hwy")
stringr::str_replace_all(address, setNames(replacement, patterns))
#[1] "890 layton dr, wilmington de 19805"
#[2] "227 weehawken pl ste 145, comstock ny 78956"
#[3] "13 airport hwy, new castle de 19720"
#[4] "3640 New Hampshire Ave NW Apt 207, Washington DC 20011"
To further ignore case and match exact word only, you can use (?i)
modifier and word boundaries around each word:
stringr::str_replace_all(address, setNames(replacement, paste0('(?i)\\b', patterns, '\\b')))
#[1] "890 layton dr, wilmington de 19805"
#[2] "227 weehawken pl ste 145, comstock ny 78956"
#[3] "13 airport hwy, new castle de 19720"
#[4] "3640 New Hampshire Ave NW Apt 207, Washington DC 20011"
How to replace multiple substrings in a Pandas series using a dictionary?
You can use:
#Borrowed from an external website
def multipleReplace(text, wordDict):
for key in wordDict:
text = text.replace(key, wordDict[key])
return text
print(testdf.apply(lambda x: multipleReplace(x,to_sub)))
0 Alice went to hospital yesterday
1 John went to hospital yesterday
EDIT
Using the dictionary as below mentioned comments:
to_sub = {
'Mary': 'Alice',
'school': 'hospital',
'today': 'yesterday',
'tal': 'zzz'
}
testdf.apply(lambda x: ' '.join([to_sub.get(i, i) for i in x.split()]))
Outputs:
0 Alice went to hospital yesterday
1 John went to hospital yesterday
Python: replace columns using dictionary
Another example:
df_currency = pd.DataFrame(currency_dict) # currency_dict actually a list of dict!!
result = pd.merge(df, df_currency)[['date', 'price', 'symbol']]
References:
https://realpython.com/pandas-merge-join-and-concat/
https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#database-style-dataframe-or-named-series-joining-merging
In R, how will I replace set of values with another set
Here is a small example showing how to randomly replace
the first 3 values in v1
> v1 <- c("A", "B", "C", "D", "E")
> v2 <- c(1, 15, "L1100", "PQ7", 243)
> replace(v1, 1:3, sample(v2, 3))
[1] "L1100" "15" "1" "D" "E"
Update
v1 <- c("A", "B", "A", "B", "C", "D", "A", "B", "C", "D", "E", "E", "C", "A", "B", "C", "D", "E", "D", "E")
v2 <- c(1, 15, "L1100")
replace(
v1,
v1 %in% c("A", "B", "C"),
v2[na.omit(match(v1, c("A", "B", "C")))]
)
gives
[1] "1" "15" "1" "15" "L1100" "D" "1" "15" "L1100"
[10] "D" "E" "E" "L1100" "1" "15" "L1100" "D" "E"
[19] "D" "E"
replace multiple characters in string with value from dictionary python
You are prematurely ending your code with the call to return
within the for loop. You can fix it by storing your new string outside of the loop, only returning once the loop is done:
def cypher(string):
a = string # a new string to store the replaced string
for i in string:
if i in d:
a = a.replace(i, d[i])
return a
There is something wrong about the logic too, though. If you have a value in your dictionary that is also a key in the dictionary, the key may get replaced twice. For example, if you have d = {'I': 'i', 'i': 'a'}
, and the input is Ii
, your output would be aa
.
Here's a much more concise implementation using join
that does not have this problem.
def cypher(string):
return ''.join(d.get(l, l) for l in string)
Related Topics
Ggplot With 2 Y Axes on Each Side and Different Scales
Simultaneously Merge Multiple Data.Frames in a List
Repeat Each Row of Data.Frame the Number of Times Specified in a Column
How to View the Source Code For a Function
Converting Year and Month ("Yyyy-Mm" Format) to a Date
Error in If/While (Condition) {: Missing Value Where True/False Needed
How to Select the Rows With Maximum Values in Each Group With Dplyr
What Specifically Are the Dangers of Eval(Parse(...))
Drop Data Frame Columns by Name
Combine a List of Data Frames into One Data Frame by Row
Finding All Duplicate Rows, Including "Elements With Smaller Subscripts"
Subset Data Frame Based on Number of Rows Per Group
Showing Data Values on Stacked Bar Chart in Ggplot2