Dictionary Style Replace Multiple Items

Dictionary style replace multiple items

map = setNames(c("0101", "0102", "0103"), c("AA", "AC", "AG"))
foo[] <- map[unlist(foo)]

assuming that map covers all the cases in foo. This would feel less like a 'hack' and be more efficient in both space and time if foo were a matrix (of character()), then

matrix(map[foo], nrow=nrow(foo), dimnames=dimnames(foo))

Both matrix and data frame variants run afoul of R's 2^31-1 limit on vector size when there are millions of SNPs and thousands of samples.

Replace multiple characters with multiple values in multiple columns? R

You can use dplyr::recode

df <- data.frame(name = rep(letters[1:3], each = 3), foo=rep(1:9),var1 = letters[1:3], var2 = rep(3:5, each = 3))


library(dplyr, warn.conflicts = FALSE)

df %>%
mutate(across(c(name, var1), ~ recode(., a = 1, b = 2, c = 3)))
#> name foo var1 var2
#> 1 1 1 1 3
#> 2 1 2 2 3
#> 3 1 3 3 3
#> 4 2 4 1 4
#> 5 2 5 2 4
#> 6 2 6 3 4
#> 7 3 7 1 5
#> 8 3 8 2 5
#> 9 3 9 3 5

Created on 2021-10-19 by the reprex package (v2.0.1)

Across will apply the function defined by ~ recode(., a = 1, b = 2, c = 3) to both name and var1.

Using ~ and . is another way to define a function in across. This function is equivalent to the one defined by function(x) recode(x, a = 1, b = 2, c = 3), and you could use that code in across instead of the ~ form and it would give the same result. The only name I know for this is what it's called in ?across, which is "purrr-style lambda function", because the purrr package was the first to use formulas to define functions in this way.

If you want to see the actual function created by the formula, you can look at rlang::as_function(~ recode(., a = 1, b = 2, c = 3)), although it's a little more complex than the one above to support the use of ..1, ..2 and ..3 which are not used here.

Now that R supports the easier way of defining functions below, this purrr-style function is maybe no longer useful, it's just an old habit to write it that way.

df <- data.frame(name = rep(letters[1:3], each = 3), foo=rep(1:9),var1 = letters[1:3], var2 = rep(3:5, each = 3))

library(dplyr, warn.conflicts = FALSE)

df %>%
mutate(across(c(name, var1), \(x) recode(x, a = 1, b = 2, c = 3)))
#> name foo var1 var2
#> 1 1 1 1 3
#> 2 1 2 2 3
#> 3 1 3 3 3
#> 4 2 4 1 4
#> 5 2 5 2 4
#> 6 2 6 3 4
#> 7 3 7 1 5
#> 8 3 8 2 5
#> 9 3 9 3 5

Created on 2021-10-19 by the reprex package (v2.0.1)

Dictionary style replacement for columns in dataframe in R

library(tidyverse)

df <- tibble(
miles_ran = c(3,4,NA,9,NA),
miles_cycled = c(9,NA,NA,2,12)
)


df %>%
replace_na(df, list(miles_ran = 0, miles_cycled = 10))

How to replace multiple values at once

A possible solution using match:

old <- 1:8
new <- c(2,4,6,8,1,3,5,7)

x[x %in% old] <- new[match(x, old, nomatch = 0)]

which gives:

> x
[1] 8 4 0 5 1 5 7 9

What this does:

  • Create two vectors: old with the values that need to be replaced and new with the corresponding replacements.
  • Use match to see where values from x occur in old. Use nomatch = 0 to remove the NA's. This results in an indexvector of the position in old for the x values
  • This index vector can then be used to index new.
  • Only assign the values from new to the positions of x that are present in old: x[x %in% old]

replace items in dictionary using dplyr

Check stringr::str_replace_all where you can pass a named vector for multiple replacement:

patterns = c("plaza", "street", "suite", "drive", "boulevard", "place", "south", "north", 
"west", "east", "square", "avenue", "road", "floor", "parkway", "circle",
"highway")
replacement = c("plz", "st", "ste", "dr", "blvd", "pl", "s", "n", "w", "e", "sq", "ave",
"rd", "flr", "pkwy", "cir", "hwy")

stringr::str_replace_all(address, setNames(replacement, patterns))
#[1] "890 layton dr, wilmington de 19805"
#[2] "227 weehawken pl ste 145, comstock ny 78956"
#[3] "13 airport hwy, new castle de 19720"
#[4] "3640 New Hampshire Ave NW Apt 207, Washington DC 20011"

To further ignore case and match exact word only, you can use (?i) modifier and word boundaries around each word:

stringr::str_replace_all(address, setNames(replacement, paste0('(?i)\\b', patterns, '\\b')))
#[1] "890 layton dr, wilmington de 19805"
#[2] "227 weehawken pl ste 145, comstock ny 78956"
#[3] "13 airport hwy, new castle de 19720"
#[4] "3640 New Hampshire Ave NW Apt 207, Washington DC 20011"

How to replace multiple substrings in a Pandas series using a dictionary?

You can use:

#Borrowed from an external website
def multipleReplace(text, wordDict):
for key in wordDict:
text = text.replace(key, wordDict[key])
return text

print(testdf.apply(lambda x: multipleReplace(x,to_sub)))

0 Alice went to hospital yesterday
1 John went to hospital yesterday

EDIT

Using the dictionary as below mentioned comments:

to_sub = {
'Mary': 'Alice',
'school': 'hospital',
'today': 'yesterday',
'tal': 'zzz'
}

testdf.apply(lambda x: ' '.join([to_sub.get(i, i) for i in x.split()]))

Outputs:

0    Alice went to hospital yesterday
1 John went to hospital yesterday

Python: replace columns using dictionary

Another example:

df_currency = pd.DataFrame(currency_dict) # currency_dict actually a list of dict!!
result = pd.merge(df, df_currency)[['date', 'price', 'symbol']]

References:

https://realpython.com/pandas-merge-join-and-concat/

https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#database-style-dataframe-or-named-series-joining-merging

In R, how will I replace set of values with another set

Here is a small example showing how to randomly replace the first 3 values in v1

> v1 <- c("A", "B", "C", "D", "E")

> v2 <- c(1, 15, "L1100", "PQ7", 243)

> replace(v1, 1:3, sample(v2, 3))
[1] "L1100" "15" "1" "D" "E"

Update

v1 <- c("A", "B", "A", "B", "C", "D", "A", "B", "C", "D", "E", "E", "C", "A", "B", "C", "D", "E", "D", "E")

v2 <- c(1, 15, "L1100")

replace(
v1,
v1 %in% c("A", "B", "C"),
v2[na.omit(match(v1, c("A", "B", "C")))]
)

gives

 [1] "1"     "15"    "1"     "15"    "L1100" "D"     "1"     "15"    "L1100"
[10] "D" "E" "E" "L1100" "1" "15" "L1100" "D" "E"
[19] "D" "E"

replace multiple characters in string with value from dictionary python

You are prematurely ending your code with the call to return within the for loop. You can fix it by storing your new string outside of the loop, only returning once the loop is done:

def cypher(string):
a = string # a new string to store the replaced string
for i in string:
if i in d:
a = a.replace(i, d[i])
return a

There is something wrong about the logic too, though. If you have a value in your dictionary that is also a key in the dictionary, the key may get replaced twice. For example, if you have d = {'I': 'i', 'i': 'a'}, and the input is Ii, your output would be aa.

Here's a much more concise implementation using join that does not have this problem.

def cypher(string):
return ''.join(d.get(l, l) for l in string)


Related Topics



Leave a reply



Submit