Change values in multiple columns of a dataframe using a lookup table
Here's a solution that works on each column successively using lapply()
:
as.data.frame(lapply(example,function(col) lookup$letter[match(col,lookup$number)]));
## a b c
## 1 A E A
## 2 B D D
## 3 C C C
## 4 D B B
## 5 E A E
Alternatively, if you don't mind switching over to a matrix, you can achieve a "more vectorized" solution, as a matrix will allow you to call match()
and index lookup$letter
just once for the entire input:
matrix(lookup$letter[match(as.matrix(example),lookup$number)],nrow(example));
## [,1] [,2] [,3]
## [1,] "A" "E" "A"
## [2,] "B" "D" "D"
## [3,] "C" "C" "C"
## [4,] "D" "B" "B"
## [5,] "E" "A" "E"
(And of course you can coerce back to data.frame via as.data.frame()
afterward, although you'll have to restore the column names as well if you want them, which can be done with setNames(...,names(example))
. But if you really want to stick with a data.frame, my first solution is probably preferable.)
Efficiently replace string in multiple columns based on lookup table
With stri_replace_all_fixed
from stringi
, you can replace many patterns at once. The syntax is a bit confusing, but when you set vectorise_all = FALSE
it replaces all instances of all patterns with corresponding replacements.
First, let's create some example data as you did not provide any:
library(tidyverse)
set.seed(1)
exp <- data.frame(matrix(sample(LETTERS, 1000, replace = TRUE), ncol = 100))
lookup <- tribble(
~pattern, ~replacement,
"A", ":",
"F", " ",
"Y", "Test"
)
Use mutate
+ across
which is the new version of mutate_at
in this case (mutate_at
is slowly phased out):
exp %>%
mutate(across(c(X1, X3), ~ stringi::stri_replace_all_fixed(
str = .x,
pattern = lookup[["pattern"]],
replacement = lookup[["replacement"]],
vectorise_all = FALSE
))) %>%
as_tibble()
#> # A tibble: 10 × 100
#> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Test A U L T Y N H M V W B U
#> 2 D U E O T W B F H L S J L
#> 3 G U I A Z X M W Y P V A G
#> 4 : J Test T L F R L P A R K X
#> 5 B V N C Y Z V F Y M Z Z U
#> 6 W N E F W G N H W U P O V
#> 7 K J E J F S F G N F K Z H
#> 8 N G B J Y J A K T Q J X A
#> 9 R I J F H F S Q G I G J S
#> 10 S O Test O L X S D M G S P Z
#> # … with 87 more variables: X14 <chr>, X15 <chr>, X16 <chr>, X17 <chr>,
#> # X18 <chr>, X19 <chr>, X20 <chr>, X21 <chr>, X22 <chr>, X23 <chr>,
#> # X24 <chr>, X25 <chr>, X26 <chr>, X27 <chr>, X28 <chr>, X29 <chr>,
#> # X30 <chr>, X31 <chr>, X32 <chr>, X33 <chr>, X34 <chr>, X35 <chr>,
#> # X36 <chr>, X37 <chr>, X38 <chr>, X39 <chr>, X40 <chr>, X41 <chr>,
#> # X42 <chr>, X43 <chr>, X44 <chr>, X45 <chr>, X46 <chr>, X47 <chr>,
#> # X48 <chr>, X49 <chr>, X50 <chr>, X51 <chr>, X52 <chr>, X53 <chr>, …
Created on 2022-02-16 by the reprex package (v2.0.1)
This is as fast as it gets I believe.
Replace values in column of Pandas DataFrame using a Series lookup table
you can use map() function for that:
In [38]: df_normalised['name'] = df_normalised['code'].map(name)
In [39]: df_normalised
Out[39]:
code name
0 8 Human development
1 11 Environment and natural resources management
2 1 Economic management
3 6 Social protection and risk management
4 5 Trade and integration
5 2 Public sector governance
6 11 Environment and natural resources management
7 6 Social protection and risk management
8 7 Social dev/gender/inclusion
9 7 Social dev/gender/inclusion
R- How do I use a lookup table containing threshold values that vary for different variables (columns) to replace values below those thresholds?
Perhaps this helps
library(dplyr)
dat %>%
mutate(across(all_of(detect_level$Parameter),
~ pmax(., detect_level$LOD[match(cur_column(), detect_level$Parameter)])))
For the updated case
dat %>%
mutate(across(all_of(detect_level$Parameter),
~ replace(., . < detect_level$LOD[match(cur_column(),
detect_level$Parameter)],detect_level$halfLOD[match(cur_column(),
detect_level$Parameter)])))
Function to replace values in data.table using a lookup table
We don't need as.name
. Object on the lhs of =
is not evaluated correctly. Instead, we could use a named vector in on
with setNames
dt.replaceValueUsingLookup <- function(dt, col, dtLookup) {
dt[
dtLookup,
on = setNames("old", col),
(col) := new
]
}
-testing
dt %>%
dt.replaceValueUsingLookup("chapter", dtLookup)
dt
# chapter
#1: 101
#2: 102
#3: 13
#4: 105
#5: 104
How to match multiple columns based on lookup table
We could unlist
the dataframe and match
directly.
new_df <- results
names(new_df) <- paste0("id", seq_along(new_df))
new_df[] <- lookup$id[match(unlist(new_df), lookup$price)]
cbind(results, new_df)
# price_1 price_2 id1 id2
#1 2 3 B C
#2 2 1 B A
#3 1 1 A A
In dplyr
, we can do
library(dplyr)
bind_cols(results, results %>% mutate_all(~lookup$id[match(., lookup$price)]))
Replace values in a dataframe based on lookup table
You posted an approach in your question which was not bad. Here's a smiliar approach:
new <- df # create a copy of df
# using lapply, loop over columns and match values to the look up table. store in "new".
new[] <- lapply(df, function(x) look$class[match(x, look$pet)])
An alternative approach which will be faster is:
new <- df
new[] <- look$class[match(unlist(df), look$pet)]
Note that I use empty brackets ([]
) in both cases to keep the structure of new
as it was (a data.frame).
(I'm using df
instead of table
and look
instead of lookup
in my answer)
How do I replace the values in a dataframe based on a lookup table in another dataframe
Use pandas' replace method : it will search for the keys in the dataframe and replace found keys with the associated values. your dataframe has a few missing NaNs, so I edited it to match what you posted
#create a dictionary from the lookup
repl = lookup.set_index('value')['description'].to_dict()
#print(repl)
{653: '30 to 39',
654: '40 to 49',
1056: 'Belgium',
1158: 'Taiwan',
1203: 'Czech Republic',
545: 'White',
530: 'Other'}
#pass it using pandas' replace method
df.replace(repl)
age cty eth
0 30 to 39 Belgium NaN
1 30 to 39 Belgium White
2 40 to 49 NaN Other
3 30 to 39 Taiwan Other
4 30 to 39 Czech Republic White
Replace column values in table with values from lookup based on matches in R using data.table
We can do a join on
the 'code' and 'old' from table and lookup respectively
table[lookup, code := new, on = .(code = old)]
-output
table
code sn
1: CBa 1
2: CBe 2
3: CBa 3
4: CBe 4
5: OOO 5
6: PPP 6
7: CBa 7
Is there a way to calculate a new column of a dataframe on base of values in a(nother) lookup table
So for my understanding to apply the formula; for each column Ci we multiply it with values PV[i], W[i],RC_A[i] then sum over each result
result=0
for i in range(len(df_lookup)):
result=result+(df_data[df_lookup.loc[i,"C"]]*df_lookup.PV.iloc[i] *
df_lookup.W.iloc[i] * df_lookup.RC_A.iloc[i])
#result is a column
#then we multiply element wise
df_data['A_calc'] = ((df_data.T / (df_data.SF * df_data.SP))*multiply(result, axis="index")
Related Topics
Counting Number of Instances of a Condition Per Row R
R Ggplot2: Stat_Count() Must Not Be Used with a Y Aesthetic Error in Bar Graph
How to Label a Barplot Bar with Positive and Negative Bars with Ggplot2
Join R Data.Tables Where Key Values Are Not Exactly Equal--Combine Rows with Closest Times
Plotly: Updating Data with Dropdown Selection
How to Plot Multiple Stacked Histograms Together in R
How to Check If CSV File Has a Comma or a Semicolon as Separator
Boxplot Show the Value of Mean
Suggestions for Speeding Up Random Forests
How to Stack Error Bars in a Stacked Bar Plot Using Geom_Errorbar
R Plotting Confidence Bands with Ggplot
Insert a Logo in Upper Right Corner of R Markdown PDF Document
How to Return Number of Decimal Places in R
Emoticons in Twitter Sentiment Analysis in R
Ordering of Points in R Lines Plot
Calculate Group Mean While Excluding Current Observation Using Dplyr