Matching multiple columns on different data frames and getting other column as result
Use merge
df1 <- read.table(text=' chr init
1 12 25289552
2 3 180418785
3 3 180434779', header=TRUE)
df2 <- read.table(text=' V1 V2 V3
10 1 69094 medium
11 1 69094 medium
12 12 25289552 high
13 1 69095 medium
14 3 180418785 medium
15 3 180434779 low', header=TRUE)
merge(df1, df2, by.x='init', by.y='V2') # this works!
init chr V1 V3
1 25289552 12 12 high
2 180418785 3 3 medium
3 180434779 3 3 low
To get your desired output the way you show it
output <- merge(df1, df2, by.x='init', by.y='V2')[, c(2,1,4)]
colnames(output)[3] <- 'Mut'
output
chr init Mut
1 12 25289552 high
2 3 180418785 medium
3 3 180434779 low
How to match multiple columns from two dataframes that have different sizes?
Use DataFrame.merge
with indicator
parameter for new column with this information, if need change values e.g. use numpy.where
:
df = df1.merge(df2, indicator='status', how='left')
df['status'] = np.where(df['status'].eq('both'), 'included', 'not included')
print (df)
a b status
0 1 4 included
1 2 5 included
2 3 6 included
Dataframes in Python - matching multiple columns of rows between two data frames
You can try to merge
on columns "A", "B", and "C" with how="left"
. (df2_sum
below is a subset of df1
, so we choose left
here.)
df2_sum = df2.groupby(["A", "B", "C"])["PRICE"].sum().reset_index()
df1.merge(df2_sum, on=["A","B","C"], how="left").fillna(0)
A B C SUM PRICE
0 1 1 1 0 0.0
1 1 1 2 0 315.0
2 1 2 2 0 0.0
3 2 2 2 0 30.0
You can then add PRICE
to your SUM
column.
Matching multiple columns in two dataframes in R using the merge or match function
Reconsider the merge
approach:
# FIRST DATAFRAME (2014)
txt='Date Shop Item ProductKey Price
2014-09-01 Asda Apple 0f-7c-32-9c65 2.00
2014-09-01 Tesco Pear 7c-e9-a0-a11c 1.50'
df1 <- read.table(text=txt, header=TRUE)
df1$Date <- as.POSIXct(df1$Date) # CONVERT TO DATE
df1$Month <- format(df1$Date, "%m") # EXTRACT MONTH (CAN ADJUST FOR MM/DD)
# SECOND DATAFRAME (2015)
txt='Date Shop Item ProductKey Price
2015-09-01 Asda Apple 0f-7c-32-9c65 2.25
2015-09-01 Tesco Pear 7c-e9-a0-a11c 1.75'
df2 <- read.table(text=txt, header=TRUE)
df2$Date <- as.POSIXct(df2$Date) # CONVERT TO DATE
df2$Month <- format(df2$Date, "%m") # EXTRACT MONTH (CAN ADJUST FOR MM/DD)
# MERGE AND TRANSFORM FOR NEW COLUMN
finaldf <- transform(merge(df1, df2, by=c("Month", "Shop", "Item", "ProductKey"), suffixes=c("_14", "_15")),
PriceRelative = Price_15 / Price_14)
finaldf
# Month Shop Item ProductKey Date_14 Price_14 Date_15 Price_15 PriceRelative
# 1 09 Asda Apple 0f-7c-32-9c65 2014-09-01 2.0 2015-09-01 2.25 1.125000
# 2 09 Tesco Pear 7c-e9-a0-a11c 2014-09-01 1.5 2015-09-01 1.75 1.166667
Partical match string between columns for multiple dataframes
Here's one approach using a for loop. You were close! Note that I changed your reference dataframe name to dfs
to avoid confusion with list()
.
Do you think you might encounter a situation where you might match multiple times in the same dataframe? If so, what I show below won't work without a couple more lines.
df1 <- data.frame(name = c("TEXT333","b","c"), column_A = 1:3, stringsAsFactors=FALSE)
df2 <- data.frame(name = c("b","TEXT345","d"), column_A = 4:6, stringsAsFactors=FALSE)
df3 <- data.frame(name = c("c","TEXT123","a"), column_A = 7:9, stringsAsFactors=FALSE)
dfs <- list(df1, df2, df3)
df <- data.frame(name = c("TEXT333","TEXT123","a", "TEXT345", "k", "l", "b","c", "f"), column_B = 11:19, stringsAsFactors=FALSE)
# loop over all dataframes in your list
for(i in 1:length(dfs)){
# get name that matches regex
val <- grep(pattern = "*TEXT*", x = dfs[[i]]$name, value = TRUE)
# use name to update value from reference df
dfs[[i]][dfs[[i]]$name == val,"column_A"] <- df[df$name == val,"column_B"]
}
Updated answer that can account for multiple matches in the same df
for(i in 1:length(dfs)){
vals <- grep(pattern = "*TEXT*", x = dfs[[i]]$name, value = TRUE)
for(val in vals){
dfs[[i]][dfs[[i]]$name == val, "column_A"] <- df[df$name == val,"column_B"]
}
}
Matching values from two column pairs in different data frames in R
The first you will get with an anti_join
, though you will need to anti-join on both combinations of source and target since direction appears not to matter in your example. Note I've had to use toupper
because the capitalization in your example was erratic and the example suggested case should be ignored.
library(dplyr)
anti_join(anti_join(B, A %>% mutate_all(toupper),
by = c("source", "target")),
A %>% mutate_all(toupper),
by = c(target = "source", source = "target")) %>%
select(-variable)
#> source target
#> 1 V2 V5
#> 2 v1 V3
#> 3 V4 V3
#> 4 V5 V4
The second result you can get from binding two inner_join
s:
bind_rows(inner_join(B, A %>% mutate_all(toupper),
by = c("source", "target")),
inner_join(B, A %>% mutate_all(toupper),
by = c(source = "target", target = "source")))
#> source target variable
#> 1 V1 V2 3
#> 2 V4 V2 1
best way to match one column in dataframe to multiple columns in another dataframe
try:
The idea is to merge df1's 'email' on each column of cols(present in df2 named like email)
cols=['email_id', 'alternate email', 'alternate email2']
out=(pd.concat([df1.merge(df2,left_on='email',right_on=x) for x in cols])
.drop_duplicates(subset=['name'],ignore_index=True).drop(cols,1))
output of out
:
name email country
0 Sara sara@example.com US
1 John john@example.com BR
2 Christine Christine@example.com CA
String match with multiple columns to look for possible result in r
Here is a possible tidyverse
solution. It gets an answer similar to your DF_Result
but not exactly ("purple,white" matched "abcd" with "silver,white" and "black,white").
The data frames are easier to merge in long form (using pivot_longer
). You can use separate_rows
to put the comma separated values into separate rows.
library(tidyverse)
DF2_long <- DF2 %>%
pivot_longer(cols = -P) %>%
separate_rows(value)
DF1 %>%
mutate(value = A) %>%
separate_rows(value) %>%
left_join(DF2_long) %>%
select(-name, -value) %>%
group_by(A) %>%
distinct(A, P) %>%
mutate(Count = row_number()) %>%
pivot_wider(id_cols = A, names_from = Count, values_from = P, names_prefix = "R")
Output
A R1 R2 R3
<chr> <chr> <chr> <chr>
1 babypink wxyz NA NA
2 red,blue abcd qwert NA
3 purple,white wxyz abcd efgh
4 skyblue qwert NA NA
5 pink,violet,green abcd qwert efgh
6 silver,white,grey abcd wxyz efgh
Related Topics
Include Data Examples in Developing R Packages
Could Not Find Function Inside Foreach Loop
How to Get Factor Matrices in R
Ggplot: Multiple Years on Same Plot by Month
Check If Each Row of a Data Frame Is Contained in Another Data Frame
Force Ggplot2 Scatter Plot to Be Square Shaped
Differencebetween Names and Colnames
Figure Captions, References Using Knitr and Markdown to HTML
How to Get Geom_Vline to Honor Facet_Wrap
Colorize Parts of the Title in a Plot
Setting the Color for an Individual Data Point
Create a Formula in a Data.Table Environment in R
Using Lapply to Change Column Names of a List of Data Frames
How to Use the Row.Names Attribute to Order the Rows of My Dataframe in R
How to Get a Warning on "Shiny App Will Not Work If the Same Output Is Used Twice"