Matching Multiple Columns on Different Data Frames and Getting Other Column as Result

Matching multiple columns on different data frames and getting other column as result

Use merge

df1 <- read.table(text='  chr    init
1 12 25289552
2 3 180418785
3 3 180434779', header=TRUE)

df2 <- read.table(text=' V1 V2 V3
10 1 69094 medium
11 1 69094 medium
12 12 25289552 high
13 1 69095 medium
14 3 180418785 medium
15 3 180434779 low', header=TRUE)

merge(df1, df2, by.x='init', by.y='V2') # this works!
init chr V1 V3
1 25289552 12 12 high
2 180418785 3 3 medium
3 180434779 3 3 low

To get your desired output the way you show it

output <- merge(df1, df2, by.x='init', by.y='V2')[, c(2,1,4)]
colnames(output)[3] <- 'Mut'
output
chr init Mut
1 12 25289552 high
2 3 180418785 medium
3 3 180434779 low

How to match multiple columns from two dataframes that have different sizes?

Use DataFrame.merge with indicator parameter for new column with this information, if need change values e.g. use numpy.where:

df = df1.merge(df2, indicator='status', how='left')
df['status'] = np.where(df['status'].eq('both'), 'included', 'not included')
print (df)
a b status
0 1 4 included
1 2 5 included
2 3 6 included

Dataframes in Python - matching multiple columns of rows between two data frames

You can try to merge on columns "A", "B", and "C" with how="left". (df2_sum below is a subset of df1, so we choose left here.)

df2_sum = df2.groupby(["A", "B", "C"])["PRICE"].sum().reset_index()

df1.merge(df2_sum, on=["A","B","C"], how="left").fillna(0)
A B C SUM PRICE
0 1 1 1 0 0.0
1 1 1 2 0 315.0
2 1 2 2 0 0.0
3 2 2 2 0 30.0

You can then add PRICE to your SUM column.

Matching multiple columns in two dataframes in R using the merge or match function

Reconsider the merge approach:

# FIRST DATAFRAME (2014)
txt='Date Shop Item ProductKey Price
2014-09-01 Asda Apple 0f-7c-32-9c65 2.00
2014-09-01 Tesco Pear 7c-e9-a0-a11c 1.50'

df1 <- read.table(text=txt, header=TRUE)
df1$Date <- as.POSIXct(df1$Date) # CONVERT TO DATE
df1$Month <- format(df1$Date, "%m") # EXTRACT MONTH (CAN ADJUST FOR MM/DD)

# SECOND DATAFRAME (2015)
txt='Date Shop Item ProductKey Price
2015-09-01 Asda Apple 0f-7c-32-9c65 2.25
2015-09-01 Tesco Pear 7c-e9-a0-a11c 1.75'

df2 <- read.table(text=txt, header=TRUE)
df2$Date <- as.POSIXct(df2$Date) # CONVERT TO DATE
df2$Month <- format(df2$Date, "%m") # EXTRACT MONTH (CAN ADJUST FOR MM/DD)

# MERGE AND TRANSFORM FOR NEW COLUMN
finaldf <- transform(merge(df1, df2, by=c("Month", "Shop", "Item", "ProductKey"), suffixes=c("_14", "_15")),
PriceRelative = Price_15 / Price_14)
finaldf
# Month Shop Item ProductKey Date_14 Price_14 Date_15 Price_15 PriceRelative
# 1 09 Asda Apple 0f-7c-32-9c65 2014-09-01 2.0 2015-09-01 2.25 1.125000
# 2 09 Tesco Pear 7c-e9-a0-a11c 2014-09-01 1.5 2015-09-01 1.75 1.166667

Partical match string between columns for multiple dataframes

Here's one approach using a for loop. You were close! Note that I changed your reference dataframe name to dfs to avoid confusion with list().

Do you think you might encounter a situation where you might match multiple times in the same dataframe? If so, what I show below won't work without a couple more lines.

df1 <- data.frame(name = c("TEXT333","b","c"), column_A = 1:3, stringsAsFactors=FALSE)
df2 <- data.frame(name = c("b","TEXT345","d"), column_A = 4:6, stringsAsFactors=FALSE)
df3 <- data.frame(name = c("c","TEXT123","a"), column_A = 7:9, stringsAsFactors=FALSE)
dfs <- list(df1, df2, df3)
df <- data.frame(name = c("TEXT333","TEXT123","a", "TEXT345", "k", "l", "b","c", "f"), column_B = 11:19, stringsAsFactors=FALSE)

# loop over all dataframes in your list
for(i in 1:length(dfs)){

# get name that matches regex
val <- grep(pattern = "*TEXT*", x = dfs[[i]]$name, value = TRUE)

# use name to update value from reference df
dfs[[i]][dfs[[i]]$name == val,"column_A"] <- df[df$name == val,"column_B"]
}

Updated answer that can account for multiple matches in the same df

for(i in 1:length(dfs)){
vals <- grep(pattern = "*TEXT*", x = dfs[[i]]$name, value = TRUE)
for(val in vals){
dfs[[i]][dfs[[i]]$name == val, "column_A"] <- df[df$name == val,"column_B"]
}
}

Matching values from two column pairs in different data frames in R

The first you will get with an anti_join, though you will need to anti-join on both combinations of source and target since direction appears not to matter in your example. Note I've had to use toupper because the capitalization in your example was erratic and the example suggested case should be ignored.

library(dplyr)

anti_join(anti_join(B, A %>% mutate_all(toupper),
by = c("source", "target")),
A %>% mutate_all(toupper),
by = c(target = "source", source = "target")) %>%
select(-variable)
#> source target
#> 1 V2 V5
#> 2 v1 V3
#> 3 V4 V3
#> 4 V5 V4

The second result you can get from binding two inner_joins:

bind_rows(inner_join(B, A %>% mutate_all(toupper), 
by = c("source", "target")),
inner_join(B, A %>% mutate_all(toupper),
by = c(source = "target", target = "source")))
#> source target variable
#> 1 V1 V2 3
#> 2 V4 V2 1

best way to match one column in dataframe to multiple columns in another dataframe

try:

The idea is to merge df1's 'email' on each column of cols(present in df2 named like email)

cols=['email_id', 'alternate email', 'alternate email2']
out=(pd.concat([df1.merge(df2,left_on='email',right_on=x) for x in cols])
.drop_duplicates(subset=['name'],ignore_index=True).drop(cols,1))

output of out:

    name        email                   country
0 Sara sara@example.com US
1 John john@example.com BR
2 Christine Christine@example.com CA

String match with multiple columns to look for possible result in r

Here is a possible tidyverse solution. It gets an answer similar to your DF_Result but not exactly ("purple,white" matched "abcd" with "silver,white" and "black,white").

The data frames are easier to merge in long form (using pivot_longer). You can use separate_rows to put the comma separated values into separate rows.

library(tidyverse)

DF2_long <- DF2 %>%
pivot_longer(cols = -P) %>%
separate_rows(value)

DF1 %>%
mutate(value = A) %>%
separate_rows(value) %>%
left_join(DF2_long) %>%
select(-name, -value) %>%
group_by(A) %>%
distinct(A, P) %>%
mutate(Count = row_number()) %>%
pivot_wider(id_cols = A, names_from = Count, values_from = P, names_prefix = "R")

Output

  A                 R1    R2    R3   
<chr> <chr> <chr> <chr>
1 babypink wxyz NA NA
2 red,blue abcd qwert NA
3 purple,white wxyz abcd efgh
4 skyblue qwert NA NA
5 pink,violet,green abcd qwert efgh
6 silver,white,grey abcd wxyz efgh


Related Topics



Leave a reply



Submit