Matching Multiple Columns on Different Data Frames and Getting Other Column as Result

Matching multiple columns on different data frames and getting other column as result

Use merge

df1 <- read.table(text='  chr    init
1  12  25289552
2   3 180418785
3   3 180434779', header=TRUE)

df2 <- read.table(text='    V1    V2     V3
10  1     69094 medium
11  1     69094 medium
12  12 25289552 high
13  1     69095 medium
14  3 180418785 medium
15  3 180434779 low', header=TRUE)

merge(df1, df2, by.x='init', by.y='V2') # this works!
       init chr V1     V3
1  25289552  12 12   high
2 180418785   3  3 medium
3 180434779   3  3    low

To get your desired output the way you show it

output <- merge(df1, df2, by.x='init', by.y='V2')[, c(2,1,4)]
colnames(output)[3] <- 'Mut' 
output
  chr      init    Mut
1  12  25289552   high
2   3 180418785 medium
3   3 180434779    low

How to match multiple columns from two dataframes that have different sizes?

Use DataFrame.merge with indicator parameter for new column with this information, if need change values e.g. use numpy.where:

df = df1.merge(df2, indicator='status', how='left')
df['status'] = np.where(df['status'].eq('both'), 'included', 'not included')
print (df)
   a  b    status
0  1  4  included
1  2  5  included
2  3  6  included

Dataframes in Python - matching multiple columns of rows between two data frames

You can try to merge on columns "A", "B", and "C" with how="left". (df2_sum below is a subset of df1, so we choose left here.)

df2_sum = df2.groupby(["A", "B", "C"])["PRICE"].sum().reset_index()

df1.merge(df2_sum, on=["A","B","C"], how="left").fillna(0)
    A   B   C   SUM PRICE
0   1   1   1   0   0.0
1   1   1   2   0   315.0
2   1   2   2   0   0.0
3   2   2   2   0   30.0

You can then add PRICE to your SUM column.

Matching multiple columns in two dataframes in R using the merge or match function

Reconsider the merge approach:

# FIRST DATAFRAME (2014)
txt='Date        Shop    Item    ProductKey     Price
2014-09-01  Asda    Apple   0f-7c-32-9c65  2.00
2014-09-01  Tesco   Pear    7c-e9-a0-a11c  1.50'

df1 <- read.table(text=txt, header=TRUE)
df1$Date <- as.POSIXct(df1$Date)             # CONVERT TO DATE
df1$Month <- format(df1$Date, "%m")          # EXTRACT MONTH (CAN ADJUST FOR MM/DD)

# SECOND DATAFRAME (2015)
txt='Date        Shop    Item    ProductKey     Price
2015-09-01  Asda    Apple   0f-7c-32-9c65  2.25
2015-09-01  Tesco   Pear    7c-e9-a0-a11c  1.75'

df2 <- read.table(text=txt, header=TRUE)
df2$Date <- as.POSIXct(df2$Date)              # CONVERT TO DATE
df2$Month <- format(df2$Date, "%m")           # EXTRACT MONTH (CAN ADJUST FOR MM/DD)

# MERGE AND TRANSFORM FOR NEW COLUMN
finaldf <- transform(merge(df1, df2, by=c("Month", "Shop", "Item", "ProductKey"), suffixes=c("_14", "_15")), 
                     PriceRelative = Price_15 / Price_14)    
finaldf
#   Month  Shop  Item    ProductKey    Date_14 Price_14    Date_15 Price_15 PriceRelative
# 1    09  Asda Apple 0f-7c-32-9c65 2014-09-01      2.0 2015-09-01     2.25      1.125000
# 2    09 Tesco  Pear 7c-e9-a0-a11c 2014-09-01      1.5 2015-09-01     1.75      1.166667

Partical match string between columns for multiple dataframes

Here's one approach using a for loop. You were close! Note that I changed your reference dataframe name to dfs to avoid confusion with list().

Do you think you might encounter a situation where you might match multiple times in the same dataframe? If so, what I show below won't work without a couple more lines.

df1 <- data.frame(name = c("TEXT333","b","c"), column_A = 1:3, stringsAsFactors=FALSE)
df2 <- data.frame(name = c("b","TEXT345","d"), column_A = 4:6, stringsAsFactors=FALSE)
df3 <- data.frame(name = c("c","TEXT123","a"), column_A = 7:9, stringsAsFactors=FALSE)
dfs <- list(df1, df2, df3)
df <- data.frame(name = c("TEXT333","TEXT123","a", "TEXT345", "k", "l", "b","c", "f"), column_B = 11:19, stringsAsFactors=FALSE)

# loop over all dataframes in your list
for(i in 1:length(dfs)){
  
  # get name that matches regex
  val <- grep(pattern = "*TEXT*", x = dfs[[i]]$name, value = TRUE)
  
  # use name to update value from reference df
  dfs[[i]][dfs[[i]]$name == val,"column_A"] <- df[df$name == val,"column_B"]
}

Updated answer that can account for multiple matches in the same df

for(i in 1:length(dfs)){
  vals <- grep(pattern = "*TEXT*", x = dfs[[i]]$name, value = TRUE)
  for(val in vals){
    dfs[[i]][dfs[[i]]$name == val, "column_A"] <- df[df$name == val,"column_B"]
  }
}

Matching values from two column pairs in different data frames in R

The first you will get with an anti_join, though you will need to anti-join on both combinations of source and target since direction appears not to matter in your example. Note I've had to use toupper because the capitalization in your example was erratic and the example suggested case should be ignored.

library(dplyr)

anti_join(anti_join(B, A %>% mutate_all(toupper), 
                    by = c("source", "target")),
          A %>% mutate_all(toupper), 
          by = c(target = "source", source = "target")) %>%
  select(-variable)
#>   source target
#> 1     V2     V5
#> 2     v1     V3
#> 3     V4     V3
#> 4     V5     V4

The second result you can get from binding two inner_joins:

bind_rows(inner_join(B, A %>% mutate_all(toupper), 
                     by = c("source", "target")), 
          inner_join(B, A %>% mutate_all(toupper), 
                     by = c(source = "target", target = "source")))
#>   source target variable
#> 1     V1     V2        3
#> 2     V4     V2        1

best way to match one column in dataframe to multiple columns in another dataframe

try:

The idea is to merge df1's 'email' on each column of cols(present in df2 named like email)

cols=['email_id', 'alternate email', 'alternate email2']
out=(pd.concat([df1.merge(df2,left_on='email',right_on=x) for x in cols])
       .drop_duplicates(subset=['name'],ignore_index=True).drop(cols,1))

output of out:

    name        email                   country
0   Sara        sara@example.com        US
1   John        john@example.com        BR
2   Christine   Christine@example.com   CA

String match with multiple columns to look for possible result in r

Here is a possible tidyverse solution. It gets an answer similar to your DF_Result but not exactly ("purple,white" matched "abcd" with "silver,white" and "black,white").

The data frames are easier to merge in long form (using pivot_longer). You can use separate_rows to put the comma separated values into separate rows.

library(tidyverse)

DF2_long <- DF2 %>%
  pivot_longer(cols = -P) %>%
  separate_rows(value)
  
DF1 %>%
  mutate(value = A) %>%
  separate_rows(value) %>%
  left_join(DF2_long) %>%
  select(-name, -value) %>%
  group_by(A) %>%
  distinct(A, P) %>%
  mutate(Count = row_number()) %>%
  pivot_wider(id_cols = A, names_from = Count, values_from = P, names_prefix = "R")

Output

  A                 R1    R2    R3   
  <chr>             <chr> <chr> <chr>
1 babypink          wxyz  NA    NA   
2 red,blue          abcd  qwert NA   
3 purple,white      wxyz  abcd  efgh 
4 skyblue           qwert NA    NA   
5 pink,violet,green abcd  qwert efgh 
6 silver,white,grey abcd  wxyz  efgh

Matching Multiple Columns on Different Data Frames and Getting Other Column as Result