Comparing Two Columns in a Data Frame Across Many Rows

Compare multiple columns within same row and highlight differences in pandas

The simplest (and naïve) approach is to use Series.eq to test each row against the first value. Setting an appropriate subset is very important here, as we only want to compare against other similar values.

def highlight_row(s: pd.Series) -> List[str]:
    bg_color = 'red'
    if s.eq(s[0]).all():
        bg_color = 'green'
    return [f'background-color:{bg_color}'] * len(s)

df.style.apply(
    func=highlight_row,
    subset=['DB1', 'DB2', 'DB3', 'DB4'],
    axis=1
)

Styled table with naïve styling (considers empty string and nan when doing comparison)

We can be a bit less naïve by excluding empty string and null values (and any other invalid values) from each row with a boolean indexing before doing the equality comparison with just the filtered array:

def highlight_row(s: pd.Series) -> List[str]:
    filtered_s = s[s.notnull() & ~s.eq('')]
    # Check for completely empty row (prevents index error from filtered_s[0])
    if filtered_s.empty:
        # No valid values in row
        css_str = ''
    elif filtered_s.eq(filtered_s[0]).all():
        # All values are the same
        css_str = 'background-color: green'
    else:
        # Row Values Differ
        css_str = 'background-color: red'
    return [css_str] * len(s)

We can also leverage an IndexSlice to more dynamically select the columns for the subset instead of manually passing a list of column names:

df.style.apply(
    func=highlight_row,
    subset=pd.IndexSlice[:, 'DB1':],
    axis=1
)

Styled table that considers only "valid" values for equality comparison

Lastly, it is possible to instead pass the idx/cols to the styling function instead of subsetting if wanting the entire row to be highlighted:

def highlight_row(s: pd.Series, idx: pd.IndexSlice) -> List[str]:
    css_str = 'background-color: red'
    # Filter Columns
    filtered_s = s[idx]
    # Filter Values
    filtered_s = filtered_s[filtered_s.notnull() & ~filtered_s.eq('')]
    # Check for completely empty row
    if filtered_s.empty:
        css_str = ''  # Empty row Styles
    elif filtered_s.eq(filtered_s[0]).all():
        css_str = 'background-color: green'
    return [css_str] * len(s)

df.style.apply(
    func=highlight_row,
    idx=pd.IndexSlice['DB1':],  # 1D IndexSlice!
    axis=1
)

Styled table with entire row highlighting

Setup and Imports:

from typing import List

import pandas as pd  # version 1.4.2

df = pd.DataFrame({
    'NAME': ['WORKFLOW_1', 'WORKFLOW_2', 'WORKFLOW_3', 'WORKFLOW_4'],
    'DB1': ['workflow1-1.jar', 'workflow2-1.jar', 'workflow3-2.jar', ''],
    'DB2': ['workflow1-2.jar', 'workflow2-1.jar', 'workflow3-1.jar',
            'workflow4-1.jar'],
    'DB3': ['workflow1-1.jar', 'workflow2-1.jar', 'workflow3-1.jar', ''],
    'DB4': ['workflow1-3.jar', 'workflow2-1.jar', 'workflow3-1.jar', '']
})

how to compare two columns in dataframe and update a column based on matching fields

import pandas as pd

d1={
    "a":(1,4,7),
    "b":(2,5,8),
    "c":(0,0,0)
}

d2={
    "a_1": (1, 4, 7),
    "b_1": (5, 2, 8)
}

df1=pd.DataFrame(d1)
df2=pd.DataFrame(d2)

# Iterate through each entry in a and compare it to a_1
for i in range(len(df1["a"])):
    for j in range(len(df2["a_1"])):
        if df1["a"][i] == df2["a_1"][j]:
            df1["c"][i] = df2["b_1"][j]

Trying to compare two dataframes with many columns in R row by row and label the incorrect rows

Without seeing any data it is kind of hard to answer you.

Using the which function can tell you which rows match some criteria.
Below is an example how to use which.
You can change it to say which(df2$answers %in% df1$answer_key) or something similar

# Load the data
data(iris)

# Take a look
head(iris)
which_example <- c(5.4, 4.6)

# The way I think of which is to ask R "which rows in iris$Sepal.Length are 5.4?"
which(iris$Sepal.Length %in% 5.4)
which(iris$Sepal.Length %in% which_example)

# Once you have the rows, you can display only those specific rows and all or some columns
# The format is df[row,column] 
# Which gives the rows. You can leave column blank to get all or enter specific ones
iris[which(iris$Sepal.Length %in% 5.4),]
iris[which(iris$Sepal.Length %in% 5.4),c(2,4)]

Pandas better method to compare two dataframes and find entries that only exist in one

Looks like using 'outer' as the how was the solution

z = pd.merge(ORIGINAL, NEW, on=cols, how = 'outer', indicator=True)
z = z[z._merge != 'both'] # Filter out records from both

Output looks like this (after only showing the columns I care about)

  Name       Site   _merge
  Charlie    A     left_only
  Doug       B     right_only

Compare two dataframe column values and join with condition in python?

While this isn't a highly efficient solution, you can use some sets to solve this problem.

matches = df1["Id"].apply(set) <= df2["Id"].apply(set)

out = df1.copy()
out.loc[matches, df2.columns.difference(["Id"])] = df2

print(out)
                Id  Value Product_Name
0  [101, 102, 103]  10001         Shoe
1  [101, 102, 104]  10000        jeans
2  [101, 102, 105]  10002      make-up
3  [101, 107, 105]  10003          NaN

In the above snippet:

matches = df1["Id"].apply(set) <= df2["Id"].apply(set) returns a boolean Series that is True where the contents of each row in df1['Id'] is in the corresponding row in df2['Id'], and False otherwise
Instead of performing an actual merge we can simply align the 2 DataFrames on the aforementioned boolean Series

If you want to test Ids against eachother in both dataframes, you can take the cartesian product of both DataFrames, filter it down to the inner join via the set criteria, and then append back any missing left join keys.

out = (
    pd.merge(df1, df2, how="cross")
    .loc[lambda df: df["Id_x"].map(set) <= df["Id_y"].map(set)]
    .pipe(
        lambda df: df.append(
             df1.loc[~df1["Id"].isin(df["Id_x"])].rename(columns={"Id": "Id_x"})
         )
    )
    .reset_index(drop=True)
)

print(out)
              Id_x  Value                  Id_y Product_Name
0  [101, 102, 103]  10001  [101, 102, 103, 104]         Shoe
1  [101, 102, 104]  10000  [101, 102, 103, 104]         Shoe
2  [101, 102, 104]  10000  [101, 102, 109, 104]        jeans
3  [101, 102, 105]  10002  [101, 105, 102, 108]      make-up
4  [101, 107, 105]  10003                   NaN          NaN