Pandas - Filter Dataframe by Another Dataframe by Row Elements

pandas - filter dataframe by another dataframe by row elements

You can do this efficiently using isin on a multiindex constructed from the desired columns:

df1 = pd.DataFrame({'c': ['A', 'A', 'B', 'C', 'C'],
                    'k': [1, 2, 2, 2, 2],
                    'l': ['a', 'b', 'a', 'a', 'd']})
df2 = pd.DataFrame({'c': ['A', 'C'],
                    'l': ['b', 'a']})
keys = list(df2.columns.values)
i1 = df1.set_index(keys).index
i2 = df2.set_index(keys).index
df1[~i1.isin(i2)]

I think this improves on @IanS's similar solution because it doesn't assume any column type (i.e. it will work with numbers as well as strings).

(Above answer is an edit. Following was my initial answer)

Interesting! This is something I haven't come across before... I would probably solve it by merging the two arrays, then dropping rows where df2 is defined. Here is an example, which makes use of a temporary array:

df1 = pd.DataFrame({'c': ['A', 'A', 'B', 'C', 'C'],
                    'k': [1, 2, 2, 2, 2],
                    'l': ['a', 'b', 'a', 'a', 'd']})
df2 = pd.DataFrame({'c': ['A', 'C'],
                    'l': ['b', 'a']})

# create a column marking df2 values
df2['marker'] = 1

# join the two, keeping all of df1's indices
joined = pd.merge(df1, df2, on=['c', 'l'], how='left')
joined

# extract desired columns where marker is NaN
joined[pd.isnull(joined['marker'])][df1.columns]

There may be a way to do this without using the temporary array, but I can't think of one. As long as your data isn't huge the above method should be a fast and sufficient answer.

Filtering a dataframe with another dataframe

Try the following:

edges = pd.DataFrame(edges.to_list(), columns=['node1','node2'])
nodes = nodes.applymap(lambda n: n[0])
edges[(edges.node1.isin(nodes)) & (edges.node2.isin(nodes)]

Filtering the dataframe based on the column value of another dataframe

Update

You could use a simple mask:

m = df2.SKU.isin(df1.SKU)
df2 = df2[m]

You are looking for an inner join. Try this:

df3 = df1.merge(df2, on=['SKU','Sales'], how='inner')

#  SKU  Sales
#0   A    100
#1   B    200
#2   C    300

Or this:

df3 = df1.merge(df2, on='SKU', how='inner')

#  SKU  Sales_x  Sales_y
#0   A      100      100
#1   B      200      200
#2   C      300      300

Filtering dataframe based on another dataframe

You can use .isin() to filter to the list of tickers available in df2.

df1_filtered = df1[df1['ticker'].isin(df2['ticker'].tolist())]

How should I filter one dataframe by entries from another one in pandas with isin?

If you want to select the entries in df1 with an index that is also present in df2, you should be able to do it with:

df1.loc[df2.index]

or if you really want to use isin:

df1[df1.index.isin(df2.index)]

Select rows of a dataframe based on another dataframe in Python

Simpliest is use merge with inner join.

Another solution with filtering:

arr = [np.array([df1[k] == v for k, v in x.items()]).all(axis=0) for x in df2.to_dict('r')]
df = df1[np.array(arr).any(axis=0)]
print(df)
     A    B  C   D
0  foo  one  0   0
5  bar  two  5  10
6  foo  one  6  12

Or create MultiIndex and filter with Index.isin:

df = df1[df1.set_index(['A','B']).index.isin(df2.set_index(['A','B']).index)]
print(df)
     A    B  C   D
0  foo  one  0   0
5  bar  two  5  10
6  foo  one  6  12

Filtering for and replacing values in one Pandas DataFrame based on common columns of another DataFrame

You could do an isin with the indices, and assign the value via boolean masking:


cols = ['Col1', 'Col2', 'Col3']

temp1 = df1.set_index(cols)

temp2 = df2.set_index(cols)

# get the booleans here 
booleans = temp1.index.isin(temp2.index)

# this assigns 100 to only rows in Col4
# that are True
df1.loc[booleans, 'Col4'] = 100

df1

  Col1 Col2 Col3  Col4
0    A    b    x     1
1    A    b    y   100
2    A    c    z     3
3    B    b    x   100

Alternatively, you could resolve it with pd.merge and the indicator parameter:

(df1.merge(df2, 
           on = cols, 
           how = 'left', 
           indicator=True, 
           suffixes = (None, '_y'))
    .assign(Col4 = lambda df: np.where(df._merge == 'both', 
                                       100, 
                                       df.Col4))
   .loc[:, df1.columns]
)

  Col1 Col2 Col3  Col4
0    A    b    x     1
1    A    b    y   100
2    A    c    z     3
3    B    b    x   100

Mapping column dataframe with another dataframe

You can mapping column reporting_date_id by another DataFrame by Series.map and then use it for replace missing values in Series.fillna:

s = df2.set_index('reporting_date_id')['filing_date_id']
df1['filing_date_id'] = df1['filing_date_id'].fillna(df1['reporting_date_id'].map(s))

Pandas - Filter Dataframe by Another Dataframe by Row Elements