Pandas - Filter Dataframe by Another Dataframe by Row Elements

pandas - filter dataframe by another dataframe by row elements

You can do this efficiently using isin on a multiindex constructed from the desired columns:

df1 = pd.DataFrame({'c': ['A', 'A', 'B', 'C', 'C'],
'k': [1, 2, 2, 2, 2],
'l': ['a', 'b', 'a', 'a', 'd']})
df2 = pd.DataFrame({'c': ['A', 'C'],
'l': ['b', 'a']})
keys = list(df2.columns.values)
i1 = df1.set_index(keys).index
i2 = df2.set_index(keys).index
df1[~i1.isin(i2)]

Sample Image

I think this improves on @IanS's similar solution because it doesn't assume any column type (i.e. it will work with numbers as well as strings).


(Above answer is an edit. Following was my initial answer)

Interesting! This is something I haven't come across before... I would probably solve it by merging the two arrays, then dropping rows where df2 is defined. Here is an example, which makes use of a temporary array:

df1 = pd.DataFrame({'c': ['A', 'A', 'B', 'C', 'C'],
'k': [1, 2, 2, 2, 2],
'l': ['a', 'b', 'a', 'a', 'd']})
df2 = pd.DataFrame({'c': ['A', 'C'],
'l': ['b', 'a']})

# create a column marking df2 values
df2['marker'] = 1

# join the two, keeping all of df1's indices
joined = pd.merge(df1, df2, on=['c', 'l'], how='left')
joined

Sample Image

# extract desired columns where marker is NaN
joined[pd.isnull(joined['marker'])][df1.columns]

Sample Image

There may be a way to do this without using the temporary array, but I can't think of one. As long as your data isn't huge the above method should be a fast and sufficient answer.

Filtering a dataframe with another dataframe

Try the following:

edges = pd.DataFrame(edges.to_list(), columns=['node1','node2'])
nodes = nodes.applymap(lambda n: n[0])
edges[(edges.node1.isin(nodes)) & (edges.node2.isin(nodes)]

Filtering the dataframe based on the column value of another dataframe

Update

You could use a simple mask:

m = df2.SKU.isin(df1.SKU)
df2 = df2[m]

You are looking for an inner join. Try this:

df3 = df1.merge(df2, on=['SKU','Sales'], how='inner')

# SKU Sales
#0 A 100
#1 B 200
#2 C 300

Or this:

df3 = df1.merge(df2, on='SKU', how='inner')

# SKU Sales_x Sales_y
#0 A 100 100
#1 B 200 200
#2 C 300 300

Filtering dataframe based on another dataframe

You can use .isin() to filter to the list of tickers available in df2.

df1_filtered = df1[df1['ticker'].isin(df2['ticker'].tolist())]

How should I filter one dataframe by entries from another one in pandas with isin?

If you want to select the entries in df1 with an index that is also present in df2, you should be able to do it with:

df1.loc[df2.index]

or if you really want to use isin:

df1[df1.index.isin(df2.index)]

Select rows of a dataframe based on another dataframe in Python

Simpliest is use merge with inner join.

Another solution with filtering:

arr = [np.array([df1[k] == v for k, v in x.items()]).all(axis=0) for x in df2.to_dict('r')]
df = df1[np.array(arr).any(axis=0)]
print(df)
A B C D
0 foo one 0 0
5 bar two 5 10
6 foo one 6 12

Or create MultiIndex and filter with Index.isin:

df = df1[df1.set_index(['A','B']).index.isin(df2.set_index(['A','B']).index)]
print(df)
A B C D
0 foo one 0 0
5 bar two 5 10
6 foo one 6 12

Filtering for and replacing values in one Pandas DataFrame based on common columns of another DataFrame

You could do an isin with the indices, and assign the value via boolean masking:


cols = ['Col1', 'Col2', 'Col3']

temp1 = df1.set_index(cols)

temp2 = df2.set_index(cols)

# get the booleans here
booleans = temp1.index.isin(temp2.index)

# this assigns 100 to only rows in Col4
# that are True
df1.loc[booleans, 'Col4'] = 100

df1

Col1 Col2 Col3 Col4
0 A b x 1
1 A b y 100
2 A c z 3
3 B b x 100

Alternatively, you could resolve it with pd.merge and the indicator parameter:

(df1.merge(df2, 
on = cols,
how = 'left',
indicator=True,
suffixes = (None, '_y'))
.assign(Col4 = lambda df: np.where(df._merge == 'both',
100,
df.Col4))
.loc[:, df1.columns]
)

Col1 Col2 Col3 Col4
0 A b x 1
1 A b y 100
2 A c z 3
3 B b x 100

Mapping column dataframe with another dataframe

You can mapping column reporting_date_id by another DataFrame by Series.map and then use it for replace missing values in Series.fillna:

s = df2.set_index('reporting_date_id')['filing_date_id']
df1['filing_date_id'] = df1['filing_date_id'].fillna(df1['reporting_date_id'].map(s))


Related Topics



Leave a reply



Submit