pandas - filter dataframe by another dataframe by row elements
You can do this efficiently using isin
on a multiindex constructed from the desired columns:
df1 = pd.DataFrame({'c': ['A', 'A', 'B', 'C', 'C'],
'k': [1, 2, 2, 2, 2],
'l': ['a', 'b', 'a', 'a', 'd']})
df2 = pd.DataFrame({'c': ['A', 'C'],
'l': ['b', 'a']})
keys = list(df2.columns.values)
i1 = df1.set_index(keys).index
i2 = df2.set_index(keys).index
df1[~i1.isin(i2)]
I think this improves on @IanS's similar solution because it doesn't assume any column type (i.e. it will work with numbers as well as strings).
(Above answer is an edit. Following was my initial answer)
Interesting! This is something I haven't come across before... I would probably solve it by merging the two arrays, then dropping rows where df2
is defined. Here is an example, which makes use of a temporary array:
df1 = pd.DataFrame({'c': ['A', 'A', 'B', 'C', 'C'],
'k': [1, 2, 2, 2, 2],
'l': ['a', 'b', 'a', 'a', 'd']})
df2 = pd.DataFrame({'c': ['A', 'C'],
'l': ['b', 'a']})
# create a column marking df2 values
df2['marker'] = 1
# join the two, keeping all of df1's indices
joined = pd.merge(df1, df2, on=['c', 'l'], how='left')
joined
# extract desired columns where marker is NaN
joined[pd.isnull(joined['marker'])][df1.columns]
There may be a way to do this without using the temporary array, but I can't think of one. As long as your data isn't huge the above method should be a fast and sufficient answer.
Filtering a dataframe with another dataframe
Try the following:
edges = pd.DataFrame(edges.to_list(), columns=['node1','node2'])
nodes = nodes.applymap(lambda n: n[0])
edges[(edges.node1.isin(nodes)) & (edges.node2.isin(nodes)]
Filtering the dataframe based on the column value of another dataframe
Update
You could use a simple mask:
m = df2.SKU.isin(df1.SKU)
df2 = df2[m]
You are looking for an inner join. Try this:
df3 = df1.merge(df2, on=['SKU','Sales'], how='inner')
# SKU Sales
#0 A 100
#1 B 200
#2 C 300
Or this:df3 = df1.merge(df2, on='SKU', how='inner')
# SKU Sales_x Sales_y
#0 A 100 100
#1 B 200 200
#2 C 300 300
Filtering dataframe based on another dataframe
You can use .isin() to filter to the list of tickers available in df2.
df1_filtered = df1[df1['ticker'].isin(df2['ticker'].tolist())]
How should I filter one dataframe by entries from another one in pandas with isin?
If you want to select the entries in df1 with an index that is also present in df2, you should be able to do it with:
df1.loc[df2.index]
or if you really want to use isin:df1[df1.index.isin(df2.index)]
Select rows of a dataframe based on another dataframe in Python
Simpliest is use merge
with inner join.
Another solution with filtering:
arr = [np.array([df1[k] == v for k, v in x.items()]).all(axis=0) for x in df2.to_dict('r')]
df = df1[np.array(arr).any(axis=0)]
print(df)
A B C D
0 foo one 0 0
5 bar two 5 10
6 foo one 6 12
Or create MultiIndex
and filter with Index.isin
:df = df1[df1.set_index(['A','B']).index.isin(df2.set_index(['A','B']).index)]
print(df)
A B C D
0 foo one 0 0
5 bar two 5 10
6 foo one 6 12
Filtering for and replacing values in one Pandas DataFrame based on common columns of another DataFrame
You could do an isin
with the indices, and assign the value via boolean masking:
cols = ['Col1', 'Col2', 'Col3']
temp1 = df1.set_index(cols)
temp2 = df2.set_index(cols)
# get the booleans here
booleans = temp1.index.isin(temp2.index)
# this assigns 100 to only rows in Col4
# that are True
df1.loc[booleans, 'Col4'] = 100
df1
Col1 Col2 Col3 Col4
0 A b x 1
1 A b y 100
2 A c z 3
3 B b x 100
Alternatively, you could resolve it with pd.merge
and the indicator
parameter:(df1.merge(df2,
on = cols,
how = 'left',
indicator=True,
suffixes = (None, '_y'))
.assign(Col4 = lambda df: np.where(df._merge == 'both',
100,
df.Col4))
.loc[:, df1.columns]
)
Col1 Col2 Col3 Col4
0 A b x 1
1 A b y 100
2 A c z 3
3 B b x 100
Mapping column dataframe with another dataframe
You can mapping column reporting_date_id
by another DataFrame
by Series.map
and then use it for replace missing values in Series.fillna
:
s = df2.set_index('reporting_date_id')['filing_date_id']
df1['filing_date_id'] = df1['filing_date_id'].fillna(df1['reporting_date_id'].map(s))
Related Topics
Slicing of a Numpy 2D Array, or How to Extract an Mxm Submatrix from an Nxn Array (N>M)
How to Concatenate Three Excels Files Xlsx Using Python
Python Socket Receive - Incoming Packets Always Have a Different Size
Getting Gradient of Model Output W.R.T Weights Using Keras
Writing List of Strings to Excel CSV File in Python
Is It Bad Practice to Use a Built-In Function Name as an Attribute or Method Identifier
What Is the Max Length of a Python String
Reading Binary Data from Stdin
Bad Idea to Catch All Exceptions in Python
Python & Pandas: How to Query If a List-Type Column Contains Something
How to Convert an Integer to the Shortest Url-Safe String in Python
Get the String Within Brackets in Python