Filter dataframe rows if value in column is in a set list of values
Use the isin
method:
rpt[rpt['STK_ID'].isin(stk_list)]
Use a list of values to select rows from a Pandas dataframe
You can use the isin
method:
In [1]: df = pd.DataFrame({'A': [5,6,3,4], 'B': [1,2,3,5]})
In [2]: df
Out[2]:
A B
0 5 1
1 6 2
2 3 3
3 4 5
In [3]: df[df['A'].isin([3, 6])]
Out[3]:
A B
1 6 2
2 3 3
And to get the opposite use ~
:
In [4]: df[~df['A'].isin([3, 6])]
Out[4]:
A B
0 5 1
3 4 5
Filter pandas dataframe rows if any value on a list inside the dataframe is in another list
You can expand the inner list, and check if any
items in the inner lists are contained in [480, 9, 104]
:
l = [480, 9, 104]
df[df.categories.str.split('.', expand=True).isin(map(str,l)).any(axis=1)]
album_id categories split_categories
0 66562 480.494 [480,494]
3 1709 9.000 [9]
4 59239 105.104 [105,104]
Filter a Dataframe on a column, if a list value is contained in the column value. Pandas
Here you go:
df = pd.DataFrame({'column':['abc', 'def', 'ghi', 'abc, def', 'ghi, jkl', 'abc']})
contains_filter = '|'.join(filter_list)
df = df[pd.notna(df.column) & df.column.str.contains(contains_filter)]
Output:
column
0 abc
3 abc, def
4 ghi, jkl
5 abc
Filter rows where column value is in list in another column?
I use a version of the visible part of your df (for the future pls follow this: how to provide a great pandas example)
I modified a few rows to have some where node is included in key_players
from io import StringIO
df = pd.read_csv(StringIO(
"""
period node key_players
0 0 ZF1013 ['ZF1128', 'ZF176', 'ZF434','ZF469','ZF659']
1 0 ZF1014 ['ZF1014', 'ZF176', 'ZF434','ZF469','ZF659']
2 0 ZF1015 ['ZF1128', 'ZF176', 'ZF434','ZF469','ZF659']
3 0 ZF1020 ['ZF1128', 'ZF176', 'ZF434','ZF469','ZF659']
4 0 ZF1025 ['ZF1128', 'ZF1025', 'ZF434','ZF469','ZF659']
1565 4 ZF898 ['ZF1336', 'ZF1346','ZF3', 'ZF434,' 'ZF481']
1566 4 ZF945 ['ZF1336', 'ZF1346','ZF3', 'ZF434,' 'ZF481']
1567 4 ZF948 ['ZF1336', 'ZF1346','ZF3', 'ZF434,' 'ZF481']
1568 4 ZF97 ['ZF1336', 'ZF1346','ZF3', 'ZF434,' 'ZF481']
1569 4 ZFM264 ['ZF1336', 'ZF1346','ZF3', 'ZF434,' 'ZF481']
"""), sep = '\s\s+')
df['key_players'] = df['key_players'].apply(eval)
Solution 1
We unwrap the list in key_players
via explode
and keep those rows where we have a match with node
df2 = df.assign(kp = df['key_players']).explode('kp')
df2[df2['kp'] == df2['node']].drop(columns = 'kp')
this prints
period node key_players
-- -------- ------ -----------------------------------------------
1 0 ZF1014 ['ZF1014', 'ZF176', 'ZF434', 'ZF469', 'ZF659']
4 0 ZF1025 ['ZF1128', 'ZF1025', 'ZF434', 'ZF469', 'ZF659']
Solution 2
If you do not mind looping through rows (generally discouraged with pandas) you can do this
df[df.apply(lambda row: row['node'] in row['key_players'], axis=1)]
with the same output
Filter for rows if any value in a list of substrings is contained in any column in a dataframe
You can use .T
to transpose the dataframe and str.contains
to check the values column-wise and then transpose back (also str.contains
can have multiple values passed to if separated with |
, which is why I change the list to a string with matches = '|'.join(matches)
).
The benefit of transposing the dataframe is that you can use column-wise pandas method instead of looping through rows or a long lambda x:
list comprehension. This technique should have good performance
compared to a lambda x
with axis=1
answer:
# df = df.set_index('Index')
matches = ['wat','air']
matches = '|'.join(matches)
df = df.reset_index(drop=True).T.fillna('')
df = df.T[[df[col].str.lower().str.contains(matches).values.any() for col in df.columns]]
df
Out[1]:
Name col1 col2 col3
0 A water watermelone
1 B bbbY hot AIR
2 B cccY water air conditioner
4 D EEEEE cold air eat
how to write a function to filter rows based on a list values one by one and make analysis
First many DataFrames is here not necessary.
You can filter only necessary values for column1
and pass both columns to groupby
:
L = ['A','B','C']
s = df1[df1['column1'].isin(L)].groupby(['column1', 'column2']).size()
Last select by values of list:
s.loc['A']
s.loc['B']
s.loc['C']
If want function:
def f(df, x):
return df[df['column1'].eq(L)].groupby(['column2']).size()
print (f(df1, 'A'))
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
You can use pd.Series.isin
.
For "IN" use: something.isin(somewhere)
Or for "NOT IN": ~something.isin(somewhere)
As a worked example:
import pandas as pd
>>> df
country
0 US
1 UK
2 Germany
3 China
>>> countries_to_keep
['UK', 'China']
>>> df.country.isin(countries_to_keep)
0 False
1 True
2 False
3 True
Name: country, dtype: bool
>>> df[df.country.isin(countries_to_keep)]
country
1 UK
3 China
>>> df[~df.country.isin(countries_to_keep)]
country
0 US
2 Germany
python pandas loc - filter for list of values
There is a df.isin(values)
method wich tests
whether each element in the DataFrame
is contained in values
.
So, as @MaxU wrote in the comment, you can use
df.loc[df['channel'].isin(['sale','fullprice'])]
to filter one column by multiple values.
Filter Dataframe if column is in any part of list
This is a more complicated string matching problem than usual, but you can use a list comprehension for performance:
lst = ["123 ABC", "456 DEF", "789 GHI"]
df['match'] = [any(x in l for l in lst) for x in df['idlist']]
df
id idlist match
0 0 ABC True
1 1 XYZ False
To simply filter, use
df[[any(x in l for l in lst) for x in df['idlist']]]
id idlist
0 0 ABC
List comprehensions are my to-go syntax for many string operations. I've written a detailed writeup about their advantages in For loops with pandas - When should I care?.
If you need to handle NaNs, use a function with try-catch handling.
def search(x, lst):
try:
return any(x in l for l in lst)
except TypeError:
return False
df[[search(x, lst) for x in df['idlist']]]
id idlist
0 0 ABC
Related Topics
How to Read and Write CSV Files With Python
How to Use a Decimal Step Value For Range()
How to Capture Sigint in Python
How to Get a Substring of a String in Python
Unicodedecodeerror When Reading CSV File in Pandas With Python
Process Escape Sequences in a String in Python
How to Use Raw_Input in Python 3
How to Use a Variable Inside a Regular Expression
Django Csrf Check Failing With an Ajax Post Request
Typeerror: 'List' Object Is Not Callable in Python
Confused by Python File Mode "W+"
Install a Python Package into a Different Directory Using Pip