Filter Dataframe Rows If Value in Column Is in a Set List of Values

Filter dataframe rows if value in column is in a set list of values

Use the isin method:

rpt[rpt['STK_ID'].isin(stk_list)]

Use a list of values to select rows from a Pandas dataframe

You can use the isin method:

In [1]: df = pd.DataFrame({'A': [5,6,3,4], 'B': [1,2,3,5]})

In [2]: df
Out[2]:
A B
0 5 1
1 6 2
2 3 3
3 4 5

In [3]: df[df['A'].isin([3, 6])]
Out[3]:
A B
1 6 2
2 3 3

And to get the opposite use ~:

In [4]: df[~df['A'].isin([3, 6])]
Out[4]:
A B
0 5 1
3 4 5

Filter pandas dataframe rows if any value on a list inside the dataframe is in another list

You can expand the inner list, and check if any items in the inner lists are contained in [480, 9, 104]:

l = [480, 9, 104]
df[df.categories.str.split('.', expand=True).isin(map(str,l)).any(axis=1)]

album_id categories split_categories
0 66562 480.494 [480,494]
3 1709 9.000 [9]
4 59239 105.104 [105,104]

Filter a Dataframe on a column, if a list value is contained in the column value. Pandas

Here you go:

df = pd.DataFrame({'column':['abc', 'def', 'ghi', 'abc, def', 'ghi, jkl', 'abc']})
contains_filter = '|'.join(filter_list)
df = df[pd.notna(df.column) & df.column.str.contains(contains_filter)]

Output:

     column
0 abc
3 abc, def
4 ghi, jkl
5 abc

Filter rows where column value is in list in another column?

I use a version of the visible part of your df (for the future pls follow this: how to provide a great pandas example)

I modified a few rows to have some where node is included in key_players

from io import StringIO
df = pd.read_csv(StringIO(
"""
period node key_players
0 0 ZF1013 ['ZF1128', 'ZF176', 'ZF434','ZF469','ZF659']
1 0 ZF1014 ['ZF1014', 'ZF176', 'ZF434','ZF469','ZF659']
2 0 ZF1015 ['ZF1128', 'ZF176', 'ZF434','ZF469','ZF659']
3 0 ZF1020 ['ZF1128', 'ZF176', 'ZF434','ZF469','ZF659']
4 0 ZF1025 ['ZF1128', 'ZF1025', 'ZF434','ZF469','ZF659']
1565 4 ZF898 ['ZF1336', 'ZF1346','ZF3', 'ZF434,' 'ZF481']
1566 4 ZF945 ['ZF1336', 'ZF1346','ZF3', 'ZF434,' 'ZF481']
1567 4 ZF948 ['ZF1336', 'ZF1346','ZF3', 'ZF434,' 'ZF481']
1568 4 ZF97 ['ZF1336', 'ZF1346','ZF3', 'ZF434,' 'ZF481']
1569 4 ZFM264 ['ZF1336', 'ZF1346','ZF3', 'ZF434,' 'ZF481']
"""), sep = '\s\s+')
df['key_players'] = df['key_players'].apply(eval)

Solution 1

We unwrap the list in key_players via explode and keep those rows where we have a match with node

df2 = df.assign(kp = df['key_players']).explode('kp')
df2[df2['kp'] == df2['node']].drop(columns = 'kp')

this prints

      period  node    key_players
-- -------- ------ -----------------------------------------------
1 0 ZF1014 ['ZF1014', 'ZF176', 'ZF434', 'ZF469', 'ZF659']
4 0 ZF1025 ['ZF1128', 'ZF1025', 'ZF434', 'ZF469', 'ZF659']

Solution 2

If you do not mind looping through rows (generally discouraged with pandas) you can do this

df[df.apply(lambda row: row['node'] in row['key_players'], axis=1)]

with the same output

Filter for rows if any value in a list of substrings is contained in any column in a dataframe

You can use .T to transpose the dataframe and str.contains to check the values column-wise and then transpose back (also str.contains can have multiple values passed to if separated with |, which is why I change the list to a string with matches = '|'.join(matches)).

The benefit of transposing the dataframe is that you can use column-wise pandas method instead of looping through rows or a long lambda x: list comprehension. This technique should have good performance compared to a lambda x with axis=1 answer:

# df = df.set_index('Index')
matches = ['wat','air']
matches = '|'.join(matches)
df = df.reset_index(drop=True).T.fillna('')
df = df.T[[df[col].str.lower().str.contains(matches).values.any() for col in df.columns]]
df
Out[1]:
Name col1 col2 col3
0 A water watermelone
1 B bbbY hot AIR
2 B cccY water air conditioner
4 D EEEEE cold air eat

how to write a function to filter rows based on a list values one by one and make analysis

First many DataFrames is here not necessary.

You can filter only necessary values for column1 and pass both columns to groupby:

L = ['A','B','C']

s = df1[df1['column1'].isin(L)].groupby(['column1', 'column2']).size()

Last select by values of list:

s.loc['A']
s.loc['B']
s.loc['C']

If want function:

def f(df, x):
return df[df['column1'].eq(L)].groupby(['column2']).size()


print (f(df1, 'A'))

How to filter Pandas dataframe using 'in' and 'not in' like in SQL

You can use pd.Series.isin.

For "IN" use: something.isin(somewhere)

Or for "NOT IN": ~something.isin(somewhere)

As a worked example:

import pandas as pd

>>> df
country
0 US
1 UK
2 Germany
3 China
>>> countries_to_keep
['UK', 'China']
>>> df.country.isin(countries_to_keep)
0 False
1 True
2 False
3 True
Name: country, dtype: bool
>>> df[df.country.isin(countries_to_keep)]
country
1 UK
3 China
>>> df[~df.country.isin(countries_to_keep)]
country
0 US
2 Germany

python pandas loc - filter for list of values

There is a df.isin(values) method wich tests
whether each element in the DataFrame is contained in values.
So, as @MaxU wrote in the comment, you can use

df.loc[df['channel'].isin(['sale','fullprice'])]

to filter one column by multiple values.

Filter Dataframe if column is in any part of list

This is a more complicated string matching problem than usual, but you can use a list comprehension for performance:

lst = ["123 ABC", "456 DEF", "789 GHI"]
df['match'] = [any(x in l for l in lst) for x in df['idlist']]
df

id idlist match
0 0 ABC True
1 1 XYZ False

To simply filter, use

df[[any(x in l for l in lst) for x in df['idlist']]]

id idlist
0 0 ABC

List comprehensions are my to-go syntax for many string operations. I've written a detailed writeup about their advantages in For loops with pandas - When should I care?.

If you need to handle NaNs, use a function with try-catch handling.

def search(x, lst):
try:
return any(x in l for l in lst)
except TypeError:
return False

df[[search(x, lst) for x in df['idlist']]]

id idlist
0 0 ABC


Related Topics



Leave a reply



Submit