Filter Dataframe Rows If Value in Column Is in a Set List of Values

Filter dataframe rows if value in column is in a set list of values

Use the isin method:

rpt[rpt['STK_ID'].isin(stk_list)]

Use a list of values to select rows from a Pandas dataframe

You can use the isin method:

In [1]: df = pd.DataFrame({'A': [5,6,3,4], 'B': [1,2,3,5]})

In [2]: df
Out[2]:
   A  B
0  5  1
1  6  2
2  3  3
3  4  5

In [3]: df[df['A'].isin([3, 6])]
Out[3]:
   A  B
1  6  2
2  3  3

And to get the opposite use ~:

In [4]: df[~df['A'].isin([3, 6])]
Out[4]:
   A  B
0  5  1
3  4  5

Filter pandas dataframe rows if any value on a list inside the dataframe is in another list

You can expand the inner list, and check if any items in the inner lists are contained in [480, 9, 104]:

l = [480, 9, 104]
df[df.categories.str.split('.', expand=True).isin(map(str,l)).any(axis=1)]

   album_id  categories split_categories
0     66562     480.494        [480,494]
3      1709       9.000              [9]
4     59239     105.104        [105,104]

Filter a Dataframe on a column, if a list value is contained in the column value. Pandas

Here you go:

df = pd.DataFrame({'column':['abc', 'def', 'ghi', 'abc, def', 'ghi, jkl', 'abc']})
contains_filter = '|'.join(filter_list)
df = df[pd.notna(df.column) & df.column.str.contains(contains_filter)]

Output:

     column
0       abc
3  abc, def
4  ghi, jkl
5       abc

Filter rows where column value is in list in another column?

I use a version of the visible part of your df (for the future pls follow this: how to provide a great pandas example)

I modified a few rows to have some where node is included in key_players

from io import StringIO
df = pd.read_csv(StringIO(
"""
        period  node    key_players
0       0       ZF1013  ['ZF1128', 'ZF176', 'ZF434','ZF469','ZF659']
1       0       ZF1014  ['ZF1014', 'ZF176', 'ZF434','ZF469','ZF659']
2       0       ZF1015  ['ZF1128', 'ZF176', 'ZF434','ZF469','ZF659']
3       0       ZF1020  ['ZF1128', 'ZF176', 'ZF434','ZF469','ZF659']
4       0       ZF1025  ['ZF1128', 'ZF1025', 'ZF434','ZF469','ZF659']
1565    4       ZF898   ['ZF1336', 'ZF1346','ZF3', 'ZF434,' 'ZF481']
1566    4       ZF945   ['ZF1336', 'ZF1346','ZF3', 'ZF434,' 'ZF481']
1567    4       ZF948   ['ZF1336', 'ZF1346','ZF3', 'ZF434,' 'ZF481']
1568    4       ZF97    ['ZF1336', 'ZF1346','ZF3', 'ZF434,' 'ZF481']
1569    4       ZFM264  ['ZF1336', 'ZF1346','ZF3', 'ZF434,' 'ZF481']
"""), sep = '\s\s+')
df['key_players'] = df['key_players'].apply(eval)

Solution 1

We unwrap the list in key_players via explode and keep those rows where we have a match with node

df2 = df.assign(kp = df['key_players']).explode('kp')
df2[df2['kp'] == df2['node']].drop(columns = 'kp')

this prints

      period  node    key_players
--  --------  ------  -----------------------------------------------
 1         0  ZF1014  ['ZF1014', 'ZF176', 'ZF434', 'ZF469', 'ZF659']
 4         0  ZF1025  ['ZF1128', 'ZF1025', 'ZF434', 'ZF469', 'ZF659']

Solution 2

If you do not mind looping through rows (generally discouraged with pandas) you can do this

df[df.apply(lambda row: row['node'] in row['key_players'], axis=1)]

with the same output

Filter for rows if any value in a list of substrings is contained in any column in a dataframe

You can use .T to transpose the dataframe and str.contains to check the values column-wise and then transpose back (also str.contains can have multiple values passed to if separated with |, which is why I change the list to a string with matches = '|'.join(matches)).

The benefit of transposing the dataframe is that you can use column-wise pandas method instead of looping through rows or a long lambda x: list comprehension. This technique should have good performance compared to a lambda x with axis=1 answer:

# df = df.set_index('Index')
matches = ['wat','air']
matches = '|'.join(matches)
df = df.reset_index(drop=True).T.fillna('')
df = df.T[[df[col].str.lower().str.contains(matches).values.any() for col in df.columns]]
df
Out[1]: 
  Name   col1      col2             col3
0    A            water      watermelone
1    B   bbbY                    hot AIR
2    B   cccY     water  air conditioner
4    D  EEEEE  cold air              eat

how to write a function to filter rows based on a list values one by one and make analysis

First many DataFrames is here not necessary.

You can filter only necessary values for column1 and pass both columns to groupby:

L = ['A','B','C']

s = df1[df1['column1'].isin(L)].groupby(['column1', 'column2']).size()

Last select by values of list:

s.loc['A']
s.loc['B']
s.loc['C']

If want function:

def f(df, x):
    return df[df['column1'].eq(L)].groupby(['column2']).size()


print (f(df1, 'A'))

How to filter Pandas dataframe using 'in' and 'not in' like in SQL

You can use pd.Series.isin.

For "IN" use: something.isin(somewhere)

Or for "NOT IN": ~something.isin(somewhere)

As a worked example:

import pandas as pd

>>> df
  country
0        US
1        UK
2   Germany
3     China
>>> countries_to_keep
['UK', 'China']
>>> df.country.isin(countries_to_keep)
0    False
1     True
2    False
3     True
Name: country, dtype: bool
>>> df[df.country.isin(countries_to_keep)]
  country
1        UK
3     China
>>> df[~df.country.isin(countries_to_keep)]
  country
0        US
2   Germany

python pandas loc - filter for list of values

There is a df.isin(values) method wich tests
whether each element in the DataFrame is contained in values.
So, as @MaxU wrote in the comment, you can use

df.loc[df['channel'].isin(['sale','fullprice'])]

to filter one column by multiple values.

Filter Dataframe if column is in any part of list

This is a more complicated string matching problem than usual, but you can use a list comprehension for performance:

lst = ["123 ABC", "456 DEF", "789 GHI"]
df['match'] = [any(x in l for l in lst) for x in df['idlist']]
df

   id idlist  match
0   0    ABC   True
1   1    XYZ  False

To simply filter, use

df[[any(x in l for l in lst) for x in df['idlist']]]

   id idlist
0   0    ABC

List comprehensions are my to-go syntax for many string operations. I've written a detailed writeup about their advantages in For loops with pandas - When should I care?.

If you need to handle NaNs, use a function with try-catch handling.

def search(x, lst):
    try:
        return any(x in l for l in lst)
    except TypeError:
        return False

df[[search(x, lst) for x in df['idlist']]]

   id idlist
0   0    ABC

Filter Dataframe Rows If Value in Column Is in a Set List of Values