Search for String in All Pandas Dataframe Columns and Filter

Search for String in all Pandas DataFrame columns and filter

The Series.str.contains method expects a regex pattern (by default), not a literal string. Therefore str.contains("^") matches the beginning of any string. Since every string has a beginning, everything matches. Instead use str.contains("\^") to match the literal ^ character.

To check every column, you could use for col in df to iterate through the column names, and then call str.contains on each column:

mask = np.column_stack([df[col].str.contains(r"\^", na=False) for col in df])
df.loc[mask.any(axis=1)]

Alternatively, you could pass regex=False to str.contains to make the test use the Python in operator; but (in general) using regex is faster.

How to filter dataframe columns between two rows that contain specific string in column?

If both values are present you temporarily set "String" as index:

df.set_index('String').loc['Start':'End'].reset_index()

output:

   String  Value
0 Start 65
1 Orange 33
2 Purple 65
3 Teal 34
4 Indigo 44
5 End 32

Alternatively, using isin (then the order of Start/End doesn't matter):

m = df['String'].isin(['Start', 'End']).cumsum().eq(1)
df[m|m.shift()]

output:

   String  Value
3 Start 65
4 Orange 33
5 Purple 65
6 Teal 34
7 Indigo 44
8 End 32

Searching for string in all columns of dataframe in Python

Create boolean DataFrame and check at least one True per row by DataFrame.any and filter by boolean indexing:

df = df[df.eq('a').any(axis=1)]
print (df)
A B
0 a b
2 e a

Detail:

print (df.eq('a'))
A B
0 True False
1 False False
2 False True

print(df.eq('a').any(axis=1))
0 True
1 False
2 True
dtype: bool

If want check substrings use str.contains for boolean DataFrame:

df = pd.DataFrame([['ad', 'b'], ['c', 'd'], ['e', 'asw']], columns=["A", "B"])
print (df)
A B
0 ad b
1 c d
2 e asw

df = df[df.apply(lambda x: x.str.contains('a')).any(axis=1)]

Or applymap for elemnt wise checking by in:

df = df[df.applymap(lambda x: 'a' in x).any(axis=1)]

print (df)
A B
0 ad b
2 e asw

Pandas filter dataframe columns through substring match

You can iterate over index axis:

>>> df[df.apply(lambda x: x['Name'].lower() in x['Fname'].lower(), axis=1)]

Name Age Fname
1 Bob 12 Bob
2 Clarke 13 clarke

str.contains takes a constant in first argument pat not a Series.

How to filter rows containing a string pattern from a Pandas dataframe

In [3]: df[df['ids'].str.contains("ball")]
Out[3]:
ids vals
0 aball 1
1 bball 2
3 fball 4

Efficient way to search string contains in multiple columns using pandas

You can do this with a lambda function

In [40]: df[['test_string_1', 'test_string_2']].apply(lambda x: x.str.contains('Rajini|God|Thalaivar',case=False)).any(axis=1).astype(int)
Out[40]:
0 1
1 1
2 0
3 1
4 0
5 1
dtype: int64

filtering data in pandas where string is in multiple columns

new_df_1 = df[df.team_1 =='ENG'][['team_1', 'score_1']]
new_df_1 =new_df_1.rename(columns={"team_1":"team", "score_1":"score"})
# team score
# 0 ENG 1

new_df_2 = df[df.team_2 =='ENG'][['team_2', 'score_2']]
new_df_2 = new_df_2.rename(columns={"team_2":"team", "score_2":"score"})
# team score
# 1 ENG 2

then concat two dataframe:

pd.concat([new_df_1, new_df_2])

the output is :

 team  score
0 ENG 1
1 ENG 2

Filter pandas dataframe if value of column is within a string

You can use .apply + in operator:

s = "ZA1127B.48"

print(df[df.apply(lambda x: x.Part_Number in s, axis=1)])

Prints:

  Part_Number
0 A1127


Related Topics



Leave a reply



Submit