Pandas Dataframe Str.Contains() and Operation

pandas dataframe str.contains() AND operation

You can do that as follows:

df[(df['col_name'].str.contains('apple')) & (df['col_name'].str.contains('banana'))]

How can I use the OR operator for the str.contains funtion when searching for multiple strings in a column or in multiple columns in Python?

You use | to represent 'or' in str.contains:

df[df['Category'].str.contains('holiday|business', case=False)]

For more infomation, see the docs.

Pandas str.contains() not working in some cases

You can add regex=False parameter for avoid convert values to regex in Series.str.contains:

melted_Peptides['variable'].str.contains(pair[0], regex=False)

str.contains function AND does not contain

You can use the bitwise operators & (and) and ~ (not) in combination.
The syntax looks like df.loc[(condition_A) & (~condition_B)]

An example relevant for your question would be:

df = df_merged

selected = df.loc[
(df[rule_col].astype(str).str.contains('Applicant Age', na=False)) &
(~df[rule_col].astype(str).str.contains('200', na=False))
]

Pandas str.contains - Search for multiple values in a string and print the values in a new column

Here is one way:

foods =['apples', 'oranges', 'grapes', 'blueberries']

def matcher(x):
for i in foods:
if i.lower() in x.lower():
return i
else:
return np.nan

df['Match'] = df['Text'].apply(matcher)

# Text Match
# 0 I want to buy some apples. apples
# 1 Oranges are good for the health. oranges
# 2 John is eating some grapes. grapes
# 3 This line does not contain any fruit names. NaN
# 4 I bought 2 blueberries yesterday. blueberries

How to test if a string contains one of the substrings in a list, in pandas?

One option is just to use the regex | character to try to match each of the substrings in the words in your Series s (still using str.contains).

You can construct the regex by joining the words in searchfor with |:

>>> searchfor = ['og', 'at']
>>> s[s.str.contains('|'.join(searchfor))]
0 cat
1 hat
2 dog
3 fog
dtype: object

As @AndyHayden noted in the comments below, take care if your substrings have special characters such as $ and ^ which you want to match literally. These characters have specific meanings in the context of regular expressions and will affect the matching.

You can make your list of substrings safer by escaping non-alphanumeric characters with re.escape:

>>> import re
>>> matches = ['$money', 'x^y']
>>> safe_matches = [re.escape(m) for m in matches]
>>> safe_matches
['\\$money', 'x\\^y']

The strings with in this new list will match each character literally when used with str.contains.

How to use str.contains() with multiple expressions, in pandas dataframes?

They should be one regular expression, and should be in one string:

"nt|nv"  # rather than "nt" | " nv"
f_recs[f_recs['Behavior'].str.contains("nt|nv", na=False)]

Python doesn't let you use the or (|) operator on strings:

In [1]: "nt" | "nv"
TypeError: unsupported operand type(s) for |: 'str' and 'str'

Pandas string.contains doesn't work if searched string contains the substring at the beginning of the string

TLDR: Experiment with pandas.Series.str.normalize(), trying different Unicode forms until the issue is solved. 'NFKC' worked for me.

The problem had to do with the format of the data in the column that I was doing the...

df['column'].str.contains('substring') 

...operation on. Using the pandas.Series.str.normalize() function works. Link here. Sometimes, under some circumstances that I can't deliberately recreate, the strings would have '\xa0' and '\n' appended to them at the beginning or the end of the string. This post helps address how to deal with that problem. Following that post, I for-looped through every string column and changed the unicode form until I found something that worked: 'NFKC'.

pandas str contains with maximum value

If need loop solution create list of dictionaries with max and pass to DataFrame constructor:

df2['Age'] = pd.to_datetime(df2['Age'], dayfirst=True)

out = []
for x in df1['string']:
m = df2.loc[df2.Name.str.contains(x), 'Age'].max()
out.append({'string': x, 'MaxDate': m})

df = pd.DataFrame(out)
print (df)
string MaxDate
0 Ti 1998-03-21
1 Kri 1996-04-18
2 ian 2000-06-19


Related Topics



Leave a reply



Submit