How to Test If a String Contains One of the Substrings in a List, in Pandas

How to test if a string contains one of the substrings in a list, in pandas?

One option is just to use the regex | character to try to match each of the substrings in the words in your Series s (still using str.contains).

You can construct the regex by joining the words in searchfor with |:

>>> searchfor = ['og', 'at']
>>> s[s.str.contains('|'.join(searchfor))]
0    cat
1    hat
2    dog
3    fog
dtype: object

As @AndyHayden noted in the comments below, take care if your substrings have special characters such as $ and ^ which you want to match literally. These characters have specific meanings in the context of regular expressions and will affect the matching.

You can make your list of substrings safer by escaping non-alphanumeric characters with re.escape:

>>> import re
>>> matches = ['$money', 'x^y']
>>> safe_matches = [re.escape(m) for m in matches]
>>> safe_matches
['\\$money', 'x\\^y']

The strings with in this new list will match each character literally when used with str.contains.

How to test if a string contains one of the substrings stored in a list column in pandas?

You can just use zip and list comprehension:

df['c'] = [int(any(w in a for w in b)) for a, b in zip(df.a, df.b)]

df
#                                        a             b  c
#0                     Bob Smith is great.  [Smith, foo]  1
#1  The Sun is a mass of incandescent gas.  [Jones, bar]  0

If you don't care about case:

df['c'] = [any(w.lower() in a for w in b) for a, b in zip(df.a.str.lower(), df.b)]

How to test string contains one of the substrings in a list, in pandas?

) is a special regex character. You need to escape:

searchfor = ['og\)', 'at\)']
s[s.str.contains('|'.join(searchfor))]

Output:

0    cat)
1    hat)
2    dog)
3    fog)
dtype: object

pandas dataframe str.contains() AND operation

You can do that as follows:

df[(df['col_name'].str.contains('apple')) & (df['col_name'].str.contains('banana'))]

Substituting values of a column if it contains a substring of a list

Simplier is use loop here:

L = ['dog', 'cat', 'panda']
    
for x in L:
    df.loc[df['column'].str.contains(x), "column"]= x
print (df)
           column
0             dog
1             cat
2  I have nothing
3           panda
4

Or use Series.str.extract with Series.fillna by original data:

df['column'] =  (df['column'].str.extract(f'({"|".join(L)})', expand=False)
                             .fillna(df['column']))
print (df)
           column
0             dog
1             cat
2  I have nothing
3           panda
4

Pandas str.contains - Search for multiple values in a string and print the values in a new column

Here is one way:

foods =['apples', 'oranges', 'grapes', 'blueberries']

def matcher(x):
    for i in foods:
        if i.lower() in x.lower():
            return i
    else:
        return np.nan

df['Match'] = df['Text'].apply(matcher)

#                                           Text        Match
# 0                   I want to buy some apples.       apples
# 1             Oranges are good for the health.      oranges
# 2                  John is eating some grapes.       grapes
# 3  This line does not contain any fruit names.          NaN
# 4            I bought 2 blueberries yesterday.  blueberries

Check if a string in a Pandas DataFrame column is in a list of strings

frame = pd.DataFrame({'a' : ['the cat is blue', 'the sky is green', 'the dog is black']})

frame
                  a
0   the cat is blue
1  the sky is green
2  the dog is black

The str.contains method accepts a regular expression pattern:

mylist = ['dog', 'cat', 'fish']
pattern = '|'.join(mylist)

pattern
'dog|cat|fish'

frame.a.str.contains(pattern)
0     True
1    False
2     True
Name: a, dtype: bool

Because regex patterns are supported, you can also embed flags:

frame = pd.DataFrame({'a' : ['Cat Mr. Nibbles is blue', 'the sky is green', 'the dog is black']})

frame
                     a
0  Cat Mr. Nibbles is blue
1         the sky is green
2         the dog is black

pattern = '|'.join([f'(?i){animal}' for animal in mylist])  # python 3.6+

pattern
'(?i)dog|(?i)cat|(?i)fish'
 
frame.a.str.contains(pattern)
0     True  # Because of the (?i) flag, 'Cat' is also matched to 'cat'
1    False
2     True

Python Pandas: check if Series contains a string from list

You can loop through the lists simultaneously with zip. Make sure to pass regex=False to str.contains as . is a regex character.

abbreviation=['n.', 'v.']
col_name=['Noun','Verb']
for a, col in zip(abbreviation, col_name):
    Blaze[col] = np.where(Blaze['Info'].str.contains(a, regex=False),True,False)
Blaze
Out[1]: 
        Word                                               Info  Noun   Verb
0        Aam  Aam, n. Etym: [D. aam, fr. LL. ama; cf. L. ham...  True  False
1  aard-vark     Aard"-vark`, n. Etym: [D., earth-pig.] (Zoöl.)  True  False
2  aard-wolf      Aard"-wolf`, n. Etym: [D, earth-wolf] (Zoöl.)  True  False

If required, str.contains also has a case parameter, so you can specify case=False to search case-insensitively.

Check if String in List of Strings is in Pandas DataFrame Column

If need match values in list, use Series.isin:

df['Match'] = df["Brand"].isin(search_for_these_values)
print (df)
            Brand  Price Liscence Plate  Match
0     Honda Civic  22000        ABC 123  False
1  Toyota Corolla  25000        XYZ 789  False
2      Ford Focus  27000        CBA 321   True
3         Audi A4  35000        ZYX 987  False
4             NaN  29000        DEF 456  False

Solution with match is used for check substrings, so different output.

Alternative solution for match substrings with Series.str.contains and parameter na=False:

df['Match'] = df["Brand"].str.contains(pattern, na=False)
print (df)
            Brand  Price Liscence Plate  Match
0     Honda Civic  22000        ABC 123   True
1  Toyota Corolla  25000        XYZ 789   True
2      Ford Focus  27000        CBA 321   True
3         Audi A4  35000        ZYX 987  False
4             NaN  29000        DEF 456  False

EDIT:

For test values in substrings is possible use list comprehension with loop by values in search_for_these_values and test match by in with any for return at least one True:

df['Match'] = [any(x in z for z in search_for_these_values) 
                                if x == x 
                                else False 
                                for x in df["Brand"]]
print (df)

            Brand  Price Liscence Plate  Match
0     Honda Civic  22000        ABC 123  False
1  Toyota Corolla  25000        XYZ 789  False
2      Ford Focus  27000        CBA 321   True
3         Audi A4  35000        ZYX 987   True
4             NaN  29000        DEF 456  False

How to Test If a String Contains One of the Substrings in a List, in Pandas