Filter a Column Which Contains Several Keywords

Filter a column which contains several keywords

grep can use | as an or, so why not paste your filters together with | as a separator:

dfilter <- df1[grep(paste0(filter1, collapse = "|"), df1$type),]

filter a column with a multiple keywords, in django

You can try like this using Q():

from django.db.models import Q

query = Q()

for k in datas["keywords"]:
    query |= Q(task_name__contains=k)

tasks = Task.objects.filter(query)

Filter Data by multiple keywords

I think you can create for each keyword separate mask and then combine them with chaining by & - for at least one True per row use DataFrame.any:

df_rest = pd.DataFrame({0:['OpenSSL XYZ dd','dd OpenSSL','g OpenSSL'],
                   1:['CVE-2017-XX OpenSSL dd','dd OpenSSL','g XYZ'],
                   2:['OpenSSL  t','dd XYZ','g CVE-2017-XX XYZ OpenSSL']})

cols = [0,1,2]
m1 = df_rest[cols].apply(lambda r: r.str.contains('OpenSSL', case=False))
print (m1)
      0      1      2
0  True   True   True
1  True   True  False
2  True  False   True

m2 = df_rest[cols].apply(lambda r: r.str.contains('XYZ', case=False))
print (m2)
       0      1      2
0   True  False  False
1  False  False   True
2  False   True   True

m3 = df_rest[cols].apply(lambda r: r.str.contains('CVE-2017-XX', case=False))
print (m3)
       0      1      2
0  False   True  False
1  False  False  False
2  False  False   True

print (m1 & m2)
       0      1      2
0   True  False  False
1  False  False  False
2  False  False   True

print ((m1 & m2).any(axis=1))
0     True
1    False
2     True
dtype: bool

df = df_rest[(m1 & m2).any(axis=1)]
print (df)
                0                       1                          2
0  OpenSSL XYZ dd  CVE-2017-XX OpenSSL dd                 OpenSSL  t
2       g OpenSSL                   g XYZ  g CVE-2017-XX XYZ OpenSSL

EDIT:

Is possible some keywords are interpreted as regex. For avoid it use regex= False:

df_rest = pd.DataFrame({0:['XYZ dd','dd OpenSSL 0.9.4','g 0.9.4'],
                   1:['0.9.4 OpenSSL dd','dd 0.9','g XYZ'],
                   2:['OpenSSL  t','dd XYZ','OpenSSL 0.9.7']})

print (df_rest)
                  0                 1              2
0            XYZ dd  0.9.4 OpenSSL dd     OpenSSL  t
1  dd OpenSSL 0.9.4            dd 0.9         dd XYZ
2           g 0.9.4             g XYZ  OpenSSL 0.9.7

cols = [0,1,2]
m = df_rest[cols].apply(lambda r: (r.str.contains('0.9.4', case=False, regex=False) & 
                                   r.str.contains('OpenSSL', case=False, regex=False)))

df = df_rest[m.any(axis=1)]                         
print (df)
                  0                 1           2
0            XYZ dd  0.9.4 OpenSSL dd  OpenSSL  t
1  dd OpenSSL 0.9.4            dd 0.9      dd XYZ

EDIT1:

df_rest = pd.DataFrame({0:['XYZ dd','dd OpenSSL 0.9.1','g 0.9.4'],
                   1:['0.9.2 OpenSSL dd','dd 0.9','g XYZ'],
                   2:['OpenSSL  t','dd XYZ','OpenSSL 0.9.1']})

print (df_rest)

df = pd.read_csv('keywords.txt', names=('a','b'))
print (df)
         a      b
0  OpenSSL  0.9.1
1  OpenSSL  0.9.2
2  OpenSSL  0.9.4

cols = [0,1,2]
for i, x in df.iterrows():
    m = df_rest[cols].apply(lambda r: (r.str.contains(x['a'], case=False, regex=False) & 
                                       r.str.contains(x['b'], case=False, regex=False)))

    df = df_rest[m.any(axis=1)] 
    f = '{0[0]}_{0[1]}.txt'.format((x['a'], x['b']))
    df.to_csv(f, index=False, header=False)

EDIT2:

dfs = []
for i, x in dfkey.iterrows(): 

    cols = [0,1,2,3,4,5]
    m = df_rest[cols].apply(lambda r: (r.str.contains(x['a'], case=False, regex=False) & 
                                          r.str.contains(x['b'], case=False, regex=False)))

    df_rest = df_rest[m.any(axis=1)] 

    dfs.append(df_rest)
pd.concat(dfs).to_csv('text.csv', index=False, header=False)

Advanced Filter for multiple keywords anywhere in a cell

Ohh you just put text asterisks around the text you want to search, not <> before.
So

*vice*
*health*
*medical*

Etc.

Filter a dataframe column for a keyword, return seperate column value (name) from the row where each keyword is found

You could do

list(df[df['words'].str.contains('apple', na=False)]['names'])

resulting in

['a', 'b']

df['words'].str.contains('apple', na=False) build a boolean pandas series for the condition, and taking care of eventual missing values in the column.
the series resulting from previous line is used filter the original dataframe df.
in the dataframe resulting from previous line, the 'names' column is selected.
in the dataframe resulting from previous line, the column is cas to a list.

Full code:

import io
import pandas as pd
data = """
names words
a     apple
b     apple
c     pear
"""
df = pd.read_csv(io.StringIO(data), sep='\s+')

lst = list(df[df['words'].str.contains('apple')]['names'])

>>>print(lst)

['a', 'b']

Filtering text from dataframe based on keywords in a list

How to filter a DataFrame by a volatile subset of words?

Dummy data

import numpy as np
import pandas as pd

columns = ['transaction_description', 'value']
data = [
    ['pac c.misalud conv. unificado', 12320.0],
    ['cargo seguro proteccion bancaria', 31222.0], 
    ['pac sura cia seguros generales', 8657.0],
    ['cargo seguro proteccion bancaria', 31222.0], 
    ['pac c.misalud conv. unificado', 12320.0], 
    ['pac sura cia seguros generales', 8657.0],
    ['cargo seguro proteccion bancaria', 31222.0],
    ['pac c.misalud conv. unificado', 12320.0],
    ['pac sura cia seguros generales', 8657.0],
    ['cargo seguro proteccion bancaria', 31222.0],
    ['cargo seguro proteccion bancaria', 40222.0]]

df=pd.DataFrame(data, columns=columns)

keywords = [
    [('tarifa',), ('mantenimiento',), ('mensual',)], 
    [('tasa',), ('anual',)],    
    [('seguro',), ('bancaria',)],  
    [('seguro',), ('generales',)],  
    [('mi salud',), ('unific',)]]

Solving

I will use a structure where the words of the sublists are arranged in columns, or to be precise, each word is placed in the list as the only element of a tuple.

Let's vectorize str.__contains__ to make the str1 in str2 code applicable to arrays:

contains = np.vectorize(str.__contains__)

Now, I'll test this function on df["transaction_description"] and the 4th set of keywords [('seguro',), ('generales',)] for example:

desc = df['transaction_description']
contains(desc, keywords[3])

In this case, we get the following result:

array([[False,  True,  True,  True, False,  True,  True, False,  True,  True,  True],
       [False, False,  True, False, False,  True, False, False,  True, False, False]])

Now, to see if all words of this subset can be found in a description, we apply the method all along the first index of the previous matrix:

df[contains(desc, keywords[3]).all(axis=0)]

And we obtain these filtered data:

          transaction_description   value
2  pac sura cia seguros generales  8657.0
5  pac sura cia seguros generales  8657.0
8  pac sura cia seguros generales  8657.0

Long story short

contains = np.vectorize(str.__contains__)
desc = df['transaction_description']
contain_all = lambda words: df[contains(desc, words).all(axis=0)]

the code and its output

Filter a Column Which Contains Several Keywords