Selecting Data Frame Rows Based on Partial String Match in a Column

R subset data.frame by column names using partial string match from another list

# Specify `interesting.list` items manually
df[,grep("P3170|C453", x=names(df))]
#>   P3170.Tp2 C453.Tn7 P3170.Tn10
#> 1         1        3          5

# Use paste to create pattern from lots of items in `interesting.list`
il <- c("P3170", "C453")
df[,grep(paste(il, collapse = "|"), x=names(df))]
#>   P3170.Tp2 C453.Tn7 P3170.Tn10
#> 1         1        3          5

Example data:

n <- c("P3170.Tp2" , "P3189.Tn10" ,"C453.Tn7" ,"F678.Tc23" ,"P3170.Tn10")
df <- data.frame(1,2,3,4,5)
names(df) <- n
Created on 2021-10-20 by the reprex package (v2.0.1)

Python - keep rows in dataframe based on partial string match

So these are the steps you will need to follow to do what you want done for your two data frames

1.Split your email_address column into two separate columns

     df1['add'], df1['domain'] = df1['email_address'].str.split('@', 1).str

2.Then drop your add column to keep your data frame clean

      df1 = df1.drop('add',axis =1)

3.Get a new Data Frame with only values you want by not selecting any value in the 'domain' column that doesn't match 'approved_doman' column

      df_new = df1[~df1['domain'].isin(df2['approved_domain'])]

4. Drop the 'domain' column in df_new

      df_new = df_new.drop('domain',axis = 1)

This is what the result will be

    mailbox     email_address
1   mailbox2    def@yahoo.com
2   mailbox3    ghi@msn.com

Finding partial match strings in any column in a dataframe in R

If you want to use apply() you could compute an index based on your string fish and then subset. The way to compute Index is obtaining the sum of those values which match with fish using grepl(). You can enable ignore.case = T in order to avoid issues with upper or lower case text. When the index is greater or equal to 1 then any match occurred so you can make the subset. Here the code:

#Data
vessel<-c(letters[1:4])
type<-c("Fishery Vessel","NA","NA","Cargo")
class<-c("NA","FISHING","NA","CARGO")
status<-c("NA", "NA", "Engaged in Fishing", "Underway")
df<-data.frame(vessel,type, class, status,stringsAsFactors = F)
#Subset
#Create an index with apply
df$Index <- apply(df[1:4],1,function(x) sum(grepl('fish',x,ignore.case = T)))
#Filter
df.sub<-subset(df,Index>=1)

Output:

  vessel           type   class             status Index
1      a Fishery Vessel      NA                 NA     1
2      b             NA FISHING                 NA     1
3      c             NA      NA Engaged in Fishing     1

Filtering rows based on partial matching between a data frame and a vector

We can paste the elements of 'vector' into a single string collapsed by | and usse that in grepl or str_detect to filter the rows

library(dplyr)
library(stringr)
df %>% 
   filter(str_detect(nam, str_c(vector, collapse="|")))
#           nam    aa
#1 mmu_mir-1-3p 12854
#2 mmu_mir-1-5p    36
#3 mmu-mir-3-5p  5489
#4 mmu-mir-6-3p  2563

In base R, this can be done with subset/grepl

subset(df, grepl(paste(vector, collapse= "|"), nam))

Based on Partial string Match fill one data frame column from another dataframe

I would do something like this:

Create a new column indexes where for every Equipment in df2 find a list of Indexes in df1 where df1.TagName contains the Equipment.
Flatten the indexes by creating one row for each item using stack() and reset_index()
Join the flatten df2 with df1 to get all information you want

from io import StringIO
import numpy as np
import pandas as pd
df1=StringIO("""Line;TagName;CLASS
187877;PT_WOA;.ZS01_LA120_T05.SB.S2384_LesSwL;10
187878;PT_WOA;.ZS01_RB2202_T05.SB.S2385_FLOK;10
187879;PT_WOA;.ZS01_LA120_T05.SB._CBAbsHy;10
187880;PT_WOA;.ZS01_LA120_T05.SB.S3110_CBAPV;10
187881;PT_WOA;.ZS01_LARB2204.SB.S3111_CBRelHy;10""")
df2=StringIO("""EquipmentNo;EquipmentDescription;Equipment
1311256;Lifting table;LA120
1311257;Roller bed;RB2200
1311258;Lifting table;LT2202
1311259;Roller bed;RB2202
1311260;Roller bed;RB2204""")
df1=pd.read_csv(df1,sep=";")
df2=pd.read_csv(df2,sep=";")

df2['indexes'] = df2['Equipment'].apply(lambda x: df1.index[df1.TagName.str.contains(str(x)).tolist()].tolist())
indexes = df2.apply(lambda x: pd.Series(x['indexes']),axis=1).stack().reset_index(level=1, drop=True)
indexes.name = 'indexes'
df2 = df2.drop('indexes', axis=1).join(indexes).dropna()
df2.index = df2['indexes']
matches = df2.join(df1, how='inner')
print(matches[['Line','TagName','EquipmentDescription','EquipmentNo']])

OUTPUT:

          Line                          TagName EquipmentDescription  EquipmentNo
187877  PT_WOA  .ZS01_LA120_T05.SB.S2384_LesSwL        Lifting table      1311256
187879  PT_WOA      .ZS01_LA120_T05.SB._CBAbsHy        Lifting table      1311256
187880  PT_WOA   .ZS01_LA120_T05.SB.S3110_CBAPV        Lifting table      1311256
187878  PT_WOA   .ZS01_RB2202_T05.SB.S2385_FLOK           Roller bed      1311259
187881  PT_WOA  .ZS01_LARB2204.SB.S3111_CBRelHy           Roller bed      1311260

R data.table select rows based on partial string match from character vector

I have a solution in mind using lapply and tstrsplit. There's probably more elegant but it does the job

lapply(1:nrow(dt), function(i) {
  dt[i,'match' := any(trimws(tstrsplit(as.character(dt[i,'sha']),";")) %in% pselection)]
  })

dt[(match)]
          title                sha match
1:  First title              12345  TRUE
2: Second Title 2345; 66543; 33423  TRUE
3:  Third Title   22222; 12345678;  TRUE

The idea is to split every row of sha column (trim whitespace otherwise row 3 will not match) and check if any sha appears