Selecting Data Frame Rows Based on Partial String Match in a Column

R subset data.frame by column names using partial string match from another list

# Specify `interesting.list` items manually
df[,grep("P3170|C453", x=names(df))]
#> P3170.Tp2 C453.Tn7 P3170.Tn10
#> 1 1 3 5

# Use paste to create pattern from lots of items in `interesting.list`
il <- c("P3170", "C453")
df[,grep(paste(il, collapse = "|"), x=names(df))]
#> P3170.Tp2 C453.Tn7 P3170.Tn10
#> 1 1 3 5

Example data:

n <- c("P3170.Tp2" , "P3189.Tn10" ,"C453.Tn7" ,"F678.Tc23" ,"P3170.Tn10")
df <- data.frame(1,2,3,4,5)
names(df) <- n
Created on 2021-10-20 by the reprex package (v2.0.1)

Python - keep rows in dataframe based on partial string match

So these are the steps you will need to follow to do what you want done for your two data frames

1.Split your email_address column into two separate columns

     df1['add'], df1['domain'] = df1['email_address'].str.split('@', 1).str

2.Then drop your add column to keep your data frame clean

      df1 = df1.drop('add',axis =1)

3.Get a new Data Frame with only values you want by not selecting any value in the 'domain' column that doesn't match 'approved_doman' column

      df_new = df1[~df1['domain'].isin(df2['approved_domain'])]

4. Drop the 'domain' column in df_new

      df_new = df_new.drop('domain',axis = 1)

This is what the result will be

    mailbox     email_address
1 mailbox2 def@yahoo.com
2 mailbox3 ghi@msn.com

Finding partial match strings in any column in a dataframe in R

If you want to use apply() you could compute an index based on your string fish and then subset. The way to compute Index is obtaining the sum of those values which match with fish using grepl(). You can enable ignore.case = T in order to avoid issues with upper or lower case text. When the index is greater or equal to 1 then any match occurred so you can make the subset. Here the code:

#Data
vessel<-c(letters[1:4])
type<-c("Fishery Vessel","NA","NA","Cargo")
class<-c("NA","FISHING","NA","CARGO")
status<-c("NA", "NA", "Engaged in Fishing", "Underway")
df<-data.frame(vessel,type, class, status,stringsAsFactors = F)
#Subset
#Create an index with apply
df$Index <- apply(df[1:4],1,function(x) sum(grepl('fish',x,ignore.case = T)))
#Filter
df.sub<-subset(df,Index>=1)

Output:

  vessel           type   class             status Index
1 a Fishery Vessel NA NA 1
2 b NA FISHING NA 1
3 c NA NA Engaged in Fishing 1

Filtering rows based on partial matching between a data frame and a vector

We can paste the elements of 'vector' into a single string collapsed by | and usse that in grepl or str_detect to filter the rows

library(dplyr)
library(stringr)
df %>%
filter(str_detect(nam, str_c(vector, collapse="|")))
# nam aa
#1 mmu_mir-1-3p 12854
#2 mmu_mir-1-5p 36
#3 mmu-mir-3-5p 5489
#4 mmu-mir-6-3p 2563

In base R, this can be done with subset/grepl

subset(df, grepl(paste(vector, collapse= "|"), nam))

Based on Partial string Match fill one data frame column from another dataframe

I would do something like this:

  1. Create a new column indexes where for every Equipment in df2 find a list of Indexes in df1 where df1.TagName contains the Equipment.

  2. Flatten the indexes by creating one row for each item using stack() and reset_index()

  3. Join the flatten df2 with df1 to get all information you want
from io import StringIO
import numpy as np
import pandas as pd
df1=StringIO("""Line;TagName;CLASS
187877;PT_WOA;.ZS01_LA120_T05.SB.S2384_LesSwL;10
187878;PT_WOA;.ZS01_RB2202_T05.SB.S2385_FLOK;10
187879;PT_WOA;.ZS01_LA120_T05.SB._CBAbsHy;10
187880;PT_WOA;.ZS01_LA120_T05.SB.S3110_CBAPV;10
187881;PT_WOA;.ZS01_LARB2204.SB.S3111_CBRelHy;10""")
df2=StringIO("""EquipmentNo;EquipmentDescription;Equipment
1311256;Lifting table;LA120
1311257;Roller bed;RB2200
1311258;Lifting table;LT2202
1311259;Roller bed;RB2202
1311260;Roller bed;RB2204""")
df1=pd.read_csv(df1,sep=";")
df2=pd.read_csv(df2,sep=";")

df2['indexes'] = df2['Equipment'].apply(lambda x: df1.index[df1.TagName.str.contains(str(x)).tolist()].tolist())
indexes = df2.apply(lambda x: pd.Series(x['indexes']),axis=1).stack().reset_index(level=1, drop=True)
indexes.name = 'indexes'
df2 = df2.drop('indexes', axis=1).join(indexes).dropna()
df2.index = df2['indexes']
matches = df2.join(df1, how='inner')
print(matches[['Line','TagName','EquipmentDescription','EquipmentNo']])

OUTPUT:

          Line                          TagName EquipmentDescription  EquipmentNo
187877 PT_WOA .ZS01_LA120_T05.SB.S2384_LesSwL Lifting table 1311256
187879 PT_WOA .ZS01_LA120_T05.SB._CBAbsHy Lifting table 1311256
187880 PT_WOA .ZS01_LA120_T05.SB.S3110_CBAPV Lifting table 1311256
187878 PT_WOA .ZS01_RB2202_T05.SB.S2385_FLOK Roller bed 1311259
187881 PT_WOA .ZS01_LARB2204.SB.S3111_CBRelHy Roller bed 1311260

R data.table select rows based on partial string match from character vector

I have a solution in mind using lapply and tstrsplit. There's probably more elegant but it does the job

lapply(1:nrow(dt), function(i) {
dt[i,'match' := any(trimws(tstrsplit(as.character(dt[i,'sha']),";")) %in% pselection)]
})

dt[(match)]
title sha match
1: First title 12345 TRUE
2: Second Title 2345; 66543; 33423 TRUE
3: Third Title 22222; 12345678; TRUE

The idea is to split every row of sha column (trim whitespace otherwise row 3 will not match) and check if any sha appears



Related Topics



Leave a reply



Submit