R subset data.frame by column names using partial string match from another list
# Specify `interesting.list` items manually
df[,grep("P3170|C453", x=names(df))]
#> P3170.Tp2 C453.Tn7 P3170.Tn10
#> 1 1 3 5
# Use paste to create pattern from lots of items in `interesting.list`
il <- c("P3170", "C453")
df[,grep(paste(il, collapse = "|"), x=names(df))]
#> P3170.Tp2 C453.Tn7 P3170.Tn10
#> 1 1 3 5
Example data:
n <- c("P3170.Tp2" , "P3189.Tn10" ,"C453.Tn7" ,"F678.Tc23" ,"P3170.Tn10")
df <- data.frame(1,2,3,4,5)
names(df) <- n
Created on 2021-10-20 by the reprex package (v2.0.1)
Python - keep rows in dataframe based on partial string match
So these are the steps you will need to follow to do what you want done for your two data frames
1.Split your email_address column into two separate columns
df1['add'], df1['domain'] = df1['email_address'].str.split('@', 1).str
2.Then drop your add column to keep your data frame clean
df1 = df1.drop('add',axis =1)
3.Get a new Data Frame with only values you want by not selecting any value in the 'domain' column that doesn't match 'approved_doman' column
df_new = df1[~df1['domain'].isin(df2['approved_domain'])]
4. Drop the 'domain' column in df_new
df_new = df_new.drop('domain',axis = 1)
This is what the result will be
mailbox email_address
1 mailbox2 def@yahoo.com
2 mailbox3 ghi@msn.com
Finding partial match strings in any column in a dataframe in R
If you want to use apply()
you could compute an index based on your string fish
and then subset. The way to compute Index
is obtaining the sum of those values which match with fish
using grepl()
. You can enable ignore.case = T
in order to avoid issues with upper or lower case text. When the index is greater or equal to 1 then any match occurred so you can make the subset. Here the code:
#Data
vessel<-c(letters[1:4])
type<-c("Fishery Vessel","NA","NA","Cargo")
class<-c("NA","FISHING","NA","CARGO")
status<-c("NA", "NA", "Engaged in Fishing", "Underway")
df<-data.frame(vessel,type, class, status,stringsAsFactors = F)
#Subset
#Create an index with apply
df$Index <- apply(df[1:4],1,function(x) sum(grepl('fish',x,ignore.case = T)))
#Filter
df.sub<-subset(df,Index>=1)
Output:
vessel type class status Index
1 a Fishery Vessel NA NA 1
2 b NA FISHING NA 1
3 c NA NA Engaged in Fishing 1
Filtering rows based on partial matching between a data frame and a vector
We can paste
the elements of 'vector' into a single string collapsed by |
and usse that in grepl
or str_detect
to filter
the rows
library(dplyr)
library(stringr)
df %>%
filter(str_detect(nam, str_c(vector, collapse="|")))
# nam aa
#1 mmu_mir-1-3p 12854
#2 mmu_mir-1-5p 36
#3 mmu-mir-3-5p 5489
#4 mmu-mir-6-3p 2563
In base R
, this can be done with subset/grepl
subset(df, grepl(paste(vector, collapse= "|"), nam))
Based on Partial string Match fill one data frame column from another dataframe
I would do something like this:
Create a new column
indexes
where for everyEquipment
in df2 find a list of Indexes in df1 where df1.TagName contains theEquipment
.Flatten the
indexes
by creating one row for each item usingstack()
andreset_index()
- Join the flatten df2 with df1 to get all information you want
from io import StringIO
import numpy as np
import pandas as pd
df1=StringIO("""Line;TagName;CLASS
187877;PT_WOA;.ZS01_LA120_T05.SB.S2384_LesSwL;10
187878;PT_WOA;.ZS01_RB2202_T05.SB.S2385_FLOK;10
187879;PT_WOA;.ZS01_LA120_T05.SB._CBAbsHy;10
187880;PT_WOA;.ZS01_LA120_T05.SB.S3110_CBAPV;10
187881;PT_WOA;.ZS01_LARB2204.SB.S3111_CBRelHy;10""")
df2=StringIO("""EquipmentNo;EquipmentDescription;Equipment
1311256;Lifting table;LA120
1311257;Roller bed;RB2200
1311258;Lifting table;LT2202
1311259;Roller bed;RB2202
1311260;Roller bed;RB2204""")
df1=pd.read_csv(df1,sep=";")
df2=pd.read_csv(df2,sep=";")
df2['indexes'] = df2['Equipment'].apply(lambda x: df1.index[df1.TagName.str.contains(str(x)).tolist()].tolist())
indexes = df2.apply(lambda x: pd.Series(x['indexes']),axis=1).stack().reset_index(level=1, drop=True)
indexes.name = 'indexes'
df2 = df2.drop('indexes', axis=1).join(indexes).dropna()
df2.index = df2['indexes']
matches = df2.join(df1, how='inner')
print(matches[['Line','TagName','EquipmentDescription','EquipmentNo']])
OUTPUT:
Line TagName EquipmentDescription EquipmentNo
187877 PT_WOA .ZS01_LA120_T05.SB.S2384_LesSwL Lifting table 1311256
187879 PT_WOA .ZS01_LA120_T05.SB._CBAbsHy Lifting table 1311256
187880 PT_WOA .ZS01_LA120_T05.SB.S3110_CBAPV Lifting table 1311256
187878 PT_WOA .ZS01_RB2202_T05.SB.S2385_FLOK Roller bed 1311259
187881 PT_WOA .ZS01_LARB2204.SB.S3111_CBRelHy Roller bed 1311260
R data.table select rows based on partial string match from character vector
I have a solution in mind using lapply
and tstrsplit
. There's probably more elegant but it does the job
lapply(1:nrow(dt), function(i) {
dt[i,'match' := any(trimws(tstrsplit(as.character(dt[i,'sha']),";")) %in% pselection)]
})
dt[(match)]
title sha match
1: First title 12345 TRUE
2: Second Title 2345; 66543; 33423 TRUE
3: Third Title 22222; 12345678; TRUE
The idea is to split every row of sha
column (trim whitespace otherwise row 3 will not match) and check if any sha
appears
Related Topics
How to Remove All Duplicates So That None Are Left in a Data Frame
How to Save a Plot as Image on the Disk
Add Count of Unique/Distinct Values by Group to the Original Data
Replace Missing Values (Na) With Most Recent Non-Na by Group
How to Specifically Order Ggplot2 X Axis Instead of Alphabetical Order
Dcast Warning: 'Aggregation Function Missing: Defaulting to Length'
Combine Legends For Color and Shape into a Single Legend
Combine (Rbind) Data Frames and Create Column With Name of Original Data Frames
Formula With Dynamic Number of Variables
Cluster Analysis in R: Determine the Optimal Number of Clusters
Interpreting "Condition Has Length ≫ 1" Warning from 'If' Function
Reorder Bars in Geom_Bar Ggplot2 by Value
In R, How to Get an Object'S Name After It Is Sent to a Function
Apply a Function to Every Specified Column in a Data.Table and Update by Reference
Transform Year/Week to Date Object
Count Occurrences of Value in a Set of Variables in R (Per Row)