Add Column to Data Frame Which Returns 1 If String Match a Certain Pattern

Add column to data frame which returns 1 if string match a certain pattern

How about

iris$check <- as.numeric(grepl(".*(sa)", iris$Species))

grepl returns a logical vector (TRUE/FALSE) which can easily be converted to 1/0 by using as.numeric.

Also possible:

iris$check <- grepl(".*(sa)", iris$Species) + 0L

Create new column in dataframe based on partial string matching other column

Since you have only two conditions, you can use a nested ifelse:

#random data; it wasn't easy to copy-paste yours  
DF <- data.frame(GL = sample(10), GLDESC = paste(sample(letters, 10), 
  c("gas", "payroll12", "GaSer", "asdf", "qweaa", "PayROll-12", 
     "asdfg", "GAS--2", "fghfgh", "qweee"), sample(letters, 10), sep = " "))

DF$KIND <- ifelse(grepl("gas", DF$GLDESC, ignore.case = T), "Materials", 
         ifelse(grepl("payroll", DF$GLDESC, ignore.case = T), "Payroll", "Other"))

DF
#   GL         GLDESC      KIND
#1   8        e gas l Materials
#2   1  c payroll12 y   Payroll
#3  10      m GaSer v Materials
#4   6       t asdf n     Other
#5   2      w qweaa t     Other
#6   4 r PayROll-12 q   Payroll
#7   9      n asdfg a     Other
#8   5     d GAS--2 w Materials
#9   7     s fghfgh e     Other
#10  3      g qweee k     Other

EDIT 10/3/2016 (..after receiving more attention than expected)

A possible solution to deal with more patterns could be to iterate over all patterns and, whenever there is match, progressively reduce the amount of comparisons:

ff = function(x, patterns, replacements = patterns, fill = NA, ...)
{
    stopifnot(length(patterns) == length(replacements))

    ans = rep_len(as.character(fill), length(x))    
    empty = seq_along(x)

    for(i in seq_along(patterns)) {
        greps = grepl(patterns[[i]], x[empty], ...)
        ans[empty[greps]] = replacements[[i]]  
        empty = empty[!greps]
    }

    return(ans)
}

ff(DF$GLDESC, c("gas", "payroll"), c("Materials", "Payroll"), "Other", ignore.case = TRUE)
# [1] "Materials" "Payroll"   "Materials" "Other"     "Other"     "Payroll"   "Other"     "Materials" "Other"     "Other"

ff(c("pat1a pat2", "pat1a pat1b", "pat3", "pat4"), 
   c("pat1a|pat1b", "pat2", "pat3"), 
   c("1", "2", "3"), fill = "empty")
#[1] "1"     "1"     "3"     "empty"

ff(c("pat1a pat2", "pat1a pat1b", "pat3", "pat4"), 
   c("pat2", "pat1a|pat1b", "pat3"), 
   c("2", "1", "3"), fill = "empty")
#[1] "2"     "1"     "3"     "empty"

Search for string pattern in dataframe column, return each occurence and join to another dataframe

(edited).

The pattern piece is a good start, but then you have to merge / join it with the original dataframe:

df.index.name = "inx"
pattern = re.compile (r'(\[[\w ]+\]\.\[[\w ]+\])')

# extract the attributes. 
extracts = df.MDX_TEXT.str.extractall(pattern).rename(columns={0:"attrname"})

# join the result with the original dataframe. 
res = df.join(extracts).reset_index()[["ID", "USER", "attrname"]].drop_duplicates()

# take just the last part of each attribute name. 
res["attrname"] = res["attrname"].str.split(".", expand = True).iloc[:, -1]

The result is:

   ID USER attrname
0   1  JOE  [ATTR1]
1   1  JOE  [ATTR2]
2   1  JOE  [ATTR3]
3   2  JAY  [ATTR1]
4   2  JAY  [ATTR3]

Create new column if DataFrame contains specific string

You could use pandas.Series.str.extract to achieve the desired output

import numpy as np
import pandas as pd

df = pd.DataFrame({
    "Name": ["name first RB LA a", "name LB second", "RB name third", "name LB fourth"]
})
df["Example"] = df["Name"].str.extract("(LB|RB)")[0] + " category"

    Name                Example
0   name first RB LA a  RB category
1   name LB second      LB category
2   RB name third       RB category
3   name LB fourth      LB category

Edit

To change category names within Example column use .str.replace:

df["Example"] = (df["Example"]
 .str.replace("RB", "Round Blade")
 .str.replace("LB", "Long Biased")
)

How to find a pattern in a string and extract it as a new column of data frame

You can try the following :

library(tidyverse)

df %>%
  extract(col, c('First', 'cut-off', 'Second'), 
               '(\\d+.*?)% 1ST\\s*\\$(\\d+).*?(\\d+.*?)%.*?', remove = FALSE) %>%
  mutate(Bonus = str_extract(col, '\\d+(?=\\sBONUS)')) %>%
  select(-col)

#   First cut-off Second Bonus
#1   3.2  100000    1.1  <NA>
#2   3.3  100000    1.2  3000
#3  <NA>    <NA>   <NA>  <NA>
#4   3.3  100000    1.2  <NA>
#5   3.3  100000    1.2  <NA>
#6   3.2  100000    1.1  <NA>

data

df <- data.frame(col = c("3.2% 1ST $100000 AND 1.1% BALANCE", "3.3% 1ST $100000 AND 1.2% BALANCE AND $3000 BONUS FULL PRICE ONLY", 
                         "$4000", "3.3% 1ST $100000 AND 1.2% BALANCE", "3.3% 1ST $100000 AND 1.2% BALANCE", 
                         "3.2% 1ST $100000 1.1% BALANCE"))

Create column based on presence of string pattern and ifelse

To check if a string contains a certain substring, you can't use == because it performs an exact matching (i.e. returns true only if the string is exactly "non").

You could use for example grepl function (belonging to grep family of functions) that performs a pattern matching:

df$loc01 <- ifelse(grepl("non",df$loc_01),'outside','inside')

Result :

> df
     loc_01 loc01_land   loc01
1      apis  165730500  inside
2      indu   62101800  inside
3      isro  540687600  inside
4      miss  161140500  inside
5  non_apis 1694590200 outside
6  non_indu 1459707300 outside
7  non_isro 1025051400 outside
8  non_miss 1419866100 outside
9  non_piro 2037064500 outside
10 non_sacn 2204629200 outside
11 non_slbe 1918840500 outside
12 non_voya  886299300 outside
13     piro  264726000  inside
14     sacn  321003900  inside
15     slbe  241292700  inside
16     voya  530532000  inside

Pandas dataframe: Check if regex contained in a column matches a string in another column in the same row

You can't use a pandas builtin method directly. You will need to apply a re.search per row:

import re

mask = df.apply(lambda r: bool(re.search(r['patterns'], r['strings'])), axis=1)
df2 = df[mask]

or using a (faster) list comprehension:

mask = [bool(re.search(p,s)) for p,s in zip(df['patterns'], df['strings'])]

output:

  strings patterns group
0   apple      \ba     1
3   train      n\b     2
4     tan      n\b     2

R: Add new column by specific patterns in another column of the dataframe

dfA <- data.frame(group=seq(1,4), pattern=c("Black & White", "Black OR Pink", "Red", "Pink"), stringsAsFactors=F)
dfB <- data.frame(color=c("Pink", "Red", "Black", "White"), value=c(2,4,84,100), stringsAsFactors=F)
    
getVal2return <- function(i, dfA, dfB){
  
  andv <- unlist(strsplit(dfA$pattern[i], split=" & "))
  orv <- unlist(strsplit(dfA$pattern[i], split=" OR "))
  if (length(andv) > 1) {
    val <- sum(dfB$value[match(andv, dfB$color)])
  } else if (length(orv)> 1){
    val <- max(dfB$value[match(orv, dfB$color)])
  } else {
  val <- dfB$value[match(dfA$pattern[i], dfB$color)]
  }
  return(val)
}
    
dfA$newVal <- sapply(1:nrow(dfA), function(x) { getVal2return(x, dfA, dfB) })

> dfA
      group       pattern newVal
    1     1 Black & White    184
    2     2 Black OR Pink     84
    3     3           Red      4
    4     4          Pink      2

Based on Partial string Match fill one data frame column from another dataframe

I would do something like this:

Create a new column indexes where for every Equipment in df2 find a list of Indexes in df1 where df1.TagName contains the Equipment.
Flatten the indexes by creating one row for each item using stack() and reset_index()
Join the flatten df2 with df1 to get all information you want

from io import StringIO
import numpy as np
import pandas as pd
df1=StringIO("""Line;TagName;CLASS
187877;PT_WOA;.ZS01_LA120_T05.SB.S2384_LesSwL;10
187878;PT_WOA;.ZS01_RB2202_T05.SB.S2385_FLOK;10
187879;PT_WOA;.ZS01_LA120_T05.SB._CBAbsHy;10
187880;PT_WOA;.ZS01_LA120_T05.SB.S3110_CBAPV;10
187881;PT_WOA;.ZS01_LARB2204.SB.S3111_CBRelHy;10""")
df2=StringIO("""EquipmentNo;EquipmentDescription;Equipment
1311256;Lifting table;LA120
1311257;Roller bed;RB2200
1311258;Lifting table;LT2202
1311259;Roller bed;RB2202
1311260;Roller bed;RB2204""")
df1=pd.read_csv(df1,sep=";")
df2=pd.read_csv(df2,sep=";")

df2['indexes'] = df2['Equipment'].apply(lambda x: df1.index[df1.TagName.str.contains(str(x)).tolist()].tolist())
indexes = df2.apply(lambda x: pd.Series(x['indexes']),axis=1).stack().reset_index(level=1, drop=True)
indexes.name = 'indexes'
df2 = df2.drop('indexes', axis=1).join(indexes).dropna()
df2.index = df2['indexes']
matches = df2.join(df1, how='inner')
print(matches[['Line','TagName','EquipmentDescription','EquipmentNo']])

OUTPUT:

          Line                          TagName EquipmentDescription  EquipmentNo
187877  PT_WOA  .ZS01_LA120_T05.SB.S2384_LesSwL        Lifting table      1311256
187879  PT_WOA      .ZS01_LA120_T05.SB._CBAbsHy        Lifting table      1311256
187880  PT_WOA   .ZS01_LA120_T05.SB.S3110_CBAPV        Lifting table      1311256
187878  PT_WOA   .ZS01_RB2202_T05.SB.S2385_FLOK           Roller bed      1311259
187881  PT_WOA  .ZS01_LARB2204.SB.S3111_CBRelHy           Roller bed      1311260

Add Column to Data Frame Which Returns 1 If String Match a Certain Pattern