How to Search for a String in One Column in Other Columns of a Data Frame

R - How to search for a string in one column in other columns of a data frame (ignoring spaces)

Just use nzchar to check that your pattern has characters:

transform(df, word_exists=mapply(grepl, pattern=word, x=keywords) & nzchar(word))
# word keywords word_exists
# 1 Hello hello goodbye nyc FALSE
# 2 hello goodbye nyc FALSE
# 3 nyc hello goodbye nyc TRUE
# 4 hello goodbye nyc FALSE

Search for string in one column using strings from another column in another dataframe in R

One approach would be to form a regex alternation of the terms in the first dataframe. Then use grepl and sub to generate the output columns.

regex <- paste0("\\b(", paste(df1$SName, collapse="|"), ")\\b")
df2$match <- ifelse(grepl(regex, df2$Description), "Yes", "No")
df2$String <- ifelse(grepl(regex, df2$Description),
sub(paste0(".*", regex, ".*"), "\\1", df2$Description),
"")
df2

Description match String
1 - ls svc368 -@#@# No
2 mkdir test svc #*-/ No
3 mkdir df2 svc123 #*-/ Yes svc123
...

String matching from one data frame column to another data frame column

I'm going to assume that in the DataFrames are strings as we typically don't use Dataframes to carry variables. With this I created a sample with your dataframe values.

data_a = {"Value": ["valid username", "valid username", "Password", "Password", "Login", "LOG IN"],
"Filed": ["username", "input_txtuserid", "input_txtpassword", "txtPassword", "input_submit_log_in", "SIGNIN"]}

data_b = {"Value": ["input_txtuserid", "input_txtpassword", "input_submit_log_in", "Password", "City", "PLACE"],
"Filed": ["JOHN", "78945", "Sucessfully", "txtPassword", "London", "4-A avenue Street"]}

A = pd.DataFrame(data_a)
B = pd.DataFrame(data_b)

A looks like:
Sample Image

B looks like:
Sample Image

Below the code to create C:

# Merging A and B, using a left join on the columns Filed for A and Value for B. Creatingg an indicator where exists
C = pd.merge(A, B, left_on=['Filed'], right_on=['Value'], how='left', indicator='Exist')

# If exists put true, otherwise false
C['Exist'] = np.where(C.Exist == 'both', True, False)
# Dropping all False so those that dont exist in both dataframes
C.drop(C[C['Exist'] == False].index, inplace=True)

# Making sure C has the right column and column names.
C = C[['Value_y', 'Filed_y']]
C.rename(columns = {"Value_y": "Value",
"Filed_y": "Filed"}, inplace = True)

Output of C
Sample Image

Hope that helps! Please Mark this as answer if it does :)

How to search a string in one pandas dataframe column as a substring in another dataframe column

Idea is create sets by split by , and match by issubset:

d = {k: set(v.split(',')) for k, v in df2.set_index('A')['B'].items()}
df1['B'] = [next(iter([k for k, v in d.items() if set(x.split(',')).issubset(v)]), '')
for x in df1['A']]
print (df1)
A B
0 9.female.ceo.,ceo,
1 9.female.ned.,ned,
2 9.female.ned.,chair,
3 2.female.ed.,ned, ,ned,
4 2.female.ned.,ed,
5 9.female.chair.,ceo, ,ceo,ned,
6 2.female.chair.,chair,

Solution with test by in:

d = df2.set_index('A')['B']
df1['B'] = [next(iter([k for k, v in d.items() if x in v]), '') for x in df1['A']]
print (df1)
A B
0 9.female.ceo.,ceo,
1 9.female.ned.,ned,
2 9.female.ned.,chair,
3 2.female.ed.,ned, ,ned,
4 2.female.ned.,ed,
5 9.female.chair.,ceo, ,ceo,ned,
6 2.female.chair.,chair,

Another solution with cross join by merge with test substrings by in:

df3 = df1.assign(tmp=1).merge(df2.assign(tmp=1), on='tmp', suffixes=('','_'))
df3 = df3.loc[[a in b for a, b in zip(df3['A'], df3['B_'])], ['A','A_']]

df = df1[['A']].merge(df3.rename(columns={'A_':'B'}), on='A', how='left')
print (df)
A B
0 9.female.ceo.,ceo, NaN
1 9.female.ned.,ned, NaN
2 9.female.ned.,chair, NaN
3 2.female.ed.,ned, ,ned,
4 2.female.ned.,ed, NaN
5 9.female.chair.,ceo, ,ceo,ned,
6 2.female.chair.,chair, NaN

Python Pandas: Check if string in one column is contained in string of another column in the same row

You need apply with in:

df['C'] = df.apply(lambda x: x.A in x.B, axis=1)
print (df)
RecID A B C
0 1 a abc True
1 2 b cba True
2 3 c bca True
3 4 d bac False
4 5 e abc False

Another solution with list comprehension is faster, but there has to be no NaNs:

df['C'] = [x[0] in x[1] for x in zip(df['A'], df['B'])]
print (df)
RecID A B C
0 1 a abc True
1 2 b cba True
2 3 c bca True
3 4 d bac False
4 5 e abc False

Extract column value based on another column in Pandas

You could use loc to get series which satisfying your condition and then iloc to get first element:

In [2]: df
Out[2]:
A B
0 p1 1
1 p1 2
2 p3 3
3 p2 4

In [3]: df.loc[df['B'] == 3, 'A']
Out[3]:
2 p3
Name: A, dtype: object

In [4]: df.loc[df['B'] == 3, 'A'].iloc[0]
Out[4]: 'p3'

Search columns for a specific set of text and if the text is found enter new a new string of text in a new column pandas

There's definitely a more optimized solution, but hope this puts you on the right path...basically loops through each row, looping through the columns and potential fuel strings and decides which abbr to use:

d={'diesel':'DSL','gasoline':'GAS','ev':'ELEC'}
df['all'] = df.apply(''.join, axis=1)
for i,row in df.iterrows():
df.at[i,'FUEL'] = d[[key for key in d.keys() if key in row['all'].lower()][0]]

del df['all']

output:

                  SUMN              SOUN      MATN  FUEL
0 Light duty vehicle Diesel Tire wear Rubber DSL
1 Heavy duty diesel Non-catalyst Diesel DSL
2 Light duty truck catalyst Gasoline GAS
3 Medium duty vehicle EV brake wear brakes ELEC

this assume that only one of the fuel types occurs in each row

EDIT: inspired by the other solution:

import re
d={'diesel':'DSL','gasoline':'GAS','ev':'ELEC'}
df['FUEL'] = df.apply(lambda x: d[re.search('gasoline|diesel|ev',''.join(x).lower()).group()], axis=1)

same output :)

Check if string is in a pandas dataframe

a['Names'].str.contains('Mel') will return an indicator vector of boolean values of size len(BabyDataSet)

Therefore, you can use

mel_count=a['Names'].str.contains('Mel').sum()
if mel_count>0:
print ("There are {m} Mels".format(m=mel_count))

Or any(), if you don't care how many records match your query

if a['Names'].str.contains('Mel').any():
print ("Mel is there")

substring of an entire column in pandas dataframe

Use the str accessor with square brackets:

df['col'] = df['col'].str[:9]

Or str.slice:

df['col'] = df['col'].str.slice(0, 9)


Related Topics



Leave a reply



Submit