R - How to search for a string in one column in other columns of a data frame (ignoring spaces)
Just use nzchar
to check that your pattern has characters:
transform(df, word_exists=mapply(grepl, pattern=word, x=keywords) & nzchar(word))
# word keywords word_exists
# 1 Hello hello goodbye nyc FALSE
# 2 hello goodbye nyc FALSE
# 3 nyc hello goodbye nyc TRUE
# 4 hello goodbye nyc FALSE
Search for string in one column using strings from another column in another dataframe in R
One approach would be to form a regex alternation of the terms in the first dataframe. Then use grepl
and sub
to generate the output columns.
regex <- paste0("\\b(", paste(df1$SName, collapse="|"), ")\\b")
df2$match <- ifelse(grepl(regex, df2$Description), "Yes", "No")
df2$String <- ifelse(grepl(regex, df2$Description),
sub(paste0(".*", regex, ".*"), "\\1", df2$Description),
"")
df2
Description match String
1 - ls svc368 -@#@# No
2 mkdir test svc #*-/ No
3 mkdir df2 svc123 #*-/ Yes svc123
...
String matching from one data frame column to another data frame column
I'm going to assume that in the DataFrames are strings as we typically don't use Dataframes to carry variables. With this I created a sample with your dataframe values.
data_a = {"Value": ["valid username", "valid username", "Password", "Password", "Login", "LOG IN"],
"Filed": ["username", "input_txtuserid", "input_txtpassword", "txtPassword", "input_submit_log_in", "SIGNIN"]}
data_b = {"Value": ["input_txtuserid", "input_txtpassword", "input_submit_log_in", "Password", "City", "PLACE"],
"Filed": ["JOHN", "78945", "Sucessfully", "txtPassword", "London", "4-A avenue Street"]}
A = pd.DataFrame(data_a)
B = pd.DataFrame(data_b)
A looks like:
B looks like:
Below the code to create C:
# Merging A and B, using a left join on the columns Filed for A and Value for B. Creatingg an indicator where exists
C = pd.merge(A, B, left_on=['Filed'], right_on=['Value'], how='left', indicator='Exist')
# If exists put true, otherwise false
C['Exist'] = np.where(C.Exist == 'both', True, False)
# Dropping all False so those that dont exist in both dataframes
C.drop(C[C['Exist'] == False].index, inplace=True)
# Making sure C has the right column and column names.
C = C[['Value_y', 'Filed_y']]
C.rename(columns = {"Value_y": "Value",
"Filed_y": "Filed"}, inplace = True)
Output of C
Hope that helps! Please Mark this as answer if it does :)
How to search a string in one pandas dataframe column as a substring in another dataframe column
Idea is create sets by split by ,
and match by issubset
:
d = {k: set(v.split(',')) for k, v in df2.set_index('A')['B'].items()}
df1['B'] = [next(iter([k for k, v in d.items() if set(x.split(',')).issubset(v)]), '')
for x in df1['A']]
print (df1)
A B
0 9.female.ceo.,ceo,
1 9.female.ned.,ned,
2 9.female.ned.,chair,
3 2.female.ed.,ned, ,ned,
4 2.female.ned.,ed,
5 9.female.chair.,ceo, ,ceo,ned,
6 2.female.chair.,chair,
Solution with test by in
:
d = df2.set_index('A')['B']
df1['B'] = [next(iter([k for k, v in d.items() if x in v]), '') for x in df1['A']]
print (df1)
A B
0 9.female.ceo.,ceo,
1 9.female.ned.,ned,
2 9.female.ned.,chair,
3 2.female.ed.,ned, ,ned,
4 2.female.ned.,ed,
5 9.female.chair.,ceo, ,ceo,ned,
6 2.female.chair.,chair,
Another solution with cross join by merge
with test substrings by in
:
df3 = df1.assign(tmp=1).merge(df2.assign(tmp=1), on='tmp', suffixes=('','_'))
df3 = df3.loc[[a in b for a, b in zip(df3['A'], df3['B_'])], ['A','A_']]
df = df1[['A']].merge(df3.rename(columns={'A_':'B'}), on='A', how='left')
print (df)
A B
0 9.female.ceo.,ceo, NaN
1 9.female.ned.,ned, NaN
2 9.female.ned.,chair, NaN
3 2.female.ed.,ned, ,ned,
4 2.female.ned.,ed, NaN
5 9.female.chair.,ceo, ,ceo,ned,
6 2.female.chair.,chair, NaN
Python Pandas: Check if string in one column is contained in string of another column in the same row
You need apply
with in
:
df['C'] = df.apply(lambda x: x.A in x.B, axis=1)
print (df)
RecID A B C
0 1 a abc True
1 2 b cba True
2 3 c bca True
3 4 d bac False
4 5 e abc False
Another solution with list comprehension
is faster, but there has to be no NaN
s:
df['C'] = [x[0] in x[1] for x in zip(df['A'], df['B'])]
print (df)
RecID A B C
0 1 a abc True
1 2 b cba True
2 3 c bca True
3 4 d bac False
4 5 e abc False
Extract column value based on another column in Pandas
You could use loc
to get series which satisfying your condition and then iloc
to get first element:
In [2]: df
Out[2]:
A B
0 p1 1
1 p1 2
2 p3 3
3 p2 4
In [3]: df.loc[df['B'] == 3, 'A']
Out[3]:
2 p3
Name: A, dtype: object
In [4]: df.loc[df['B'] == 3, 'A'].iloc[0]
Out[4]: 'p3'
Search columns for a specific set of text and if the text is found enter new a new string of text in a new column pandas
There's definitely a more optimized solution, but hope this puts you on the right path...basically loops through each row, looping through the columns and potential fuel strings and decides which abbr to use:
d={'diesel':'DSL','gasoline':'GAS','ev':'ELEC'}
df['all'] = df.apply(''.join, axis=1)
for i,row in df.iterrows():
df.at[i,'FUEL'] = d[[key for key in d.keys() if key in row['all'].lower()][0]]
del df['all']
output:
SUMN SOUN MATN FUEL
0 Light duty vehicle Diesel Tire wear Rubber DSL
1 Heavy duty diesel Non-catalyst Diesel DSL
2 Light duty truck catalyst Gasoline GAS
3 Medium duty vehicle EV brake wear brakes ELEC
this assume that only one of the fuel types occurs in each row
EDIT: inspired by the other solution:
import re
d={'diesel':'DSL','gasoline':'GAS','ev':'ELEC'}
df['FUEL'] = df.apply(lambda x: d[re.search('gasoline|diesel|ev',''.join(x).lower()).group()], axis=1)
same output :)
Check if string is in a pandas dataframe
a['Names'].str.contains('Mel')
will return an indicator vector of boolean values of size len(BabyDataSet)
Therefore, you can use
mel_count=a['Names'].str.contains('Mel').sum()
if mel_count>0:
print ("There are {m} Mels".format(m=mel_count))
Or any()
, if you don't care how many records match your query
if a['Names'].str.contains('Mel').any():
print ("Mel is there")
substring of an entire column in pandas dataframe
Use the str
accessor with square brackets:
df['col'] = df['col'].str[:9]
Or str.slice:
df['col'] = df['col'].str.slice(0, 9)
Related Topics
How to Install the Odbc Driver for Snowflake Successfully on an M1 Apple Silicon MAC
How to Read the Files in a Directory in Sorted Order Using R
How to Determine If a Character Vector Is a Valid Numeric or Integer Vector
Converting Date Column in Data Frame
Geom_Rect on Some Panels of a Facet_Wrap
Adjusting the Node Size in Igraph Using a Matrix
How to Cache Data in Shiny Server
How to Set Factor Levels to the Order They Appear in a Data Frame
Adding a Ranking Column to a Dataframe
Naive Bayes in Quanteda VS Caret: Wildly Different Results
Calculate the Derivative of a Data-Function in R
Different Y-Axis Labels Facet_Grid and Sizes
Scale Back Linear Regression Coefficients in R from Scaled and Centered Data
Automatically Detect Date Columns When Reading a File into a Data.Frame
Ggplot Dotplot: What Is the Proper Use of Geom_Dotplot