Python Pandas: Check if string in one column is contained in string of another column in the same row
You need apply
with in
:
df['C'] = df.apply(lambda x: x.A in x.B, axis=1)
print (df)
RecID A B C
0 1 a abc True
1 2 b cba True
2 3 c bca True
3 4 d bac False
4 5 e abc False
Another solution with list comprehension
is faster, but there has to be no NaN
s:
df['C'] = [x[0] in x[1] for x in zip(df['A'], df['B'])]
print (df)
RecID A B C
0 1 a abc True
1 2 b cba True
2 3 c bca True
3 4 d bac False
4 5 e abc False
Pandas: check if string value in one column is part of string of another column in same row of dataframe - current script returning all Yes
In your case
df['Compare'] = df.apply(lambda x: 'Yes' if x['DEPTH'] in x['Description'] else 'No',axis=1)
df
Out[133]:
WKEY Description DEPTH Compare
0 50030 36 @ 3159 W/270, LWD[GR,RES,PWD] @ 4015 3159 Yes
1 50030 36 @ 3159 W/270, LWD[GR,RES,PWD] @ 4015 3994 No
2 50030 36 @ 3159 W/270, LWD[GR,RES,PWD] @ 4015 5401 No
3 50030 26 @ 3994, LWD[GR,RES,PWD] @ 5430, 20 @ 5401 3159 No
4 50030 26 @ 3994, LWD[GR,RES,PWD] @ 5430, 20 @ 5401 3994 Yes
5 50030 26 @ 3994, LWD[GR,RES,PWD] @ 5430, 20 @ 5401 5401 Yes
Pandas: Determine if a string in one column is a substring of a string in another column
Let us try with numpy
defchararray
which is vectorized
from numpy.core.defchararray import find
find(df['1'].values.astype(str),df['0'].values.astype(str))!=-1
Out[740]: array([False, True, True, False])
Check if a column contains data from another column in python pandas
split
the name into substrings, and use a list comprehension with any
to get True is any string matches:
df['result'] = [any(s in url for s in lst)
for lst, url in zip(df['name'].str.split(), df['url'])]
the (slower) equivalent with apply
would be:
df['result'] = df.apply(lambda x: any(s in x['url']
for s in x['name'].split()), axis=1)
output:
name url result
0 pau lola www.paulola.com True
1 pou gine www.cheeseham.com False
2 pete raj www.pataraj.com True
Pandas dataframe: Check if regex contained in a column matches a string in another column in the same row
You can't use a pandas builtin method directly. You will need to apply
a re.search
per row:
import re
mask = df.apply(lambda r: bool(re.search(r['patterns'], r['strings'])), axis=1)
df2 = df[mask]
or using a (faster) list comprehension:
mask = [bool(re.search(p,s)) for p,s in zip(df['patterns'], df['strings'])]
output:
strings patterns group
0 apple \ba 1
3 train n\b 2
4 tan n\b 2
Check if string in one column is contained in string of another column in the same row and add new column with matching column name
>>> df
to_find col1 col2
0 a ab ac
1 b aa ba
2 c bc ee
>>> df['found_in'] = df.apply(lambda x: ' '.join(x.iloc[1:][x.iloc[1:].str.contains(str(x['to_find']))].index), axis=1)
>>> df
to_find col1 col2 found_in
0 a ab ac col1 col2
1 b aa ba col2
2 c bc ee col1
For better readability,
>>> def get_columns(x):
... y = x.iloc[1:]
... return y.index[y.str.contains(str(x['to_find']))]
...
>>> df['found_in'] = df.apply(lambda x: ' '.join(get_columns(x)), axis=1)
Returning yes in one column if it contains a string in another column
MaterialsTracking_df['Valid Site'] = "Y" if ...
assigns a value to all rows.
Use pandas.DataFrame.apply instead
https://pandas.pydata.org/pandas-docs/version/0.24.2/reference/api/pandas.DataFrame.apply.html
example (I added another dummy row where the condition doesn't meet):
import pandas as pd
from io import StringIO
Materials_Tracking_df = pd.read_csv(StringIO("""
EXAM;Scoring Site DBN;Exams for this Site
MXRC;04M435;MXRC, MXRK, MXRN
MXRC;04M435;MXRC, MXRK, MXRN
SXRK;03M076;SXRK, SXRU
MXRC;04M435;MXRC, MXRK, MXRN
SXRK;04____;MXRC, MXRK, MXRN
"""), sep=';')
Materials_Tracking_df['Valid Site'] = Materials_Tracking_df.apply(
lambda r: 'T' if r['EXAM'] in r['Exams for this Site'] else 'N'
, axis=1)
EXAM Scoring Site DBN Exams for this Site Valid Site
0 MXRC 04M435 MXRC, MXRK, MXRN T
1 MXRC 04M435 MXRC, MXRK, MXRN T
2 SXRK 03M076 SXRK, SXRU T
3 MXRC 04M435 MXRC, MXRK, MXRN T
4 SXRK 04____ MXRC, MXRK, MXRN N
python pandas - Check if partial string in column exists in other column
This looks like an expensive operation. You can try:
df['col2'].apply(lambda x: 'Yes' if df['col1'].str.contains(x).any() else 'No')
Output:
0 No
1 Yes
2 Yes
Name: col2, dtype: object
Most efficient way of checking whether a string is present in another column's values in Pandas
Try using issubset()
with str.split()
:
df["check1_output"] = df.apply(lambda x: set(x["items_check1"].split(",")).issubset(x["all_items"].split(",")), axis=1)
df["check2_output"] = df.apply(lambda x: set(x["items_check2"].split(",")).issubset(x["all_items"].split(",")), axis=1)
>>> df
id all_items ... check1_output check2_output
0 1239 foobar,foo,foofoo,bar ... True True
1 3298 foobar,foo ... True False
2 9384 foo,bar ... False True
Related Topics
Split a String with Unknown Number of Spaces as Separator in Python
Converting Python Dict to Kwargs
Find First Element in a Sequence That Matches a Predicate
Cannot Find Vcvarsall.Bat When Running a Python Script
Adding a Background Image to a Plot
How to Save an Image Locally Using Python Whose Url Address I Already Know
Backporting Python 3 Open(Encoding="Utf-8") to Python 2
Problems with Pip Install Numpy - Runtimeerror: Broken Toolchain: Cannot Link a Simple C Program
How to Make a Multidimension Numpy Array with a Varying Row Size
Parsing Datetime Strings Containing Nanoseconds
Determine If Python Is Running Inside Virtualenv
Spark Dataframe Distinguish Columns with Duplicated Name
Python Pip Install Module Is Not Found. How to Link Python to Pip Location
What Is the Most Pythonic Way to Pop a Random Element from a List
Django Rest Framework File Upload