Existing function for seeing if a row exists in a data frame?
For data from @Marek answer.
nrow(merge(row_to_find,X))>0 # TRUE if exists
Check if a row in one data frame exist in another data frame
You can use merge
with parameter indicator
, then remove column Rating
and use numpy.where
:
df = pd.merge(df1, df2, on=['User','Movie'], how='left', indicator='Exist')
df.drop('Rating', inplace=True, axis=1)
df['Exist'] = np.where(df.Exist == 'both', True, False)
print (df)
User Movie Exist
0 1 333 False
1 1 1193 True
2 1 3 False
3 2 433 False
4 3 54 True
5 3 343 False
6 3 76 True
Check if a row exists in pandas
I think you need compare index values - output is True
and False
numpy array.
And for scalar need any
- check at least one True
or all
for check if all values are True
s:
(df.index == 'entry').any()
(df.index == 'entry').all()
Another solution from comment of John Galt:
'entry' in df.index
If need check substring:
df.index.str.contains('en').any()
Sample:
df = pd.DataFrame({'Apr 2013':[1,2,3]}, index=['entry','pdf','sum'])
print(df)
Apr 2013
entry 1
pdf 2
sum 3
print (df.index == 'entry')
[ True False False]
print ((df.index == 'entry').any())
True
print ((df.index == 'entry').all())
False
#check columns values
print ('entry' in df)
False
#same as explicitely call columns (better readability)
print ('entry' in df.columns)
False
#check index values
print ('entry' in df.index)
True
#check columns values
print ('Apr 2013' in df)
True
#check columns values
print ('Apr 2013' in df.columns)
True
df = pd.DataFrame({'Apr 2013':[1,2,3]}, index=['entry','entry','entry'])
print(df)
Apr 2013
entry 1
entry 2
entry 3
print (df.index == 'entry')
[ True True True]
print ((df.index == 'entry').any())
True
print ((df.index == 'entry').all())
True
Pandas Check if a Row Exists Anywhere in a Column and Return True or False
You can use Series.isin
method against a list
of values. So you need a proper list
of Description
column values:
In [915]: vals = [x.split() for x in df.Description.values][0]
In [917]: df['Check'] = df.Keyword.isin(vals)
In [918]: df
Out[918]:
Keyword Description Check
0 spam eggs spam foo bar True
1 eggs True
2 house False
3 foo True
4 bar True
5 turtle False
How to check if values in one dataframe exist in another dataframe in R?
Try this using %in%
and a vector for all values:
#Code
df1$reply <- df1$user_name %in% c(df2$name,df2$organisation)
Output:
df1
id reply user_name
1 1 TRUE John
2 2 TRUE Amazon
3 3 FALSE Bob
Some data used:
#Data1
df1 <- structure(list(id = 1:3, reply = c(NA, NA, NA), user_name = c("John",
"Amazon", "Bob")), class = "data.frame", row.names = c(NA, -3L
))
#Data2
df2 <- structure(list(name = c("John", "Pat"), organisation = c("Amazon",
"Apple")), class = "data.frame", row.names = c(NA, -2L))
How to quickly check if row exists in PySpark Dataframe?
It would be better to create a spark dataframe from the entries that you want to look up, and then do a semi join
or an anti join
to get the rows that exist or do not exist in the lookup dataframe. This should be more efficient than checking the entries one by one.
import pyspark.sql.functions as F
df = spark.createDataFrame([[2,5],[2,10]],['A','B'])
result1 = df.join(lookup, ['A','B'], 'semi').withColumn('exists', F.lit(True))
result2 = df.join(lookup, ['A','B'], 'anti').withColumn('exists', F.lit(False))
result = result1.unionAll(result2)
result.show()
+---+---+------+
| A| B|exists|
+---+---+------+
| 2| 5| true|
| 2| 10| false|
+---+---+------+
Pandas check if row exist in another dataframe and append index
you can do it this way:
Data (pay attention at the index in the B
DF):
In [276]: cols = ['SampleID', 'ParentID']
In [277]: A
Out[277]:
Real_ID SampleID ParentID Something AnotherThing
0 NaN 10 11 a b
1 NaN 20 21 a b
2 NaN 40 51 a b
In [278]: B
Out[278]:
SampleID ParentID
3 10 11
5 20 21
Solution:
In [279]: merged = pd.merge(A[cols], B, on=cols, how='outer', indicator=True)
In [280]: merged
Out[280]:
SampleID ParentID _merge
0 10 11 both
1 20 21 both
2 40 51 left_only
In [281]: B = pd.concat([B, merged.ix[merged._merge=='left_only', cols]])
In [282]: B
Out[282]:
SampleID ParentID
3 10 11
5 20 21
2 40 51
In [285]: A['Real_ID'] = pd.merge(A[cols], B.reset_index(), on=cols)['index']
In [286]: A
Out[286]:
Real_ID SampleID ParentID Something AnotherThing
0 3 10 11 a b
1 5 20 21 a b
2 2 40 51 a b
Check if row with correct values in dataframe exists and append if not
Idea is use DataFrame.loc
for set values by 89
- if not exist is added new row, if exist is overwrite value. There is also added DataFrame.astype
for convert to original dtypes, if is appended new row:
df2 = pd.DataFrame({'id':[1,2,3,4] ,
'value':[23,34,45,56]})
df = pd.DataFrame({'id':[1,2,3,4,5] ,
'value':[23,34,45,56,67]})
def test(df, value_to_check):
df = df.set_index('id')
dtypes = df.dtypes
df.loc[value_to_check, ['value']] = 89
return df.astype(dtypes).reset_index()
df1 = test(df, 5)
print (df1)
id value
0 1 23
1 2 34
2 3 45
3 4 56
4 5 89
df1 = test(df2, 5)
print (df1)
id value
0 1 23
1 2 34
2 3 45
3 4 56
4 5 89
Related Topics
Select Rows of a Data.Frame That Contain Only Numbers in a Certain Column
Dplyr String as Column Reference
Data Table - Select Value of Column by Name from Another Column
Continuous Colour of Geom_Line According to Y Value
Importing Excel File Using Url Using Read.Xls
Weird Characters Added to First Column Name After Reading a Toad-Exported CSV File
Calculating a Distance Matrix by Dtw
Finding Euclidean Distance in R{Spatstat} Between Points, Confined by an Irregular Polygon Window
Different Axis Limits Per Facet in Ggplot2
Apply Function to Elements Over a List
In R, Getting the Following Error: "Attempt to Replicate an Object of Type 'Closure'"
How to Order Bars Within All Facets
How to Expand Axis Asymmetrically with Ggplot2 Without Setting Limits Manually