Existing Function for Seeing If a Row Exists in a Data Frame

Existing function for seeing if a row exists in a data frame?

For data from @Marek answer.

nrow(merge(row_to_find,X))>0 # TRUE if exists

Check if a row in one data frame exist in another data frame

You can use merge with parameter indicator, then remove column Rating and use numpy.where:

df = pd.merge(df1, df2, on=['User','Movie'], how='left', indicator='Exist')
df.drop('Rating', inplace=True, axis=1)
df['Exist'] = np.where(df.Exist == 'both', True, False)
print (df)
User Movie Exist
0 1 333 False
1 1 1193 True
2 1 3 False
3 2 433 False
4 3 54 True
5 3 343 False
6 3 76 True

Check if a row exists in pandas

I think you need compare index values - output is True and False numpy array.
And for scalar need any - check at least one True or all for check if all values are Trues:

(df.index == 'entry').any()

(df.index == 'entry').all()

Another solution from comment of John Galt:

'entry' in df.index

If need check substring:

df.index.str.contains('en').any()

Sample:

df = pd.DataFrame({'Apr 2013':[1,2,3]}, index=['entry','pdf','sum'])
print(df)
Apr 2013
entry 1
pdf 2
sum 3

print (df.index == 'entry')
[ True False False]

print ((df.index == 'entry').any())
True
print ((df.index == 'entry').all())
False

#check columns values
print ('entry' in df)
False
#same as explicitely call columns (better readability)
print ('entry' in df.columns)
False
#check index values
print ('entry' in df.index)
True
#check columns values
print ('Apr 2013' in df)
True
#check columns values
print ('Apr 2013' in df.columns)
True

df = pd.DataFrame({'Apr 2013':[1,2,3]}, index=['entry','entry','entry'])
print(df)
Apr 2013
entry 1
entry 2
entry 3

print (df.index == 'entry')
[ True True True]

print ((df.index == 'entry').any())
True
print ((df.index == 'entry').all())
True

Pandas Check if a Row Exists Anywhere in a Column and Return True or False

You can use Series.isin method against a list of values. So you need a proper list of Description column values:

In [915]: vals = [x.split() for x in df.Description.values][0]
In [917]: df['Check'] = df.Keyword.isin(vals)

In [918]: df
Out[918]:
Keyword Description Check
0 spam eggs spam foo bar True
1 eggs True
2 house False
3 foo True
4 bar True
5 turtle False

How to check if values in one dataframe exist in another dataframe in R?

Try this using %in% and a vector for all values:

#Code
df1$reply <- df1$user_name %in% c(df2$name,df2$organisation)

Output:

df1
id reply user_name
1 1 TRUE John
2 2 TRUE Amazon
3 3 FALSE Bob

Some data used:

#Data1
df1 <- structure(list(id = 1:3, reply = c(NA, NA, NA), user_name = c("John",
"Amazon", "Bob")), class = "data.frame", row.names = c(NA, -3L
))

#Data2
df2 <- structure(list(name = c("John", "Pat"), organisation = c("Amazon",
"Apple")), class = "data.frame", row.names = c(NA, -2L))

How to quickly check if row exists in PySpark Dataframe?

It would be better to create a spark dataframe from the entries that you want to look up, and then do a semi join or an anti join to get the rows that exist or do not exist in the lookup dataframe. This should be more efficient than checking the entries one by one.

import pyspark.sql.functions as F

df = spark.createDataFrame([[2,5],[2,10]],['A','B'])

result1 = df.join(lookup, ['A','B'], 'semi').withColumn('exists', F.lit(True))

result2 = df.join(lookup, ['A','B'], 'anti').withColumn('exists', F.lit(False))

result = result1.unionAll(result2)

result.show()
+---+---+------+
| A| B|exists|
+---+---+------+
| 2| 5| true|
| 2| 10| false|
+---+---+------+

Pandas check if row exist in another dataframe and append index

you can do it this way:

Data (pay attention at the index in the B DF):

In [276]: cols = ['SampleID', 'ParentID']

In [277]: A
Out[277]:
Real_ID SampleID ParentID Something AnotherThing
0 NaN 10 11 a b
1 NaN 20 21 a b
2 NaN 40 51 a b

In [278]: B
Out[278]:
SampleID ParentID
3 10 11
5 20 21

Solution:

In [279]: merged = pd.merge(A[cols], B, on=cols, how='outer', indicator=True)

In [280]: merged
Out[280]:
SampleID ParentID _merge
0 10 11 both
1 20 21 both
2 40 51 left_only

In [281]: B = pd.concat([B, merged.ix[merged._merge=='left_only', cols]])

In [282]: B
Out[282]:
SampleID ParentID
3 10 11
5 20 21
2 40 51

In [285]: A['Real_ID'] = pd.merge(A[cols], B.reset_index(), on=cols)['index']

In [286]: A
Out[286]:
Real_ID SampleID ParentID Something AnotherThing
0 3 10 11 a b
1 5 20 21 a b
2 2 40 51 a b

Check if row with correct values in dataframe exists and append if not

Idea is use DataFrame.loc for set values by 89 - if not exist is added new row, if exist is overwrite value. There is also added DataFrame.astype for convert to original dtypes, if is appended new row:

df2 = pd.DataFrame({'id':[1,2,3,4] ,                  
'value':[23,34,45,56]})

df = pd.DataFrame({'id':[1,2,3,4,5] ,
'value':[23,34,45,56,67]})

def test(df, value_to_check):
df = df.set_index('id')
dtypes = df.dtypes
df.loc[value_to_check, ['value']] = 89
return df.astype(dtypes).reset_index()

df1 = test(df, 5)
print (df1)
id value
0 1 23
1 2 34
2 3 45
3 4 56
4 5 89

df1 = test(df2, 5)
print (df1)
id value
0 1 23
1 2 34
2 3 45
3 4 56
4 5 89


Related Topics



Leave a reply



Submit