How to Find Index of Match Between Two Set of Data Frame

how to find index of match between two set of data frame

You can use match

match(df2$V1, df1$V1)
#[1] 1 9 10 NA NA 16

If you do not want NA and want it as -, you can use ifelse

i1 <- match(df2$V1, df1$V1)
df2$myindex <- ifelse(is.na(i1), "-", i1)
df2
# V1 myindex
#1 AbC 1
#2 F 9
#3 GI666 10
#4 Dehli -
#5 Bangalore -
#6 Mumbai 16

Get Index of matching string from Two dataframe

Use apply with map:

Y = Y.reset_index().set_index('X')['index']
X = X.apply(lambda x: x.map(Y))
print(X)
a b c
0 0 4 6.0
1 3 1 5.0
2 2 7 NaN
3 9 8 7.0

Is there a way to get elements of a data frame where the index elements match the elements of a list

Use MultiIndex.get_level_values with Index.isin, loc is not necessary here:

df[df.index.get_level_values('Index 1').isin(['c', 'a', 'd'])]

Pandas: How to keep track of the indices for the matching data entries between two dataframes?

Using reset_index and groupby with R2 columns B , to get the list of index , then chain with .loc

R2.reset_index().groupby('B')['index'].apply(list).loc[R1.B.unique()]# if you need dict , adding to_dict() at the end 
B
2 [0]
6 [2, 3]
7 [4]
Name: index, dtype: object

Compare two dataframes by index with unique indexes

Seems you only want to keep comparison results for indices that exist in both dataframes, and in this case, you can get the common set of indices by

idx = df1.index.intersection(df2.index)

then

df1.loc[idx].eq(df2.loc[idx])

or

df1.eq(df2).loc[idx]

How to find the same index between two dataframes and combine it to a new dataframe, Python 3

Option 1
You could concatenate the two dataframes and group by columns.

pd.concat([df1, df2], 1).dropna().mean(axis=1, level=0)

A B
apple 0.0 7.5
banana 2.5 7.5

If it's just A you want, then this should suffice -

pd.concat([df1, df2], 1).dropna()['A'].mean(axis=1, level=0)

A
apple 0.0
banana 2.5

Option 2
An alternative would be to find the intersecting indices with index.intersection and index with loc -

i = df1.index.intersection(df2.index)

df1.loc[i, ['A']].add(df2.loc[i, ['A']]).div(2)

A
Name
apple 0.0
banana 2.5

Python Pandas - Find difference between two data frames

By using drop_duplicates

pd.concat([df1,df2]).drop_duplicates(keep=False)

Update :

The above method only works for those data frames that don't already have duplicates themselves. For example:

df1=pd.DataFrame({'A':[1,2,3,3],'B':[2,3,4,4]})
df2=pd.DataFrame({'A':[1],'B':[2]})

It will output like below , which is wrong

Wrong Output :

pd.concat([df1, df2]).drop_duplicates(keep=False)
Out[655]:
A B
1 2 3

Correct Output

Out[656]: 
A B
1 2 3
2 3 4
3 3 4


How to achieve that?

Method 1: Using isin with tuple

df1[~df1.apply(tuple,1).isin(df2.apply(tuple,1))]
Out[657]:
A B
1 2 3
2 3 4
3 3 4

Method 2: merge with indicator

df1.merge(df2,indicator = True, how='left').loc[lambda x : x['_merge']!='both']
Out[421]:
A B _merge
1 2 3 left_only
2 3 4 left_only
3 3 4 left_only

Comparing two dataframes and getting the differences

This approach, df1 != df2, works only for dataframes with identical rows and columns. In fact, all dataframes axes are compared with _indexed_same method, and exception is raised if differences found, even in columns/indices order.

If I got you right, you want not to find changes, but symmetric difference. For that, one approach might be concatenate dataframes:

>>> df = pd.concat([df1, df2])
>>> df = df.reset_index(drop=True)

group by

>>> df_gpby = df.groupby(list(df.columns))

get index of unique records

>>> idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1]

filter

>>> df.reindex(idx)
Date Fruit Num Color
9 2013-11-25 Orange 8.6 Orange
8 2013-11-25 Apple 22.1 Red

R - Find index of element from one dataframe and place in another

We can use match to create the new column

df1$OccurrenceIndex <-  match(df2$Activity, df1$Activity)
df1
# Activity NoOfOccurrences OccurrenceIndex
#1 Walking 38 3
#2 Jogging 26 1
#3 Running 12 2


Related Topics



Leave a reply



Submit