how to find index of match between two set of data frame
You can use match
match(df2$V1, df1$V1)
#[1] 1 9 10 NA NA 16
If you do not want NA
and want it as -
, you can use ifelse
i1 <- match(df2$V1, df1$V1)
df2$myindex <- ifelse(is.na(i1), "-", i1)
df2
# V1 myindex
#1 AbC 1
#2 F 9
#3 GI666 10
#4 Dehli -
#5 Bangalore -
#6 Mumbai 16
Get Index of matching string from Two dataframe
Use apply
with map
:
Y = Y.reset_index().set_index('X')['index']
X = X.apply(lambda x: x.map(Y))
print(X)
a b c
0 0 4 6.0
1 3 1 5.0
2 2 7 NaN
3 9 8 7.0
Is there a way to get elements of a data frame where the index elements match the elements of a list
Use MultiIndex.get_level_values
with Index.isin
, loc
is not necessary here:
df[df.index.get_level_values('Index 1').isin(['c', 'a', 'd'])]
Pandas: How to keep track of the indices for the matching data entries between two dataframes?
Using reset_index
and groupby
with R2 columns B , to get the list
of index
, then chain with .loc
R2.reset_index().groupby('B')['index'].apply(list).loc[R1.B.unique()]# if you need dict , adding to_dict() at the end
B
2 [0]
6 [2, 3]
7 [4]
Name: index, dtype: object
Compare two dataframes by index with unique indexes
Seems you only want to keep comparison results for indices that exist in both dataframes, and in this case, you can get the common set of indices by
idx = df1.index.intersection(df2.index)
then
df1.loc[idx].eq(df2.loc[idx])
or
df1.eq(df2).loc[idx]
How to find the same index between two dataframes and combine it to a new dataframe, Python 3
Option 1
You could concatenate the two dataframes and group by columns.
pd.concat([df1, df2], 1).dropna().mean(axis=1, level=0)
A B
apple 0.0 7.5
banana 2.5 7.5
If it's just A
you want, then this should suffice -
pd.concat([df1, df2], 1).dropna()['A'].mean(axis=1, level=0)
A
apple 0.0
banana 2.5
Option 2
An alternative would be to find the intersecting indices with index.intersection
and index with loc
-
i = df1.index.intersection(df2.index)
df1.loc[i, ['A']].add(df2.loc[i, ['A']]).div(2)
A
Name
apple 0.0
banana 2.5
Python Pandas - Find difference between two data frames
By using drop_duplicates
pd.concat([df1,df2]).drop_duplicates(keep=False)
Update :
The above method only works for those data frames that don't already have duplicates themselves. For example:
df1=pd.DataFrame({'A':[1,2,3,3],'B':[2,3,4,4]})
df2=pd.DataFrame({'A':[1],'B':[2]})
It will output like below , which is wrong
Wrong Output :
pd.concat([df1, df2]).drop_duplicates(keep=False)
Out[655]:
A B
1 2 3
Correct Output
Out[656]:
A B
1 2 3
2 3 4
3 3 4
How to achieve that?
Method 1: Using isin
with tuple
df1[~df1.apply(tuple,1).isin(df2.apply(tuple,1))]
Out[657]:
A B
1 2 3
2 3 4
3 3 4
Method 2: merge
with indicator
df1.merge(df2,indicator = True, how='left').loc[lambda x : x['_merge']!='both']
Out[421]:
A B _merge
1 2 3 left_only
2 3 4 left_only
3 3 4 left_only
Comparing two dataframes and getting the differences
This approach, df1 != df2
, works only for dataframes with identical rows and columns. In fact, all dataframes axes are compared with _indexed_same
method, and exception is raised if differences found, even in columns/indices order.
If I got you right, you want not to find changes, but symmetric difference. For that, one approach might be concatenate dataframes:
>>> df = pd.concat([df1, df2])
>>> df = df.reset_index(drop=True)
group by
>>> df_gpby = df.groupby(list(df.columns))
get index of unique records
>>> idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1]
filter
>>> df.reindex(idx)
Date Fruit Num Color
9 2013-11-25 Orange 8.6 Orange
8 2013-11-25 Apple 22.1 Red
R - Find index of element from one dataframe and place in another
We can use match
to create the new column
df1$OccurrenceIndex <- match(df2$Activity, df1$Activity)
df1
# Activity NoOfOccurrences OccurrenceIndex
#1 Walking 38 3
#2 Jogging 26 1
#3 Running 12 2
Related Topics
Replace Na with Groups Mean in a Non Specified Number of Columns
Dplyr: Put Count Occurrences into New Variable
Display Only Months in Daterangeinput or Dateinput for a Shiny App [R Programming]
Random Forest with Classes That Are Very Unbalanced
Function for Retrieving Own Ip Address from Within R
What's the Difference in Using a Semicolon or Explicit New Line in R Code
Extracting Value Based on Another Column
Arithmetic Operations on R Factors
R Shiny Conditionalpanel Output Value
Get All the Rows with Rownames Starting with Abc111
Error in Fetch(Key):Lazy-Load Database
Insert Layer Underneath Existing Layers in Ggplot2 Object
Fastest Way for Filling-In Missing Dates for Data.Table
Using Lapply to Change Column Names of a List of Data Frames
What's the Real Meaning About 'Everything That Exists Is an Object' in R