Concat Dataframe Reindexing Only Valid with Uniquely Valued Index Objects

Concat DataFrame Reindexing only valid with uniquely valued Index objects

pd.concat requires that the indices be unique. To remove rows with duplicate indices, use

df = df.loc[~df.index.duplicated(keep='first')]

import pandas as pd
from pandas import Timestamp

df1 = pd.DataFrame(
    {'price': [0.7286, 0.7286, 0.7286, 0.7286],
     'side': [2, 2, 2, 2],
     'timestamp': [1451865675631331, 1451865675631400,
                  1451865675631861, 1451865675631866]},
    index=pd.DatetimeIndex(['2000-1-1', '2000-1-1', '2001-1-1', '2002-1-1']))

df2 = pd.DataFrame(
    {'bid': [0.7284, 0.7284, 0.7284, 0.7285, 0.7285],
     'bid_size': [4000000, 4000000, 5000000, 1000000, 4000000],
     'offer': [0.7285, 0.729, 0.7286, 0.7286, 0.729],
     'offer_size': [1000000, 4000000, 4000000, 4000000, 4000000]},
    index=pd.DatetimeIndex(['2000-1-1', '2001-1-1', '2002-1-1', '2003-1-1', '2004-1-1']))

df1 = df1.loc[~df1.index.duplicated(keep='first')]
#              price  side         timestamp
# 2000-01-01  0.7286     2  1451865675631331
# 2001-01-01  0.7286     2  1451865675631861
# 2002-01-01  0.7286     2  1451865675631866

df2 = df2.loc[~df2.index.duplicated(keep='first')]
#                bid  bid_size   offer  offer_size
# 2000-01-01  0.7284   4000000  0.7285     1000000
# 2001-01-01  0.7284   4000000  0.7290     4000000
# 2002-01-01  0.7284   5000000  0.7286     4000000
# 2003-01-01  0.7285   1000000  0.7286     4000000
# 2004-01-01  0.7285   4000000  0.7290     4000000

result = pd.concat([df1, df2], axis=0)
print(result)
               bid  bid_size   offer  offer_size   price  side     timestamp
2000-01-01     NaN       NaN     NaN         NaN  0.7286     2  1.451866e+15
2001-01-01     NaN       NaN     NaN         NaN  0.7286     2  1.451866e+15
2002-01-01     NaN       NaN     NaN         NaN  0.7286     2  1.451866e+15
2000-01-01  0.7284   4000000  0.7285     1000000     NaN   NaN           NaN
2001-01-01  0.7284   4000000  0.7290     4000000     NaN   NaN           NaN
2002-01-01  0.7284   5000000  0.7286     4000000     NaN   NaN           NaN
2003-01-01  0.7285   1000000  0.7286     4000000     NaN   NaN           NaN
2004-01-01  0.7285   4000000  0.7290     4000000     NaN   NaN           NaN

Note there is also pd.join, which can join DataFrames based on their indices,
and handle non-unique indices based on the how parameter. Rows with duplicate
index are not removed.

In [94]: df1.join(df2)
Out[94]: 
             price  side         timestamp     bid  bid_size   offer  \
2000-01-01  0.7286     2  1451865675631331  0.7284   4000000  0.7285   
2000-01-01  0.7286     2  1451865675631400  0.7284   4000000  0.7285   
2001-01-01  0.7286     2  1451865675631861  0.7284   4000000  0.7290   
2002-01-01  0.7286     2  1451865675631866  0.7284   5000000  0.7286   

            offer_size  
2000-01-01     1000000  
2000-01-01     1000000  
2001-01-01     4000000  
2002-01-01     4000000  

In [95]: df1.join(df2, how='outer')
Out[95]: 
             price  side     timestamp     bid  bid_size   offer  offer_size
2000-01-01  0.7286     2  1.451866e+15  0.7284   4000000  0.7285     1000000
2000-01-01  0.7286     2  1.451866e+15  0.7284   4000000  0.7285     1000000
2001-01-01  0.7286     2  1.451866e+15  0.7284   4000000  0.7290     4000000
2002-01-01  0.7286     2  1.451866e+15  0.7284   5000000  0.7286     4000000
2003-01-01     NaN   NaN           NaN  0.7285   1000000  0.7286     4000000
2004-01-01     NaN   NaN           NaN  0.7285   4000000  0.7290     4000000

Pandas concat - InvalidIndexError: Reindexing only valid with uniquely valued Index objects

For me working correct with sample data.

I try change data for raise error, reason is duplicated columns names:

df1 = pd.DataFrame({'col1': [1,2,3],
                    'col2': [4,5,6] 
                  }).rename(columns={'col2':'col1'})
print (df1)
   col1  col1 <- col1 is duplicated
0     1     4
1     2     5
2     3     6

df2 = pd.DataFrame({'col1': [7,8,9],
                    'col2': ['10','11','12'],
                    'col3': ['13','14','15'] 
                  })

# Concat and keep only cols from df1

df3 = pd.concat([df1, df2], ignore_index=True).reindex(df1.columns, axis='columns')
print (df3)

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

You can find them:

print (df1.columns[df1.columns.duplicated(keep=False)])
Index(['col1', 'col1'], dtype='object')

print (df2.columns[df2.columns.duplicated(keep=False)])
Index([], dtype='object')

Solution is deduplicated them:

print (pd.io.parsers.ParserBase({'names':df1.columns})._maybe_dedup_names(df1.columns))
['col1', 'col1.1']

unique indexes throwing: Reindexing only valid with uniquely valued Index objects

Found the reason for the errors. As somewhat of a pandas noob I thought the error only had to do with the index. However the problem was that I had duplicate columns in each DataFrame.

Concat two dataframes: Reindexing only valid with uniquely valued Index objects

Here the InvalidIndexError is actually referring to the column index.

df1 has duplicate column names:

... Pressure Tilt_X Tilt_X

pd.concat does not work with duplicate column names.

In this case it looks like the second Tilt_X should actually be Tilt_Y, but you should check all of your dataframes' columns to make sure there are no other duplicates.

Pandas Concat returning InvalidIndexError: Reindexing only valid with uniquely valued Index objects Error

The duplicated columns shouldn't be an issue (even with ignore_index=False):

df1 = pd.DataFrame([range(7)], columns=['respondent ID', 'Column1', 'Column2', 'Column3', 'Column1', 'Column2', 'Column3'])
df2 = pd.DataFrame([['2']*7], columns=['respondent ID', 'Column1', 'Column2', 'Column3', 'Column1', 'Column2', 'Column3'])
pd.concat([df1, df2], ignore_index=True)

output:

  respondent ID Column1 Column2 Column3 Column1 Column2 Column3
0             0       1       2       3       4       5       6
1             2       2       2       2       2       2       2

Concat Dataframe Reindexing Only Valid with Uniquely Valued Index Objects