Concat Dataframe Reindexing Only Valid with Uniquely Valued Index Objects

Concat DataFrame Reindexing only valid with uniquely valued Index objects

pd.concat requires that the indices be unique. To remove rows with duplicate indices, use

df = df.loc[~df.index.duplicated(keep='first')]

import pandas as pd
from pandas import Timestamp

df1 = pd.DataFrame(
{'price': [0.7286, 0.7286, 0.7286, 0.7286],
'side': [2, 2, 2, 2],
'timestamp': [1451865675631331, 1451865675631400,
1451865675631861, 1451865675631866]},
index=pd.DatetimeIndex(['2000-1-1', '2000-1-1', '2001-1-1', '2002-1-1']))

df2 = pd.DataFrame(
{'bid': [0.7284, 0.7284, 0.7284, 0.7285, 0.7285],
'bid_size': [4000000, 4000000, 5000000, 1000000, 4000000],
'offer': [0.7285, 0.729, 0.7286, 0.7286, 0.729],
'offer_size': [1000000, 4000000, 4000000, 4000000, 4000000]},
index=pd.DatetimeIndex(['2000-1-1', '2001-1-1', '2002-1-1', '2003-1-1', '2004-1-1']))

df1 = df1.loc[~df1.index.duplicated(keep='first')]
# price side timestamp
# 2000-01-01 0.7286 2 1451865675631331
# 2001-01-01 0.7286 2 1451865675631861
# 2002-01-01 0.7286 2 1451865675631866

df2 = df2.loc[~df2.index.duplicated(keep='first')]
# bid bid_size offer offer_size
# 2000-01-01 0.7284 4000000 0.7285 1000000
# 2001-01-01 0.7284 4000000 0.7290 4000000
# 2002-01-01 0.7284 5000000 0.7286 4000000
# 2003-01-01 0.7285 1000000 0.7286 4000000
# 2004-01-01 0.7285 4000000 0.7290 4000000

result = pd.concat([df1, df2], axis=0)
print(result)
bid bid_size offer offer_size price side timestamp
2000-01-01 NaN NaN NaN NaN 0.7286 2 1.451866e+15
2001-01-01 NaN NaN NaN NaN 0.7286 2 1.451866e+15
2002-01-01 NaN NaN NaN NaN 0.7286 2 1.451866e+15
2000-01-01 0.7284 4000000 0.7285 1000000 NaN NaN NaN
2001-01-01 0.7284 4000000 0.7290 4000000 NaN NaN NaN
2002-01-01 0.7284 5000000 0.7286 4000000 NaN NaN NaN
2003-01-01 0.7285 1000000 0.7286 4000000 NaN NaN NaN
2004-01-01 0.7285 4000000 0.7290 4000000 NaN NaN NaN

Note there is also pd.join, which can join DataFrames based on their indices,
and handle non-unique indices based on the how parameter. Rows with duplicate
index are not removed.

In [94]: df1.join(df2)
Out[94]:
price side timestamp bid bid_size offer \
2000-01-01 0.7286 2 1451865675631331 0.7284 4000000 0.7285
2000-01-01 0.7286 2 1451865675631400 0.7284 4000000 0.7285
2001-01-01 0.7286 2 1451865675631861 0.7284 4000000 0.7290
2002-01-01 0.7286 2 1451865675631866 0.7284 5000000 0.7286

offer_size
2000-01-01 1000000
2000-01-01 1000000
2001-01-01 4000000
2002-01-01 4000000

In [95]: df1.join(df2, how='outer')
Out[95]:
price side timestamp bid bid_size offer offer_size
2000-01-01 0.7286 2 1.451866e+15 0.7284 4000000 0.7285 1000000
2000-01-01 0.7286 2 1.451866e+15 0.7284 4000000 0.7285 1000000
2001-01-01 0.7286 2 1.451866e+15 0.7284 4000000 0.7290 4000000
2002-01-01 0.7286 2 1.451866e+15 0.7284 5000000 0.7286 4000000
2003-01-01 NaN NaN NaN 0.7285 1000000 0.7286 4000000
2004-01-01 NaN NaN NaN 0.7285 4000000 0.7290 4000000

Pandas concat - InvalidIndexError: Reindexing only valid with uniquely valued Index objects

For me working correct with sample data.

I try change data for raise error, reason is duplicated columns names:

df1 = pd.DataFrame({'col1': [1,2,3],
'col2': [4,5,6]
}).rename(columns={'col2':'col1'})
print (df1)
col1 col1 <- col1 is duplicated
0 1 4
1 2 5
2 3 6

df2 = pd.DataFrame({'col1': [7,8,9],
'col2': ['10','11','12'],
'col3': ['13','14','15']
})

# Concat and keep only cols from df1

df3 = pd.concat([df1, df2], ignore_index=True).reindex(df1.columns, axis='columns')
print (df3)

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

You can find them:

print (df1.columns[df1.columns.duplicated(keep=False)])
Index(['col1', 'col1'], dtype='object')

print (df2.columns[df2.columns.duplicated(keep=False)])
Index([], dtype='object')

Solution is deduplicated them:

print (pd.io.parsers.ParserBase({'names':df1.columns})._maybe_dedup_names(df1.columns))
['col1', 'col1.1']

unique indexes throwing: Reindexing only valid with uniquely valued Index objects

Found the reason for the errors. As somewhat of a pandas noob I thought the error only had to do with the index. However the problem was that I had duplicate columns in each DataFrame.

Concat two dataframes: Reindexing only valid with uniquely valued Index objects

Here the InvalidIndexError is actually referring to the column index.

df1 has duplicate column names:

... Pressure Tilt_X Tilt_X

pd.concat does not work with duplicate column names.

In this case it looks like the second Tilt_X should actually be Tilt_Y, but you should check all of your dataframes' columns to make sure there are no other duplicates.

Pandas Concat returning InvalidIndexError: Reindexing only valid with uniquely valued Index objects Error

The duplicated columns shouldn't be an issue (even with ignore_index=False):

df1 = pd.DataFrame([range(7)], columns=['respondent ID', 'Column1', 'Column2', 'Column3', 'Column1', 'Column2', 'Column3'])
df2 = pd.DataFrame([['2']*7], columns=['respondent ID', 'Column1', 'Column2', 'Column3', 'Column1', 'Column2', 'Column3'])
pd.concat([df1, df2], ignore_index=True)

output:

  respondent ID Column1 Column2 Column3 Column1 Column2 Column3
0 0 1 2 3 4 5 6
1 2 2 2 2 2 2 2


Related Topics



Leave a reply



Submit