Concat DataFrame Reindexing only valid with uniquely valued Index objects
pd.concat
requires that the indices be unique. To remove rows with duplicate indices, use
df = df.loc[~df.index.duplicated(keep='first')]
import pandas as pd
from pandas import Timestamp
df1 = pd.DataFrame(
{'price': [0.7286, 0.7286, 0.7286, 0.7286],
'side': [2, 2, 2, 2],
'timestamp': [1451865675631331, 1451865675631400,
1451865675631861, 1451865675631866]},
index=pd.DatetimeIndex(['2000-1-1', '2000-1-1', '2001-1-1', '2002-1-1']))
df2 = pd.DataFrame(
{'bid': [0.7284, 0.7284, 0.7284, 0.7285, 0.7285],
'bid_size': [4000000, 4000000, 5000000, 1000000, 4000000],
'offer': [0.7285, 0.729, 0.7286, 0.7286, 0.729],
'offer_size': [1000000, 4000000, 4000000, 4000000, 4000000]},
index=pd.DatetimeIndex(['2000-1-1', '2001-1-1', '2002-1-1', '2003-1-1', '2004-1-1']))
df1 = df1.loc[~df1.index.duplicated(keep='first')]
# price side timestamp
# 2000-01-01 0.7286 2 1451865675631331
# 2001-01-01 0.7286 2 1451865675631861
# 2002-01-01 0.7286 2 1451865675631866
df2 = df2.loc[~df2.index.duplicated(keep='first')]
# bid bid_size offer offer_size
# 2000-01-01 0.7284 4000000 0.7285 1000000
# 2001-01-01 0.7284 4000000 0.7290 4000000
# 2002-01-01 0.7284 5000000 0.7286 4000000
# 2003-01-01 0.7285 1000000 0.7286 4000000
# 2004-01-01 0.7285 4000000 0.7290 4000000
result = pd.concat([df1, df2], axis=0)
print(result)
bid bid_size offer offer_size price side timestamp
2000-01-01 NaN NaN NaN NaN 0.7286 2 1.451866e+15
2001-01-01 NaN NaN NaN NaN 0.7286 2 1.451866e+15
2002-01-01 NaN NaN NaN NaN 0.7286 2 1.451866e+15
2000-01-01 0.7284 4000000 0.7285 1000000 NaN NaN NaN
2001-01-01 0.7284 4000000 0.7290 4000000 NaN NaN NaN
2002-01-01 0.7284 5000000 0.7286 4000000 NaN NaN NaN
2003-01-01 0.7285 1000000 0.7286 4000000 NaN NaN NaN
2004-01-01 0.7285 4000000 0.7290 4000000 NaN NaN NaN
Note there is also pd.join
, which can join DataFrames based on their indices,
and handle non-unique indices based on the how
parameter. Rows with duplicate
index are not removed.
In [94]: df1.join(df2)
Out[94]:
price side timestamp bid bid_size offer \
2000-01-01 0.7286 2 1451865675631331 0.7284 4000000 0.7285
2000-01-01 0.7286 2 1451865675631400 0.7284 4000000 0.7285
2001-01-01 0.7286 2 1451865675631861 0.7284 4000000 0.7290
2002-01-01 0.7286 2 1451865675631866 0.7284 5000000 0.7286
offer_size
2000-01-01 1000000
2000-01-01 1000000
2001-01-01 4000000
2002-01-01 4000000
In [95]: df1.join(df2, how='outer')
Out[95]:
price side timestamp bid bid_size offer offer_size
2000-01-01 0.7286 2 1.451866e+15 0.7284 4000000 0.7285 1000000
2000-01-01 0.7286 2 1.451866e+15 0.7284 4000000 0.7285 1000000
2001-01-01 0.7286 2 1.451866e+15 0.7284 4000000 0.7290 4000000
2002-01-01 0.7286 2 1.451866e+15 0.7284 5000000 0.7286 4000000
2003-01-01 NaN NaN NaN 0.7285 1000000 0.7286 4000000
2004-01-01 NaN NaN NaN 0.7285 4000000 0.7290 4000000
Pandas concat - InvalidIndexError: Reindexing only valid with uniquely valued Index objects
For me working correct with sample data.
I try change data for raise error, reason is duplicated columns names:
df1 = pd.DataFrame({'col1': [1,2,3],
'col2': [4,5,6]
}).rename(columns={'col2':'col1'})
print (df1)
col1 col1 <- col1 is duplicated
0 1 4
1 2 5
2 3 6
df2 = pd.DataFrame({'col1': [7,8,9],
'col2': ['10','11','12'],
'col3': ['13','14','15']
})
# Concat and keep only cols from df1
df3 = pd.concat([df1, df2], ignore_index=True).reindex(df1.columns, axis='columns')
print (df3)
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
You can find them:
print (df1.columns[df1.columns.duplicated(keep=False)])
Index(['col1', 'col1'], dtype='object')
print (df2.columns[df2.columns.duplicated(keep=False)])
Index([], dtype='object')
Solution is deduplicated them:
print (pd.io.parsers.ParserBase({'names':df1.columns})._maybe_dedup_names(df1.columns))
['col1', 'col1.1']
unique indexes throwing: Reindexing only valid with uniquely valued Index objects
Found the reason for the errors. As somewhat of a pandas noob I thought the error only had to do with the index. However the problem was that I had duplicate columns in each DataFrame.
Concat two dataframes: Reindexing only valid with uniquely valued Index objects
Here the InvalidIndexError
is actually referring to the column index.
df1
has duplicate column names:
... Pressure Tilt_X Tilt_X
pd.concat
does not work with duplicate column names.
In this case it looks like the second Tilt_X
should actually be Tilt_Y
, but you should check all of your dataframes' columns to make sure there are no other duplicates.
Pandas Concat returning InvalidIndexError: Reindexing only valid with uniquely valued Index objects Error
The duplicated columns shouldn't be an issue (even with ignore_index=False
):
df1 = pd.DataFrame([range(7)], columns=['respondent ID', 'Column1', 'Column2', 'Column3', 'Column1', 'Column2', 'Column3'])
df2 = pd.DataFrame([['2']*7], columns=['respondent ID', 'Column1', 'Column2', 'Column3', 'Column1', 'Column2', 'Column3'])
pd.concat([df1, df2], ignore_index=True)
output:
respondent ID Column1 Column2 Column3 Column1 Column2 Column3
0 0 1 2 3 4 5 6
1 2 2 2 2 2 2 2
Related Topics
Fitting a Histogram with Python
Nltk-Based Text Processing with Pandas
Pythonic Way to Combine For-Loop and If-Statement
How to Copy Inmemoryuploadedfile Object to Disk
Can't Open Lib 'Odbc Driver 13 for SQL Server'? Sym Linking Issue
How to Get Char from String by Index
Crawling with an Authenticated Session in Scrapy
Is There a Difference Between Continue and Pass in a for Loop in Python
List of All Available Matplotlib Backends
When Is Not a Good Time to Use Python Generators
Format String Unused Named Arguments
Finding Duplicate Files and Removing Them
How to Clone a Python Generator Object
Set Up Python Simplehttpserver on Windows
Why Isn't the Regular Expression's "Non-Capturing" Group Working