Panda's DataFrame - renaming multiple identically named columns
I was looking to find a solution within Pandas more than a general Python solution.
Column's get_loc() function returns a masked array if it finds duplicates with 'True' values pointing to the locations where duplicates are found. I then use the mask to assign new values into those locations. In my case, I know ahead of time how many dups I'm going to get and what I'm going to assign to them but it looks like df.columns.get_duplicates() would return a list of all dups and you can then use that list in conjunction with get_loc() if you need a more generic dup-weeding action
'''UPDATED AS-OF SEPT 2020'''
cols=pd.Series(df.columns)
for dup in df.columns[df.columns.duplicated(keep=False)]:
cols[df.columns.get_loc(dup)] = ([dup + '.' + str(d_idx)
if d_idx != 0
else dup
for d_idx in range(df.columns.get_loc(dup).sum())]
)
df.columns=cols
blah blah2 blah3 blah.1 blah.2
0 0 1 2 3 4
1 5 6 7 8 9
New Better Method (Update 03Dec2019)
This code below is better than above code. Copied from another answer below (@SatishSK):
#sample df with duplicate blah column
df=pd.DataFrame(np.arange(2*5).reshape(2,5))
df.columns=['blah','blah2','blah3','blah','blah']
df
# you just need the following 4 lines to rename duplicates
# df is the dataframe that you want to rename duplicated columns
cols=pd.Series(df.columns)
for dup in cols[cols.duplicated()].unique():
cols[cols[cols == dup].index.values.tolist()] = [dup + '.' + str(i) if i != 0 else dup for i in range(sum(cols == dup))]
# rename the columns with the cols list.
df.columns=cols
df
Output:
blah blah2 blah3 blah.1 blah.2
0 0 1 2 3 4
1 5 6 7 8 9
Rename duplicate column name by order in Pandas
You could use an itertools.count()
counter and a list expression to create new column headers, then assign them to the data frame.
For example:
>>> import itertools
>>> df = pd.DataFrame([[1, 2, 3]], columns=["Nice", "Nice", "Hello"])
>>> df
Nice Nice Hello
0 1 2 3
>>> count = itertools.count(1)
>>> new_cols = [f"Nice{next(count)}" if col == "Nice" else col for col in df.columns]
>>> df.columns = new_cols
>>> df
Nice1 Nice2 Hello
0 1 2 3
(Python 3.6+ required for the f-strings)
EDIT: Alternatively, per the comment below, the list expression can replace any label that may contain "Nice"
in case there are unexpected spaces or other characters:
new_cols = [f"Nice{next(count)}" if "Nice" in col else col for col in df.columns]
Renaming columns in a Pandas dataframe with duplicate column names?
X_R.columns = ['Retail','Cost']
Renaming column names in Pandas
Just assign it to the .columns
attribute:
>>> df = pd.DataFrame({'$a':[1,2], '$b': [10,20]})
>>> df
$a $b
0 1 10
1 2 20
>>> df.columns = ['a', 'b']
>>> df
a b
0 1 10
1 2 20
Renaming Multiple Columns in Pandas
I think best here is use rename
with unique new columns names like:
df = pd.DataFrame({'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (df)
A B C D E F
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b
d = dict(zip(df.columns[1::3], range(len(df.columns[1::3]))))
print (d)
{'B': 0, 'E': 1}
df = df.rename(columns=d)
print (df)
A 0 C D 1 F
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b
Or:
d = dict(zip(df.columns[1::3],
['name{}'.format(x) for x in range(len(df.columns[1::3]))]))
print (d)
{'B': 'name0', 'E': 'name1'}
df = df.rename(columns=d)
print (df)
A name0 C D name1 F
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b
Not recommended solution is rename for same column names:
d = dict.fromkeys(df.columns[1::3], 'Name')
print (d)
{'B': 'Name', 'E': 'Name'}
df = df.rename(columns=d)
print (df)
A Name C D Name F
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b
because if want seelct column Name
it return all columns in DataFrame
:
print (df['Name'])
Name Name
0 4 5
1 5 3
2 4 6
3 5 9
4 5 2
5 4 4
Is there a built in Python/pandas function to rename duplicate columns in a pandas.DataFrame?
but I could not find a handy way to do this to a DataFrame object already
To an existing dataframe we have to resort to some code, there is no builtin;
s = pd.Series(df.columns)
df.columns= df.columns+s.groupby(s).cumcount().replace(0,'').astype(str)
x x1 y
0 2 5 1
1 1 9 3
2 4 1 2
Pandas Dataframe automatically renames duplicate columns name
I dont think it is a good idea have more columns with the same name, and i wouldnt suggest this, but if you want to go with that, you can do in this way:
df = df.rename(columns = {"Jun'17.1":"Jun'17"})
To access to the 2 different columns then do in this way:
df["Jun'17"].iloc[:,0]
df["Jun'17"].iloc[:,1]
Changing multiple column names but not all of them - Pandas Python
say you have a dictionary of the new column names and the name of the column they should replace:
df.rename(columns={'old_col':'new_col', 'old_col_2':'new_col_2'}, inplace=True)
But, if you don't have that, and you only have the indices, you can do this:
column_indices = [1,4,5,6]
new_names = ['a','b','c','d']
old_names = df.columns[column_indices]
df.rename(columns=dict(zip(old_names, new_names)), inplace=True)
Automatically rename columns to ensure they are unique
You can uniquify the columns manually:
df_columns = ['a', 'b', 'a', 'a_2', 'a_2', 'a', 'a_2', 'a_2_2']
def uniquify(df_columns):
seen = set()
for item in df_columns:
fudge = 1
newitem = item
while newitem in seen:
fudge += 1
newitem = "{}_{}".format(item, fudge)
yield newitem
seen.add(newitem)
list(uniquify(df_columns))
#>>> ['a', 'b', 'a_2', 'a_2_2', 'a_2_3', 'a_3', 'a_2_4', 'a_2_2_2']
Related Topics
How to Tell a Python Script to Use a Particular Version
How to Get Stable Results with Tensorflow, Setting Random Seed
Validation of a Password - Python
Remove None Value from a List Without Removing the 0 Value
Downloading a Directory Tree with Ftplib
How to Access a Standard-Library Module in Python When There Is a Local Module with the Same Name
How to Set Headers Using Python's Urllib
Re.Sub Erroring with "Expected String or Bytes-Like Object"
Factorize a Column of Strings in Pandas
How to Set Ticks on Fixed Position , Matplotlib
Matplotlib Connect Scatterplot Points with Line - Python
Download and Save PDF File with Python Requests Module
Python Re.Sub Back Reference Not Back Referencing
Openpyxl 1.8.5: Reading the Result of a Formula Typed in a Cell Using Openpyxl
What's a Good Equivalent to Subprocess.Check_Call That Returns the Contents of Stdout