Duplicate a Column in Data Frame and Rename It to Another Column Name

Duplicate a column in data frame and rename it to another column name

Answered with help of user @thelatemail.

df = read.table(sep="",
header=T,
text="Name Age Rate
Aira 23 90
Ben 32 98
Cat 27 95")

df$Rate2 = df$Rate #create column 'Rate2' and make it equal to 'Rate' (duplicate).

Another option to duplicate, triplicate or 'n plicate':

#use ?replicate function, which replicates elements over vectors and lists. 
n = 3 #replicate 3 new columns
df3 = cbind(df, replicate(n,df$Rate)) #replicate from column "Rate" in the df object
df3 #plot df3 output

Name Age Rate 1 2 3
1 Aira 23 90 90 90 90
2 Ben 32 98 98 98 98
3 Cat 27 95 95 95 95

Renaming columns in a Pandas dataframe with duplicate column names?

X_R.columns = ['Retail','Cost']

Rename duplicate column name by order in Pandas

You could use an itertools.count() counter and a list expression to create new column headers, then assign them to the data frame.

For example:

>>> import itertools
>>> df = pd.DataFrame([[1, 2, 3]], columns=["Nice", "Nice", "Hello"])
>>> df
Nice Nice Hello
0 1 2 3
>>> count = itertools.count(1)
>>> new_cols = [f"Nice{next(count)}" if col == "Nice" else col for col in df.columns]
>>> df.columns = new_cols
>>> df
Nice1 Nice2 Hello
0 1 2 3

(Python 3.6+ required for the f-strings)

EDIT: Alternatively, per the comment below, the list expression can replace any label that may contain "Nice" in case there are unexpected spaces or other characters:

new_cols = [f"Nice{next(count)}" if "Nice" in col else col for col in df.columns]

Python - Pandas - Copy column names to new dataframe without bringing data

You could do it like this:

new_df = df.copy()
new_df[['5:10', '6:10', '7:10']] = ''

or more concise:

new_df = df.copy()
new_df[new_df.columns[1:]] = ''

But why not just create a new dataframe with new_df = df.copy() and then perform your computations without blanking the dataframe? I don't think you need to do that, and it just adds time to the process.

Pandas copy column names from one dataframe to another

Just like you have used columns from the dataframe with column names, you can use values from the dataframe without column names:

new_df_with_col_names = pd.DataFrame(data=no_col_names_df.values, columns=col_names_df.columns)


In [4]: new_df_with_col_names = pd.DataFrame(data=no_col_names_df, columns=col_names_df.columns)

In [5]: new_df_with_col_names
Out[5]:
col1 col2 col3
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN

In [6]: new_df_with_col_names = pd.DataFrame(data=no_col_names_df.values, columns=col_names_df.columns)

In [7]: new_df_with_col_names
Out[7]:
col1 col2 col3
0 1 2 3
1 4 5 6
2 7 8 9

Panda's DataFrame - renaming multiple identically named columns

I was looking to find a solution within Pandas more than a general Python solution.
Column's get_loc() function returns a masked array if it finds duplicates with 'True' values pointing to the locations where duplicates are found. I then use the mask to assign new values into those locations. In my case, I know ahead of time how many dups I'm going to get and what I'm going to assign to them but it looks like df.columns.get_duplicates() would return a list of all dups and you can then use that list in conjunction with get_loc() if you need a more generic dup-weeding action

'''UPDATED AS-OF SEPT 2020'''

cols=pd.Series(df.columns)
for dup in df.columns[df.columns.duplicated(keep=False)]:
cols[df.columns.get_loc(dup)] = ([dup + '.' + str(d_idx)
if d_idx != 0
else dup
for d_idx in range(df.columns.get_loc(dup).sum())]
)
df.columns=cols

blah blah2 blah3 blah.1 blah.2
0 0 1 2 3 4
1 5 6 7 8 9

New Better Method (Update 03Dec2019)

This code below is better than above code. Copied from another answer below (@SatishSK):

#sample df with duplicate blah column
df=pd.DataFrame(np.arange(2*5).reshape(2,5))
df.columns=['blah','blah2','blah3','blah','blah']
df

# you just need the following 4 lines to rename duplicates
# df is the dataframe that you want to rename duplicated columns

cols=pd.Series(df.columns)

for dup in cols[cols.duplicated()].unique():
cols[cols[cols == dup].index.values.tolist()] = [dup + '.' + str(i) if i != 0 else dup for i in range(sum(cols == dup))]

# rename the columns with the cols list.
df.columns=cols

df

Output:

    blah    blah2   blah3   blah.1  blah.2
0 0 1 2 3 4
1 5 6 7 8 9

Merging multiple data frames causing duplicate column names

You can do

s = pd.concat([x.set_index('key') for x in df_list],axis = 1,keys=range(len(df_list)))
s.columns = s.columns.map('{0[1]}_{0[0]}'.format)
s = s.reset_index()
s
Out[236]:
key value_0 value_1 value_2 value_3
0 A -1.957968 NaN -0.852135 -0.976960
1 B 1.545932 -0.276838 NaN 0.197615
2 C -2.149727 NaN -0.364382 0.349993
3 D 0.524990 -0.476655 NaN NaN
4 E NaN -2.135870 0.798782 NaN
5 F NaN 1.456544 -0.255705 0.447279

Is there a built in Python/pandas function to rename duplicate columns in a pandas.DataFrame?

but I could not find a handy way to do this to a DataFrame object already

To an existing dataframe we have to resort to some code, there is no builtin;

s = pd.Series(df.columns)
df.columns= df.columns+s.groupby(s).cumcount().replace(0,'').astype(str)

x x1 y
0 2 5 1
1 1 9 3
2 4 1 2


Related Topics



Leave a reply



Submit