Make a Column with Duplicated Values Unique in a Dataframe

How to Make column Duplicate Values to Unique?

If there is maximal 26 duplicated values like alphabets create dictionary by enumerate with string.ascii_uppercase, select only duplicated rows by DataFrame.duplicated and add new values created by counter by GroupBy.cumcount and Series.map:

import string

d = dict(enumerate(string.ascii_uppercase))

print (len(d))
26

m = df.duplicated(['colA', 'ColB'], keep=False)
df.loc[m, 'colA'] += '_' + df[m].groupby(['colA', 'ColB']).cumcount().map(d)
print (df)
  colA ColB  ColC
0  A_A    B   345
1    B    C   876
2    D    B   983
3  A_B    B   371
4  G_A    B   972
5    H    K   193
6  G_B    B   367
7    D    J   293

If possible add numbers instead alphabets is possible solution simplify:

m = df.duplicated(['colA', 'ColB'], keep=False)
df.loc[m, 'colA'] += '_' + df[m].groupby(['colA', 'ColB']).cumcount().astype(str)
print (df)
  colA ColB  ColC
0  A_0    B   345
1    B    C   876
2    D    B   983
3  A_1    B   371
4  G_0    B   972
5    H    K   193
6  G_1    B   367
7    D    J   293

Make a column with duplicated values unique in a dataframe

We can use make.names with unique=TRUE. By default, a . will be appended before the suffix numbers, and that can be replaced by _ using sub

 employee$name <- sub('[.]', '_', make.names(employee$name, unique=TRUE))

Or a better option suggested by @DavidArenburg. If the name column is factor class, convert the input column to character class (as.character) before applying the make.unique

 make.unique(as.character(employee$name), sep = "_")
 #[1] "John"   "Joe"    "Mat"    "John_1" "Joe_1"

Produce Unique value for duplicates in column using Pandas/Python

You can use groupby.cumcount:

df['type'] += np.where(df['type'].duplicated(),
                       df.groupby('type').cumcount().astype(str), 
                       '')

Or similarly with loc update:

df.loc[df['type'].duplicated(), 'type'] += df.groupby('type').cumcount().astype(str)

Output:

  type  total  free  use
0    a     10     5    5
1   a1     10     4    6
2   a2     10     1    9
3   a3     10     8    2
4   a4     10     3    7
5    b     20     5    5
6   b1     20     3    7
7   b2     20     2    8
8   b3     20     6    4
9   b4     20     2    8

How can unique show duplicate values in a dataframe?

Moving my comment to an answer, as it solved the problem:

print(df['ID'].astype(int).unique())

Pandas: Split dataframe with duplicate values into dataframe with unique values

I don't think you can achieve this in a vectorial way.

One possibility is to use a custom function to iterate the items and keep track of the unique ones. Then use this to split with groupby:

def cum_uniq(s):
    i = 0
    seen = set()
    out = []
    for x in s:
        if x in seen:
            i+=1
            seen = set()
        out.append(i)
        seen.add(x)
    return pd.Series(out, index=s.index)

out = [g for _,g in df.groupby(cum_uniq(df['Col1']))]

output:

[  Col1
 0    a,
   Col1
 1    a
 2    b,
   Col1
 3    a,
   Col1
 4    a
 5    b]

intermediate:

cum_uniq(df['Col1'])

0    0
1    1
2    1
3    2
4    3
5    3
dtype: int64

if order doesn't matter

Let's ad a Col2 to the example:

  Col1  Col2
0    a     0
1    a     1
2    b     2
3    a     3
4    a     4
5    b     5

the previous code gives:

[  Col1  Col2
 0    a     0,
   Col1  Col2
 1    a     1
 2    b     2,
   Col1  Col2
 3    a     3,
   Col1  Col2
 4    a     4
 5    b     5]

If order does not matter, you can vectorize it:

out = [g for _,g in df.groupby(df.groupby('Col1').cumcount())]

output:

[  Col1  Col2
0    a     0
2    b     2,
   Col1  Col2
1    a     1
5    b     5,
   Col1  Col2
3    a     3,
   Col1  Col2
4    a     4]

making duplicate values into unique

Here is what I tried and it worked for me.... I took help and declared a class for renaming duplicate values.

class renamer():
def init(self):
self.d = dict()

def __call__(self, x):
    if x not in self.d:
        self.d[x] = 0
        return x
    else:
        self.d[x] += 1
        return "%s_%d" % (x, self.d[x])

and then I just used apply function to the dataframe column.

df['ID'] = df['ID'].apply(renamer())