Make a Column with Duplicated Values Unique in a Dataframe

How to Make column Duplicate Values to Unique?

If there is maximal 26 duplicated values like alphabets create dictionary by enumerate with string.ascii_uppercase, select only duplicated rows by DataFrame.duplicated and add new values created by counter by GroupBy.cumcount and Series.map:

import string

d = dict(enumerate(string.ascii_uppercase))

print (len(d))
26

m = df.duplicated(['colA', 'ColB'], keep=False)
df.loc[m, 'colA'] += '_' + df[m].groupby(['colA', 'ColB']).cumcount().map(d)
print (df)
colA ColB ColC
0 A_A B 345
1 B C 876
2 D B 983
3 A_B B 371
4 G_A B 972
5 H K 193
6 G_B B 367
7 D J 293

If possible add numbers instead alphabets is possible solution simplify:

m = df.duplicated(['colA', 'ColB'], keep=False)
df.loc[m, 'colA'] += '_' + df[m].groupby(['colA', 'ColB']).cumcount().astype(str)
print (df)
colA ColB ColC
0 A_0 B 345
1 B C 876
2 D B 983
3 A_1 B 371
4 G_0 B 972
5 H K 193
6 G_1 B 367
7 D J 293

Make a column with duplicated values unique in a dataframe

We can use make.names with unique=TRUE. By default, a . will be appended before the suffix numbers, and that can be replaced by _ using sub

 employee$name <- sub('[.]', '_', make.names(employee$name, unique=TRUE))

Or a better option suggested by @DavidArenburg. If the name column is factor class, convert the input column to character class (as.character) before applying the make.unique

 make.unique(as.character(employee$name), sep = "_")
#[1] "John" "Joe" "Mat" "John_1" "Joe_1"

Produce Unique value for duplicates in column using Pandas/Python

You can use groupby.cumcount:

df['type'] += np.where(df['type'].duplicated(),
df.groupby('type').cumcount().astype(str),
'')

Or similarly with loc update:

df.loc[df['type'].duplicated(), 'type'] += df.groupby('type').cumcount().astype(str)

Output:

  type  total  free  use
0 a 10 5 5
1 a1 10 4 6
2 a2 10 1 9
3 a3 10 8 2
4 a4 10 3 7
5 b 20 5 5
6 b1 20 3 7
7 b2 20 2 8
8 b3 20 6 4
9 b4 20 2 8

How can unique show duplicate values in a dataframe?

Moving my comment to an answer, as it solved the problem:

print(df['ID'].astype(int).unique())

Pandas: Split dataframe with duplicate values into dataframe with unique values

I don't think you can achieve this in a vectorial way.

One possibility is to use a custom function to iterate the items and keep track of the unique ones. Then use this to split with groupby:

def cum_uniq(s):
i = 0
seen = set()
out = []
for x in s:
if x in seen:
i+=1
seen = set()
out.append(i)
seen.add(x)
return pd.Series(out, index=s.index)

out = [g for _,g in df.groupby(cum_uniq(df['Col1']))]

output:

[  Col1
0 a,
Col1
1 a
2 b,
Col1
3 a,
Col1
4 a
5 b]

intermediate:

cum_uniq(df['Col1'])

0 0
1 1
2 1
3 2
4 3
5 3
dtype: int64
if order doesn't matter

Let's ad a Col2 to the example:

  Col1  Col2
0 a 0
1 a 1
2 b 2
3 a 3
4 a 4
5 b 5

the previous code gives:

[  Col1  Col2
0 a 0,
Col1 Col2
1 a 1
2 b 2,
Col1 Col2
3 a 3,
Col1 Col2
4 a 4
5 b 5]

If order does not matter, you can vectorize it:

out = [g for _,g in df.groupby(df.groupby('Col1').cumcount())]

output:

[  Col1  Col2
0 a 0
2 b 2,
Col1 Col2
1 a 1
5 b 5,
Col1 Col2
3 a 3,
Col1 Col2
4 a 4]

making duplicate values into unique

Here is what I tried and it worked for me.... I took help and declared a class for renaming duplicate values.

class renamer():
def init(self):
self.d = dict()

def __call__(self, x):
if x not in self.d:
self.d[x] = 0
return x
else:
self.d[x] += 1
return "%s_%d" % (x, self.d[x])

and then I just used apply function to the dataframe column.

df['ID'] = df['ID'].apply(renamer())



Related Topics



Leave a reply



Submit