How to Make column Duplicate Values to Unique?
If there is maximal 26
duplicated values like alphabets create dictionary by enumerate
with string.ascii_uppercase
, select only duplicated rows by DataFrame.duplicated
and add new values created by counter by GroupBy.cumcount
and Series.map
:
import string
d = dict(enumerate(string.ascii_uppercase))
print (len(d))
26
m = df.duplicated(['colA', 'ColB'], keep=False)
df.loc[m, 'colA'] += '_' + df[m].groupby(['colA', 'ColB']).cumcount().map(d)
print (df)
colA ColB ColC
0 A_A B 345
1 B C 876
2 D B 983
3 A_B B 371
4 G_A B 972
5 H K 193
6 G_B B 367
7 D J 293
If possible add numbers instead alphabets is possible solution simplify:
m = df.duplicated(['colA', 'ColB'], keep=False)
df.loc[m, 'colA'] += '_' + df[m].groupby(['colA', 'ColB']).cumcount().astype(str)
print (df)
colA ColB ColC
0 A_0 B 345
1 B C 876
2 D B 983
3 A_1 B 371
4 G_0 B 972
5 H K 193
6 G_1 B 367
7 D J 293
Make a column with duplicated values unique in a dataframe
We can use make.names
with unique=TRUE
. By default, a .
will be appended before the suffix numbers, and that can be replaced by _
using sub
employee$name <- sub('[.]', '_', make.names(employee$name, unique=TRUE))
Or a better option suggested by @DavidArenburg. If the name
column is factor
class, convert the input column to character
class (as.character
) before applying the make.unique
make.unique(as.character(employee$name), sep = "_")
#[1] "John" "Joe" "Mat" "John_1" "Joe_1"
Produce Unique value for duplicates in column using Pandas/Python
You can use groupby.cumcount
:
df['type'] += np.where(df['type'].duplicated(),
df.groupby('type').cumcount().astype(str),
'')
Or similarly with loc
update:
df.loc[df['type'].duplicated(), 'type'] += df.groupby('type').cumcount().astype(str)
Output:
type total free use
0 a 10 5 5
1 a1 10 4 6
2 a2 10 1 9
3 a3 10 8 2
4 a4 10 3 7
5 b 20 5 5
6 b1 20 3 7
7 b2 20 2 8
8 b3 20 6 4
9 b4 20 2 8
How can unique show duplicate values in a dataframe?
Moving my comment to an answer, as it solved the problem:
print(df['ID'].astype(int).unique())
Pandas: Split dataframe with duplicate values into dataframe with unique values
I don't think you can achieve this in a vectorial way.
One possibility is to use a custom function to iterate the items and keep track of the unique ones. Then use this to split with groupby
:
def cum_uniq(s):
i = 0
seen = set()
out = []
for x in s:
if x in seen:
i+=1
seen = set()
out.append(i)
seen.add(x)
return pd.Series(out, index=s.index)
out = [g for _,g in df.groupby(cum_uniq(df['Col1']))]
output:
[ Col1
0 a,
Col1
1 a
2 b,
Col1
3 a,
Col1
4 a
5 b]
intermediate:
cum_uniq(df['Col1'])
0 0
1 1
2 1
3 2
4 3
5 3
dtype: int64
if order doesn't matter
Let's ad a Col2 to the example:
Col1 Col2
0 a 0
1 a 1
2 b 2
3 a 3
4 a 4
5 b 5
the previous code gives:
[ Col1 Col2
0 a 0,
Col1 Col2
1 a 1
2 b 2,
Col1 Col2
3 a 3,
Col1 Col2
4 a 4
5 b 5]
If order does not matter, you can vectorize it:
out = [g for _,g in df.groupby(df.groupby('Col1').cumcount())]
output:
[ Col1 Col2
0 a 0
2 b 2,
Col1 Col2
1 a 1
5 b 5,
Col1 Col2
3 a 3,
Col1 Col2
4 a 4]
making duplicate values into unique
Here is what I tried and it worked for me.... I took help and declared a class for renaming duplicate values.
class renamer():
def init(self):
self.d = dict()
def __call__(self, x):
if x not in self.d:
self.d[x] = 0
return x
else:
self.d[x] += 1
return "%s_%d" % (x, self.d[x])
and then I just used apply function to the dataframe column.
df['ID'] = df['ID'].apply(renamer())
Related Topics
Using Mutate Rowwise Over a Subset of Columns
R Plotly: Preserving Appearance of Two Legends When Converting Ggplot2 with Ggplotly
Conda Build R Package Fails at C Compiler Issue on Macos Mojave
R Markdown Add Tag to Head of HTML Output
Change Plot Panel in Multipanel Plot in R
Ggplot2: Shape, Color and Linestyle into One Legend
Include Link to Local HTML File in Datatable in Shiny
Fast Alternative to Split in R
R: How to Prompt The User for Input from The Console
Convert Unicode to Readable Characters in R
Ggplot2 Log Transformation for Data and Scales
R: Gradient Fill for Geom_Rect in Ggplot2
Tiff Plot Generation and Compression: R VS. Gimp VS. Irfanview VS. Photoshop File Sizes
Fill Missing Values in The Data.Frame with The Data from The Same Data Frame