How to Assign a Unique Id Number to Each Group of Identical Values in a Column

How to assign a unique ID number to each group of identical values in a column

How about

df2 <- transform(df,id=as.numeric(factor(sample)))

?

I think this (cribbed from Add ID column by group) should be slightly more efficient, although perhaps a little harder to remember:

df3 <- transform(df, id=match(sample, unique(sample)))
all.equal(df2,df3) ## TRUE

If you want to do this in tidyverse:

library(dplyr)
df %>% group_by(sample) %>% mutate(id=cur_group_id())

assign a unique ID number for every repeated value in a column R

It can be done using rowid

library(data.table)
library(dplyr)
weighted_df %>%
mutate(ID = rowid(Name))

-output

#     Name        Room1       Room2        Room3 ID
#1 H001 0.579649851 0.84602529 0.620850211 1
#2 H001 0.579649851 0.84602529 0.620850211 2
#3 H001 0.579649851 0.84602529 0.620850211 3
#4 H001 0.579649851 0.84602529 0.620850211 4
#5 H001 0.579649851 0.84602529 0.620850211 5
#6 H001 0.579649851 0.84602529 0.620850211 6
#7 H001 0.579649851 0.84602529 0.620850211 7
#8 H001 0.579649851 0.84602529 0.620850211 8
#9 H001 0.579649851 0.84602529 0.620850211 9
#10 H001 0.579649851 0.84602529 0.620850211 10
#11 H001 0.579649851 0.84602529 0.620850211 11
#12 H001 0.579649851 0.84602529 0.620850211 12
#13 H001 0.579649851 0.84602529 0.620850211 13
#14 H001 0.579649851 0.84602529 0.620850211 14
#15 H001 0.579649851 0.84602529 0.620850211 15
#16 H001 0.579649851 0.84602529 0.620850211 16
#17 H001 0.579649851 0.84602529 0.620850211 17
#18 H002 1.457267473 -1.18612874 0.553957293 1
#19 H002 1.457267473 -1.18612874 0.553957293 2
# ...

Create a new column with unique identifier for each group

Try with groupby ngroup + 1, use sort=False to ensure groups are enumerated in the order they appear in the DataFrame:

df['idx'] = df.groupby(['ID', 'phase'], sort=False).ngroup() + 1

df:

   ID phase side  values  idx
0 r1 ph1 l 12 1
1 r1 ph1 r 34 1
2 r1 ph2 l 93 2
3 s4 ph3 l 21 3
4 s3 ph2 l 88 4
5 s3 ph2 r 54 4

How to assign a unique ID to each corresponding unique column value?

Use window functions.

scala> df.show(false)
+-------+--------+---------+---------+--------+-----------------+
|address|lastname|firstname|patientid|policyno|visitid |
+-------+--------+---------+---------+--------+-----------------+
|addr1 |Pits |Rodney |patid1 |policy1 |visituid_rodney_1|
|addr1 |Pits |Rodney |patid1 |policy1 |visituid_rodney_2|
|addr1 |Pits |Rodney |patid1 |policy1 |visituid_rodney_3|
|addr2 |Dobs |Aliya |patid2 |policy2 |visituid_aliya_1 |
|addr2 |Dobs |Aliya |patid2 |policy2 |visituid_aliya_2 |
|addr2 |Dobs |OP |patid3 |policy3 |visituid_OP_1 |
|addr4 |AN |OTHER |patid4 |policy4 |visituid_OP_1 |
|addr2 |ANO |THER |patid5 |policy5 |visituid_OP_1 |
+-------+--------+---------+---------+--------+-----------------+
val expr = Seq("lastname","firstname")
.map(c => hash(col(c)).as(c))
.reduce(concat(_,_).asc)
val winSpec = dense_rank()
.over(Window.orderBy(expr))

Output

scala> df.withColumn("id",winSpec).show(false)

+-------+--------+---------+---------+--------+-----------------+---+
|address|lastname|firstname|patientid|policyno|visitid |id |
+-------+--------+---------+---------+--------+-----------------+---+
|addr1 |Pits |Rodney |patid1 |policy1 |visituid_rodney_1|1 |
|addr1 |Pits |Rodney |patid1 |policy1 |visituid_rodney_2|1 |
|addr1 |Pits |Rodney |patid1 |policy1 |visituid_rodney_3|1 |
|addr4 |AN |OTHER |patid4 |policy4 |visituid_OP_1 |2 |
|addr2 |Dobs |OP |patid3 |policy3 |visituid_OP_1 |3 |
|addr2 |Dobs |Aliya |patid2 |policy2 |visituid_aliya_1 |4 |
|addr2 |Dobs |Aliya |patid2 |policy2 |visituid_aliya_2 |4 |
|addr2 |ANO |THER |patid5 |policy5 |visituid_OP_1 |5 |
+-------+--------+---------+---------+--------+-----------------+---+

PostgreSQL -Assign Unique ID to each groups of records, based on a condition

We can use LAG here along with SUM as an analytic function:

WITH cte AS (
SELECT *, CASE WHEN TELEGRAMMZAEHLER =
LAG(TELEGRAMMZAEHLER) OVER (ORDER BY TMSTP DESC) - 1
THEN 0 ELSE 1 END AS idx
FROM yourTable
)

SELECT ID, TMSTP, CAT_TYPE, TELEGRAMMZAEHLER,
SUM(idx) OVER (ORDER BY TMSTP DESC) AS GRUPPE
FROM cte
ORDER BY TMSTP DESC;

Demo

Assign unique id each time two fields/columns are an exact match

Using DENSE_RANK() should do the trick:

SELECT *, 
DENSE_RANK() OVER(ORDER by person_id, datetime) as unique_combination_id
FROM tbl;

DEMO

How to generate a unique ID based on duplicate values in either of two columns?

Further to my comment above, to reproduce the same ID order you need to ensure the proper factor level ordering

library(dplyr)
df %>% mutate(ID = as.integer(factor(Name, levels = unique(Name))))
# E.mail Phone Name ID
#1 mortena 3076 morten 1
#2 kaspera 2688 kasper 2
#3 christoffera 1212 christoffer 3
#4 mortenb 3076 morten 1
#5 mortena 3075 morten 1
#6 kasperb 2688 kasper 2
#7 christoffera 1213 christoffer 3

or with group_indices

df %>% mutate(ID = group_indices(., factor(Name, levels = unique(Name))))

If the order of IDs doesn't matter you can do

df %>% mutate(ID = as.integer(as.factor(Name)))

or

df %>% mutate(ID = group_indices(., Name))

In these cases, the order of ID is determined by the alphabetical order of unique Names (rather than by their occurrence).


Sample data

df <- read.table(text = "E-mail         Phone      Name
mortena 3076 morten
kaspera 2688 kasper
christoffera 1212 christoffer
mortenb 3076 morten
mortena 3075 morten
kasperb 2688 kasper
christoffera 1213 christoffer", header = T)

Assign Unique Group Id To Sets of Rows with Same Column Value Separated by Other value

Try this:

with t as
(
select
uid,
radius,
lag(radius,1) over (order by uid) as prev_rad
from
radtable
)

select
uid,
radius,
sum
(
case when radius = coalesce(prev_rad,radius) then 0 else 1 end
)
over
(
order by uid
) + 1 as GroupID
from
t


Related Topics



Leave a reply



Submit