How to assign a unique ID number to each group of identical values in a column
How about
df2 <- transform(df,id=as.numeric(factor(sample)))
?
I think this (cribbed from Add ID column by group) should be slightly more efficient, although perhaps a little harder to remember:
df3 <- transform(df, id=match(sample, unique(sample)))
all.equal(df2,df3) ## TRUE
If you want to do this in tidyverse:
library(dplyr)
df %>% group_by(sample) %>% mutate(id=cur_group_id())
assign a unique ID number for every repeated value in a column R
It can be done using rowid
library(data.table)
library(dplyr)
weighted_df %>%
mutate(ID = rowid(Name))
-output
# Name Room1 Room2 Room3 ID
#1 H001 0.579649851 0.84602529 0.620850211 1
#2 H001 0.579649851 0.84602529 0.620850211 2
#3 H001 0.579649851 0.84602529 0.620850211 3
#4 H001 0.579649851 0.84602529 0.620850211 4
#5 H001 0.579649851 0.84602529 0.620850211 5
#6 H001 0.579649851 0.84602529 0.620850211 6
#7 H001 0.579649851 0.84602529 0.620850211 7
#8 H001 0.579649851 0.84602529 0.620850211 8
#9 H001 0.579649851 0.84602529 0.620850211 9
#10 H001 0.579649851 0.84602529 0.620850211 10
#11 H001 0.579649851 0.84602529 0.620850211 11
#12 H001 0.579649851 0.84602529 0.620850211 12
#13 H001 0.579649851 0.84602529 0.620850211 13
#14 H001 0.579649851 0.84602529 0.620850211 14
#15 H001 0.579649851 0.84602529 0.620850211 15
#16 H001 0.579649851 0.84602529 0.620850211 16
#17 H001 0.579649851 0.84602529 0.620850211 17
#18 H002 1.457267473 -1.18612874 0.553957293 1
#19 H002 1.457267473 -1.18612874 0.553957293 2
# ...
Create a new column with unique identifier for each group
Try with groupby ngroup
+ 1, use sort=False
to ensure groups are enumerated in the order they appear in the DataFrame:
df['idx'] = df.groupby(['ID', 'phase'], sort=False).ngroup() + 1
df
:
ID phase side values idx
0 r1 ph1 l 12 1
1 r1 ph1 r 34 1
2 r1 ph2 l 93 2
3 s4 ph3 l 21 3
4 s3 ph2 l 88 4
5 s3 ph2 r 54 4
How to assign a unique ID to each corresponding unique column value?
Use window
functions.
scala> df.show(false)
+-------+--------+---------+---------+--------+-----------------+
|address|lastname|firstname|patientid|policyno|visitid |
+-------+--------+---------+---------+--------+-----------------+
|addr1 |Pits |Rodney |patid1 |policy1 |visituid_rodney_1|
|addr1 |Pits |Rodney |patid1 |policy1 |visituid_rodney_2|
|addr1 |Pits |Rodney |patid1 |policy1 |visituid_rodney_3|
|addr2 |Dobs |Aliya |patid2 |policy2 |visituid_aliya_1 |
|addr2 |Dobs |Aliya |patid2 |policy2 |visituid_aliya_2 |
|addr2 |Dobs |OP |patid3 |policy3 |visituid_OP_1 |
|addr4 |AN |OTHER |patid4 |policy4 |visituid_OP_1 |
|addr2 |ANO |THER |patid5 |policy5 |visituid_OP_1 |
+-------+--------+---------+---------+--------+-----------------+
val expr = Seq("lastname","firstname")
.map(c => hash(col(c)).as(c))
.reduce(concat(_,_).asc)
val winSpec = dense_rank()
.over(Window.orderBy(expr))
Output
scala> df.withColumn("id",winSpec).show(false)
+-------+--------+---------+---------+--------+-----------------+---+
|address|lastname|firstname|patientid|policyno|visitid |id |
+-------+--------+---------+---------+--------+-----------------+---+
|addr1 |Pits |Rodney |patid1 |policy1 |visituid_rodney_1|1 |
|addr1 |Pits |Rodney |patid1 |policy1 |visituid_rodney_2|1 |
|addr1 |Pits |Rodney |patid1 |policy1 |visituid_rodney_3|1 |
|addr4 |AN |OTHER |patid4 |policy4 |visituid_OP_1 |2 |
|addr2 |Dobs |OP |patid3 |policy3 |visituid_OP_1 |3 |
|addr2 |Dobs |Aliya |patid2 |policy2 |visituid_aliya_1 |4 |
|addr2 |Dobs |Aliya |patid2 |policy2 |visituid_aliya_2 |4 |
|addr2 |ANO |THER |patid5 |policy5 |visituid_OP_1 |5 |
+-------+--------+---------+---------+--------+-----------------+---+
PostgreSQL -Assign Unique ID to each groups of records, based on a condition
We can use LAG
here along with SUM
as an analytic function:
WITH cte AS (
SELECT *, CASE WHEN TELEGRAMMZAEHLER =
LAG(TELEGRAMMZAEHLER) OVER (ORDER BY TMSTP DESC) - 1
THEN 0 ELSE 1 END AS idx
FROM yourTable
)
SELECT ID, TMSTP, CAT_TYPE, TELEGRAMMZAEHLER,
SUM(idx) OVER (ORDER BY TMSTP DESC) AS GRUPPE
FROM cte
ORDER BY TMSTP DESC;
Demo
Assign unique id each time two fields/columns are an exact match
Using DENSE_RANK()
should do the trick:
SELECT *,
DENSE_RANK() OVER(ORDER by person_id, datetime) as unique_combination_id
FROM tbl;
DEMO
How to generate a unique ID based on duplicate values in either of two columns?
Further to my comment above, to reproduce the same ID
order you need to ensure the proper factor
level ordering
library(dplyr)
df %>% mutate(ID = as.integer(factor(Name, levels = unique(Name))))
# E.mail Phone Name ID
#1 mortena 3076 morten 1
#2 kaspera 2688 kasper 2
#3 christoffera 1212 christoffer 3
#4 mortenb 3076 morten 1
#5 mortena 3075 morten 1
#6 kasperb 2688 kasper 2
#7 christoffera 1213 christoffer 3
or with group_indices
df %>% mutate(ID = group_indices(., factor(Name, levels = unique(Name))))
If the order of ID
s doesn't matter you can do
df %>% mutate(ID = as.integer(as.factor(Name)))
or
df %>% mutate(ID = group_indices(., Name))
In these cases, the order of ID
is determined by the alphabetical order of unique Name
s (rather than by their occurrence).
Sample data
df <- read.table(text = "E-mail Phone Name
mortena 3076 morten
kaspera 2688 kasper
christoffera 1212 christoffer
mortenb 3076 morten
mortena 3075 morten
kasperb 2688 kasper
christoffera 1213 christoffer", header = T)
Assign Unique Group Id To Sets of Rows with Same Column Value Separated by Other value
Try this:
with t as
(
select
uid,
radius,
lag(radius,1) over (order by uid) as prev_rad
from
radtable
)
select
uid,
radius,
sum
(
case when radius = coalesce(prev_rad,radius) then 0 else 1 end
)
over
(
order by uid
) + 1 as GroupID
from
t
Related Topics
Getting Warning: " 'Newdata' Had 1 Row But Variables Found Have 32 Rows" on Predict.Lm
Standard Evaluation in Dplyr: Summarise a Variable Given as a Character String
Labeling Outliers of Boxplots in R
How to Quickly Form Groups (Quartiles, Deciles, etc) by Ordering Column(S) in a Data Frame
What Ways Are There to Edit a Function in R
Dplyr: Inner_Join With a Partial String Match
Rcpp Pass by Reference Vs. by Value
Posix Character Class Does Not Work in Base R Regex
Plotting Smooth Line Through All Data Points
How to Center Stacked Percent Barchart Labels
Why Do R Objects Not Print in a Function or a "For" Loop
Limit Ggplot2 Axes Without Removing Data (Outside Limits): Zoom
How to Count Runs in a Sequence
How to Create an R Function Programmatically
How to Efficiently Calculate Distance Between Pair of Coordinates Using Data.Table :=
Cumulatively Paste (Concatenate) Values Grouped by Another Variable
How to Omit Na Values While Pasting Numerous Column Values Together