Add ID column by group
Here's one way using interaction
.
d <- read.table(text='LAT LONG
13.5330 -15.4180
13.5330 -15.4180
13.5330 -15.4180
13.5330 -15.4180
13.5330 -15.4170
13.5330 -15.4170
13.5330 -15.4170
13.5340 -14.9350
13.5340 -14.9350
13.5340 -15.9170
13.3670 -14.6190', header=TRUE)
d <- transform(d, Cluster_ID = as.numeric(interaction(LAT, LONG, drop=TRUE)))
# LAT LONG Cluster_ID
# 1 13.533 -15.418 2
# 2 13.533 -15.418 2
# 3 13.533 -15.418 2
# 4 13.533 -15.418 2
# 5 13.533 -15.417 3
# 6 13.533 -15.417 3
# 7 13.533 -15.417 3
# 8 13.534 -14.935 4
# 9 13.534 -14.935 4
# 10 13.534 -15.917 1
# 11 13.367 -14.619 5
EDIT: Incorporated @Spacedman's suggestion to supply drop=TRUE
to interaction
.
R - Group by variable and then assign a unique ID
dplyr
has a group_indices
function for creating unique group IDs
library(dplyr)
data <- data.frame(personal_id = c("111-111-111", "999-999-999", "222-222-222", "111-111-111"),
gender = c("M", "F", "M", "M"),
temperature = c(99.6, 98.2, 97.8, 95.5))
data$group_id <- data %>% group_indices(personal_id)
data <- data %>% select(-personal_id)
data
gender temperature group_id
1 M 99.6 1
2 F 98.2 3
3 M 97.8 2
4 M 95.5 1
Or within the same pipeline (https://github.com/tidyverse/dplyr/issues/2160):
data %>%
mutate(group_id = group_indices(., personal_id))
Pandas groupby and create a unique ID column for every row
If you need count per group by row we have cumcount
:
df['new'] = df.groupby('fruit').cumcount()
df
Out[346]:
fruit count new
0 apple 1 0
1 apple 20 1
2 apple 21 2
3 mango 31 0
4 mango 17 1
Or:
df['new'] = df.assign(new=1).groupby('fruit')['new'].cumsum()-1
df
Out[352]:
fruit count new
0 apple 1 0
1 apple 20 1
2 apple 21 2
3 mango 31 0
4 mango 17 1
Pandas: How to add a new column of Group ID to a dataframe based on the loop of a column of index sequence
As you mentioned:
represent Group ID based on the loop of Index
If you mean that a new group is formed whenever Index
reset to 0
, you can try:
df['Group ID'] = df['Index'].eq(0).cumsum()
Or, if you mean a new group is formed whenever Index
stop from an increasing sequence and starts from an index value smaller than the previous index, you can use:
df['Group ID'] = df['Index'].diff().lt(0).cumsum() + 1
Result:
print(df)
Index Column1 Column2 Group ID
0 0 xxxxxxx yyyy 1
1 1 xxxxxx yyy 1
2 2 xxxxx yy 1
3 0 xxx y 2
4 1 xx yyyyy 2
Create a new column with unique identifier for each group
Try with groupby ngroup
+ 1, use sort=False
to ensure groups are enumerated in the order they appear in the DataFrame:
df['idx'] = df.groupby(['ID', 'phase'], sort=False).ngroup() + 1
df
:
ID phase side values idx
0 r1 ph1 l 12 1
1 r1 ph1 r 34 1
2 r1 ph2 l 93 2
3 s4 ph3 l 21 3
4 s3 ph2 l 88 4
5 s3 ph2 r 54 4
Group dataframe rows by creating a unique ID column based on the amount of time passed between entries and variable values
Here's a dplyr approach that calculates the gap and rolling avg gap within each Name/Item group, then flags large gaps, and assigns a new group for each large gap or change in Name or Item.
df1 %>%
group_by(Name,Item) %>%
mutate(purch_num = row_number(),
time_since_first = Date - first(Date),
gap = Date - lag(Date, default = as.Date(-Inf)),
avg_gap = time_since_first / (purch_num-1),
new_grp_flag = gap > 180 | gap > 3*avg_gap) %>%
ungroup() %>%
mutate(group = cumsum(new_grp_flag))
How to assign a unique ID number to each group of identical values in a column
How about
df2 <- transform(df,id=as.numeric(factor(sample)))
?
I think this (cribbed from Add ID column by group) should be slightly more efficient, although perhaps a little harder to remember:
df3 <- transform(df, id=match(sample, unique(sample)))
all.equal(df2,df3) ## TRUE
If you want to do this in tidyverse:
library(dplyr)
df %>% group_by(sample) %>% mutate(id=cur_group_id())
Assign unique ID to Pandas group but add one if repeated
You can use .groupby()
followed by ngroup()
:
df["id"] = df.groupby((df["fruit"] != df["fruit"].shift(1)).cumsum()).ngroup()
print(df)
Prints:
fruit id
0 apple 0
1 apple 0
2 orange 1
3 orange 1
4 lemon 2
5 apple 3
6 apple 3
7 lemon 4
8 lemon 4
Or if you prefer itertools.groupby
:
from itertools import groupby
data, i = [], 0
for _, g in groupby(df["fruit"]):
data.extend([i] * sum(1 for _ in g))
i += 1
df["id"] = data
print(df)
SQL Server : Identity column by group
You could create a FUNCTION
which get a name and gives MAX
identity for given parameter:
CREATE FUNCTION [dbo].[GetIdentityForName] (@Name VARCHAR(MAX))
RETURNS INT
AS
BEGIN
RETURN
(SELECT ISNULL(MAX(NameId),0)+1
FROM YourTable
WHERE Name = @Name);
END
and then set DefaultValue
for NameId
for call the function when a record has been inserted like this:
ALTER TABLE YourTable ADD CONSTRAINT
DF_Identity_NameId DEFAULT ([dbo].[GetIdentityForName](Name)) FOR NameId
Assuming that YourTable
is (Id
, Name
, NameId
).
I hope to be helpful for you :)
Related Topics
Predict() - Maybe I'M Not Understanding It
How to Plot With 2 Different Y-Axes
Emulate Ggplot2 Default Color Palette
How to Convert Excel Date Format to Proper Date in R
How to Escape Backslashes in R String
Include Levels of Zero Count in Result of Table()
Annotating Text on Individual Facet in Ggplot2
Interpreting "Condition Has Length ≫ 1" Warning from 'If' Function
Reorder Bars in Geom_Bar Ggplot2 by Value
In R, How to Get an Object'S Name After It Is Sent to a Function
Apply a Function to Every Specified Column in a Data.Table and Update by Reference
Transform Year/Week to Date Object
Increasing (Or Decreasing) the Memory Available to R Processes
Replace Values in a Dataframe Based on Lookup Table
How to Split Data into Training/Testing Sets Using Sample Function
Find All Combinations of a Set of Numbers That Add Up to a Certain Total