Add Id Column by Group

Add ID column by group

Here's one way using interaction.

d <- read.table(text='LAT LONG
13.5330 -15.4180
13.5330 -15.4180
13.5330 -15.4180
13.5330 -15.4180
13.5330 -15.4170
13.5330 -15.4170
13.5330 -15.4170
13.5340 -14.9350
13.5340 -14.9350
13.5340 -15.9170
13.3670 -14.6190', header=TRUE)

d <- transform(d, Cluster_ID = as.numeric(interaction(LAT, LONG, drop=TRUE)))

# LAT LONG Cluster_ID
# 1 13.533 -15.418 2
# 2 13.533 -15.418 2
# 3 13.533 -15.418 2
# 4 13.533 -15.418 2
# 5 13.533 -15.417 3
# 6 13.533 -15.417 3
# 7 13.533 -15.417 3
# 8 13.534 -14.935 4
# 9 13.534 -14.935 4
# 10 13.534 -15.917 1
# 11 13.367 -14.619 5

EDIT: Incorporated @Spacedman's suggestion to supply drop=TRUE to interaction.

R - Group by variable and then assign a unique ID

dplyr has a group_indices function for creating unique group IDs

library(dplyr)
data <- data.frame(personal_id = c("111-111-111", "999-999-999", "222-222-222", "111-111-111"),
gender = c("M", "F", "M", "M"),
temperature = c(99.6, 98.2, 97.8, 95.5))

data$group_id <- data %>% group_indices(personal_id)
data <- data %>% select(-personal_id)

data
gender temperature group_id
1 M 99.6 1
2 F 98.2 3
3 M 97.8 2
4 M 95.5 1

Or within the same pipeline (https://github.com/tidyverse/dplyr/issues/2160):

data %>% 
mutate(group_id = group_indices(., personal_id))

Pandas groupby and create a unique ID column for every row

If you need count per group by row we have cumcount:

df['new'] = df.groupby('fruit').cumcount()
df
Out[346]:
fruit count new
0 apple 1 0
1 apple 20 1
2 apple 21 2
3 mango 31 0
4 mango 17 1

Or:

df['new'] = df.assign(new=1).groupby('fruit')['new'].cumsum()-1
df
Out[352]:
fruit count new
0 apple 1 0
1 apple 20 1
2 apple 21 2
3 mango 31 0
4 mango 17 1

Pandas: How to add a new column of Group ID to a dataframe based on the loop of a column of index sequence

As you mentioned:

represent Group ID based on the loop of Index

If you mean that a new group is formed whenever Index reset to 0, you can try:

df['Group ID'] = df['Index'].eq(0).cumsum()

Or, if you mean a new group is formed whenever Index stop from an increasing sequence and starts from an index value smaller than the previous index, you can use:

df['Group ID'] = df['Index'].diff().lt(0).cumsum() + 1

Result:

print(df)

Index Column1 Column2 Group ID
0 0 xxxxxxx yyyy 1
1 1 xxxxxx yyy 1
2 2 xxxxx yy 1
3 0 xxx y 2
4 1 xx yyyyy 2

Create a new column with unique identifier for each group

Try with groupby ngroup + 1, use sort=False to ensure groups are enumerated in the order they appear in the DataFrame:

df['idx'] = df.groupby(['ID', 'phase'], sort=False).ngroup() + 1

df:

   ID phase side  values  idx
0 r1 ph1 l 12 1
1 r1 ph1 r 34 1
2 r1 ph2 l 93 2
3 s4 ph3 l 21 3
4 s3 ph2 l 88 4
5 s3 ph2 r 54 4

Group dataframe rows by creating a unique ID column based on the amount of time passed between entries and variable values

Here's a dplyr approach that calculates the gap and rolling avg gap within each Name/Item group, then flags large gaps, and assigns a new group for each large gap or change in Name or Item.

df1 %>%
group_by(Name,Item) %>%
mutate(purch_num = row_number(),
time_since_first = Date - first(Date),
gap = Date - lag(Date, default = as.Date(-Inf)),
avg_gap = time_since_first / (purch_num-1),
new_grp_flag = gap > 180 | gap > 3*avg_gap) %>%
ungroup() %>%
mutate(group = cumsum(new_grp_flag))

How to assign a unique ID number to each group of identical values in a column

How about

df2 <- transform(df,id=as.numeric(factor(sample)))

?

I think this (cribbed from Add ID column by group) should be slightly more efficient, although perhaps a little harder to remember:

df3 <- transform(df, id=match(sample, unique(sample)))
all.equal(df2,df3) ## TRUE

If you want to do this in tidyverse:

library(dplyr)
df %>% group_by(sample) %>% mutate(id=cur_group_id())

Assign unique ID to Pandas group but add one if repeated

You can use .groupby() followed by ngroup():

df["id"] = df.groupby((df["fruit"] != df["fruit"].shift(1)).cumsum()).ngroup()
print(df)

Prints:

    fruit  id
0 apple 0
1 apple 0
2 orange 1
3 orange 1
4 lemon 2
5 apple 3
6 apple 3
7 lemon 4
8 lemon 4

Or if you prefer itertools.groupby:

from itertools import groupby

data, i = [], 0
for _, g in groupby(df["fruit"]):
data.extend([i] * sum(1 for _ in g))
i += 1

df["id"] = data
print(df)

SQL Server : Identity column by group

You could create a FUNCTION which get a name and gives MAX identity for given parameter:

CREATE FUNCTION [dbo].[GetIdentityForName] (@Name VARCHAR(MAX))
RETURNS INT
AS
BEGIN
RETURN
(SELECT ISNULL(MAX(NameId),0)+1
FROM YourTable
WHERE Name = @Name);
END

and then set DefaultValue for NameId for call the function when a record has been inserted like this:

ALTER TABLE YourTable ADD CONSTRAINT
DF_Identity_NameId DEFAULT ([dbo].[GetIdentityForName](Name)) FOR NameId

Assuming that YourTable is (Id, Name, NameId).

I hope to be helpful for you :)



Related Topics



Leave a reply



Submit