Add Id Column by Group

Add ID column by group

Here's one way using interaction.

d <- read.table(text='LAT LONG
13.5330 -15.4180 
13.5330 -15.4180 
13.5330 -15.4180 
13.5330 -15.4180 
13.5330 -15.4170 
13.5330 -15.4170 
13.5330 -15.4170 
13.5340 -14.9350 
13.5340 -14.9350 
13.5340 -15.9170 
13.3670 -14.6190', header=TRUE)

d <- transform(d, Cluster_ID = as.numeric(interaction(LAT, LONG, drop=TRUE)))

#       LAT    LONG Cluster_ID
# 1  13.533 -15.418          2
# 2  13.533 -15.418          2
# 3  13.533 -15.418          2
# 4  13.533 -15.418          2
# 5  13.533 -15.417          3
# 6  13.533 -15.417          3
# 7  13.533 -15.417          3
# 8  13.534 -14.935          4
# 9  13.534 -14.935          4
# 10 13.534 -15.917          1
# 11 13.367 -14.619          5

EDIT: Incorporated @Spacedman's suggestion to supply drop=TRUE to interaction.

R - Group by variable and then assign a unique ID

dplyr has a group_indices function for creating unique group IDs

library(dplyr)
data <- data.frame(personal_id = c("111-111-111", "999-999-999", "222-222-222", "111-111-111"),
                       gender = c("M", "F", "M", "M"),
                       temperature = c(99.6, 98.2, 97.8, 95.5))

data$group_id <- data %>% group_indices(personal_id) 
data <- data %>% select(-personal_id)

data
  gender temperature group_id
1      M        99.6        1
2      F        98.2        3
3      M        97.8        2
4      M        95.5        1

Or within the same pipeline (https://github.com/tidyverse/dplyr/issues/2160):

data %>% 
    mutate(group_id = group_indices(., personal_id))

Pandas groupby and create a unique ID column for every row

If you need count per group by row we have cumcount:

df['new'] = df.groupby('fruit').cumcount()
df
Out[346]: 
   fruit  count  new
0  apple      1    0
1  apple     20    1
2  apple     21    2
3  mango     31    0
4  mango     17    1

Or:

df['new'] = df.assign(new=1).groupby('fruit')['new'].cumsum()-1
df
Out[352]: 
   fruit  count  new
0  apple      1    0
1  apple     20    1
2  apple     21    2
3  mango     31    0
4  mango     17    1

Pandas: How to add a new column of Group ID to a dataframe based on the loop of a column of index sequence

As you mentioned:

represent Group ID based on the loop of Index

If you mean that a new group is formed whenever Index reset to 0, you can try:

df['Group ID'] = df['Index'].eq(0).cumsum()

Or, if you mean a new group is formed whenever Index stop from an increasing sequence and starts from an index value smaller than the previous index, you can use:

df['Group ID'] = df['Index'].diff().lt(0).cumsum() + 1

Result:

print(df)

   Index  Column1 Column2  Group ID
0      0  xxxxxxx    yyyy         1
1      1   xxxxxx     yyy         1
2      2    xxxxx      yy         1
3      0      xxx       y         2
4      1       xx   yyyyy         2

Create a new column with unique identifier for each group

Try with groupby ngroup + 1, use sort=False to ensure groups are enumerated in the order they appear in the DataFrame:

df['idx'] = df.groupby(['ID', 'phase'], sort=False).ngroup() + 1

df:

   ID phase side  values  idx
0  r1   ph1    l      12    1
1  r1   ph1    r      34    1
2  r1   ph2    l      93    2
3  s4   ph3    l      21    3
4  s3   ph2    l      88    4
5  s3   ph2    r      54    4

Group dataframe rows by creating a unique ID column based on the amount of time passed between entries and variable values

Here's a dplyr approach that calculates the gap and rolling avg gap within each Name/Item group, then flags large gaps, and assigns a new group for each large gap or change in Name or Item.

df1 %>%
  group_by(Name,Item) %>%
  mutate(purch_num = row_number(),
         time_since_first = Date - first(Date),
         gap = Date - lag(Date, default = as.Date(-Inf)),
         avg_gap = time_since_first / (purch_num-1),
         new_grp_flag = gap > 180 | gap > 3*avg_gap) %>%
  ungroup() %>%
  mutate(group = cumsum(new_grp_flag))

How to assign a unique ID number to each group of identical values in a column

How about

df2 <- transform(df,id=as.numeric(factor(sample)))

I think this (cribbed from Add ID column by group) should be slightly more efficient, although perhaps a little harder to remember:

df3 <- transform(df, id=match(sample, unique(sample)))
all.equal(df2,df3)  ## TRUE

If you want to do this in tidyverse:

library(dplyr)
df %>% group_by(sample) %>% mutate(id=cur_group_id())

Assign unique ID to Pandas group but add one if repeated

You can use .groupby() followed by ngroup():

df["id"] = df.groupby((df["fruit"] != df["fruit"].shift(1)).cumsum()).ngroup()
print(df)

Prints:

    fruit  id
0   apple   0
1   apple   0
2  orange   1
3  orange   1
4   lemon   2
5   apple   3
6   apple   3
7   lemon   4
8   lemon   4

Or if you prefer itertools.groupby:

from itertools import groupby

data, i = [], 0
for _, g in groupby(df["fruit"]):
    data.extend([i] * sum(1 for _ in g))
    i += 1

df["id"] = data
print(df)

SQL Server : Identity column by group

You could create a FUNCTION which get a name and gives MAX identity for given parameter:

CREATE FUNCTION [dbo].[GetIdentityForName] (@Name VARCHAR(MAX))
RETURNS INT
AS
  BEGIN
      RETURN
        (SELECT ISNULL(MAX(NameId),0)+1
         FROM  YourTable
         WHERE Name = @Name);
  END

and then set DefaultValue for NameId for call the function when a record has been inserted like this:

ALTER TABLE YourTable ADD CONSTRAINT
    DF_Identity_NameId DEFAULT ([dbo].[GetIdentityForName](Name)) FOR NameId

Assuming that YourTable is (Id, Name, NameId).

I hope to be helpful for you :)

Add Id Column by Group