Create Group Names For Consecutive Values

Create group names for consecutive values

Using diff and cumsum :

paste0("Group_", cumsum(c(1, diff(x) != 0)))
#[1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4" "Group_4" "Group_5" "Group_5"

(If your values are floating point values, you might have to avoid != and use a tolerance instead.)

group together consecutive numbers in a list

You could use negative indexing:

def group_by_missing(seq):
if not seq:
return seq
grouped = [[seq[0]]]
for x in seq[1:]:
if x == grouped[-1][-1] + 1:
grouped[-1].append(x)
else:
grouped.append([x])
return grouped

Example Usage:

>>> lst = [1, 2, 3, 4, 5, 12, 13, 14, 15, 20, 21, 22, 23, 30, 35, 36, 37, 38, 39, 40]
>>> group_by_missing(lst)
[[1, 2, 3, 4, 5], [12, 13, 14, 15], [20, 21, 22, 23], [30], [35, 36, 37, 38, 39, 40]]

How to create a consecutive group number

Try Data$number <- as.numeric(as.factor(Data$site))

On a sidenote : the difference between the solution of me and @Chase on one hand, and the one of @DWin on the other, is the ordering of the numbers. Both as.factor and factor will automatically sort the levels, whereas that doesn't happen in the solution of @DWin :

Dat <- data.frame(site = rep(c(1,8,4), each = 3), score = runif(9))

Dat$number <- as.numeric(factor(Dat$site))
Dat$sitenum <- match(Dat$site, unique(Dat$site) )

Gives

> Dat
site score number sitenum
1 1 0.7377561 1 1
2 1 0.3131139 1 1
3 1 0.7862290 1 1
4 8 0.4480387 3 2
5 8 0.3873210 3 2
6 8 0.8778102 3 2
7 4 0.6916340 2 3
8 4 0.3033787 2 3
9 4 0.6552808 2 3

How can we group and name a series for n consecutive numbers in excel?

Assuming your series is in Column A and you want the group number in Column B:

=INT((ROW(1:1)-1)/$C$2)+1

Sample Image

How to groupby consecutive values in pandas DataFrame

You can use groupby by custom Series:

df = pd.DataFrame({'a': [1, 1, -1, 1, -1, -1]})
print (df)
a
0 1
1 1
2 -1
3 1
4 -1
5 -1

print ((df.a != df.a.shift()).cumsum())
0 1
1 1
2 2
3 3
4 4
5 4
Name: a, dtype: int32
for i, g in df.groupby([(df.a != df.a.shift()).cumsum()]):
print (i)
print (g)
print (g.a.tolist())

a
0 1
1 1
[1, 1]
2
a
2 -1
[-1]
3
a
3 1
[1]
4
a
4 -1
5 -1
[-1, -1]

Identify groups of continuous numbers in a list

more_itertools.consecutive_groups was added in version 4.0.

Demo

import more_itertools as mit


iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
[list(group) for group in mit.consecutive_groups(iterable)]
# [[2, 3, 4, 5], [12, 13, 14, 15, 16, 17], [20]]

Code

Applying this tool, we make a generator function that finds ranges of consecutive numbers.

def find_ranges(iterable):
"""Yield range of consecutive numbers."""
for group in mit.consecutive_groups(iterable):
group = list(group)
if len(group) == 1:
yield group[0]
else:
yield group[0], group[-1]


iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
list(find_ranges(iterable))
# [(2, 5), (12, 17), 20]

The source implementation emulates a classic recipe (as demonstrated by @Nadia Alramli).

Note: more_itertools is a third-party package installable via pip install more_itertools.

Group rows based on consecutive line numbers

Convert the numbers to numeric, calculate difference between consecutive numbers and increment the group count when the difference is greater than 1.

transform(df, group = cumsum(c(TRUE, diff(as.numeric(line)) > 1)))

# line group
#1 0001 1
#2 0002 1
#3 0003 1
#4 0011 2
#5 0012 2
#6 0234 3
#7 0235 3
#8 0236 3

If you want to use dplyr :

library(dplyr)
df %>% mutate(group = cumsum(c(TRUE, diff(as.numeric(line)) > 1)))

Group consecutive integers together

As you have lists of consecutive numbers, I suggest you to use range objects instead of lists:

d, head = {}, None
for x in l:
if head is None or x != d[head].stop:
head = x
d[head] = range(head, x+1)


Related Topics



Leave a reply



Submit