Create group names for consecutive values
Using diff
and cumsum
:
paste0("Group_", cumsum(c(1, diff(x) != 0)))
#[1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4" "Group_4" "Group_5" "Group_5"
(If your values are floating point values, you might have to avoid !=
and use a tolerance instead.)
group together consecutive numbers in a list
You could use negative indexing:
def group_by_missing(seq):
if not seq:
return seq
grouped = [[seq[0]]]
for x in seq[1:]:
if x == grouped[-1][-1] + 1:
grouped[-1].append(x)
else:
grouped.append([x])
return grouped
Example Usage:
>>> lst = [1, 2, 3, 4, 5, 12, 13, 14, 15, 20, 21, 22, 23, 30, 35, 36, 37, 38, 39, 40]
>>> group_by_missing(lst)
[[1, 2, 3, 4, 5], [12, 13, 14, 15], [20, 21, 22, 23], [30], [35, 36, 37, 38, 39, 40]]
How to create a consecutive group number
Try Data$number <- as.numeric(as.factor(Data$site))
On a sidenote : the difference between the solution of me and @Chase on one hand, and the one of @DWin on the other, is the ordering of the numbers. Both as.factor
and factor
will automatically sort the levels, whereas that doesn't happen in the solution of @DWin :
Dat <- data.frame(site = rep(c(1,8,4), each = 3), score = runif(9))
Dat$number <- as.numeric(factor(Dat$site))
Dat$sitenum <- match(Dat$site, unique(Dat$site) )
Gives
> Dat
site score number sitenum
1 1 0.7377561 1 1
2 1 0.3131139 1 1
3 1 0.7862290 1 1
4 8 0.4480387 3 2
5 8 0.3873210 3 2
6 8 0.8778102 3 2
7 4 0.6916340 2 3
8 4 0.3033787 2 3
9 4 0.6552808 2 3
How can we group and name a series for n consecutive numbers in excel?
Assuming your series is in Column A and you want the group number in Column B:
=INT((ROW(1:1)-1)/$C$2)+1
How to groupby consecutive values in pandas DataFrame
You can use groupby
by custom Series
:
df = pd.DataFrame({'a': [1, 1, -1, 1, -1, -1]})
print (df)
a
0 1
1 1
2 -1
3 1
4 -1
5 -1
print ((df.a != df.a.shift()).cumsum())
0 1
1 1
2 2
3 3
4 4
5 4
Name: a, dtype: int32
for i, g in df.groupby([(df.a != df.a.shift()).cumsum()]):
print (i)
print (g)
print (g.a.tolist())
a
0 1
1 1
[1, 1]
2
a
2 -1
[-1]
3
a
3 1
[1]
4
a
4 -1
5 -1
[-1, -1]
Identify groups of continuous numbers in a list
more_itertools.consecutive_groups
was added in version 4.0.
Demo
import more_itertools as mit
iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
[list(group) for group in mit.consecutive_groups(iterable)]
# [[2, 3, 4, 5], [12, 13, 14, 15, 16, 17], [20]]
Code
Applying this tool, we make a generator function that finds ranges of consecutive numbers.
def find_ranges(iterable):
"""Yield range of consecutive numbers."""
for group in mit.consecutive_groups(iterable):
group = list(group)
if len(group) == 1:
yield group[0]
else:
yield group[0], group[-1]
iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
list(find_ranges(iterable))
# [(2, 5), (12, 17), 20]
The source implementation emulates a classic recipe (as demonstrated by @Nadia Alramli).
Note: more_itertools
is a third-party package installable via pip install more_itertools
.
Group rows based on consecutive line numbers
Convert the numbers to numeric, calculate difference between consecutive numbers and increment the group count when the difference is greater than 1.
transform(df, group = cumsum(c(TRUE, diff(as.numeric(line)) > 1)))
# line group
#1 0001 1
#2 0002 1
#3 0003 1
#4 0011 2
#5 0012 2
#6 0234 3
#7 0235 3
#8 0236 3
If you want to use dplyr
:
library(dplyr)
df %>% mutate(group = cumsum(c(TRUE, diff(as.numeric(line)) > 1)))
Group consecutive integers together
As you have lists of consecutive numbers, I suggest you to use range
objects instead of list
s:
d, head = {}, None
for x in l:
if head is None or x != d[head].stop:
head = x
d[head] = range(head, x+1)
Related Topics
What Is the Width Argument in Position_Dodge
Merging Two Data Frames Using Fuzzy/Approximate String Matching in R
How to Change the Order of Facet Labels in Ggplot (Custom Facet Wrap Labels)
Select Groups Which Have At Least One of a Certain Value
Convert Column With Pipe Delimited Data into Dummy Variables
How to Convert a Table to a Data Frame
Returning Multiple Objects in an R Function
Create a Variable Name With "Paste" in R
Assign Multiple New Variables on Lhs in a Single Line
Labeling Outliers of Boxplots in R
Select Rows With Min Value by Group
Shifting Non-Na Cells to the Left
Remove Duplicated Rows Using Dplyr
Plot Multiple Lines (Data Series) Each With Unique Color in R