How to Count Runs in a Sequence

How can I count runs in a sequence?

Use rle():

y <- rle(c(1,0,0,0,1,0,0,0,0,0,2,0,0))
y$lengths[y$values==0]

R function to calculate number of runs in a sequence of numbers

If your vector is x, you can use the rle function:

rle(x)
Run Length Encoding
  lengths: int [1:109] 34 3 2 1 5 1 2 1 2 5 ...
  values : num [1:109] 1 -1 1 -1 1 -1 1 -1 1 -1 ...

but are you sure the result should be 291?

How to count how many time a sequence appears in a give string in python?

Use count on your string, it will return the number of time it found your parameter value seq

def count_seqence(str, seq):
    return str.count(seq)

print count_seqence("kjdsflsdnf lskmfldsknffsdlkfnsldkmf", "ds")

Output

Count events that occurred in sequence

Try:

library(data.table)

setDT(df)[, desirable_output := cumsum(event), by = .(city, rleid(city, event))]

In postgresql, how to count runs in a sequence across repeating partitions?

Try this. It uses the Tabibitosan method ( grouping sequence ranges ) : Toolbox

SQL Fiddle

PostgreSQL 9.6 Schema Setup:

CREATE TABLE user_events
    (user_name varchar(3), eventname varchar(1), event_time time)
;

INSERT INTO user_events
    (user_name, eventname, event_time)
VALUES
    ('Ted', 'a', '12:01'),
    ('Ted', 'b', '12:02'),
    ('Ted', 'b', '12:03'),
    ('Ted', 'b', '12:04'),
    ('Ted', 'c', '12:05'),
    ('Ted', 'b', '12:06'),
    ('Ted', 'b', '12:07'),
    ('Ted', 'c', '12:08'),
    ('Ted', 'b', '12:09'),
    ('Ted', 'b', '12:11'),
    ('Ted', 'b', '12:12')
;

Query 1:

SELECT t.user_name
    ,t.eventname
    ,row_number() OVER (
        ORDER BY MIN(event_time)
        ) AS event_sequence_number
    ,MIN(event_time) AS time_started
    ,COUNT(*) as frequency
FROM (
    SELECT user_name
        ,eventname
        ,event_time
        ,row_number() OVER (
            ORDER BY event_time
            ) - row_number() OVER (
            PARTITION BY eventname ORDER BY event_time
                ,eventname
            )  seq
    FROM user_events
    ) t
GROUP BY user_name
    ,eventname
    ,seq
ORDER BY time_started

Results:

| user_name | eventname | event_sequence_number | time_started | frequency |
|-----------|-----------|-----------------------|--------------|-----------|
|       Ted |         a |                     1 |     12:01:00 |         1 |
|       Ted |         b |                     2 |     12:02:00 |         3 |
|       Ted |         c |                     3 |     12:05:00 |         1 |
|       Ted |         b |                     4 |     12:06:00 |         2 |
|       Ted |         c |                     5 |     12:08:00 |         1 |
|       Ted |         b |                     6 |     12:09:00 |         3 |

How do I find the length of a run of numbers in a list? (Is there a faster way than what I'm doing?)

I might use itertools.groupby for this one

lst = [ 1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0]

from itertools import groupby
from operator import itemgetter

for k,v in groupby(enumerate(lst),key=itemgetter(1)):
    if k:
        v = list(v)
        print v[0][0],v[-1][0]

This will print the start and end indices of the groups of 1's

Counting number of sequences in a vector

The run length encoding function (rle) is built for this. Helpfully whilst it computes the length of runs of equal values in a vector, it returns those lengths with the values. So use rle( bin ).

Compare the $values output to your desired value (1) with == and sum the result (because you get a TRUE or 1L when the run of values is of 1's):

sum( rle(bin)$values == 1 )
[1] 5