Best Way to Determine If a Sequence Is in Another Sequence

Best way to determine if a sequence is in another sequence?

I second the Knuth-Morris-Pratt algorithm. By the way, your problem (and the KMP solution) is exactly recipe 5.13 in Python Cookbook 2nd edition. You can find the related code at http://code.activestate.com/recipes/117214/

It finds all the correct subsequences in a given sequence, and should be used as an iterator:

>>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,6]): print s
3
>>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,7]): print s
(nothing)

How to test if a sequence starts with values in another sequence?

len(prefix) <= len(sequence) and all(i==j for i, j in zip(prefix, sequence))

Fastest way to check if a sequence contains a non-consecutive subsequence?

The code you found creates an iterator for A; you can see this as a simple pointer to the next position in A to look at, and in moves the pointer forward across A until a match is found. It can be used multiple times, but only ever moves forward; when using in containment tests against a single iterator multiple times, the iterator can't go backwards and so can only test if still to visit values are equal to the left-hand operand.

Given your last example, with B = [2, 7, 2], what happens is this:

  • it = iter(A) creates an iterator object for the A list, and stores 0 as the next position to look at.
  • The all() function tests each element in an iterable and returns False early, if such a result was found. Otherwise it keeps testing every element. Here the tests are repeated x in it calls, where x is set to each value in B in turn.
  • x is first set to 2, and so 2 in it is tested.

    • it is set to next look at A[0]. That's 4, not equal to 2, so the internal position counter is incremented to 1.
    • A[1] is 2, and that's equal, so 2 in it returns True at this point, but not before incrementing the 'next position to look at' counter to 2.
  • 2 in it was true, so all() continues on.
  • The next value in B is 7, so 7 in it is tested.

    • it is set to next look at A[2]. That's 8, not 7. The position counter is incremented to 3.
    • it is set to next look at A[3]. That's 2, not 7. The position counter is incremented to 4.
    • it is set to next look at A[4]. That's 7, equal to 7. The position counter is incremented to 5 and True is returned.
  • 7 in it was true, so all() continues on.
  • The next value in B is 2, so 2 in it is tested.

    • it is set to next look at A[5]. That's 0, not 2. The position counter is incremented to 6.
    • it is set to next look at A[6]. That's 1, not 2. The position counter is incremented to 7.
    • it is set to next look at A[7]. That's 5, not 2. The position counter is incremented to 8.
    • it is set to next look at A[8]. That's 3, not 2. The position counter is incremented to 9.
    • There is no A[9] because there are not that many elements in A, and so False is returned.
  • 2 in it was False, so all() ends by returning False.

You could verify this with an iterator with a side effect you can observe; here I used print() to write out what the next value is for a given input:

>>> A = [4, 2, 8, 2, 7, 0, 1, 5, 3]
>>> B = [2, 7, 2]
>>> with_sideeffect = lambda name, iterable: (
print(f"{name}[{idx}] = {value}") or value
for idx, value in enumerate(iterable)
)
>>> is_sublist(with_sideeffect(" > A", A), with_sideeffect("< B", B))
< B[0] = 2
> A[0] = 4
> A[1] = 2
< B[1] = 7
> A[2] = 8
> A[3] = 2
> A[4] = 7
< B[2] = 2
> A[5] = 0
> A[6] = 1
> A[7] = 5
> A[8] = 3
False

Your problem requires that you test every element of B consecutively, there are no shortcuts here. You also must scan through A to test for the elements of B being present, in the right order. You can only declare victory when all elements of B have been found (partial scan), and defeat when all elements in A have been scanned and the current value in B you are testing for is not found.

So, assuming the size of B is always smaller than A, the best case scenario then is where all K elements in B are equal to the first K elements of A. The worst case, is any case where not all of the elements of B are present in A, and require a full scan through A. It doesn't matter what number of elements are present in B; if you are testing element K out of K you already have been scanning part-way through A and must complete your scan through A to find that the last element is missing.

So the best case with N elements in A and K elements in B, takes O(K) time. The worst case, using the same definitions of N and K, takes O(N) time.

There is no faster algorithm to test for this condition, so all you can hope for is lowering your constant times (the time taken to complete each of the N steps). Here that'd be a faster way to scan through A as you search for the elements of B. I am not aware of a better way to do this than by using the method you already found.

Is there a Python builtin for determining if an iterable contained a certain sequence?

Referenced from https://stackoverflow.com/a/6822773/24718
modified to use a list.

from itertools import islice

def window(seq, n=2):
"""
Returns a sliding window (of width n) over data from the iterable
s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...
"""
it = iter(seq)
result = list(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + [elem]
yield result

def contains_sequence(all_values, seq):
return any(seq == current_seq for current_seq in window(all_values, len(seq)))

test_iterable = [1,2,3]
search_sequence = [1,2]

result = contains_sequence(test_iterable, search_sequence)

Is there a way to compare how similar a sequence of numbers is to another sequence of numbers in Ruby?

You could do something like:

num_str1 = '1234567890'
num_str2 = '1234657890'

num_str1.chars.zip(num_str2.chars).count { |a, b| a == b }
#=> 8

This converts each string to character-arrays, then pairing elements by index, before comparing them. The percentage calculation is left as an exercise. See ruby-docs.org for more info on the methods used.

Determine if the sequence is incremented? (python)

Returning as soon as you find one is better. You can leverage pythons built in functions to do that:

def isIncreasing(seq):
return all(a<b for a,b in zip(seq,seq[1:]))

This zippes your sequence into pairs:

[1,2,3,4] => [(1,2),(2,3),(3,4)]

and checks each pair. all() terminates as soon as it finds a False.

Documentation:

  • all()
  • zip()

How to test membership of sequence in python list?

NOTE: I realized that this will actually fail if your list contains [15, 2, 36] which does contain the string 5, 2, 3 so it is just for special cases.

Since you have a dictionary, maybe list comprehension on the keys and string matching? It is actually the same speed as walking through the elements, according to timeit...

s_list = [5,2,3]   # sequence to search for

# Setting up your dictionary
MyD = {'DOC3187' : [1, 2, 3, 6, 7],
'DOC4552' : [5, 2, 3, 6],
'DOC4974' : [1, 2, 3, 6],
'DOC8365' : [1, 2, 3, 5, 6, 7],
'DOC3738' : [1, 4, 2, 3, 6],
'DOC5311' : [1, 5, 2, 3, 6, 7]}

query = str(s_list)[1:-1] # make a string of '5, 2, 3'
Matches = [ k for k in MyD if query in str(MyD[k]) ]

Result:

['DOC5311', 'DOC4552']


Related Topics



Leave a reply



Submit