Best way to determine if a sequence is in another sequence?
I second the Knuth-Morris-Pratt algorithm. By the way, your problem (and the KMP solution) is exactly recipe 5.13 in Python Cookbook 2nd edition. You can find the related code at http://code.activestate.com/recipes/117214/
It finds all the correct subsequences in a given sequence, and should be used as an iterator:
>>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,6]): print s
3
>>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,7]): print s
(nothing)
How to test if a sequence starts with values in another sequence?
len(prefix) <= len(sequence) and all(i==j for i, j in zip(prefix, sequence))
Fastest way to check if a sequence contains a non-consecutive subsequence?
The code you found creates an iterator for A
; you can see this as a simple pointer to the next position in A
to look at, and in
moves the pointer forward across A
until a match is found. It can be used multiple times, but only ever moves forward; when using in
containment tests against a single iterator multiple times, the iterator can't go backwards and so can only test if still to visit values are equal to the left-hand operand.
Given your last example, with B = [2, 7, 2]
, what happens is this:
it = iter(A)
creates an iterator object for theA
list, and stores0
as the next position to look at.- The
all()
function tests each element in an iterable and returnsFalse
early, if such a result was found. Otherwise it keeps testing every element. Here the tests are repeatedx in it
calls, wherex
is set to each value inB
in turn. x
is first set to2
, and so2 in it
is tested.it
is set to next look atA[0]
. That's 4, not equal to2
, so the internal position counter is incremented to 1.A[1]
is2
, and that's equal, so2 in it
returnsTrue
at this point, but not before incrementing the 'next position to look at' counter to2
.
2 in it
was true, soall()
continues on.- The next value in
B
is7
, so7 in it
is tested.it
is set to next look atA[2]
. That's8
, not7
. The position counter is incremented to3
.it
is set to next look atA[3]
. That's2
, not7
. The position counter is incremented to4
.it
is set to next look atA[4]
. That's7
, equal to7
. The position counter is incremented to5
andTrue
is returned.
7 in it
was true, soall()
continues on.- The next value in
B
is2
, so2 in it
is tested.it
is set to next look atA[5]
. That's0
, not2
. The position counter is incremented to6
.it
is set to next look atA[6]
. That's1
, not2
. The position counter is incremented to7
.it
is set to next look atA[7]
. That's5
, not2
. The position counter is incremented to8
.it
is set to next look atA[8]
. That's3
, not2
. The position counter is incremented to9
.- There is no
A[9]
because there are not that many elements inA
, and soFalse
is returned.
2 in it
wasFalse
, soall()
ends by returningFalse
.
You could verify this with an iterator with a side effect you can observe; here I used print()
to write out what the next value is for a given input:
>>> A = [4, 2, 8, 2, 7, 0, 1, 5, 3]
>>> B = [2, 7, 2]
>>> with_sideeffect = lambda name, iterable: (
print(f"{name}[{idx}] = {value}") or value
for idx, value in enumerate(iterable)
)
>>> is_sublist(with_sideeffect(" > A", A), with_sideeffect("< B", B))
< B[0] = 2
> A[0] = 4
> A[1] = 2
< B[1] = 7
> A[2] = 8
> A[3] = 2
> A[4] = 7
< B[2] = 2
> A[5] = 0
> A[6] = 1
> A[7] = 5
> A[8] = 3
False
Your problem requires that you test every element of B
consecutively, there are no shortcuts here. You also must scan through A
to test for the elements of B
being present, in the right order. You can only declare victory when all elements of B
have been found (partial scan), and defeat when all elements in A
have been scanned and the current value in B
you are testing for is not found.
So, assuming the size of B is always smaller than A, the best case scenario then is where all K elements in B
are equal to the first K elements of A
. The worst case, is any case where not all of the elements of B
are present in A
, and require a full scan through A
. It doesn't matter what number of elements are present in B
; if you are testing element K out of K you already have been scanning part-way through A
and must complete your scan through A
to find that the last element is missing.
So the best case with N elements in A
and K elements in B
, takes O(K) time. The worst case, using the same definitions of N and K, takes O(N) time.
There is no faster algorithm to test for this condition, so all you can hope for is lowering your constant times (the time taken to complete each of the N steps). Here that'd be a faster way to scan through A
as you search for the elements of B
. I am not aware of a better way to do this than by using the method you already found.
Is there a Python builtin for determining if an iterable contained a certain sequence?
Referenced from https://stackoverflow.com/a/6822773/24718
modified to use a list.
from itertools import islice
def window(seq, n=2):
"""
Returns a sliding window (of width n) over data from the iterable
s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...
"""
it = iter(seq)
result = list(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + [elem]
yield result
def contains_sequence(all_values, seq):
return any(seq == current_seq for current_seq in window(all_values, len(seq)))
test_iterable = [1,2,3]
search_sequence = [1,2]
result = contains_sequence(test_iterable, search_sequence)
Is there a way to compare how similar a sequence of numbers is to another sequence of numbers in Ruby?
You could do something like:
num_str1 = '1234567890'
num_str2 = '1234657890'
num_str1.chars.zip(num_str2.chars).count { |a, b| a == b }
#=> 8
This converts each string to character-arrays, then pairing elements by index, before comparing them. The percentage calculation is left as an exercise. See ruby-docs.org for more info on the methods used.
Determine if the sequence is incremented? (python)
Returning as soon as you find one is better. You can leverage pythons built in functions to do that:
def isIncreasing(seq):
return all(a<b for a,b in zip(seq,seq[1:]))
This zippes your sequence into pairs:
[1,2,3,4] => [(1,2),(2,3),(3,4)]
and checks each pair. all()
terminates as soon as it finds a False
.
Documentation:
- all()
- zip()
How to test membership of sequence in python list?
NOTE: I realized that this will actually fail if your list contains [15, 2, 36]
which does contain the string 5, 2, 3
so it is just for special cases.
Since you have a dictionary, maybe list comprehension on the keys and string matching? It is actually the same speed as walking through the elements, according to timeit
...
s_list = [5,2,3] # sequence to search for
# Setting up your dictionary
MyD = {'DOC3187' : [1, 2, 3, 6, 7],
'DOC4552' : [5, 2, 3, 6],
'DOC4974' : [1, 2, 3, 6],
'DOC8365' : [1, 2, 3, 5, 6, 7],
'DOC3738' : [1, 4, 2, 3, 6],
'DOC5311' : [1, 5, 2, 3, 6, 7]}
query = str(s_list)[1:-1] # make a string of '5, 2, 3'
Matches = [ k for k in MyD if query in str(MyD[k]) ]
Result:
['DOC5311', 'DOC4552']
Related Topics
Python Input Never Equals an Integer
Python Matplotlib Framework Under MACosx
"Ssl: Certificate_Verify_Failed" Error When Scraping Https://Www.Thenewboston.Com/
Get All Object Attributes in Python
Downloading a Directory Tree with Ftplib
Getting Values from JSON Using Python
How to Replace Django's Primary Key with a Different Integer That Is Unique for That Table
How to Do N-D Distance and Nearest Neighbor Calculations on Numpy Arrays
Is Everything Greater Than None
Paging/Scrolling Through Set of 2D Heat Maps in Matplotlib
Python CSV.Reader: How to Return to the Top of the File
Seeking from End of File Throwing Unsupported Exception
How to Schedule a Function to Run Every Hour on Flask
Python: Sort Function Breaks in the Presence of Nan
Python How to Read N Number of Lines at a Time
Convert Bytes to Bits in Python
In Python, How to Escape Newline Characters When Printing a String