Binary Search (Bisection) in Python

Binary search (bisection) in Python

bisect_left finds the first position p at which an element could be inserted in a given sorted range while maintaining the sorted order. That will be the position of x if x exists in the range. If p is the past-the-end position, x wasn't found. Otherwise, we can test to see if x is there to see if x was found.

from bisect import bisect_left

def binary_search(a, x, lo=0, hi=None):
    if hi is None: hi = len(a)
    pos = bisect_left(a, x, lo, hi)                  # find insertion position
    return pos if pos != hi and a[pos] == x else -1  # don't walk off the end

Python bisect: search range of values

Python dictionaries are hashed, thus provide O(1) search/insert/delete time complexities while binary search is logarithmic.

When you search for a specific value looking up a dictionary is more performant than binary search. Binary search makes more sense if you also are looking for a range around a specific value. If you want the closest possible value to your search term for example. Or if you want to know how many elements are already before the position of the element that you are looking for. Or how many elements lie between 2 particular elements that you are looking up. Hashed dictionaries can't do that of course. As you can see, in a binary search, all elements are related to each other in terms of their total ordering (however it is defined for that particular type). This property of sorted lists enables binary search to act upon ranges. While in a hashed dictionary, the elements are unrelated to each other, except for when their hashes are exactly the same.

Binary search vs linear search

You binary search copies parts of the list in lines such as temp_arr=temp_arr[start:end]. Since this copying happens in exponential decay parts (e.g. if original list is 128, then the next copy will be 64, and the one after that 32, etc), whose total cost is O(N) over the whole run of the algorithm.

EDIT: clarification regarding the total cost of the exponential decay mentioned above. The total cost of copying the list parts over all the iterations is N /2 + N /4 + N /8 + ... + 1, which is a geometric series, which equals N-1. Since usually the list's length is not a power of two, there will be a slight variation of the total cost, but it will still be O(N).

Although both the inefficient implementation of "binary search" and linear search, are O(N), the constant factor is higher for the "binary search" since it uses many more operations than linear search to achieve its goal. So overall your "binary search" is slower.

To make the code run faster, find a way to avoid creating new lists. A possible solution is to change the code such that start and end point to the original numbers_one_to_hundred list, without ever creating temp_arr

EDIT: Clarificaiton on improvments: Without copying partial lists, the binary search should be significantly faster for any list with enough elements. The cut-off size is likely between N=10 and N=100, when I'll guess the number to be closer to 20 for correctly implemented binary search in python.

recursive binary search, biesction by slicing

The below returns False if the value is not found; otherwise it returns the index. The binary search executes recursively.

The key addition to your code is returning your function calls. I added some return a+b if a != False else a logic, to handle returning False if the value isn't in the iterable.

def binary_search(iterable, target):
    # if the list is down to one element, and it isn't the target, return False
    if len(iterable) == 1 and iterable[0] != target: return False
        
    left_index = 0
    right_index = len(iterable)
    mid = (left_index + right_index) // 2

    if iterable[mid] == target:
        return mid
    elif iterable[mid] < target:
        iterable = iterable[mid:right_index]
        v = binary_search(iterable,target)
        # if v is not False, add it to the left side of the window and return
        # else return False
        return v + mid if v != False else v
    elif iterable[mid] > target:
        iterable = iterable[left_index:mid]
        v = binary_search(iterable,target)
        return v + left_index if v != False else v

Binary search (bisect) for a Unicode character in Python 2.7

It appears your J list is at fault here; it is not correctly sorted.

When I use the requests library to load your test data file and sort the information, it works fine:

 >>> # your bisect function defined
 ...
 >>> import requests
 >>> words = u"そつう　れきだい　ほんやく　わかす　りくつ　ばいか　ろせん　やちん　そつう　れきだい　ほんやく　わかめ"
 >>> url = 'https://raw.githubusercontent.com/trezor/python-mnemonic/master/mnemonic/wordlist/japanese.txt'
 >>> J = sorted(requests.get(url).text.splitlines())
 >>> [ J.index(x) for x in words.split()]
[1024, 2015, 1787, 2039, 1983, 1601, 2031, 1919, 1024, 2015, 1787, 2040]
>>> [ binary_search(J, x) for x in words.split()]
[1024, 2015, 1787, 2039, 1983, 1601, 2031, 1919, 1024, 2015, 1787, 2040]

Note that the indices don't quite match your test results; where the indices are off by more than 3 the bisect fails to find the result.

Binary Search with results returned as Indices

Here is a more conventional binary search (see also Wikipedia) that will work better:

def binary_search(keys, query): 
    L, R = 0, len(keys) - 1
    while L <= R:
        M = L + (R - L) // 2
        if keys[M] == query:
            return M
        elif query < keys[M]:
            R = M - 1
        else:
            L = M + 1
    return -1

print("keys:", list(range(0,20,2)))
print(14, binary_search(list(range(0,20,2)), 14))
print(15, binary_search(list(range(0,20,2)), 15))
print(-1, binary_search(list(range(0,20,2)), -1))
print(20, binary_search(list(range(0,20,2)), 20))
print(0, binary_search(list(range(0,20,2)), 0))
print(18, binary_search(list(range(0,20,2)), 18))

Test case output:

keys: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
14 7
15 -1
-1 -1
20 -1
0 0
18 9

Some issues in your implementation are that slicing is expensive and best avoided, and also that your dictionary mechanic seems unnecessary, since binary search operates on the indexes of the array being searched, and therefore you shouldn't need to do any extra work to get the index of the match (if any).

UPDATE: Here is a recursive approach that will also work:

def binary_search(keys, query, L = 0, R = None): 
    R = R if R is not None else len(keys) - 1
    M = L + (R - L) // 2
    if L > R:
        return -1
    elif keys[M] == query:
        return M
    elif query < keys[M]:
        R = M - 1
    else:
        L = M + 1
    return binary_search(keys, query, L, R)

Binary Search (Bisection) in Python