Is There a Module for Balanced Binary Tree in Python's Standard Library

Is there a module for balanced binary tree in Python's standard library?

No, there is not a balanced binary tree in the stdlib. However, from your comments, it sounds like you may have some other options:

  • You say that you want a BST instead of a list for O(log n) searches. If searching is all you need and your data are already sorted, the bisect module provides a binary search algorithm for lists.
  • Mike DeSimone recommended sets and dicts and you explained why lists are too algorithmically slow. Sets and dicts are implemented as hash tables, which have O(1) lookup. The solution to most problems in Python really is "use a dict".

If neither solution works well for you, you'll have to go to a third party module or implement your own.

Built-in binary search tree in Python?

There's no special reason, to my knowledge - I'd guess that the reason is that for so many applications the highly-tuned dict and set implementations (which are hash tables) work well. They're good enough in most cases. There are definitely situations where you need the performance characteristics of balanced binary search trees (like ordered traversal based on key- rather than addition-order), but those are far enough off the beaten path that people are happy with grabbing a third-party package in that case.

I've had a good experience using the bintrees package on PyPI. This has implementations of unbalanced, AVL and red-black binary trees, in both pure Python and as extensions written in Cython.

I think the rest of the reason is essentially historical accident. If the person who wrote bintrees lobbied for its inclusion in the stdlib, and was willing to put up with the constraints that imposes on maintenance and releases, it would probably go in. (Although the Cython dependency would cause a problem, I'd guess.)

Algorithmic complexity:

For hash tables (like dicts or sets), insertion and lookup are O(1), while for a balanced tree these are O(log(n)). In-order traversal of keys is O(n) in a tree, but to do the same thing with a hash table you need to sort the keys first, so it's O(n*log(n)). When you're picking which kind of data structure to use, you need to think about which operations you're going to be using, and pick the tradeoff that makes the most sense in your application.

Is there a function in Python's standard library that does something similar to the loop function?

itertools.product(*groups)

Is there a PYTHON data structure with O(log n) deletion/insertion and supports indexing?

Python does not appear to have such a structure in the standard library.

I'm not exactly sure of your needs, but since you considered using a set, that means you don't need duplicate items. Consider not changing the length of the list for a delete. Instead, replace the element you want to remove with its next-highest (non-equal) neighbor to the right. (If it happens to be the last element, you can just pop it.) The length and general indexing will be wrong, but a binary search will still work, which might be all that you need the indexing for.

Say you have [0,1,2,3,4,5] and you want to remove the 3. Make the list [0,1,2,4,4,5]. If you then want to remove the 4, make it [0,1,2,5,5,5].

You can use a binary search to find both ends of the run, which gives you the O(log n) removal you want. Use bisect_left first, and then pass in the answer from that to restrict the part of the list searched by bisect_right. Once you know the bounds, Python can assign the whole slice at once.

If you then want to remove the 5, pop them off [0,1,2]. Removal at the end of the list is more efficient since it won't require allocating a new array.

You can clean up the duplicates occasionally for better amortized performance, or if you need the length or something. Maybe when the number of "deletions" reaches a certain fraction of the original length. Don't go through a set, because you'd have to sort again. (OrderedDict.fromkeys could work, but you'd still have to build a list from the .keys()). Just copy the list while skipping over the duplicates, like

itr = iter(old)
new = [next(itr)]
for e in itr:
if e is not new[-1]:
new.append(e)


Related Topics



Leave a reply



Submit