Is there a module for balanced binary tree in Python's standard library?
No, there is not a balanced binary tree in the stdlib. However, from your comments, it sounds like you may have some other options:
- You say that you want a BST instead of a list for
O(log n)
searches. If searching is all you need and your data are already sorted, thebisect
module provides a binary search algorithm for lists. - Mike DeSimone recommended sets and dicts and you explained why lists are too algorithmically slow. Sets and dicts are implemented as hash tables, which have O(1) lookup. The solution to most problems in Python really is "use a dict".
If neither solution works well for you, you'll have to go to a third party module or implement your own.
Built-in binary search tree in Python?
There's no special reason, to my knowledge - I'd guess that the reason is that for so many applications the highly-tuned dict
and set
implementations (which are hash tables) work well. They're good enough in most cases. There are definitely situations where you need the performance characteristics of balanced binary search trees (like ordered traversal based on key- rather than addition-order), but those are far enough off the beaten path that people are happy with grabbing a third-party package in that case.
I've had a good experience using the bintrees package on PyPI. This has implementations of unbalanced, AVL and red-black binary trees, in both pure Python and as extensions written in Cython.
I think the rest of the reason is essentially historical accident. If the person who wrote bintrees lobbied for its inclusion in the stdlib, and was willing to put up with the constraints that imposes on maintenance and releases, it would probably go in. (Although the Cython dependency would cause a problem, I'd guess.)
Algorithmic complexity:
For hash tables (like dicts or sets), insertion and lookup are O(1), while for a balanced tree these are O(log(n)). In-order traversal of keys is O(n) in a tree, but to do the same thing with a hash table you need to sort the keys first, so it's O(n*log(n)). When you're picking which kind of data structure to use, you need to think about which operations you're going to be using, and pick the tradeoff that makes the most sense in your application.
Is there a function in Python's standard library that does something similar to the loop function?
itertools.product(*groups)
Is there a PYTHON data structure with O(log n) deletion/insertion and supports indexing?
Python does not appear to have such a structure in the standard library.
I'm not exactly sure of your needs, but since you considered using a set, that means you don't need duplicate items. Consider not changing the length of the list for a delete. Instead, replace the element you want to remove with its next-highest (non-equal) neighbor to the right. (If it happens to be the last element, you can just pop it.) The length and general indexing will be wrong, but a binary search will still work, which might be all that you need the indexing for.
Say you have [0,1,2,3,4,5]
and you want to remove the 3
. Make the list [0,1,2,4,4,5]
. If you then want to remove the 4
, make it [0,1,2,5,5,5]
.
You can use a binary search to find both ends of the run, which gives you the O(log n) removal you want. Use bisect_left
first, and then pass in the answer from that to restrict the part of the list searched by bisect_right
. Once you know the bounds, Python can assign the whole slice at once.
If you then want to remove the 5
, pop them off [0,1,2]
. Removal at the end of the list is more efficient since it won't require allocating a new array.
You can clean up the duplicates occasionally for better amortized performance, or if you need the length or something. Maybe when the number of "deletions" reaches a certain fraction of the original length. Don't go through a set
, because you'd have to sort again. (OrderedDict.fromkeys
could work, but you'd still have to build a list from the .keys()
). Just copy the list while skipping over the duplicates, like
itr = iter(old)
new = [next(itr)]
for e in itr:
if e is not new[-1]:
new.append(e)
Related Topics
What's a Good Equivalent to Subprocess.Check_Call That Returns the Contents of Stdout
Downloading a Directory Tree with Ftplib
How to Replace Back Slash Character with Empty String in Python
How to Print a Percentage Value in Python
Understanding Time.Perf_Counter() and Time.Process_Time()
Get All Object Attributes in Python
All Synonyms for Word in Python
How to Install a Python Package from Within Ipython
Python: Catching Specific Exception
How to Rotate a Matplotlib Plot Through 90 Degrees
Python: Read Lines from Compressed Text Files
Python Webdriver to Handle Pop Up Browser Windows Which Is Not an Alert
What Is the Most Efficient Way of Counting Occurrences in Pandas
Creating Lowpass Filter in Scipy - Understanding Methods and Units
Converting Currency with $ to Numbers in Python Pandas