Which Is Faster and Why Set or List

Which is faster and why? Set or List?

Membership testing in a set is vastly faster, especially for large sets. That is because the set uses a hash function to map to a bucket. Since Python implementations automatically resize that hash table, the speed can be constant (O(1)) no matter the size of the set (assuming the hash function is sufficiently good).

In contrast, to evaluate whether an object is a member of a list, Python has to compare every single member for equality, i.e. the test is O(n).

What makes sets faster than lists?

Sets are implemented using hash tables. Whenever you add an object to a set, the position within the memory of the set object is determined using the hash of the object to be added. When testing for membership, all that needs to be done is basically to look if the object is at the position determined by its hash, so the speed of this operation does not depend on the size of the set. For lists, in contrast, the whole list needs to be searched, which will become slower as the list grows.

This is also the reason that sets do not preserve the order of the objects you add.

Note that sets aren't faster than lists in general -- membership test is faster for sets, and so is removing an element. As long as you don't need these operations, lists are often faster.

Python Sets vs Lists

It depends on what you are intending to do with it.

Sets are significantly faster when it comes to determining if an object is present in the set (as in x in s), but are slower than lists when it comes to iterating over their contents.

You can use the timeit module to see which is faster for your situation.

Are sets really faster than lists?

What you've been told is correct, searching in a set is O(1) since members are stored using a hash table. Searching in an (unsorted) array is O(n).

The problem with your tests is that you're both creating the set/array and searching it in the same line. In this case, you're both testing the speed of inserting all the items, and then searching for a single entry.

Try something like this instead:

test_range = range(10000000)
test_set = set(test_range)
test_array = list(test_range)

timeit.timeit('10000 in test_set', number=10)
timeit.timeit('10000 in test_array', number=10)

Is set() faster than list(), python?

Python’s set class represents the mathematical notion of a set, namely a collection
of elements, without duplicates, and without an inherent order to those elements.
The major advantage of using a set, as opposed to a list, is that it has a highly
an optimized method for checking whether a specific element is contained in the set.
This is based on a data structure known as a hash table

However, there are two important restrictions due to the
algorithmic underpinnings. The first is that the set does not maintain the elements
in any particular order. The second is that only instances of immutable types can be
added to a Python set. Therefore, objects such as integers, floating-point numbers,
and character strings are eligible to be elements of a set. It is possible to maintain a
set of tuples, but not a set of lists or a set of sets, as lists and sets are mutable.

Which one is faster: iterating over a set and iterating over a list

I've run some tests with timeit, and (while list performs slightly faster) there is no significant difference:

>>> import timeit
>>> # For the set
>>> timeit.timeit("for i in s: pass", "s = set([1,4,7,10,13])")
0.20565616500061878
>>> # For the list
>>> timeit.timeit("for i in l: pass", "l = [1,4,7,10,13]")
0.19532391999928223

These values stay very much the same (0.20 vs. 0.19) even when trying multiple times.

However, the overhead of creating the sets can be significant.

Why is converting a list to a set faster than using just list to compute a list difference?

There is overhead to convert a list to a set, but a set is substantially faster than a list for those in tests.

You can instantly see if item x is in set y because there's a hash table being used underneath. No matter how large your set is, the lookup time is the same (basically instantaneous) - this is known in Big-O notation as O(1). For a list, you have to individually check every element to see if item x is in list z. As your list grows, the check will take longer - this is O(n), meaning the length of the operation is directly tied to how long the list is.

That increased speed can offset the set creation overhead, which is how your set check ends up being faster.

EDIT: to answer that other question, Python has no way of determining that your list is sorted - not if you're using a standard list object, anyway. So it can't achieve O(log n) performance with a list comprehension. If you wanted to write your own binary search method which assumes the list is sorted, you can certainly do so, but O(1) beats O(log n) any day.

EDIT 2:

I'm aware that a set's average O(1) lookup time beats that of a list's
O(n) but if the original list A contains about a million or so
integers, wouldn't the set creation actually take longer?

No, not at all. Creating a set out of a list is a O(n) operation, as inserting an item into a set is O(1) and you're doing that n times. If you have a list with a million integers in it, converting it into a set involves two O(n) steps, while repeatedly scanning the list is going to be n O(n) steps. In practice, creating the set is going to be about 250,000 times faster for a list with a million integers, and the speed difference will grow larger and larger the more items you have in your list.

Python list.append if not in list vs set.add performance

The set is far faster, in general. Testing for membership in a list is O(n), linear in the size of the list. Adding to a set is O(1), independent of the number of the items in the list. Aside from that, the list code makes two function calls: one to check if 12 is in the list, another to add it, while the set operation makes just one call.

Note that the list solution can be fast, though, when the item doesn't need to be added to the list because it was found early in the list.

# Add item to set
$ python -m timeit -s 's = set(range(100))' 's.add(101)'
10000000 loops, best of 3: 0.0619 usec per loop

# Add item not found in list
$ python -m timeit -s 'l = list(range(100))' 'if 101 not in l: l.append(101)'
1000000 loops, best of 3: 1.23 usec per loop

# "Add" item found early in list
$ python -m timeit -s 'l = list(range(100))' 'if 0 not in l: l.append(0)'
10000000 loops, best of 3: 0.0214 usec per loop

# "Add" item found at the end of the list
$ python -m timeit -s 'l = list(range(102))' 'if 101 not in l: l.append(101)'
1000000 loops, best of 3: 1.24 usec per loop

Better/Faster to Loop through set or list?

Just use a set. Its semantics are exactly what you want: a collection of unique items.

Technically you'll be iterating through the list twice: once to create the set, once for your actual loop. But you'd be doing just as much work or more with any other approach.

Performance and Memory allocation comparison between List and Set

If you don't care about the ordering, and don't delete elements, then it really boils down to whether you need to find elements in this data structure, and how fast you need those lookups to be.

Finding an element by value in a HashSet is O(1). In an ArrayList, it's O(n).

If you are only using the container to store a bunch of unique objects, and iterate over them at the end (in any order), then arguably ArrayList is a better choice since it's simpler and more economical.

Which Is Faster and Why Set or List