Python Sets VS Lists

Python Sets vs Lists

It depends on what you are intending to do with it.

Sets are significantly faster when it comes to determining if an object is present in the set (as in x in s), but its elements are not ordered so you cannot access items by index as you would in a list. Sets are also somewhat slower to iterate over in practice.

You can use the timeit module to see which is faster for your situation.

What makes sets faster than lists?

Sets are implemented using hash tables. Whenever you add an object to a set, the position within the memory of the set object is determined using the hash of the object to be added. When testing for membership, all that needs to be done is basically to look if the object is at the position determined by its hash, so the speed of this operation does not depend on the size of the set. For lists, in contrast, the whole list needs to be searched, which will become slower as the list grows.

This is also the reason that sets do not preserve the order of the objects you add.

Note that sets aren't faster than lists in general -- membership test is faster for sets, and so is removing an element. As long as you don't need these operations, lists are often faster.

What is difference between list and set in python?

A few key differences between a set and a list.

  • The order of a set is undefined. e.g. You can't be sure of what the "first" element of a set is.
  • A set must be unique, It cannot contain a duplicate of elements.

This is all very simplified, but with a set having only unique items and with its order undetermined, it allows the location of the object in memory to be defined uniquely based on itself. So when you search for a particular object in a set, it will already know the expected location of that object in memory. This is why it is O(1) (It does not need to iterate through all the elements)


  1. If you don't care about the order of your list and you will always have unique items, it is considered best to use sets.

set vs list performance difference

The reference for this would be the Python wiki "Time Complexity" page

In particular, adding an item to a set has complexity O(1) on average, meaning constant time regardless of the size of the set; meawhile, list x in l has complexity O(n), meaning it grows linearly with the size of the list.

Are sets really faster than lists?

What you've been told is correct, searching in a set is O(1) since members are stored using a hash table. Searching in an (unsorted) array is O(n).

The problem with your tests is that you're both creating the set/array and searching it in the same line. In this case, you're both testing the speed of inserting all the items, and then searching for a single entry.

Try something like this instead:

test_range = range(10000000)
test_set = set(test_range)
test_array = list(test_range)

timeit.timeit('10000 in test_set', number=10)
timeit.timeit('10000 in test_array', number=10)

When to use Lists, Sets, Dictionaries, or tuples in python?

Lists are used when you have data you want to further modify, alter like sorting and all.

Dictionary is used when you have to sets of data where data of the first set corresponds to data of other set. And the position of the data doesn't matter only the relation of the two sets matters.

A tuple is used when position of the data is very important and you don't want to alter the position throughout.

Beginner to python: Lists, Tuples, Dictionaries, Sets

  1. List : Used to store the items. Duplicates can also be stored. Order of storing will be maintained as is. It is mutable which means the same object can be altered after creation.
    Example :

    list1 = ['one',2,3.4,'hello']
    print(list1)
    print(list1[3])
    list1[1]=2 #allowed
    print(list1)

    Answer:

        'one', 2, 3.4, 'hello']
    hello
    [2, 2, 3.4, 'hello']
  2. Set : Just like list. Only difference is it cannot store the duplicate values. Also there will be no order in which values are stored.

Example:

>>>#set of integers
>>>s = {1, 2, 3}
>>>print(s)
{1, 2, 3}
>>>#print type of s
>>>print(type(s))
<class 'set'>
>>> x = set(['foo', 'bar', 'baz', 'foo', 'qux'])
>>> x
{'qux', 'foo', 'bar', 'baz'}

  1. Dictionary : Dictionary is basically a storage type in which you can store a value corresponding to a key just like a real life dictionary.

Example:

>>>dictval = {"Nishant":1,"Akash":2,"Ravi":3,"Hari":4}
>>>print(dictval)
{'Nishant': 1, 'Akash': 2, 'Ravi': 3, 'Hari': 4}
>>>print(dictval["Akash"])
2

  1. Tuple: A tuple is a sequence of values much like a list. The values stored in a tuple can be any type, and they are indexed by integers. The important difference is that tuples are immutable.That we can’t change the elements of tuple once it is assigned whereas in the list, elements can be changed.

Example:

>>>#tuple with mixed datatypes
>>>t = (1, 'raju', 28, 'abc')
>>>print(t)
>>>print(t[1])
(1, 'raju', 28, 'abc')
raju

>>>t = (1, 'raju', 28, 'abc')
>>>print(t)
>>>print(t[3])
>>>t[3] = 'def'
(1, 'raju', 28, 'abc')
abc
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-27-254e32ad7a3c> in <module>()
2 print(t)
3 print(t[3])
----> 4 t[3] = 'def'
TypeError: 'tuple' object does not support item assignment

What is the difference between Set and List?

List is an ordered sequence of elements whereas Set is a distinct list of elements which is unordered (thank you, Quinn Taylor).

List<E>:

An ordered collection (also known as a
sequence). The user of this interface
has precise control over where in the
list each element is inserted. The
user can access elements by their
integer index (position in the list),
and search for elements in the list.

Set<E>:

A collection that contains no
duplicate elements. More formally,
sets contain no pair of elements e1
and e2 such that e1.equals(e2), and at
most one null element. As implied by
its name, this interface models the
mathematical set abstraction.

Memory consumption of a list and set in Python

I think it's because of the inherent difference between list and set or dict i.e. the way in which the elements are stored.

List is nothing but a collection of references to the original object. Suppose you create 1000 integers, then 1000 integer objects are created and the list only contains the reference to these objects.

On the other hand, set or dictionary has to compute the hash value for these 1000 integers and the memory is consumed according to the number of elements.

For ex: In both set and dict, by default, the smallest size is 8 (that is, if you are only storing 3 values, python will still allocate 8 elements). On resize, the number of buckets increases by 4x until we reach 50,000 elements, after which the size is increased by 2x. This gives the following possible sizes,

16, 64, 256, 1024, 4096, 16384, 65536, 131072, 262144, ...

Some examples:

In [26]: a=[i for i in range(60000)]
In [27]: b={i for i in range(60000)}

In [30]: b1={i for i in range(100000)}
In [31]: a1=[i for i in range(100000)]

In [32]: getsizeof(a)
Out[32]: 514568
In [33]: getsizeof(b)
Out[33]: 2097376

In [34]: getsizeof(a1)
Out[34]: 824464
In [35]: getsizeof(b1)
Out[35]: 4194528

Answers:
Yes, it's the internal structure in the way set stores the elements consumes this much memory. And, sys.getsizeof is correct only; There's nothing wrong with using that here.

For more detailed reference about list, set or dict please refer this chapter: High Performance Python



Related Topics



Leave a reply



Submit