How Is Python's List Implemented

How is Python's List Implemented?

It's a dynamic array. Practical proof: Indexing takes (of course with extremely small differences (0.0013 µsecs!)) the same time regardless of index:

...>python -m timeit --setup="x = [None]*1000" "x[500]"
10000000 loops, best of 3: 0.0579 usec per loop

...>python -m timeit --setup="x = [None]*1000" "x[0]"
10000000 loops, best of 3: 0.0566 usec per loop

I would be astounded if IronPython or Jython used linked lists - they would ruin the performance of many many widely-used libraries built on the assumption that lists are dynamic arrays.

How come list element lookup is O(1) in Python?

A list in Python is implemented as an array of pointers1. So, what's really happening when you create the list:

["perry", 1, 23.5, "s"]

is that you are actually creating an array of pointers like so:

[0xa3d25342, 0x635423fa, 0xff243546, 0x2545fade]

Each pointer "points" to the respective objects in memory, so that the string "perry" will be stored at address 0xa3d25342 and the number 1 will be stored at 0x635423fa, etc.

Since all pointers are the same size, the interpreter can in fact add 3 times the size of an element to the address of li[0] to get to the pointer stored at li[3].


1 Get more details from: the horse's mouth (CPython source code on GitHub).

Why are Python Lists called 'lists' when they are implemented as dynamic arrays

They're named after the list abstract data type, not linked lists. This is similar to the naming of Java's List interface and C#'s List<T>.

What is the underlying data structure for Python lists?

List objects are implemented as
arrays. They are optimized for fast
fixed-length operations and incur O(n)
memory movement costs for pop(0) and
insert(0, v) operations which change
both the size and position of the
underlying data representation.

See also:
http://docs.python.org/library/collections.html#collections.deque

Btw, I find it interesting that the Python tutorial on data structures recommends using pop(0) to simulate a queue but does not mention O(n) or the deque option.

http://docs.python.org/tutorial/datastructures.html#using-lists-as-queues

Does Python use linked lists for lists? Why is inserting slow?

Python uses a linear list layout in memory so that indexing is fast (O(1)).

Is a Python list the same list as defined by my data structures and algorithms textbook?

There are multiple names for many data structures (hash[map/table]/dictionary/associative array comes to mind). You're referring to two data structures which are unrelated beyond the fact that they both store a linear sequence of items. The name similarity is an unfortunate source of confusion, but the TL;DR is that lists in Python's terminology are very similar to, and are built upon, arrays while linked lists, which your text refers to as "lists", aren't.

It's important to separate implementation from interface: the underlying data structure behind a queue, for example, could be an array or a linked list, but this should be an implementation detail that the client of the interface doesn't know or care about. The interface should enforce guarantees for the time complexity of various operations. It's common for data structure texts to implement abstract data types using various underlying "primitive" data structures like arrays or linked lists, many of which may impact the resulting complexity of various operations (prompting student analysis).

Linked lists are simple, dynamic (expandible) data structures that offer fast O(1) add and removal operations on both ends. Their nodes may be doubly-linked and are usually structs or objects in memory with pointers or references to next and (if doubly-linked) previous elements as well as a data property per node. There is no random access for linked lists--you have to walk through the list from an endpoint to locate a single element by following the pointers.

Linked lists are typically used to implement queues, stacks and more complex data structures like Java's LinkedHashMap, which combines a doubly linked list and a hash map. Python offers collections.deque and Queue which are linked list-based data structures offering fast access to front and back elements. Other libraries, like Java's ArrayDeque, use circular arrays to implement the same deque interface.

Tree nodes are a variant of singly linked list nodes with two child pointers rather than a single next pointer. As with linked lists (and their nodes), typically tree nodes and trees will be an implementation detail behind an interface such as the C++ map. You can implement trees with arrays, for example, as is typically the case with heaps.

Python's lists (also known as ArrayLists (Java), arrays (JS/Ruby), vectors (C++/Haskell) and List (C#)) are a dynamic abstraction on primitive arrays, which are fixed in size. The list abstraction adds a length property and many functions for manipulating the underlying array like append/push/push_back, pop, shift, splice, etc (names are dependent on language). The underlying array will be automatically resized to fit the number of elements it contains. Inserting or removing elements at the front or middle of the list is an O(n) operation since as many as all elements need to be shifted to accommodate the adjustment to the array. This shifting is part of the abstraction and is hidden to the client.

The advantages of lists are the same as those of arrays: fast random access to any element. Additions to the end of a list are also fast and constant-time because the occasional reallocations of the underlying array are amortized across multiple append operations.

An important consideration for lists versus primitive arrays is the memory scheme. Python lists are comprised of objects and suffer from many of the problems of linked lists in that pointers to heap-allocated data need to be referenced and may have poor locality. Using a numpy array provides the advantages of a C array in that you get chunks of sequential memory which benefit from fast access (sweeping a memory-aligned offset over the data) and locality. Increased overhead is the typical cost of abstractions like lists and it can have a major performance impact on certain applications.

To add to the confusion, at least two high level languages, C++ and Haskell, have a "list" data structure, but these are actually linked lists rather than dynamic array(list)s.

Regardless of the language you're using, it's important to distinguish what the data structures you're using really are (in terms of time complexity for typical operations, primarily) to avoid inadvertent misuse and select the correct tool for the job.

How does list indexing work under the hood in Python3?

the arrays (list in python) technically store pointers rather than the objects themselves, which allows the array to contain only elements of a specific size even with mixed types list in python.

from python docs:

CPython’s lists are really variable-length arrays, not Lisp-style
linked lists. The implementation uses a contiguous array of references
to other objects, and keeps a pointer to this array and the array’s
length in a list head structure.

This makes indexing a list a[i] an operation whose cost is independent
of the size of the list or the value of the index.

When items are appended or inserted, the array of references is
resized. Some cleverness is applied to improve the performance of
appending items repeatedly; when the array must be grown, some extra
space is allocated so the next few times don’t require an actual
resize.

source:
https://docs.python.org/3/faq/design.html#how-are-lists-implemented-in-cpython

more explanation :


what is a pointer?

A pointer is a variable that stores a memory address. Pointers are used to store the addresses of other variables or memory items.

and how indexing work ?

a[i] means the same as (p + i) when p represents pointer to the first element of an array:
*(a + i)
so if a pointer p points to an element of an array, then adding n to the pointer makes it point at the nth element following the original one. this involved adding or subtracting the correct offset (based on the size of a reference) in bytes between objects.

the size of a reference same as word size for the CPU 4 bytes on a 32bit system, and 8 bytes on a 64bit system

memory representation of array of pointers

hope this clear things to you ..
this is my first answer for me in stackoverflow, vote up if it is helpful. thank you.

Meaning of list[-1] in Python

One of the neat features of Python lists is that you can index from the end of the list. You can do this by passing a negative number to []. It essentially treats len(array) as the 0th index. So, if you wanted the last element in array, you would call array[-1].

All your return c.most_common()[-1] statement does is call c.most_common and return the last value in the resulting list, which would give you the least common item in that list. Essentially, this line is equivalent to:

temp = c.most_common()
return temp[len(temp) - 1]

Why is a list access O(1) in Python?

Get item is getting an item in a specific index, while lookup means searching if some element exists in the list. To do so, unless the list is sorted, you will need to iterate all elements, and have O(n) Get Item operations, which leads to O(n) lookup.

A dictionary is maintaining a smart data structure (hash table) under the hood, so you will not need to query O(n) times to find if the element exists, but a constant number of times (average case), leading to O(1) lookup.



Related Topics



Leave a reply



Submit