What Is the Time Complexity of Popping Elements from List in Python

What is the time complexity of popping elements from list in Python?

Pop() for the last element ought to be O(1) since you only need to return the element referred to by the last element in the array and update the index of the last element. I would expect pop() for an arbitrary element to be O(N) and require on average N/2 operations since you would need to move any elements beyond the element you are removing one position up in the array of pointers.

Why is the big O of pop() different from pop(0) in python

Python's list implementation uses a dynamically resized C array under the hood, removing elements usually requires you to move elements following after up to prevent gaps.

list.pop() with no arguments removes the last element. Accessing that element can be done in constant time. There are no elements following so nothing needs to be shifted.

list.pop(0) removes the first element. All remaining elements have to be shifted up one step, so that takes O(n) linear time.

Python list.pop(i) time complexity?

Your algorithm genuinely does take O(n) time and the "pop in reverse order" algorithm genuinely does take O(n²) time. However, LeetCode isn't reporting that your time complexity is better than 89% of submissions; it is reporting your actual running time is better than 89% of all submissions. The actual running time depends on what inputs the algorithm is tested with; not just the sizes but also the number of duplicates.

It also depends how the running times across multiple test cases are averaged; if most of the test cases are for small inputs where the quadratic solution is faster, then the quadratic solution may come out ahead overall even though its time complexity is higher. @Heap Overflow also points out in the comments that the overhead time of LeetCode's judging system is proportionally large and quite variable compared to the time it takes for the algorithms to run, so the discrepancy could simply be due to random variation in that overhead.

To shed some light on this, I measured running times using timeit. The graph below shows my results; the shapes are exactly what you'd expect given the time complexities, and the crossover point is somewhere between 8000 < n < 9000 on my machine. This is based on sorted lists where each distinct element appears on average twice. The code I used to generate the times is given below.

running times

Timing code:

def linear_solution(nums):
    left, right = 0, 0
    while right < len(nums)-1:
        if nums[right] != nums[right+1]:
            nums[left+1]=nums[right+1]
            left += 1
        right += 1
    return left + 1

def quadratic_solution(nums):
    prev_obj = []
    for i in range(len(nums)-1,-1,-1):
        if prev_obj == nums[i]:
            nums.pop(i)
        prev_obj = nums[i]
    return len(nums)

from random import randint
from timeit import timeit

def gen_list(n):
    max_n = n // 2
    return sorted(randint(0, max_n) for i in range(n))

# I used a step size of 1000 up to 15000, then a step size of 5000 up to 50000
step = 1000
max_n = 15000
reps = 100

print('n', 'linear time (ms)', 'quadratic time (ms)', sep='\t')
for n in range(step, max_n+1, step):
    # generate input lists
    lsts1 = [ gen_list(n) for i in range(reps) ]
    # copy the lists by value, since the algorithms will mutate them
    lsts2 = [ list(g) for g in lsts1 ]
    # use iterators to supply the input lists one-by-one to timeit
    iter1 = iter(lsts1)
    iter2 = iter(lsts2)
    t1 = timeit(lambda: linear_solution(next(iter1)), number=reps)
    t2 = timeit(lambda: quadratic_solution(next(iter2)), number=reps)
    # timeit reports the total time in seconds across all reps
    print(n, 1000*t1/reps, 1000*t2/reps, sep='\t')

The conclusion is that your algorithm is indeed faster than the quadratic solution for large enough inputs, but the inputs LeetCode is using to measure running times are not "large enough" to overcome the variation in the judging overhead, and the fact that the average includes times measured on smaller inputs where the quadratic algorithm is faster.

What is the time complexity of pop() for the set in Python?

On modern CPython implementations, pop takes amortized constant-ish time (I'll explain further). On Python 2, it's usually the same, but performance can degrade heavily in certain cases.

A Python set is based on a hash table, and pop has to find an occupied entry in the table to remove and return. If it searched from the start of the table every time, this would take time proportional to the number of empty leading entries, and it would get slower with every pop.

To avoid this, the standard CPython implementation tries to remember the position of the last popped entry, to speed up sequences of pops. CPython 3.5+ has a dedicated finger member in the set memory layout to store this position, but earlier versions abuse the hash field of the first hash table entry to store this index.

On any Python version, removing all elements from a set with a sequence of pop operations will take time proportional to the size of the underlying hash table, which is usually within a small constant factor of the original number of elements (unless you'd already removed a bunch of elements). Mixing insertions and pop can interfere heavily with this on Python 2 if inserted elements land in hash table index 0, trashing the search finger. This is much less of an issue on Python 3.

What is the time complexity of popping an element from a dict in Python?

The time complexity of dict.pop is exactly the same as that of dict.get. list.pop is O(N) because it needs to shift elements, but dict.pop doesn't do that.

That said, dict.pop probably won't improve your lookup time. Deleting a key from a dict needs to leave a DKIX_DUMMY marker in its place, and the lookup routine needs to treat any DKIX_DUMMY it finds as if it were a hash collision, and keep going. At best, you'll save some == comparisons on deleted keys.

Even if dict.pop were an improvement, it wouldn't save you from O(N) worst-case lookup time. If you need to deal with adversarial key choices, dicts may not be a good fit for your use case.

Which is faster in Python for popping elements in between- list or set?

When you pop an item from a list, all the filing elements most be shifted, so on average the operation is O(n). Popping the first element is the worst case since the entire list has to shift. Popping the last element is effectively O(1) because nothing needs to shift. As far as I know, python lists don't reallocate to a smaller buffer, so the entire list is only copied when the first element is popped.

Sets are implemented as hashables. Unless you get catastrophic had collision, popping an element is always O(1). If you're popping an arbitrary element, since sets are unordered, it may be O(1) even in the face of collisions, depending on how the buckets are implemented.

What are effects on overhead of using list.pop(0)?

If you look into the actual pop code in cpython/Objects/listobject.c, you'll see there are memcpy/memmove calls for the case where you're not popping the last element (it's actually done in the call to list_ass_slice).

Hence, it's not a non-trivial expense (certainly not O(1) as suggested in a comment - that may be true for a linked-list type structure but that's not what Python lists are). It's the fact that it's doing the element removal in-place that means that the id won't change but that doesn't mean it's efficient.

For example, consider the list:

    0    1     2     3     4     5     <- index
+-----+-----+-----+-----+-----+-----+
|  A  |  B  |  C  |  D  |  E  |  F  |
+-----+-----+-----+-----+-----+-----+

Popping the last element is usually an O(1) operation since it simply needs to take out F and reduce the size.

However, popping the first element means taking out A and then moving all the elements B..F to the position where A was:

    0    1     2     3     4     <- index
+-----+-----+-----+-----+-----+
|  B  |  C  |  D  |  E  |  F  |
+-----+-----+-----+-----+-----+

But keep in mind it probably won't matter unless your lists get really big. The objects themselves aren't being reconstructed since the list only holds references to them.

What Is the Time Complexity of Popping Elements from List in Python