Why Do Two Identical Lists Have a Different Memory Footprint

Why do two identical lists have a different memory footprint?

When you write [None] * 10, Python knows that it will need a list of exactly 10 objects, so it allocates exactly that.

When you use a list comprehension, Python doesn't know how much it will need. So it gradually grows the list as elements are added. For each reallocation it allocates more room than is immediately needed, so that it doesn't have to reallocate for each element. The resulting list is likely to be somewhat bigger than needed.

You can see this behavior when comparing lists created with similar sizes:

>>> sys.getsizeof([None]*15)
184
>>> sys.getsizeof([None]*16)
192
>>> sys.getsizeof([None for _ in range(15)])
192
>>> sys.getsizeof([None for _ in range(16)])
192
>>> sys.getsizeof([None for _ in range(17)])
264

You can see that the first method allocates just what is needed, while the second one grows periodically. In this example, it allocates enough for 16 elements, and had to reallocate when reaching the 17th.

Why do lists with the same data have different sizes?

When you append to a list, memory is over-allocated to the list for performance reasons so that multiple appends would not require corresponding reallocations of memory for the list which would slow the overall performance in the case of repeated appends.

The CPython source clearly describes this behaviour in a comment:

/* This over-allocates proportional to the list size, making room
 * for additional growth.  The over-allocation is mild, but is
 * enough to give linear-time amortized behavior over a long
 * sequence of appends() in the presence of a poorly-performing
 * system realloc().
 * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
 * Note: new_allocated won't overflow because the largest possible value
 *       is PY_SSIZE_T_MAX * (9 / 8) + 6 which always fits in a size_t.
 */

In the other case, the list is constructed from a literal, and the size of the list reflects the size of the container itself and the references to each contained object.

The exact allocation behaviour may vary with other Python implementations (see list.append implementations for Jython and PyPy) and over-allocation is not guaranteed to exist.

Two lists have different memory address but still are referenced

You need to use deepcopy

from copy import deepcopy

myarr = [
    ["s"],
    ["s"],
    ["w"],
    ["p"],
    ["m"],
    ["g"],
    ["c"]
]

newarr = deepcopy(myarr)

print(myarr, "myarr")
print(newarr, "newarr")
print("##### starting manipulation #########")

for i in newarr:
    i[0] = "a"

print(myarr, "myarr")
print(newarr, "newarr")

Output :

([['s'], ['s'], ['w'], ['p'], ['m'], ['g'], ['c']], 'myarr')
([['s'], ['s'], ['w'], ['p'], ['m'], ['g'], ['c']], 'newarr')
##### starting manipulation #########
([['s'], ['s'], ['w'], ['p'], ['m'], ['g'], ['c']], 'myarr')
([['a'], ['a'], ['a'], ['a'], ['a'], ['a'], ['a']], 'newarr')

Why don't lists with the same values point to the same memory location in python?

Lists are mutable. You don't want changes to one ostensibly independent list modifying another that is coincidentally identical.

Strings, on the other hand, are immutable. You can't make changes to var1 that would affect var2, so it's possible to share the underlying object. Note that it is not guaranteed that two str literals produce the same object, though. It is implementation-dependent when and whether such caching occurs.

Does the memory increase when the same object is added to two lists?

Not by much, just the necessary parts to the list inner workings to add the new item.

Java uses References, so when an instance is added to two list you are just adding the reference (effectively an id) to the instance not a copy of it.

Which also means that if you change it in one list it also changes in the other.

Different lists in Python in size and content still share the id, does memory matter?

Make a list, and some items to it. the id remains the same:

In [21]: alist = []
In [22]: id(alist)
Out[22]: 139981602327808
In [23]: for i in range(29): alist.append(i)
In [24]: id(alist)
Out[24]: 139981602327808

But the memory use for this list occurs in several parts. There's some sort storage for the list instance itself (that's that the id references). Python is written in C, but all items are objects (as in C++).

The list also has a data buffer, think of it as a C array with fix size. It holds pointers to objects elsewhere in memory. That buffer has space for the current references plus some sort of growth space. As you add items to list, their references are inserted in the growth space. When that fills up, the list gets a new buffer, with more growth space. List append is relatively fast, with periodic slow downs as it copies references to the new buffer. All that occurs under the covers so that the Python programmer doesn't notice.

I suspect that in my example alist got a new buffer several times, but I don't there's any way to track or measure that.

Storage for the objects referenced by the list is another matter. cython creates small integer objects (up to 256) at the start, so my list (and yours) will have references to those unique integer objects. It also maintains some sort of cache of strings. But other things, such as larger numbers, other lists, dicts, custom class objects, are created as needed. Identically valued objects might well have different id.

So while the data buffer of the list is contiguous, the items referenced in the buffer are not.

By and large, that memory management is unimportant to programmers. It's only when data structures get very large that we need to worry. And that seems to occur more with object classes like numpy arrays, which have a very different memory use.

Difference in sizeof between a = [0] and a = [i for i in range(1)]

When the interpreter sees a = [0], it knows that it can build a list with just one element.

When it does a list comprehension, it first creates an empty list, then appends items as it goes. It doesn't know beforehand how big the list is going to be, even if it's iterating over something straightforward like range(1). So it tries to guess how much memory to allocate, and if it turns out that's not enough, it will have to dynamically increase the memory allocation. That's not cheap, so it may start with a generous guess.

Why is python storing these two variables at the same location?

x = 50000000

Here, Python creates an int and stores it at 0x23ef28ec930.

y = 50000000

Here, the interpreter was able to somehow detect that the number already exists in the program, and so used the same object for y that it uses for x. Since these are number literals, the number 50000000 may have been stored at compile time and then used for both of these assignments.

As far as I know, Python makes no guarantees about whether or not identical immutable values will have the same address. You should not rely on this behavior except when dealing with singletons, like None, True, and False.

Further to this if I add in the line y+=1 just before the print statement then x and y are stored in separate locations in memory and x is y is False. I would have thought that as the are both the same object in memory that incrementing y would increment x also.

ints are immutable. When you do y += 1, you are assigning a new value to y, which makes it different from x.

As a further point does this behaviour depend on the variable type? What about if x,y are not ints but lists, strings or any other object?

It depends a lot on whether or not the values are mutable or immutable. Strings are immutable, so identical ones might have the same address. Lists are mutable, so it makes a big difference whether x and y refer to the same list or to two newly constructed lists. These two snippets are very different:

# These have different addresses. They can be changed independently.
x = []
y = []

# These variables refer to the same list.
x = []
y = x

Note that operators like += attempt to change the object in place. This is obviously not possible with immutable values, but for lists, y += [6] modifies the existing list, while y = y + [6] makes a new list. += is not always equivalent to plain addition.

Why Do Two Identical Lists Have a Different Memory Footprint