Find Out How Much Memory Is Being Used by an Object in Python

Find out how much memory is being used by an object in Python

There's no easy way to find out the memory size of a python object. One of the problems you may find is that Python objects - like lists and dicts - may have references to other python objects (in this case, what would your size be? The size containing the size of each object or not?). There are some pointers overhead and internal structures related to object types and garbage collection. Finally, some python objects have non-obvious behaviors. For instance, lists reserve space for more objects than they have, most of the time; dicts are even more complicated since they can operate in different ways (they have a different implementation for small number of keys and sometimes they over allocate entries).

There is a big chunk of code (and an updated big chunk of code) out there to try to best approximate the size of a python object in memory.

You may also want to check some old description about PyObject (the internal C struct that represents virtually all python objects).

Total memory used by Python process?

Here is a useful solution that works for various operating systems, including Linux, Windows, etc.:

import os, psutil
process = psutil.Process(os.getpid())
print(process.memory_info().rss) # in bytes

Notes:

  • do pip install psutil if it is not installed yet

  • handy one-liner if you quickly want to know how many MiB your process takes:

    import os, psutil; print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)
  • with Python 2.7 and psutil 5.6.3, it was process.memory_info()[0] instead (there was a change in the API later).

How can I check the memory usage of objects in iPython?

Unfortunately this is not possible, but there are a number of ways of approximating the answer:

  1. for very simple objects (e.g. ints, strings, floats, doubles) which are represented more or less as simple C-language types you can simply calculate the number of bytes as with John Mulder's solution.

  2. For more complex objects a good approximation is to serialize the object to a string using cPickle.dumps. The length of the string is a good approximation of the amount of memory required to store an object.

There is one big snag with solution 2, which is that objects usually contain references to other objects. For example a dict contains string-keys and other objects as values. Those other objects might be shared. Since pickle always tries to do a complete serialization of the object it will always over-estimate the amount of memory required to store an object.

Variable's memory size in Python

Use sys.getsizeof to get the size of an object, in bytes.

>>> from sys import getsizeof
>>> a = 42
>>> getsizeof(a)
12
>>> a = 2**1000
>>> getsizeof(a)
146
>>>

Note that the size and layout of an object is purely implementation-specific. CPython, for example, may use totally different internal data structures than IronPython. So the size of an object may vary from implementation to implementation.

How to see how much memory is using Python?

There is no in-built way to do this short of making an external system call to get back information about the current process' memory usage, such as reading /proc/meminfo for the current process directly in Linux.

If you are content with a Unix-only solution in the standard library that can return only the peak resident memory used, you are looking for resource.getrusage(resource.RUSAGE_SELF).ru_maxrss.

This function returns an object that describes the resources consumed by either the current process or its children...

>>> resource.getrusage(resource.RUSAGE_SELF)
resource.struct_rusage(ru_utime=0.058433,
ru_stime=0.021911999999999997, ru_maxrss=7600, ru_ixrss=0,
ru_idrss=0, ru_isrss=0, ru_minflt=2445, ru_majflt=1, ru_nswap=0,
ru_inblock=256, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0,
ru_nvcsw=148, ru_nivcsw=176)

This will not be able to tell you how much memory is being allocated between invocations, but it may be useful to track the growth in peak memory used over the lifetime of an application.

Some Python profilers written in C have been developed to interface directly with CPython that are capable of retrieving information about the total memory used. One example is Heapy, which also possesses graphical plotting abilities.

If you only want to track down the memory consumed by new objects as they have been added to the stack, you can always use sys.getsizeof() on each new object to get back a running total of space allocated.

Find the memory size of a set of strings vs. set of bytestrings

sys.getsizeof does not measure the size of the full target data structure. It only measure the memory taken by the set object which contains references to strings/bytes objects. The references are not included in the returned memory consumption (ie. it does not walk recursively in each object of the target data structure). A reference takes typically 8 bytes on a 64-bit platform and a CPython set is not as compact as a list: it is implemented like a hash-table with many buckets and some buckets are unused. In fact, this is mandatory for this data structure to be fast (in general, the occupancy should be 50%-90%). Moreover, each bucket contains a hash which usually takes 8 bytes.

The string themselves take much more space than a bucket (at least on my machine):

sys.getsizeof(randomstring(50))           # 99
sys.getsizeof(randomstring(50).encode()) # 83

On my machine, it turns out that CPython strings are 16 bytes bigger than bytes.



Related Topics



Leave a reply



Submit