Python Garbage Collector Documentation

Python garbage collector documentation

  • Python Garbage Collection
  • gc module docs
  • Details on Garbage Collection for Python

There's no definitive resource on how Python does its garbage collection (other than the source code itself), but those 3 links should give you a pretty good idea.

Update

The source is actually pretty helpful. How much you get out of it depends on how well you read C, but the comments are actually very helpful. Skip down to the collect() function and the comments explain the process well (albeit in very technical terms).

Does python garbage collection automatic in python?

Yes, Python garbage collector removes every object not referenced to. The feature is based on reference counting. However it can also deal with cyclic references.

Of course when the process is terminated, all its resources are released. However it is rather OS dependent.

[EDIT]

In case of long-living processes, there are at least two possibilities of a memory leak:

  • the process may keep references to objects permanently,
  • objects with finalizers (.__del__() method) may get stuck in a noncollectable cycle (see http://arctrix.com/nas/python/gc/ for details).

Garbage Collector and gc module

In CPython, objects are cleared from memory immediately when their reference count drops to 0.

The moment you rebind a to 'hello', the reference count for the 'hi' string object is decremented. If it reaches 0, it'll be removed from memory.

As such, the garbage collector only needs to deal with objects that (indirectly or directly) reference one another, and thus keep the reference count from ever dropping to 0.

Strings cannot reference other objects, so are not of interest to the garbage collector. But anything that can reference something else (such as containers types such as lists or dictionaries, or any Python class or instance) can produce a circular reference:

a = []  # Ref count is 1
a.append(a) # A circular reference! Ref count is now 2
del a # Ref count is decremented to 1

The garbage collector detects these circular references; nothing else references a, so eventually the gc process breaks the circle, letting the reference counts drop to 0 naturally.

Incidentally, the Python compiler bundles string literals such as 'hi' and 'hello' as constants with the bytecode produced and as such, there is always at least one reference to such objects. In addition, string literals used in source code that match the regular expression [a-zA-Z0-9_] are interned; made into singletons to reduce the memory footprint, so other code blocks that use the same string literal will hold a reference to the same shared string.

What is the most reliable way to register a clean up when a Python object's ref count reaches zero?

First thing to know is that the "garbage collection" (gc) module in CPython exists solely to collect trash caught in reference cycles. In most programs, almost all trash is reclaimed by reference counting alone, and gc has nothing to do with it. In fact, I have a number of long-running programs I know don't create cyclic trash, and I start them with gc.disable(). They can run then for days without leaking memory - refcounting alone (which cannot be disabled) is all they need to reclaim all the significant trash they create.

So, no, thinking about gc at all is mostly a red herring here.

__del__() is the best hook you have. That's why it exists: to give you a way to do something when CPython learns that an object is trash.

That doesn't mean it's bulletproof, but then nothing can be. For example, if the OS kills your program (via, e.g, SIGKILL on Linux), your program ends at once, with no chance to perform any cleanup actions.

An alternative is to use weakref.finalize(obj, callback, ...) to register a callback to be invoked when a weakly-referencable object obj becomes trash. That has the sometimes-benefit of arranging (by default) for the atexit module to run the callback during the normal interpreter shutdown sequence even if the object is still alive then. See the weakref module's

"Comparing finalizers with __del__() methods"

section for more on that.

Python: is the garbage collector run before a MemoryError is raised?

Actually, there are reference cycles, and it's the only reason why the manual gc.collect() calls are able to reclaim memory at all.

In Python (I'm assuming CPython here), the garbage collector's sole purpose is to break reference cycles. When none are present, objects are destroyed and their memory reclaimed at the exact moment the last reference to them is lost.

As for when the garbage collector is run, the full documentation is here: http://docs.python.org/2/library/gc.html

The TLDR of it is that Python maintains an internal counter of object allocations and deallocations. Whenever (allocations - deallocations) reaches 700 (threshold 0), a garbage collection is run and both counters are reset.

Every time a collection happens (either automatic, or manually run with gc.collect()), generation 0 (all objects that haven't yet survived a collection) is collected (that is, objects with no accessible references are walked through, looking for reference cycles -- if any are found, the cycles are broken, possibly leading to objects being destroyed because there are no references left). All objects that remain after that collection are moved to generation 1.

Every 10 collections (threshold 1), generation 1 is also collected, and all objects in generation 1 that survive that are moved to generation 2. Every 10 collections of generation 1 (that is, every 100 collections -- threshold 2), generation 2 is also collected. Objects that survive that are left in generation 2 -- there is no generation 3.

These 3 thresholds can be user-set by calling gc.set_threshold(threshold0, threshold1, threshold2).

What this all means for your program:

  1. The GC is not the mechanism CPython uses to reclaim memory (refcounting is). The GC breaks reference cycles in "dead" objects, which may lead to some of them being destroyed.
  2. No, there are no guarantees that the GC will run before a MemoryError is raised.
  3. You have reference cycles. Try to get rid of them.

Are there any Python reference counting/garbage collection gotchas when dealing with C code?

Your link to http://docs.python.org/extending/extending.html#reference-counts is the right place. The Extending and Embedding and Python/C API sections of the documentation are the ones that will explain how to use the C API.

Reference counting is one of the annoying parts of using the C API. The main gotcha is keeping everything straight: Depending on the API function you call, you may or may not own the reference to the object you get. Be careful to understand whether you own it (and thus cannot forget to DECREF it or give it to something that will steal it) or are borrowing it (and must INCREF it to keep it and possibly to use it during your function). The most common bugs involving this are 1) remembering incorrectly whether you own a reference returned by a particular function and 2) believing you're safe to borrow a reference for a longer time than you are.

You do not have to do anything special for the cyclic garbage collector. It's just there to patch up a flaw in reference counting and doesn't require direct access.

When does CPython garbage collect?

The GC runs periodically based on the (delta between the) number of allocations and deallocations that have taken place since the last GC run.

See the gc.set_threshold() function:

In order to decide when to run, the collector keeps track of the number object allocations and deallocations since the last collection. When the number of allocations minus the number of deallocations exceeds threshold0, collection starts.

You can access the current counts with gc.get_count(); this returns a tuple of the 3 counts GC tracks (the other 2 are to determine when to run deeper scans).

The PyPy garbage collector operates entirely differently, as the GC process in PyPy is responsible for all deallocations, not just cyclic references. Moreover, the PyPy garbage collector is pluggable, meaning that how often it runs depends on what GC option you have picked. The default Minimark strategy doesn't even run at all when below a memory threshold, for example.

See the RPython toolchain Garbage Collector documentation for some details on their strategies, and the Minimark configuration options for more hints on what can be tweaked.

Ditto for Jython or IronPython; these implementations rely on the host runtime (Java and .NET) to handle garbage collection for them.



Related Topics



Leave a reply



Submit