How to Explicitly Free Memory in Python

How can I explicitly free memory in Python?

According to Python Official Documentation, you can explicitly invoke the Garbage Collector to release unreferenced memory with gc.collect(). Example:

import gc

gc.collect()

You should do that after marking what you want to discard using del:

del my_array
del my_object
gc.collect()

Free memory in Python

You cannot really free memory manually in Python.

Using del decreases the reference count of an object. Once that reference count reaches zero, the object will be freed when the garbage collector is run.

So the best you can do is to run gc.collect() manually after del-ing a bunch of objects.

In these cases the best advice is usually to try and change your algorithms. For example use a generator instead of a list as Thijs suggests in the comments.

The other strategy is to throw hardware at the problem (buy more RAM). But this generally has financial and technical limits. :-)

Releasing memory in Python

Memory allocated on the heap can be subject to high-water marks. This is complicated by Python's internal optimizations for allocating small objects (PyObject_Malloc) in 4 KiB pools, classed for allocation sizes at multiples of 8 bytes -- up to 256 bytes (512 bytes in 3.3). The pools themselves are in 256 KiB arenas, so if just one block in one pool is used, the entire 256 KiB arena will not be released. In Python 3.3 the small object allocator was switched to using anonymous memory maps instead of the heap, so it should perform better at releasing memory.

Additionally, the built-in types maintain freelists of previously allocated objects that may or may not use the small object allocator. The int type maintains a freelist with its own allocated memory, and clearing it requires calling PyInt_ClearFreeList(). This can be called indirectly by doing a full gc.collect.

Try it like this, and tell me what you get. Here's the link for psutil.Process.memory_info.

import os
import gc
import psutil

proc = psutil.Process(os.getpid())
gc.collect()
mem0 = proc.memory_info().rss

# create approx. 10**7 int objects and pointers
foo = ['abc' for x in range(10**7)]
mem1 = proc.memory_info().rss

# unreference, including x == 9999999
del foo, x
mem2 = proc.memory_info().rss

# collect() calls PyInt_ClearFreeList()
# or use ctypes: pythonapi.PyInt_ClearFreeList()
gc.collect()
mem3 = proc.memory_info().rss

pd = lambda x2, x1: 100.0 * (x2 - x1) / mem0
print "Allocation: %0.2f%%" % pd(mem1, mem0)
print "Unreference: %0.2f%%" % pd(mem2, mem1)
print "Collect: %0.2f%%" % pd(mem3, mem2)
print "Overall: %0.2f%%" % pd(mem3, mem0)

Output:

Allocation: 3034.36%
Unreference: -752.39%
Collect: -2279.74%
Overall: 2.23%

Edit:

I switched to measuring relative to the process VM size to eliminate the effects of other processes in the system.

The C runtime (e.g. glibc, msvcrt) shrinks the heap when contiguous free space at the top reaches a constant, dynamic, or configurable threshold. With glibc you can tune this with mallopt (M_TRIM_THRESHOLD). Given this, it isn't surprising if the heap shrinks by more -- even a lot more -- than the block that you free.

In 3.x range doesn't create a list, so the test above won't create 10 million int objects. Even if it did, the int type in 3.x is basically a 2.x long, which doesn't implement a freelist.

How to destroy Python objects and free up memory

Now, it could be that something in the 50,000th is very large, and that's causing the OOM, so to test this I'd first try:

file_list_chunks = list(divide_chunks(file_list_1,20000))[30000:]

If it fails at 10,000 this will confirm whether 20k is too big a chunksize, or if it fails at 50,000 again, there is an issue with the code...

Okay, onto the code...

Firstly, you don't need the explicit list constructor, it's much better in python to iterate rather than generate the entire the list into memory.

file_list_chunks = list(divide_chunks(file_list_1,20000))
# becomes
file_list_chunks = divide_chunks(file_list_1,20000)

I think you might be misusing ThreadPool here:

Prevents any more tasks from being submitted to the pool. Once all the tasks have been completed the worker processes will exit.

This reads like close might have some thinks still running, although I guess this is safe it feels a little un-pythonic, it's better to use the context manager for ThreadPool:

with ThreadPool(64) as pool: 
    results = pool.map(get_image_features,f)
    # etc.

The explicit dels in python aren't actually guaranteed to free memory.

You should collect after the join/after the with:

with ThreadPool(..):
    ...
    pool.join()
gc.collect()

You could also try chunk this into smaller pieces e.g. 10,000 or even smaller!

Hammer 1

One thing, I would consider doing here, instead of using pandas DataFrames and large lists is to use a SQL database, you can do this locally with sqlite3:

import sqlite3
conn = sqlite3.connect(':memory:', check_same_thread=False)  # or, use a file e.g. 'image-features.db'

and use context manager:

with conn:
    conn.execute('''CREATE TABLE images
                    (filename text, features text)''')

with conn:
    # Insert a row of data
    conn.execute("INSERT INTO images VALUES ('my-image.png','feature1,feature2')")

That way, we won't have to handle the large list objects or DataFrame.

You can pass the connection to each of the threads... you might have to something a little weird like:

results = pool.map(get_image_features, zip(itertools.repeat(conn), f))

Then, after the calculation is complete you can select all from the database, into which ever format you like. E.g. using read_sql.

Hammer 2

Use a subprocess here, rather than running this in the same instance of python "shell out" to another.

Since you can pass start and end to python as sys.args, you can slice these:

# main.py
# a for loop to iterate over this
subprocess.check_call(["python", "chunk.py", "0", "20000"])

# chunk.py a b
for count,f in enumerate(file_list_chunks):
    if count < int(sys.argv[1]) or count > int(sys.argv[2]):
         pass
    # do stuff

That way, the subprocess will properly clean up python (there's no way there'll be memory leaks, since the process will be terminated).

My bet is that Hammer 1 is the way to go, it feels like you're gluing up a lot of data, and reading it into python lists unnecessarily, and using sqlite3 (or some other database) completely avoids that.

Cannot free memory in a function

Short answer

Freeing memory should be handled in separate functions that destroy a specific object, one for (adjacency) lists and one for graphs (which calls the adjacency list destroying function). The adjacency list destructor should iterate over a list, freeing nodes as it visits them (note the nodes are freed using the destructor's own local variables, not the newNodeI variables in the graph constructor). The graph destructor would be called from check_graph. Note that this parallels how creation is handled in the code.

Longer answer

The program would greatly benefit from following some fundamental design principles. In particular, break up the functions into more basic operations, each of which performs a single task (akin to the Single Responsibility Principle from OOP). When considering the sub-tasks of a task, they should be at the same level and in the same domain (more on this later). Additionally, functions shouldn't be overlong. Repeated code is a candidate for abstraction into a separate function (Don't Repeat Yourself), as long as it is conceptually a single task. Though the program may not be explicitly object-oriented, some OO conventions can be usefully applied. Variable names should be descriptive.

Start thinking about function names. The sample has createGraph and check_graph, a mix of naming conventions. This isn't inherently wrong, but naming conventions should only be mixed when each convention is doing something different, and are in different parts of a program. One C convention for naming methods in an OO manner is to use DromedaryCase for class names and camelCase for method names (as is done in C++), and connect the two with an underscore (basically, snake case) (e.g. ClassName_methodName). Extending this, the underscore indicates going down in scope, so nested class methods would be named as: Outer_Inner_methodName. Alternatives include using camelCase for class names, or snake case for everything, or snake case but with a double underscore for scope (e.g. outer_class__inner_class__method_name). "Private" methods can be indicated with a leading underscore.

The check_graph function performs the following sub-tasks:

opens a file
causes edges to be read from file
causes a Graph object to be created
allocates space for a member field of a queue (queue.vertices)
traverses the graph breadth-first
examines queue members to determine when it's empty
destroys the queue member, edges, and Graph

This mixes different levels of tasks (e.g. causing a Graph object to be created (which happens in a different function) but destroying the object itself; creating a part of the queue) and domains (e.g. file I/O, memory management, and graph algorithms), resulting in multiple responsibilities. Reading objects from files should be handled by a component whose responsibility it is to bridge I/O and object creation. Destroying the graph object should be handled by a separate function, a counterpart to createGraph (or Graph_create, if you use the convention above). This in particular should resolve the issue in question. Queue manipulation should be farmed out to queue functions, encapsulating the operations and data.

The majority of the lines in check_graph are concerned with the breadth-first traversal of the graph. This could be the basis for a function that implements the BFS algorithm, taking a callback that's called for each vertex as it's visited. check_graph would then call the BFS function.

A sketch of a refactored version:

typedef void (*Graph_visitor_t)(Graph_t *graph, int iVertex, void *additional);

/**
 * Breadth-first traversal of a graph
 *
 * visit: a callback, invoked for each vertex when visited
 * pAdditional: additional data passed along to the `visit` function
 */
void Graph_bfs(Graph_t *graph, Graph_visitor_t visit, void *pAdditional) {
    // TODO: detect & handle memory errors
    bool *visited = calloc(sizeof(*visited), graph->nVertices);
    IntQueue_t *queue = IntQueue_create(graph->nVertices);
    
    visit(graph, 0, pAdditional);
    visited[0] = 1;
    IntQueue_push(queue, 0);
    while (! IntQueue_empty(queue)) {
        int current_vertex = IntQueue_pop(queue);
        /* much the same as the original `check_graph` (only 
         * add a call to `visit`)
         */
        // ...
    }

    IntQueue_destroy(queue);
    free(visited);
}

void _Graph_countVisited(Graph_t* graph, int iVertex, int *pnVisited) {
    ++(*pnVisited);
}

// Demonstrates how to use Graph_bfs (check_graph woudl be similar).
void Graph_isConnected(Graph_t *graph) {
    int nVisited = 0;
    Graph_bfs(graph, &_Graph_countVisited, &nVisited);

    return nVisited == graph->nVertices;
}

createGraph performs the following sub-tasks:

allocates the graph object & members
allocates the adjacency list nodes
traverses adjacency lists
adds nodes to adjacency lists

Again, some of these tasks are at different levels and should be farmed out (e.g. adjacency list manipulation). The code that manipulates the adjacency list within the loop is also repetitive, and is a great candidate for being moved to another function.

Many of the variable names (e.g. l, wxk, newNode3) aren't very descriptive, leading to some bugs. For example, in createGraph, graph->head is allocated to hold l entries, but wxk entries are accessed when initializing it (in this case, the better fix is to use calloc instead of manually initializing all entries to NULL). If these variables were name more descriptively, e.g. nVertices and nEdges (I'm guessing as to purpose), the bug would be more obvious and likely wouldn't have occurred in the first place.

void _Graph_addAdjacency(Graph_t *graph, int from, int to, double weight) {
    Node_t *newNode = List_Node_create(to, weight);
    if (graph->head[from] == NULL ) {
        graph->head[from] = newSrcNode;
    } else {
        List_append(graph->head[from], newSrcNode);
    }
}

void _Graph_addEdge(Graph_t *graph, Edge_t *edge) {
    _Graph_addAdjacency(graph, edges[i].src, edges[i].dest, edges[i].weight);
    _Graph_addAdjacency(graph, edges[i].dest, edges[i].src, edges[i].weight);
}

Graph_t* Graph_create(Edge_t edges[], int nEdges, int nVertices) {
    //// allocation
    // TODO: detect & handle memory errors
    Graph_t *graph = malloc(sizeof *graph);
    graph->head = calloc(sizeof *(graph->head), nVertices);
    graph->nVertices = nVertices;

    //// initialization
    // add edges to the directed graph one by one
    for (int i = 0; i < nEdges; i++) {
        // TODO: add error detection
        _Graph_addEdge(graph, edges[i]);
    }

    return graph;
}

Rounding out the example are functions to read the graph from the file (Graph_readFromPath) and to tie it all together (main in this example, though in a larger program it wouldn't be the main function).


Graph_t* Graph_readFromPath(const char *fName) {
    FILE *in = fopen(fName, "r");
    
    int nVertices = Count_readFromFile(in);
    int nEdges = Count_readFromFile(in);
    Edge_t *edges = Edges_readFromFile(in, nEdges);

    fclose(in);

    Graph_t* graph = Graph_create(Edge_t edges[], nEdges, nVertices);

    free(edges);
    return graph;
}

int main(int argc, char **argv) {
    if (argc < 2) {
        fprintf(stderr, "No input file given.");
        return 1;
    }
    const char *fName = argv[1];
    if (access(fname, R_OK)) {
        fprintf(stderr, "Error reading input file '%s': %s", fName, strerror(errno));
        return 1;
    }
    
    Graph_t *graph = Graph_readFromFile(fName);
    if (! Graph_isConnected(graph)) {
        // ...
    }
    Graph_destroy(graph);

    return 0;
}