Possible to Share In-Memory Data Between 2 Separate Processes

Possible to share in-memory data between 2 separate processes?

Without some deep and dark rewriting of the Python core runtime (to allow forcing of an allocator that uses a given segment of shared memory and ensures compatible addresses between disparate processes) there is no way to "share objects in memory" in any general sense. That list will hold a million addresses of tuples, each tuple made up of addresses of all of its items, and each of these addresses will have be assigned by pymalloc in a way that inevitably varies among processes and spreads all over the heap.

On just about every system except Windows, it's possible to spawn a subprocess that has essentially read-only access to objects in the parent process's space... as long as the parent process doesn't alter those objects, either. That's obtained with a call to os.fork(), that in practice "snapshots" all of the memory space of the current process and starts another simultaneous process on the copy/snapshot. On all modern operating systems, this is actually very fast thanks to a "copy on write" approach: the pages of virtual memory that are not altered by either process after the fork are not really copied (access to the same pages is instead shared); as soon as either process modifies any bit in a previously shared page, poof, that page is copied, and the page table modified, so the modifying process now has its own copy while the other process still sees the original one.

This extremely limited form of sharing can still be a lifesaver in some cases (although it's extremely limited: remember for example that adding a reference to a shared object counts as "altering" that object, due to reference counts, and so will force a page copy!)... except on Windows, of course, where it's not available. With this single exception (which I don't think will cover your use case), sharing of object graphs that include references/pointers to other objects is basically unfeasible -- and just about any objects set of interest in modern languages (including Python) falls under this classification.

In extreme (but sufficiently simple) cases one can obtain sharing by renouncing the native memory representation of such object graphs. For example, a list of a million tuples each with sixteen floats could actually be represented as a single block of 128 MB of shared memory -- all the 16M floats in double-precision IEEE representation laid end to end -- with a little shim on top to "make it look like" you're addressing things in the normal way (and, of course, the not-so-little-after-all shim would also have to take care of the extremely hairy inter-process synchronization problems that are certain to arise;-). It only gets hairier and more complicated from there.

Modern approaches to concurrency are more and more disdaining shared-anything approaches in favor of shared-nothing ones, where tasks communicate by message passing (even in multi-core systems using threading and shared address spaces, the synchronization issues and the performance hits the HW incurs in terms of caching, pipeline stalls, etc, when large areas of memory are actively modified by multiple cores at once, are pushing people away).

For example, the multiprocessing module in Python's standard library relies mostly on pickling and sending objects back and forth, not on sharing memory (surely not in a R/W way!-).

I realize this is not welcome news to the OP, but if he does need to put multiple processors to work, he'd better think in terms of having anything they must share reside in places where they can be accessed and modified by message passing -- a database, a memcache cluster, a dedicated process that does nothing but keep those data in memory and send and receive them on request, and other such message-passing-centric architectures.

Shared memory between python processes

From Python 3.8 and onwards you can use multiprocessing.shared_memory.SharedMemory

Sharing a complex python object in memory between separate processes

For complex objects there isn't readily available method to directly share memory between processes. If you have simple ctypes you can do this in a c-style shared memory but it won't map directly to python objects.

There is a simple solution that works well if you only need a portion of your data at any one time, not the entire 36GB. For this you can use a SyncManager from multiprocessing.managers. Using this, you setup a server that serves up a proxy class for your data (your data isn't stored in the class, the proxy only provides access to it). Your client then attaches to the server using a BaseManager and calls methods in the proxy class to retrieve the data.

Behind the scenes the Manager classes take care of pickling the data you ask for and sending it through the open port from server to client. Because you're pickling data with every call this isn't efficient if you need your entire dataset. In the case where you only need a small portion of the data in the client, the method saves a lot of time since the data only needs to be loaded once by the server.

The solution is comparable to a database solution speed-wise but it can save you a lot of complexity and DB-learning if you'd prefer to keep to a purely pythonic solution.

Here's some example code that is meant to work with GloVe word vectors.

Server

#!/usr/bin/python
import  sys
from    multiprocessing.managers import SyncManager
import  numpy

# Global for storing the data to be served
gVectors = {}

# Proxy class to be shared with different processes
# Don't but the big vector data in here since that will force it to 
# be piped to the other process when instantiated there, instead just
# return the global vector data, from this process, when requested.
class GloVeProxy(object):
    def __init__(self):
        pass

    def getNVectors(self):
        global gVectors
        return len(gVectors)

    def getEmpty(self):
        global gVectors
        return numpy.zeros_like(gVectors.values()[0])

    def getVector(self, word, default=None):
        global gVectors
        return gVectors.get(word, default)

# Class to encapsulate the server functionality
class GloVeServer(object):
    def __init__(self, port, fname):
        self.port = port
        self.load(fname)

    # Load the vectors into gVectors (global)
    @staticmethod
    def load(filename):
        global gVectors
        f = open(filename, 'r')
        for line in f:
            vals = line.rstrip().split(' ')
            gVectors[vals[0]] = numpy.array(vals[1:]).astype('float32')

    # Run the server
    def run(self):
        class myManager(SyncManager): pass  
        myManager.register('GloVeProxy', GloVeProxy)
        mgr = myManager(address=('', self.port), authkey='GloVeProxy01')
        server = mgr.get_server()
        server.serve_forever()

if __name__ == '__main__':
    port  = 5010
    fname = '/mnt/raid/Data/Misc/GloVe/WikiGiga/glove.6B.50d.txt'

    print 'Loading vector data'
    gs = GloVeServer(port, fname)

    print 'Serving data. Press <ctrl>-c to stop.'
    gs.run()

Client

from   multiprocessing.managers import BaseManager
import psutil   #3rd party module for process info (not strictly required)

# Grab the shared proxy class.  All methods in that class will be availble here
class GloVeClient(object):
    def __init__(self, port):
        assert self._checkForProcess('GloVeServer.py'), 'Must have GloVeServer running'
        class myManager(BaseManager): pass
        myManager.register('GloVeProxy')
        self.mgr = myManager(address=('localhost', port), authkey='GloVeProxy01')
        self.mgr.connect()
        self.glove = self.mgr.GloVeProxy()

    # Return the instance of the proxy class
    @staticmethod
    def getGloVe(port):
        return GloVeClient(port).glove

    # Verify the server is running
    @staticmethod
    def _checkForProcess(name):
        for proc in psutil.process_iter():
            if proc.name() == name:
                return True
        return False

if __name__ == '__main__':
    port = 5010
    glove = GloVeClient.getGloVe(port)

    for word in ['test', 'cat', '123456']:
        print('%s = %s' % (word, glove.getVector(word)))

Note that the psutil library is just used to check to see if you have the server running, it's not required. Be sure to name the server GloVeServer.py or change the check by psutil in the code so it looks for the correct name.

Process VS thread : can two processes share the same shared memory ? can two threads ?

can two processes share the same shared memory segment?

Yes and no. Typically with modern operating systems, when another process is forked from the first, they share the same memory space with a copy-on-write set on all pages. Any updates made to any of the read-write memory pages causes a copy to be made for the page so there will be two copies and the memory page will no longer be shared between the parent and child process. This means that only read-only pages or pages that have not been written to will be shared.

If a process has not been forked from another then they typically do not share any memory. One exception is if you are running two instances of the same program then they may share code and maybe even static data segments but no other pages will be shared. Another is how some operating systems allow applications to share the code pages for dynamic libraries that are loaded by multiple applications.

There are also specific memory-map calls to share the same memory segment. The call designates whether the map is read-only or read-write. How to do this is very OS dependent.

can two threads share the same shared memory?

Certainly. Typically all of the memory inside of a multi-threaded process is "shared" by all of the threads except for some relatively small stack spaces which are per-thread. That is usually the definition of threads in that they all are running within the same memory space.

Threads also have the added complexity of having cached memory segments in high speed memory tied to the processor/core. This cached memory is not shared and updates to memory pages are flushed into central storage depending on synchronization operations.

Sharing shared object between multiple processes

Shared objects are loaded via mmap() with the MAP_PRIVATE flag. This means that these are copy-on-write mappings, they initially point to the same memory, but once any of them is modified, it is copied and "unshared" before the modification.

Sharing memory between two processes (C, Windows)

You can try a memory-mapped file.

This gives a bit more step-by-step detail.

Possible to Share In-Memory Data Between 2 Separate Processes