Sharing a complex object between Python processes?

I have a fairly complex Python object that I need to share between multiple processes. I launch these processes using multiprocessing.Process . When I share an object with multiprocessing.Queue and multiprocessing.Pipe in it, they are shared just fine. But when I try to share an object with other non-multiprocessing-module objects, it seems like Python forks these objects. Is that true?

I tried using multiprocessing.Value. But I'm not sure what the type should be? My object class is called MyClass. But when I try multiprocess.Value(MyClass, instance) , it fails with:

TypeError: this type has no size

Any idea what's going on?


You can do this using Python's Multiprocessing "Manager" classes and a proxy class that you define. From the Python docs: http://docs.python.org/library/multiprocessing.html#proxy-objects

What you want to do is define a proxy class for your custom object, and then share the object using a "Remote Manager" -- look at the examples in the same linked doc page for "remote manager" where the docs show how to share a remote queue. You're going to be doing the same thing, but your call to your_manager_instance.register() will include your custom proxy class in its argument list.

In this manner, you're setting up a server to share the custom object with a custom proxy. Your clients need access to the server (again, see the excellent documentation examples of how to setup client/server access to a remote queue, but instead of sharing a queue, you are sharing access to your specific class).


After a lot research and testing, I found "Manager" do this job in a non-complex object level.

The code below shows that object inst is shared between processes, which means property var of inst is changed outside when child process changes it.

from multiprocessing import Process, Manager
from multiprocessing.managers import BaseManager

class SimpleClass(object):
    def __init__(self):
        self.var = 0

    def set(self, value):
        self.var = value

    def get(self):
        return self.var


def change_obj_value(obj):
    obj.set(100)


if __name__ == '__main__':
    BaseManager.register('SimpleClass', SimpleClass)
    manager = BaseManager()
    manager.start()
    inst = manager.SimpleClass()

    p = Process(target=change_obj_value, args=[inst])
    p.start()
    p.join()

    print inst                    # <__main__.SimpleClass object at 0x10cf82350>
    print inst.get()              # 100

Okay, above code is enough if you only need to share simple objects .

Why no complex? Because it may fail if your object is nested (object inside object):

from multiprocessing import Process, Manager
from multiprocessing.managers import BaseManager

class GetSetter(object):
    def __init__(self):
        self.var = None

    def set(self, value):
        self.var = value

    def get(self):
        return self.var


class ChildClass(GetSetter):
    pass

class ParentClass(GetSetter):
    def __init__(self):
        self.child = ChildClass()
        GetSetter.__init__(self)

    def getChild(self):
        return self.child


def change_obj_value(obj):
    obj.set(100)
    obj.getChild().set(100)


if __name__ == '__main__':
    BaseManager.register('ParentClass', ParentClass)
    manager = BaseManager()
    manager.start()
    inst2 = manager.ParentClass()

    p2 = Process(target=change_obj_value, args=[inst2])
    p2.start()
    p2.join()

    print inst2                    # <__main__.ParentClass object at 0x10cf82350>
    print inst2.getChild()         # <__main__.ChildClass object at 0x10cf6dc50>
    print inst2.get()              # 100
    #good!

    print inst2.getChild().get()   # None
    #bad! you need to register child class too but there's almost no way to do it
    #even if you did register child class, you may get PicklingError :)

I think the main reason of this behavior is because Manager is just a candybar build on top of low-level communication tools like pipe/queue.

So, this approach is not well recommended for multiprocessing case. It's always better if you can use low-level tools like lock/semaphore/pipe/queue or high-level tools like Redis queue or Redis publish/subscribe for complicated use case (only my recommendation lol).


here's a python package I made just for that (sharing complex objects between processes).

git: https://github.com/dRoje/pipe-proxy

The idea is you create a proxy for your object and pass it to a process. Then you use the proxy like you have a reference to the original object. Although you can only use method calls, so accessing object variables is done threw setters and getters.

Say we have an object called 'example', creating proxy and proxy listener is easy:

from pipeproxy import proxy 
example = Example() 
exampleProxy, exampleProxyListener = proxy.createProxy(example) 

Now you send the proxy to another process.

p = Process(target=someMethod, args=(exampleProxy,)) p.start()

Use it in the other process as you would use the original object (example):

def someMethod(exampleProxy):
    ...
    exampleProxy.originalExampleMethod()
    ...

But you do have to listen to it in the main process:

exampleProxyListener.listen()

Read more and find examples here:

http://matkodjipalo.com/index.php/2017/11/12/proxy-solution-python-multiprocessing/

链接地址: http://www.djcxy.com/p/46468.html

上一篇: 多处理模块和pyro的比较?

下一篇: 在Python进程之间共享一个复杂的对象?