Prevent TextIOWrapper from closing on GC in a Py2/Py3 compatible way

What I need to accomplish:

Given a binary file, decode it in a couple different ways providing a TextIOBase API. Ideally these subsequent files can get passed on without my needing to keep track of their lifespan explicitly.

Unfortunately, wrapping a BufferedReader will result in that reader being closed when the TextIOWrapper goes out of scope.

Here is a simple demo of this:

In [1]: import io

In [2]: def mangle(x):
   ...:     io.TextIOWrapper(x) # Will get GCed causing __del__ to call close
   ...:     

In [3]: f = io.open('example', mode='rb')

In [4]: f.closed
Out[4]: False

In [5]: mangle(f)

In [6]: f.closed
Out[6]: True

I can fix this in Python 3 by overriding __del__ (this is a reasonable solution for my use case as I have complete control over the decoding process, I just need to expose a very uniform API at the end):

In [1]: import io

In [2]: class MyTextIOWrapper(io.TextIOWrapper):
   ...:     def __del__(self):
   ...:         print("I've been GC'ed")
   ...:         

In [3]: def mangle2(x):
   ...:     MyTextIOWrapper(x)
   ...:     

In [4]: f2 = io.open('example', mode='rb')

In [5]: f2.closed
Out[5]: False

In [6]: mangle2(f2)
I've been GC'ed

In [7]: f2.closed
Out[7]: False

However this does not work in Python 2:

In [7]: class MyTextIOWrapper(io.TextIOWrapper):
   ...:     def __del__(self):
   ...:         print("I've been GC'ed")
   ...:         

In [8]: def mangle2(x):
   ...:     MyTextIOWrapper(x)
   ...:     

In [9]: f2 = io.open('example', mode='rb')

In [10]: f2.closed
Out[10]: False

In [11]: mangle2(f2)
I've been GC'ed

In [12]: f2.closed
Out[12]: True

I've spent a bit of time staring at the Python source code and it looks remarkably similar between 2.7 and 3.4 so I don't understand why the __del__ inherited from IOBase is not overridable in Python 2 (or even visible in dir ), but still seems to get executed. Python 3 works exactly as expected.

Is there anything I can do?


It turns out there is basically nothing that can be done about the deconstructor calling close in Python 2.7. This is hardcoded into the C code. Instead we can modify close such that it won't close the buffer when __del__ is happening ( __del__ will be executed before _PyIOBase_finalize in the C code giving us a chance to change the behaviour of close ). This lets close work as expected without letting the GC close the buffer.

class SaneTextIOWrapper(io.TextIOWrapper):
    def __init__(self, *args, **kwargs):
        self._should_close_buffer = True
        super(SaneTextIOWrapper, self).__init__(*args, **kwargs)

    def __del__(self):
        # Accept the inevitability of the buffer being closed by the destructor
        # because of this line in Python 2.7:
        # https://github.com/python/cpython/blob/2.7/Modules/_io/iobase.c#L221
        self._should_close_buffer = False
        self.close()  # Actually close for Python 3 because it is an override.
                      # We can't call super because Python 2 doesn't actually
                      # have a `__del__` method for IOBase (hence this
                      # workaround). Close is idempotent so it won't matter
                      # that Python 2 will end up calling this twice

    def close(self):
        # We can't stop Python 2.7 from calling close in the deconstructor
        # so instead we can prevent the buffer from being closed with a flag.

        # Based on:
        # https://github.com/python/cpython/blob/2.7/Lib/_pyio.py#L1586
        # https://github.com/python/cpython/blob/3.4/Lib/_pyio.py#L1615
        if self.buffer is not None and not self.closed:
            try:
                self.flush()
            finally:
                if self._should_close_buffer:
                    self.buffer.close()

My previous solution here used _pyio.TextIOWrapper which is slower than the above because it is written in Python, not C.

It involved simply overriding __del__ with a noop which will also work in Py2/3.


A simple solution would be to return the variable from the function and store it in script scope, so that it does not get garbage collected until the script ends or the reference to it changes. But there may be other elegant solutions out there.


EDIT:

I found a much better solution (comparatively), but I will leave this answer in the event it is useful for anyone to learn from. (It is a pretty easy way to show off gc.garbage )

Please do not actually use what follows.

OLD:

I found a potential solution, though it is horrible:

What we can do is set up a cyclic reference in the destructor, which will hold off the GC event. We can then look at the garbage of gc to find these unreferenceable objects, break the cycle, and drop that reference.

In [1]: import io

In [2]: class MyTextIOWrapper(io.TextIOWrapper):
   ...:     def __del__(self):
   ...:         if not hasattr(self, '_cycle'):
   ...:             print "holding off GC"
   ...:             self._cycle = self
   ...:         else:
   ...:             print "getting GCed!"
   ...:

In [3]: def mangle(x):
   ...:     MyTextIOWrapper(x)
   ...:     

In [4]: f = io.open('example', mode='rb')

In [5]: mangle(f)
holding off GC

In [6]: f.closed
Out[6]: False

In [7]: import gc

In [8]: gc.garbage
Out[8]: []

In [9]: gc.collect()
Out[9]: 34

In [10]: gc.garbage
Out[10]: [<_io.TextIOWrapper name='example' encoding='UTF-8'>]

In [11]: gc.garbage[0]._cycle=False

In [12]: del gc.garbage[0]
getting GCed!

In [13]: f.closed
Out[13]: True

Truthfully this is a pretty horrific workaround, but it could be transparent to the API I am delivering. Still I would prefer a way to override the __del__ of IOBase .

链接地址: http://www.djcxy.com/p/86012.html

上一篇: 浅拷贝是否真的需要?

下一篇: 防止TextIOWrapper以兼容Py2 / Py3的方式关闭GC