Why Do I have to worry about Thread Safety in CPython?

From what I understand, the Global Interpreter Lock allows only a single thread to access the interpreter and execute bytecode. If that's the case, then at any given time, only a single thread will be using the interpreter and its memory.

With that I believe that it is fair to exclude the possibility of having race cases, since no two threads can access the interpreter's memory at the same time, yet I still see warnings about making sure data structures are "thread safe". There is a possibility that it may be covering all implementations of the python interpreter (like cython) which can switch off the GIL and allow true multi threading.

I understand the importance of thread safety in interpreter environments that do not have the GIL enabled. However, for CPython, why is thread safety encouraged when writing multi threaded python code? What is the worse that can happen in the CPython environment?


Of course race conditions can still take place, because access to datastructures is not atomic.

Say you test for a key being present in a dictionary, then do something to add the key:

if key not in dictionary:
    # calculate new value
    value = elaborate_calculation()
    dictionary[key] = value

The thread can be switched at any point after the not in test has returned true, and another thread will also come to the conclusion that the key isn't there. Now two threads are doing the calculation, and you don't know which one will win.

All that the GIL does is protect Python's internal interpreter state. This doesn't mean that data structures used by Python code itself are now locked and protected.


An important note: the multiprocessing module in Python is synchonous to some degree despite the GIL, in that access to the same variable can occur across different processes simultaneously.

This has a likelyhood of corrupting your data, or at least disrupting your control flow, which would be why thread safety is reccomended.

As to why it happens, despite there only being one interpriter, there isn't anything stopping (at least as far as I can tell) two preinterprited pieces of code accessing the same parts of the shared memory synchonously. When doing say:

import multiprocessing
def my_func ():
    print("hello world")
my_process=multiprocessing.Process (target=my_func, args=(,))
my_process.start ()
my_process.join ()

My understanding is that the time it takes to interprit (in this case) my_func was buried in the overhead it takes to spawn a new process.

In this case, the term "process" is more suitable here, because there are worker threads that are temporarily spawned just to copy data, so there's some data handshaking doing on, so it's actually quite a bit of a different process (pun intended) than the spawning of a traditional thread.

I hope this helps.

链接地址: http://www.djcxy.com/p/15170.html

上一篇: malloc可以分配的最大内存

下一篇: 为什么我必须担心CPython中的线程安全性?