Python and Multi-threading

It has been a few days since Python 2.6 has been out, and the word on the street is that it’s meant to ease the transition into Python 3k. Python3k is not backwards compatible to the 2.X releases. I haven’t had much time on my hands to get down and dirty with the new 2.6 release, but have had some time to read up on it.

Most people know that Python does have a threading API that is pretty darn close to Java’s. However, the way it has been implemented is that all threads need to grab hold of the Global Interpreter Lockto ensure that only one thread at any one time can execute within the Python VM. This is to ensure that all threads have the same “view” of all variables. Apparently they tried to avoid this by making the Python VM thread safe, but it did get a terrible performance hit.

Java tends to get around this by having a rather complex memory model within the Java VM where each thread has it’s own virtual memory. That’s why you have to synchronize various sections of your code to ensure that threads see the same variable states. I highly recommend reading up Doug Lea’s article on synchronization and the Java Memory Model for anyone who wants to do very intensive multi-threaded applications in Java.

So, what are the implications of having to grab hold of the Global Interpreter Lock in Python ? The problem is that it is not TRUE multi threading. You, as the programmer and designer (you DO design your solutions first, right?), will have to  plans on when threads should just go to sleep and allow other threads to run. The VM will not do this for you, and one might say that it really is closer to a single threaded VM. From past experience, I’ve found Python’s Threads to be really useful when I’m making blocking calls (for e.g. grabbing a DB connection, blocking APIs (yuck!)), and can do something else in the background while the main thread is sleeping. You could get around this problem by using sub-processes, but there was no easy way to do it, and you had to add a lot of boiler plate code every single time. There was just no support for a clean true multi-threaded interface out of a standard installation.

Now, in Python 2.6, there’s a new package for creating sub-processes called multiprocessing. After a quick glance, it looks very similar to the threading API, BUT instead of running threads, it creates a child process which has it’s own memory and in turn does not need to share it’s Global Interpreter Lock. My own prediction is that it comes at a cost of creating a new process and memory space efficiency. However, you do end up with a TRUE multi-threaded application that really uses all the available processor cores on a multi-core CPU. Considering that RAM is getting cheaper, and processors getting more cores built into them, I think this is a fair trade off.

As always, and this applies to Java as well – writing a true multi-threaded application is not trivial, and always do your homework before you get started! In the past, I always had to fallback to Java for the more intensive applications that I wrote because I always thought creating sub-processes in Python was too tedious. From now on, I have no excuses! The new package in Python 2.6 looks very neat and removes the need to write tons of boiler plate.