Speed up your Python: Unladen vs. Shedskin vs. PyPy vs. Cython vs. C

Lately I’ve found that prototyping code in a higher-level language like Python is more enjoyable, readable, and time-efficient than directly coding in C (against my usual instinct when coding something that I think needs to go fast). Mostly this is because of the simplicity in syntax of the higher level language as well as the fact that I’m not caught up in mundane aspects of making my code more optimized/efficient. That being said, it is still desirable to make portions of the code run efficiently and so creating a C/ctypes module is called for.

I recently created an application (I won’t go into the details now) that had a portion of it that could be significantly sped up if compiled as a C module. This spawned a whole exploration into speeding up my Python code (ideally while making minimal modifications to it).

I created a C module directly, used the Shedskin compiler to compile my Python code into C++, and tried the JIT solutions PyPy and Unladen Swallow. The time results for running the first few iteration for this application were surprising to me:

cpython: 59.174s
shedskin: 1m18.428s
c-stl: 12.515s
pypy: 10.316s
unladen: 44.050s
cython: 39.824

While this is not an exhaustive test, PyPy consistently beats a handwritten module using C++ and STL! Moreover, PyPy required little modification to my source (itertools had some issues) [1]. I’m surprised that Uladen and Shedskin took so long (all code was compiled at O3 optimization and run on multiple systems to make sure the performance numbers were relatively consistent).

Apparently out-of-the-box solutions these days can offer nearly a 10x improvement over default Python for a particular app. and I wonder what aspects of PyPy’s system accounts for this large performance improvement (their JIT implementation?).

[1] Uladen required no modifications to my program to run properly and Shedskin required quite a few to get going. Of course, creating a C-based version took a moment :-).

Update 1: Thanks for the comments below. I added Cython, re-ran the analysis, and emailed off the source to those who were interested.

Update 2: The main meat of the code is a nested for loop that does string slicing and comparisons and it turns out that it’s in the slicing and comparisons that was the bottleneck for Shedskin. The new numbers are below with a faster matching function for all tests (note that this kind of addition requires call ‘code twiddling’, where we find ourselves fiddling with a very straightforward, readable set of statements to gain efficiency).

cpython:       59.593s

shedskin0.6:   8.602s

shedskin0.7:   3.332s

c-stl:              1.423s

pypy:             8.947s

unladen:       29.163s

cython:         26.486s (3.5s after adding a few types)

So C comes out the winner here, but Shedskin and Cython are quite competitive. PyPy’s JIT performance is impressive and I’ve been scrolling through some of the blog entries on their website to learn more about why this could be. Thanks to Mark (Shedskin) and Maciej (PyPy) for their comments in general and and to Mark for profling the various Shedskin versions himself and providing a matching function. It would be interesting to see if the developers of Unladen and Cython have some suggestions for improvement.

I also think it’s important not to look at this comparison as a ‘bake-off’ to see which one is better. PyPy is doing some very different things than Shedskin, for example. Which one you use at this point will likely be highly dependent on the application and your urge to create more optimized code. I think in general hand-writing C code and code-twiddling it will almost always get faster results, but this comes at the cost of time and headache. In the meanwhile, the folks behind these tools are making it more feasible to take our Python code and optimize it right out of the box.

Update 3: I also added (per request below :-)) just a few basic ‘cdef’s and types to my Cython version. It does a lot better, getting about 3.5s on average per run!

21 comments

fijal · November 25, 2010

Hey.

Can I find source code to reproduce those results?

Cheers,
fijal

mark dufour · November 25, 2010

hi,

I’d like to know why shedskin performs so badly here. if you don’t wish to publish your source code, could you perhaps send it to me in private?

thanks!

Geet · November 25, 2010

Hi Mark and Fijal — thanks for the quick responses!

I’ll try to comment-up the source and send it to you both in the next couple of days. (I do not wish to publish the source because it’s a portion of a class lab that I simply wanted to speed up :-))

I’ll also send a README with exactly what I did for each test. Again, they weren’t exhaustive and not meant to be a ‘show-down’. I just wanted to see what worked fastest for me.

Any corrections/comments to my procedures would be greatly appreciated and I’ll update my entry accordingly.

Nelle · November 25, 2010

Hello,

I would also be interested in seeing the code. Would it be possible for you to send it to me as well ?

Many thanks

Mike R · November 25, 2010

You’d mentioned itertools had issues with PyPy. Can you elaborate?

I’ve been trying to find less time-intensive ways of speeding up code than rewriting the slow parts in C. It sounds like PyPy might be a good starting point, but itertools makes it into virtually all the complicated code.

joaquin · November 25, 2010

Why didnt try Cython?

bryan · November 25, 2010

Adding Cython to your list would be interesting.

Geet · November 25, 2010

Hi Mike — my only issue with itertools was having to re-code ‘product’ as nested for loops. Joaquin and Bryan — I added Cython in as well. Thanks for the comments!

gregor · November 26, 2010

Hello. I would like to dig in deeper in that subject, could you some source code, that I could start with?
I’m want to check, why unladen is so slow.
Thanks.

Carl Friedrich Bolz · November 26, 2010

Just wanted to add that in the meantime, PyPy has added better itertools support (which took about 20 minutes 🙂 ). Not sure about product though.

Geet · November 26, 2010

Carl — nice to hear that itertools support is improved.
Gregor — I sent you a version of the test code so you can check out Unladen

mark dufour · November 27, 2010

note that shedskin can probably generate even faster code when you use -bw flags (avoid checking for index-out-of-bounds or wrap-around). this reduces the runtime by about 20% here.

Rob · November 28, 2010

Are there plans in PyPy to support the multiprocessing module? I’m looking to try it out for one of my projects, but the lack of multi-threading support makes it a no-go for the time being.

mark dufour · November 28, 2010

do you really need threads, or would processes work too..?

Pingback: Python Hatchlings part 0 | RoBlog

Robert · November 30, 2010

Did you annotate any types in the Cython code? This is how one typically gets speeds close to that of pure C. I’d love to see how fast I could get it going with Cython.

Geet · November 30, 2010

Hi Robert — thanks for the comment! I have not done any type annotation. I just tried out some very basic type annotations and I get running times around 3.5 seconds — does make a big difference! (I’ll update the post sometime today)

Francisco Costa · February 28, 2011

Hi, can you send me that source code?
thanks!

Geet · February 28, 2011

Sure, I’ll send it to your email

louis · June 24, 2011

Geet,
I have been searching high and low for such a comparison as I am moving from matlab. Another request for source please.

dr.benton · August 23, 2011

Hi Geet —

I’m trying to get my head around how to incorporate pypy into standard python to achieve these kind of speed-ups.

Could i ask you to please pass me your pypy source code, and maybe also the appropriate “import” statements (or equiv) to fold the pypy bits into vanilla python?

And any links that you found helpful or could recommend would be very greatly appreciated. I am seriously struggling to wade through what documentation there is/i can find!

fijal · November 25, 2010

Hey.

Can I find source code to reproduce those results?

Cheers,
fijal
mark dufour · November 25, 2010

hi,

I’d like to know why shedskin performs so badly here. if you don’t wish to publish your source code, could you perhaps send it to me in private?

thanks!
Geet · November 25, 2010

Hi Mark and Fijal — thanks for the quick responses!

I’ll try to comment-up the source and send it to you both in the next couple of days. (I do not wish to publish the source because it’s a portion of a class lab that I simply wanted to speed up :-))

I’ll also send a README with exactly what I did for each test. Again, they weren’t exhaustive and not meant to be a ‘show-down’. I just wanted to see what worked fastest for me.

Any corrections/comments to my procedures would be greatly appreciated and I’ll update my entry accordingly.
Nelle · November 25, 2010

Hello,

I would also be interested in seeing the code. Would it be possible for you to send it to me as well ?

Many thanks
Mike R · November 25, 2010

You’d mentioned itertools had issues with PyPy. Can you elaborate?

I’ve been trying to find less time-intensive ways of speeding up code than rewriting the slow parts in C. It sounds like PyPy might be a good starting point, but itertools makes it into virtually all the complicated code.
joaquin · November 25, 2010

Why didnt try Cython?
bryan · November 25, 2010

Adding Cython to your list would be interesting.
Geet · November 25, 2010

Hi Mike — my only issue with itertools was having to re-code ‘product’ as nested for loops. Joaquin and Bryan — I added Cython in as well. Thanks for the comments!
gregor · November 26, 2010

Hello. I would like to dig in deeper in that subject, could you some source code, that I could start with?
I’m want to check, why unladen is so slow.
Thanks.
Carl Friedrich Bolz · November 26, 2010

Just wanted to add that in the meantime, PyPy has added better itertools support (which took about 20 minutes 🙂 ). Not sure about product though.
Geet · November 26, 2010

Carl — nice to hear that itertools support is improved.
Gregor — I sent you a version of the test code so you can check out Unladen
mark dufour · November 27, 2010

note that shedskin can probably generate even faster code when you use -bw flags (avoid checking for index-out-of-bounds or wrap-around). this reduces the runtime by about 20% here.
Rob · November 28, 2010

Are there plans in PyPy to support the multiprocessing module? I’m looking to try it out for one of my projects, but the lack of multi-threading support makes it a no-go for the time being.
mark dufour · November 28, 2010

do you really need threads, or would processes work too..?
Pingback: Python Hatchlings part 0 | RoBlog
Robert · November 30, 2010

Did you annotate any types in the Cython code? This is how one typically gets speeds close to that of pure C. I’d love to see how fast I could get it going with Cython.
Geet · November 30, 2010

Hi Robert — thanks for the comment! I have not done any type annotation. I just tried out some very basic type annotations and I get running times around 3.5 seconds — does make a big difference! (I’ll update the post sometime today)
Francisco Costa · February 28, 2011

Hi, can you send me that source code?
thanks!
Geet · February 28, 2011

Sure, I’ll send it to your email
louis · June 24, 2011

Geet,
I have been searching high and low for such a comparison as I am moving from matlab. Another request for source please.
dr.benton · August 23, 2011

Hi Geet —

I’m trying to get my head around how to incorporate pypy into standard python to achieve these kind of speed-ups.

Could i ask you to please pass me your pypy source code, and maybe also the appropriate “import” statements (or equiv) to fold the pypy bits into vanilla python?

And any links that you found helpful or could recommend would be very greatly appreciated. I am seriously struggling to wade through what documentation there is/i can find!

	Plot histogram with… on Grabbing individual colors fro…
	Plot histogram with… on Grabbing individual colors fro…
	Max Blome on Histograms and density es…
	Python:Plot histogra… on Grabbing individual colors fro…
	Tim Taylor on Consider the Nimrod Programmin…
	Mike on Grabbing individual colors fro…
	ksienrzyc on Consider the Nimrod Programmin…
	deminthon on Consider the Nimrod Programmin…
	Geet on Consider the Nimrod Programmin…
	deminthon on Consider the Nimrod Programmin…
	Geet on Consider the Nimrod Programmin…
	Zoltán on Consider the Nimrod Programmin…
	Geet on Getting Closer to a Star Trek…
	Geet on Consider the Nimrod Programmin…
	rezan on Consider the Nimrod Programmin…

Geet Duggal's Blog

Speed up your Python: Unladen vs. Shedskin vs. PyPy vs. Cython vs. C

21 comments

Leave a comment

Share this:

Related

21 comments

Leave a comment