All Pythons are slow, but some are faster than others

by Itamar Turner-Trauring
Last updated 03 May 2024, originally created 08 Feb 2021

Python is not the fastest language around, so any performance boost helps, especially if you’re running at scale. It turns out that depending where you install Python from, its performance can vary quite a bit: choosing the wrong version of Python can cut your speed by 10-20%.

Let’s look at some numbers.

Python 3.12 (June 2024)

I ran a quick experiment with Python 3.12 using the “official” Python Docker image (python:312) and Ubuntu 22.04 (ubuntu:22.04).

Python 3.12 build	Pystone (higher is better)
Docker `python:3.12`	670,000
Ubuntu 24.04	750,000

The Ubuntu 24.04 build is noticeably faster!

Older runs

In previous versions of Python the results varied. For Python 3.9, there were performance differences; for Python 3.10, I saw none.

Comparing builds of Python (February 2021)

I ran three benchmarks from the pyperformance suite on four different builds of Python 3.9 (code is here):

python:3.9-buster, the “official” Python Docker image.
Ubuntu 20.04, via the ubuntu:20.04 Docker image.
Anaconda Python on ubuntu:20.04.
Conda-Forge on ubuntu:20.04.

If you’re not familiar with Conda, it’s a packaging system which includes precompiled libraries of pretty much all libraries and executables (including Python), other than the standard C library. The company that created Conda, Anaconda, provides a package channel, and there is a community project called Conda-Forge that provides Python and thousands of other packages.

All of the benchmark runs were inside a Docker container, on Fedora 33, with an Intel Xeon CPU E3-1226 v3 @ 3.30GHz. Docker seems to have a surprisingly high performance hit on Fedora, but that ought to be equalized across runs of the same benchmarks, and I got similar results in previous benchmarking runs using Podman (which has lower overhead but mysteriously broke over the weekend).

Here are the results with mean and stddev; each run was done 10 times, but I also did multiple runs with similar results. Lower is better, since we’re measuring elapsed time. The Conda-Forge Python is fastest, followed by Ubuntu 20.04; Docker’s official Python image is the slowest.

Python build	2to3	django_template	unpickle_pure_python
Conda-Forge	491ms ± 3ms	78ms ± 0.8ms	514us ± 5us
Ubuntu 20.04	512ms ± 3ms	80ms ± 0.7ms	537us ± 7us
Anaconda	523ms ± 5ms	86ms ± 2.3ms	550us ± 3us
`python3.9-buster`	543ms ± 3ms	92ms ± 0.6ms	590us ± 12 us

Why the differences?

It turns out that compiling Python for maximum performance is actually quite tricky: it involves profiler-guided optimizations, where runs from real code are used to guide the compiler, and a variety of knobs you can tweak.

One knob is whether the core of the Python implementation is in a shared library, or in the python executable itself. The shared library version tends to be rather slower.

Python build	`python` links to `libpython.so`?
Conda-Forge	No
Ubuntu 20.04	No
Anaconda	No
`python:3.9-buster`	Yes

Fedora and RHEL also use the shared library version of Python. However they—and in the future, Python 3.10 by default—use the -fno-semantic-interposition option to speed things up. The “official” python Docker image doesn’t use this option yet. So that explains at least part of why the official Docker image is so much slower.

That isn’t enough to explain all these differences in performance, however, nor the differences between the other builds. Other options might include how and whether they do profiler-guided optimization, other compiler flags, compiler versions, glibc differences, and more.

My hope is that the various organization doing Python builds continue to learn from each other, and perhaps even set up a cross-build performance comparison. The baseline performance of Python could be a lot better than it is today in almost all installed environments.

Python 3.10 (May 2022)

Python 3.10 has different optimization settings, and I was able to test with newer releases of Debian (Bullseye has superceded Buster) and Ubuntu (22.04) with newer compilers. So I decided to rerun these tests with newer images. These results may not be comparable in absolute terms to 3.9 numbers; more useful is comparing them to each other. Again, lower is better:

Python build	2to3	django_template	unpickle_pure_python
Conda-Forge	479ms ± 35ms	55.6ms ± 1.0ms	367us ± 5us
Ubuntu 22.04	486ms ± 29ms	55.6ms ± 1.4ms	379us ± 5us
`python:3.10-bullseye`	471ms ± 32ms	55.4ms ± 1.7ms	358us ± 4us

With Python 3.10, I am not seeing any meaningful difference between the different Python builds.

Takeaways: be careful which Python you choose

I was surprised at how much performance variation there was between different builds of Python 3.9 and 3.12. Perhaps this was a benchmarking failure on my part, but they are configured differently. And there are many other builds I haven’t tested, like the deadsnakes PPA that provides additional Python versions for Ubuntu, the Python in RedHat Enterprise Linux, and more. The results also seem to vary over time, as different Linux distributions and packagers switch versions, compilers, and compiler versions. Your best bet: benchmark your own application.

Find performance and memory bottlenecks in your data processing code with the Sciagraph profiler

Slow-running jobs waste your time during development, impede your users, and increase your compute costs. Speed up your code and you’ll iterate faster, have happier users, and stick to your budget—but first you need to identify the cause of the problem.

Find performance bottlenecks and memory hogs in your data science Python jobs with the Sciagraph profiler. Profile in development and production, with multiprocessing support, on macOS and Linux, with built-in support for Jupyter notebooks.

Speed up your Python code and learn skills you can use at your job

Join over 7600 Python developers and data scientists learning practical tools and techniques every week, from Python performance to Docker packaging, by signing up for my newsletter.