Speed Up Python with PyPy

python
code
Published

February 22, 2024

As an Analyst, you’re likely familiar with the traditional python development cycle:

  1. Compose some code.
  2. Run the programmed script.
  3. Review the outputs (via print statements, logs, etc.)
  4. Refine and optimize.

But when your execution time is sluggish, this cycle becomes a frustratingly slow process.

There’s a widespread myth that Python, by default, is slow, and little can be done to change that. But I’m here to dispel that myth.

The truth is, Python’s speed can be boosted by orders of magnitude without much difficulty. The key lies in understanding how to leverage the right tools and techniques, such as PyPy, to streamline your Python development process. Why PyPy, you ask? Well, because at times, a simple shift in the manner you execute scripts can significantly speed up the process.

The best part is, there’s no need to spend valuable time rewriting your code. It’s essentially a free lunch.

You start by writing your Python code, which is then compiled into an internal format known as “Python bytecode”. Each bytecode statement is subsequently translated into a language your specific machine can comprehend, one statement at a time.

Code
from IPython.display import display, Image as dImage
from PIL import Image
import scipy
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
Code
path="../resources/images/a5714f29-c749-419b-a73b-d48dae184a56.png"
display(Image.open(path))

And yes, when your code involves a loop the interpretation/translation process leads to significant repetition.

That’s where PyPy and its Just-In-Time (JIT) compiler come into play. Consider PyPy as a more advanced version of a bytecode executor. It’s like an observant supervisor monitoring the process. When it notices a fragment of code undergoing repeated translation, it leaps into action to optimize it.

How does it do this? PyPy employs tactics such as type placement and compilation to prevent unnecessary repetition, hence speeding up the execution process.

In essence, PyPy is a powerful tool that can help you eliminate redundancy and optimize your code’s performance. So, the next time you find your code stuck in the endless loop of translation, remember PyPy might just be the solution you need.

Here is a quick example:

Code
import random

random.seed(1)

def main():
    N = 1_000_000
    res = [0]*N
    for i in range(N):
        dat = (random.randint(0,10) for x in range(100))
        res[i] = sum(dat) / N
    print(res[:10])

If we ran it with CPython we get:

time python main.py
[0.000528, 0.000502, 0.000493, 0.000463, 0.000477, 0.000524, 0.000534, 0.000549, 0.000478, 0.000482]
python main.py  19,25s user 0,02s system 99% cpu 19,271 total

and the same with PyPy:

time python main.py
[0.000528, 0.000502, 0.000493, 0.000463, 0.000477, 0.000524, 0.000534, 0.000549, 0.000478, 0.000482]
python main.py  2,39s user 0,03s system 100% cpu 2,420 total

The results speak for themselves - we achieved a speed increase of nine times using PyPy. However, it’s important to note a couple of factors:

So, what’s the takeaway here?

If your code is predominantly written in Python and comprises a good number of loops, you should definitely give PyPy a shot. It could result in easy gains and a considerable boost in efficiency.

And the best part? The transition from CPython to PyPy is a breeze. In my experience, it took just two commands to make the switch.

I genuinely hope that the insights shared here will prove beneficial in your coding journey, saving you precious time and making your development process more efficient.

Happy coding!