NumPy on PyPy with a side of Psymeric

I’ve been playing around with PyPy lately (more on that later) and decided I’d take a look to see how the NumPy implementation on PyPy (NumPyPy[1]) is coming along.  NumPyPy is potentially very interesting. Because the JIT can remove most of the Python overhead, more of the code can be moved to the Python level. This in turn opens up all sorts of interesting avenues for optimization, including doing the things that Numexpr does, only better.  Therefore, I was very exciting when I saw this post from 2012, describing how NumPyPy was running “relatively real-world” examples over twice as fast as standard NumPy.

I downloaded and installed the NumPyPy code from https://bitbucket.org/pypy/numpy.  This went smoothly except I had to spend a bit of time messing with permissions. I’m not sure if this was something I did on my end, or if the permissions of the source are odd. In either event, installation was pretty easy.  I first tested the speed of NumPyPy using the micro-optimization code from last week – this was my first indication that this wasn’t going to be as impressive as I’d hoped. NumPyPy was over 10⨉ slower than standard NumPy when running this code!

I did some more digging around and found an email chain that described how the NumPyPy developers are focusing on completeness before speed. That’s understandable, but certainly isn’t as exciting as a very fast, if incomplete, version of NumPy.

I tried another, very simple,  example, using timeit; in this case standard NumPy was about 7× faster than NumPyPy:

$ python -m timeit -s "import numpy as np; a = np.arange(100000.0); b=a*7"  "x = a + b"
10000 loops, best of 3: 71.3 usec per loop
 $ pypy-2.3.1-osx64/bin/pypy -m timeit -s "import numpy as np; a = np.arange(100000.0); b=a*7"  "x = a + b"
1000 loops, best of 3: 953 usec per loop

Just for fun, I dusted off Psymeric.py, a very old replacement for Numeric.array that I wrote to see what kind of performance I could get using Psyco plus Python.  There is a copy of Psymeric hosted at https://bitbucket.org/dblank/pure-numpy/src, although I had to tweak that version slightly to ignore Psyco and run under both Python 2 and 3.  Running the equivalent problem with Psymeric using both CPython and PyPy gives an interesting result:

python -m timeit -s "import psymeric as ps; a = ps.Array(range(100000), ps.Float64); b=a*7"  "x = a + b"
10 loops, best of 3: 30.7 msec per loop
pypy-2.3.1-osx64/bin/pypy -m timeit -s "import psymeric as ps; a = ps.Array(range(100000), ps.Float64); b=a*7"  "x = a + b"
1000 loops, best of 3: 510 usec per loop

Running with CPython this is, predictably, pretty terrible (note the units are ms in this case versus µs in the other cases). However, when run with PyPy, this actually faster than NumPyPy.  Keep in mind that Psymeric is pure Python and we are just relying on PyPy’s JIT to speed it up.

These results made me suspect that NumPyPy was also written in Python, but that appears to not be quite right.  It appears that the core of NumPyPy is written in the RPython, the same subset of Python the PyPy itself is written in. This allows the core to be translated into C. However, as I understand it, in order for this to work, the core needs to be part of PyPy proper, not a separate module. And that appears in fact that is the case: the core parts of numpy are contained in the module _numpy defined in the PyPy source directory micronumpy and they are imported by the NumPyPy package, which is installed separately. If all that sounds wishy-washy, it’s because I’m still very unsure on how this is working, but this is my best guess at the moment.

This puts NumPyPy in an odd position. One of main attractions of PyPy from my perspective is that it’s quite fast. However, NumPyPy is still too slow for most of the applications I’m interested in.  From comments on the mailing list, it sounds like their funding sources for NumPy are more interested in completeness than speed, so the speed situation may not improve soon. I should put my time where my mouth is and figure out how to contribute to the project, but I’m not sure if I’ll have the spare cycles soon.

[1] This was a name used for NumPy on PyPy for a while. I’m not sure if it’s still considered legit, but I can’t go around writing “NumPy on PyPy” over and over.

2 Replies to “NumPy on PyPy with a side of Psymeric”

  1. In the PyData ecosystem we are taking a different approach that I hope could also lead to a more unified future with the PyPy world.

    Numba is already able to speed up slow segments of NumPy code while still working within the NumPy + CPython ecosystem.

    A CPython and PyPy merged NumPy world is possible, but it is not easy. The Numba array object (with some libdynd) is my view of the way forward right now. Ultimately it is a project that could connect with the PyPy universe given enough effort.

    1. Thanks for the pointer, Numba does look like a very nice tool. I’ll give it a try next time I run into a speed bottleneck. Although I like playing with PyPy, nearly all of my work is done using CPython+NumPy.

      On a theoretical level, my take is that NumPyPy has more potential than tools like Numba (or Cython, Numexpr, etc), since it can optimize at the Python level and thus optimize across function calls. This would allow either better optimizations or the same level of optimizations with more natural coding style. The key word there is potential though, since NumPyPy is clearly pretty far from NumPy speed now and it may require a lot of work before it can catch up. How long it will take for NumPyPy to approach speed parity with NumPy, if it ever does, is a big question mark.

      That said, NumPyPy+PyPy is already pretty good at things that NumPy+CPython is pretty terrible at. If I use sum2d the example from the from the front page of numba.pydata.org, but use 100×100 or larger arrays to make timing easier, I find that NumPyPy+PyPy is about 140x faster running that code than NumPy+CPython is without the Numba JIT. When I turn on the Numba JIT, it is about 200x faster than the non-JIT version. That’s faster than NumPyPy, but not THAT much faster. (This is very artificial benchmark, so take it with a grain of salt, but it was fast and easy.)

Comments are closed.