A Partially Successful Adventure in RPython

Before I talk about RPython, I need to say at least a little bit about PyPy. PyPy is Python interpreter, written in Python that outperforms the standard, C implementation of Python (CPython) by about a factor of six.  Given that Python is generally thought to be much slower than C, one might think that was impossible.  However, it is in fact faster, for most things at least, as one can see on the PyPy benchmark page.

This is possible because PyPy is only nominally written in Python. Sure, it can run (very slowly) on top of the standard CPython interpreter, but it’s really written in a restricted subset of Python referred to as RPython.  The exact definition of RPython is a little vague, in fact the documentation says “RPython is everything that our translation toolchain can accept”, but essentially it removes the most dynamic and difficult to optimize portions of Python.  The resulting language is static enough that the aforementioned toolchain can translate RPython code into native code resulting in a tremendous speed up. Not only that, and this is the truly amazing part, if the RPython code looks like an interpreter, it will automagically add a JIT compiler to it

That sounds impressive, but I wanted to try out the RPython toolchain to get a better feel for what is involved in translating a interpreter and how large the performance gains one could expect for straightforward, non highly tuned implementation. To this end I looked around and I found lis.py, minimal scheme interpreter written by Peter Norvig.  I tried translating with this RPython and quickly discovered that this implementation is much too dynamic to be successfully translated with with RPython.  So, using the basic structure of lisp.py, I rewrote the interpreter so that RPython could translate it.  The result, named skeem.py (pronounced skimpy) was roughly five times longer and at least several times uglier, but did translate to a native code version named skeem-c. (This is RPython’s choice of name and I believe results from the fact that RPython first translates the Python code to C code, then compiles it).

To get an idea of the speedup, I wrote a lisp program to write out the mandelbrot set in PGM format.  This will not run with the standard lis.py; I had to add a couple of commands to make generating the set feasible.  Running the mandelbrot code using skeem.py takes 45,000 seconds using Python 2.7.  Using PyPy it takes 532 seconds, a considerable improvement.  Finally, using the translated skeem-c reduced the run time to 98 seconds. That’s a whopping 450-times improvement over the original!

Unfortunately, adding a JIT was not a success. I managed to get the skeem.py to translate with –opt=jit, but the resulting interpreter was five times slower than the interpreter without the JIT. I suspect that this is related to the main “loop” of the interpreter being recursive rather than iterative, but I didn’t dig into that.

Later, I will post some results from a second, uglier but faster, interpreter I wrote, which got a factor of two speed boost from –opt=jit.

(define x-centre -0.5)
(define y-centre 0.0)
(define width 4.0)
(define i-max 1600)
(define j-max 1200)
(define n 100)
(define r-max 2.0)
(define colour-max 255)
(define pixel-size (/ width i-max))
(define x-offset (- x-centre (* 0.5 pixel-size (+ i-max 1))))
(define y-offset (+ y-centre (* 0.5 pixel-size (+ j-max 1))))
 
(define (*inside? z-0 z n)
 (and (< (magnitude z) r-max)
 (or (= n 0)
 (*inside? z-0 (+ (* z z) z-0) (- n 1)))))
 
(define (inside? z)
 (*inside? z 0 n))
 
(define (boolean->integer b)
 (if b colour-max 0))
 
(define (pixel i j)
 (boolean->integer
 (inside?
 (make-rectangular (+ x-offset (* pixel-size i))
 (- y-offset (* pixel-size j))))))
 
(define plot
 (lambda ()
 (begin (display (quote P2)) (display nl)
 (display i-max) (display nl)
 (display j-max) (display nl)
 (display colour-max) (display nl)
 (do ((j 1 (+ j 1))) ((> j j-max))
 (do ((i 1 (+ i 1))) ((> i i-max))
 (begin (display (pixel i j)) (display nl)))))))
 
(plot)

[1] And, if Python in Python is six time faster, would Python in Python in Python be 36 times faster? One can only wish!