Update to Jupyter on GCE

Quick update: in an earlier post I showed one way to run Jupyter notebooks remotely on GCE. Since then I found there is a simpler way to write the SSH command. Anything after -- in the gcloud compute ssh command is passed directly to ssh. So rather than using multiple instances of --ssh-flag, one can instead use:

gcloud compute ssh img-detection-gpu-3 -- \
        -L 9999:localhost:8888

I’ve also taken to using rmate to use Sublime Text remotely on GCE. In this case the command becomes:

gcloud compute ssh img-detection-gpu-3 -- \
         -L 9999:localhost:8888 \
         -R 52698:localhost:52698

Curve Fitting

A while back a colleague tweaked me with the joke that machine learning is just glorified curve fitting. This is true as far as it goes, but a large, modern neural net (e.g., VGG-16 with 138 million parameters) has approximately the same relationship with a linear fit (2 parameters) that the bomb dropped on Hiroshima (Little Boy with a yield of 63 TJ) had with a stick of dynamite (1 MJ).

The relative danger is almost certainly not as great, but still you are considerably more likely to cause yourselves and others grief with the careless application of modern machine learning methods than with a linear fit.

Jupyter on GCE

I was recently inspired to setup Jupyter to run remotely on a GCE instance. I have access to a lot of computing resources for work, so it’s silly to run things on locally my laptop, but running interactive Python sessions remotely can be painful due to latency and the vagaries of terminals. Running Jupyter seems like a perfect fit here, since the editing is done locally – no lag – and Jupyter can be nicely self documenting for moderate sized projects1)Once projects hit a certain size though, Jupyter becomes inscrutable and really needs to be modularized.

Jeff Delaney has a helpful post on setting Jupyter up GCE and the Jupyter docs on running a public server also have some useful information. However, the solutions for exposing Jupyter to the web were not terribly secure or painful to implement, or both. Since I’m only interested in being able to run the server myself, a simple, relatively secure solution is to use ssh tunneling. So rather than exposing ports publicly on GCE, just start the Jupyter server on your GCE instance with the –no-browser option.

jupyter notebook --no-browser

Then, on your local machine run

gcloud compute ssh nnet-inference \
                       --ssh-flag="-L" \
                       --ssh-flag="9999:localhost:8888"

And point your browser to http://localhost:9999.

That’s it. Now you can use Jupyter remotely without opening up public ports on your GCE instance.  2)A couple of minor notes: I run my notebook inside tmux so that it stays alive if my connection drops. And if the connection drops you’ll need to restart the tunnel.

References   [ + ]

1. Once projects hit a certain size though, Jupyter becomes inscrutable and really needs to be modularized.
2. A couple of minor notes: I run my notebook inside tmux so that it stays alive if my connection drops. And if the connection drops you’ll need to restart the tunnel.

A Connection Between RMSPE and the Log Transform

This is (mostly) a Test

The primary purpose of this post is to test using IPython notebooks as part of WordPress blog posts. I already did a some testing of this (see my previous post), but figured more couldn’t hurt. If someone finds it interesting, so much the better.

We transition from regular WordPress to a notebook just below:

IPython Notebooks in WordPress

Animating Weather Station Data

Every time there’s a big storm here, which is admittedly not that often, I think that there should be a better way to visualize the rainfall data that exists.  The Flood Control District of Maricopa County maintains an extensive network of rainfall and streamflow gauges. The data is available for download, but the only forms I’ve seen it in is either as static map or as table. As a little project, I decided to download the data from the big storm we had here on September 8th, 2014 and see if I could produce an interesting animation.

Due to some prodding from friends the scope of this project crept, I ended up adding weather radar data as well as sound, and the whole thing took much longer than I’d planned on. The result is below: the blue circles are the rainfall gauges – circle area is proportional to the previous hours rainfall at that location, the red is the weather radar composite reflectivity, and the rainfall sound at a given time is proportional to the total rain at all stations1)I just add up the rainfall at all of the stations. It would be more accurate to scale the contribution of each station based local density of stations since rain in areas with more stations is overepresented in the current scheme.. The animation was produced using Python, primarily with the matplotlib package. If you are interested in how this is done, I’ve described the process of animating the just the rainfall data in an IPython notebook which can be viewed at nbviewer.ipython.org2)I just discovered this and I really like it. or downloaded from github.com. I plan to add notebooks that show how to produce the rest of the animation shortly.

References   [ + ]

1. I just add up the rainfall at all of the stations. It would be more accurate to scale the contribution of each station based local density of stations since rain in areas with more stations is overepresented in the current scheme.
2. I just discovered this and I really like it.

A Partially Successful Adventure in RPython

Before I talk about RPython, I need to say at least a little bit about PyPy. PyPy is Python interpreter, written in Python that outperforms the standard, C implementation of Python (CPython) by about a factor of six.  Given that Python is generally thought to be much slower than C, one might think that was impossible.  However, it is in fact faster, for most things at least, as one can see on the PyPy benchmark page.

This is possible because PyPy is only nominally written in Python. Sure, it can run (very slowly) on top of the standard CPython interpreter, but it’s really written in a restricted subset of Python referred to as RPython.  The exact definition of RPython is a little vague, in fact the documentation says “RPython is everything that our translation toolchain can accept”, but essentially it removes the most dynamic and difficult to optimize portions of Python.  The resulting language is static enough that the aforementioned toolchain can translate RPython code into native code resulting in a tremendous speed up. Not only that, and this is the truly amazing part, if the RPython code looks like an interpreter, it will automagically add a JIT compiler to it

That sounds impressive, but I wanted to try out the RPython toolchain to get a better feel for what is involved in translating a interpreter and how large the performance gains one could expect for straightforward, non highly tuned implementation. To this end I looked around and I found lis.py, minimal scheme interpreter written by Peter Norvig.  I tried translating with this RPython and quickly discovered that this implementation is much too dynamic to be successfully translated with with RPython.  So, using the basic structure of lisp.py, I rewrote the interpreter so that RPython could translate it.  The result, named skeem.py (pronounced skimpy) was roughly five times longer and at least several times uglier, but did translate to a native code version named skeem-c. (This is RPython’s choice of name and I believe results from the fact that RPython first translates the Python code to C code, then compiles it).

To get an idea of the speedup, I wrote a lisp program to write out the mandelbrot set in PGM format.  This will not run with the standard lis.py; I had to add a couple of commands to make generating the set feasible.  Running the mandelbrot code using skeem.py takes 45,000 seconds using Python 2.7.  Using PyPy it takes 532 seconds, a considerable improvement.  Finally, using the translated skeem-c reduced the run time to 98 seconds. That’s a whopping 450-times improvement over the original!

Unfortunately, adding a JIT was not a success. I managed to get the skeem.py to translate with –opt=jit, but the resulting interpreter was five times slower than the interpreter without the JIT. I suspect that this is related to the main “loop” of the interpreter being recursive rather than iterative, but I didn’t dig into that.

Later, I will post some results from a second, uglier but faster, interpreter I wrote, which got a factor of two speed boost from –opt=jit.

(define x-centre -0.5)
(define y-centre 0.0)
(define width 4.0)
(define i-max 1600)
(define j-max 1200)
(define n 100)
(define r-max 2.0)
(define colour-max 255)
(define pixel-size (/ width i-max))
(define x-offset (- x-centre (* 0.5 pixel-size (+ i-max 1))))
(define y-offset (+ y-centre (* 0.5 pixel-size (+ j-max 1))))
 
(define (*inside? z-0 z n)
 (and (< (magnitude z) r-max)
 (or (= n 0)
 (*inside? z-0 (+ (* z z) z-0) (- n 1)))))
 
(define (inside? z)
 (*inside? z 0 n))
 
(define (boolean->integer b)
 (if b colour-max 0))
 
(define (pixel i j)
 (boolean->integer
 (inside?
 (make-rectangular (+ x-offset (* pixel-size i))
 (- y-offset (* pixel-size j))))))
 
(define plot
 (lambda ()
 (begin (display (quote P2)) (display nl)
 (display i-max) (display nl)
 (display j-max) (display nl)
 (display colour-max) (display nl)
 (do ((j 1 (+ j 1))) ((> j j-max))
 (do ((i 1 (+ i 1))) ((> i i-max))
 (begin (display (pixel i j)) (display nl)))))))
 
(plot)

[1] And, if Python in Python is six time faster, would Python in Python in Python be 36 times faster? One can only wish!