Wednesday, November 21, 2007

PyPy-Squeak Mini-Tutorial

A proper installation guide had been missing so far, here it comes.

How to install PyPy and generate a Squeak VM

Make sure you are running Python version 2.5 or higher, and checkout the project from subversion

> svn co pypy-dist
> cd pypy-dist

Now, lets generate some Squeak VMs. Switch to the translation goal folder and run the tool chain

> cd pypy/translator/goal
> ./ --gc=generation

This yields tons of output and ends in the debugger, just press ctrl-D to leave and run the generated VM as follows

> ./targetfibsmalltalk-c 25

If you browse the target's python file, you'll find some fixture code together with a function called entry_point(argv). The fixture code is executed before the tool chain takes over, it may use the full power of Python and is not restricted to RPython. Then, the tool chain is started up, taking the entry_point function and the fixture's result as an input, to generate the VM. Therefore, all code eventually called by the entry point must conform to RPython.

Other available goals for Smalltalk are

> ls *smalltalk*.py

For more options, please refer to the previous post Translating the Smalltalk interpreter to C, Java or .NET.

How to browse and test the source code

First, go back to the pypy root folder

> cd ../../../

Then, add py.test to your $PATH to run the tests, for example like this

> sudo ln -s py/bin/py.test /usr/bin

And now, switch to the Smalltalk folder and run the tests

> cd pypy/lang/smalltalk
> py.test

Browse the python files to read the source code - implements bytecodes and interpreter loop - implements primitives - implements image loading

For more information or questions, you may subscribe toor go to channel #pypy on

Thanks to Alexandre Bergel and Mathieu Suen for fighting through an earlier version of this tutorial, eliminiating my command line typos – akuhn

Sunday, October 28, 2007

More Sprint Pictures

Some more pictures of the Bern sprint.

Discussing Shadows: Oscar Nierstrasz, Lucas Renggli, Marcus Denker, Tudor Girba, Niko Matsakis, Armin Rigo.

Marcus Denker, Adrian Kuhn, Armin Rigo, Toon Verwaest

Armin Rigo, Toon Verwaest, Carl Friedrich Bolz

At about one in the night, finally trying to leave: Toon, Armin, Adrian

Working on primitives: Oscar, Niko, Lukas

Trying to translate the Smalltalk interpreter: Carl Friedrich, Armin, Toon, Adrian and Lukas (in the front).

Saturday, October 27, 2007

Bern Sprint finished, Summary

The Bern sprint is finished, all the non-local sprinters have gone home and everybody is resting. The week was amazingly intense and productive, but also lots of fun. Thanks to all the participants! Many thanks also to the University of Bern to host the sprint, especially to Adrian Kuhn for putting lots of effort into the organization.

I am not quite awake yet to completely summarize the sprint, but some of the things we managed to do are:
  • Define a simple representation of the Smalltalk object model in Python
  • implement the bytecode dispatcher and all the Squeak bytecodes
  • implement helper functionality for defining primitives in Python
  • implement many of the essential primitives
  • implement an image loader that can load Squeak images
  • translate all of the above to C (or .NET, or Java, for that matter)

We managed to load the Squeak mini-image and successfully run the tiny benchmark at around a tenth of the speed of Squeak.

The area where the most work is left is obviously the primitives. We haven't even started on the graphical primitives yet and there cannot really start an image (just load it and call selected methods on some functions from the outside). On the other hand, this is all rather straightforward work so it probably won't take too hard thinking to do it. Another big question is image saving where it is unclear whether we should define our own format or try to be able to write Squeak images again.

I will maybe post some more detailed information about what we did during the week, so if you have some specific questions, just ask them. Also, I hope some of the sprinters (including myself) will work a bit more on the project in the next time. We are considering a followup sprint, let's see where that is going.

Friday, October 26, 2007

Translating the Smalltalk interpreter to C, Java or .NET

Yesterday, we started to point PyPy's translation tool-chain on the Smalltalk interpreter loop. In practice, this means starting the tool, waiting for a few seconds while it performs type inference, and looking at the error message that we get. The error message is sometimes a bit obscure (it's not a trivial job to report good error messages from type inferencers). Once understood, we fix the corresponding place in the RPython source (i.e. in pypy/lang/smalltalk/*) and try again. This try-and-error process can go on for half an hour, at the end of which the RPython source is eventually accepted by the translation tool-chain. We then get a nice executable file, produced by gcc compiling the generated C source. The executable does the same job as the original RPython source running on top of the standard CPython - it is just much faster.

First a WARNING: as we are actively hacking on the Smalltalk code, all the examples below might be broken in some revisions. It is a bit too early in the project to have tests for translatability, given that not everybody hacking on the source knows how to use RPython or understand the error messages. For now a subgroup of people is trying to run translation from time to time and fixing the problems. (It works in revision 48070.)

The first thing we translated is an interpreter with an "entry point" that contains, hard-coded, the Squeak bytecode for the Fibonacci series:

    cd pypy/translator/goal
./ --gc=generation
(lots of output...)
(Ctrl-D to exit the debugger prompt at the end of translation)
./targetfibsmalltalk-c 25

Yay! In initial benchmarks, this ran some 10-15 times slower than the same bytecode in Squeak, which is not too bad for a first try. Since yesterday evening we added some tweaks here and there, so the current numbers might be better.

We can also translate the image loader code:

    cd pypy/translator/goal
./ --gc=generation
(lots of output...)
(Ctrl-D to exit the debugger prompt at the end of translation)
./targetimageloadingmalltalk-c ../../lang/smalltalk/mini.image
(lots of #)

The final executable loads the mini.image and runs the tinyBenchmark from there - and the tinyBenchmark even runs to completion since one hour ago :-)

An interesting feature of the translation tool-chain is that it is happy with targets that contains prebuilt data - even a LOT of prebuilt data. This means that we can also preload a Smalltalk image into the Python process - simply by calling the image loader before translating - and translate only the interpreter, as an RPython program, with all the Smalltalk objects from the image already loaded. The objects turn into static C data, which gives a large executable that contains essentially a built-in "image" - tons of static data that the OS will just mmap into the new process and only lazily load from the disk when the interpreter accesses the memory pages. Example:

    cd pypy/translator/goal
./ --gc=generation
(lots of output...)
(Ctrl-D to exit the debugger prompt at the end of translation)

Be sure to look at the size of targettinybenchsmalltalk-c. It runs without taking any input argument - the image is already in there.

By the way, don't take any performance numbers too seriously so far. The point is that we actually managed to write a reasonably good base for a Smalltalk virtual machine in just five days of work, with a team of 4 to 10 people, depending on the hour of the day or night :-)

Oh, by the way, all the examples above can also be translated to Java bytecode or .NET bytecode (use the --backend=jvm or --backend=cli option to, respectively). For Java you need to install the Jasmin bytecode assembler (and make sure that "jasmin" is in your $PATH).

Armin Rigo

Toolchain applied to interpreter, Fibonacci running

Just a quick note (we were busy hacking and are now too tired for anything more): A few hours ago the interpreter was successfully translated to C and was running the Fibonacci function. It is around 15 times slower than Squeak, which is amazingly fast for the first try (PyPy's Python interpreter was at least 200 times slower than CPython when we first managed to translate it).

More details will follow tomorrow (in particular how to try it yourself).

Carl Friedrich & AA

Thursday, October 25, 2007


Today we report on the internal representation of objects and classes in the VM we are working on. Since everything is an object in Smalltalk, even classes, were are confronted with two diverging forces. From the Smalltalk point of view all objects are created equal, hence it would be most natural to model them using one class W_SqueakObject. From the VM point of view we would like to represent some objects using special classes: classes, stack frames, compiled methods, method dictionaries, method contexts, block context and so on. If those objects would not be exposed to the smalltalk view, that would not be a problem at all. However, as they are exposed to plain Smalltalk we had to find a better solution.

This post will quickly describe the outcome of various discussions and refactorings that we had yesterday and this morning addressing this issue. After we wrote a prototype yesterday, several (heated) discussions with all sprinters and, finally, a complete rewrite of the prototype we arrived at the following solution.

Every Smalltalk object may have an associated "shadow" object. These shadow objects are not exposed to the Smalltalk world, they are used by the WM as internal representation and can hold arbitrary information about the actual object. If an object has a shadow the shadow is notified whenever the state of the actual object changes, to keep them in sync. One way of looking at shadows is that they are a general cache mechanism, however, the approach is way more powerful, hooking into the notification of shadow arbitrary meta-behaviour may be attached to objects. As an example, think of immutable objects which reject any modification.

In the current implementation, the shadows are used to attach nicely decoded information about classes to all objects which are used as classes. This allows to use any object as a class, even if they are not a subclass of Smalltalk.Class, which is very Smalltalkish. The shadow of the class stores all required information about classes in a nice, easily accessible data structure (as opposed to the obscure bit format used at the Smalltalk level). The class shadow mirrors format and size of instances, a Python dictionary containing the compiled methods (mirroring the method dictionary), and the name of the class (if it has one), etc. Storing the methods in a Python dictionary instead of the Smalltalk method dictionary allows the tool chain to generate better code for method lookups, taking advantage of its highly optimized builtin dictionary implementation.

Wednesday, October 24, 2007

Third day, work in progress

It is 19:36 local time and the sprint is still running. While I am taking a break, Armin Rigo and Carl Friedrich Bolz are working on the internal representation of classes for the VM. This is ongoing and tricky work, as in Smalltalk any object can (potentially) be used as class. Lukas Renggli just left, he continued today on the implementation of primitives. While Toon Verwaest is poking around on the loaded mini.image, printing all strings in the image and trying to execute random methods using pypy.lang.smalltalk.interpreter.Interpreter. He selects all compiled methods having no arguments and tries to execute them, some of the methods even run successfully.

You can find the image poking hacks in
I will now join Toon to pair up, thinking about and hopefully implementing some benchmarks for the loaded image.

Adrian AA Kuhn