First a WARNING: as we are actively hacking on the Smalltalk code, all the examples below might be broken in some revisions. It is a bit too early in the project to have tests for translatability, given that not everybody hacking on the source knows how to use RPython or understand the error messages. For now a subgroup of people is trying to run translation from time to time and fixing the problems. (It works in revision 48070.)
The first thing we translated is an interpreter with an "entry point" that contains, hard-coded, the Squeak bytecode for the Fibonacci series:
cd pypy/translator/goal
./translate.py --gc=generation targetfibsmalltalk.py
(lots of output...)
(Ctrl-D to exit the debugger prompt at the end of translation)
./targetfibsmalltalk-c 25
121393
Yay! In initial benchmarks, this ran some 10-15 times slower than the same bytecode in Squeak, which is not too bad for a first try. Since yesterday evening we added some tweaks here and there, so the current numbers might be better.
We can also translate the image loader code:
cd pypy/translator/goal
./translate.py --gc=generation targetimageloadingmalltalk.py
(lots of output...)
(Ctrl-D to exit the debugger prompt at the end of translation)
./targetimageloadingmalltalk-c ../../lang/smalltalk/mini.image
(lots of #)
The final executable loads the mini.image and runs the tinyBenchmark from there - and the tinyBenchmark even runs to completion since one hour ago :-)
An interesting feature of the translation tool-chain is that it is happy with targets that contains prebuilt data - even a LOT of prebuilt data. This means that we can also preload a Smalltalk image into the Python process - simply by calling the image loader before translating - and translate only the interpreter, as an RPython program, with all the Smalltalk objects from the image already loaded. The objects turn into static C data, which gives a large executable that contains essentially a built-in "image" - tons of static data that the OS will just mmap into the new process and only lazily load from the disk when the interpreter accesses the memory pages. Example:
cd pypy/translator/goal
./translate.py --gc=generation targettinybenchsmalltalk.py
(lots of output...)
(Ctrl-D to exit the debugger prompt at the end of translation)
Be sure to look at the size of targettinybenchsmalltalk-c. It runs without taking any input argument - the image is already in there.
By the way, don't take any performance numbers too seriously so far. The point is that we actually managed to write a reasonably good base for a Smalltalk virtual machine in just five days of work, with a team of 4 to 10 people, depending on the hour of the day or night :-)
Oh, by the way, all the examples above can also be translated to Java bytecode or .NET bytecode (use the --backend=jvm or --backend=cli option to translate.py, respectively). For Java you need to install the Jasmin bytecode assembler (and make sure that "jasmin" is in your $PATH).
Armin Rigo
2 comments:
Would it be possible (at some point) to use the image loader to load in Smalltalk bytecode and have that affect the translation process somehow? As in add, remove, or customize a translation step based on what image was "mixed-in". I'm not sure if there are instances where this would be necessary, but it might be an interesting direction to look in.
This is fantastic!
Post a Comment