Sunday, October 17, 2010

Emscripten 0.4!

The focus of this release was on making the generated code faster. As the chart shows, we went from being 100X slower than hand-optimized JavaScript code, to around 5X slower. You can see the difference in the raytrace demo, which has been updated to use all the current optimizations.

5X slower is still slower. It will be hard to do much better, though, without either better JS engine support, or much more clever code analysis. Both will hopefully happen over time. Meanwhile, 5X slower is not too terrible, and there are some advantages over hand-written code - we hardly use garbage collection, so no GC pauses. Also, the speed really depends on the code - the comparison in the chart above uses benchmarks for which we have comparable code in both C++ and JavaScript. But the most interesting uses of Emscripten are to convert code for which we don't have a JavaScript equivalent. Also worth mentioning is that it is perfectly possible to hand-optimize the crucial parts of the code that Emscripten generates.

Some technical details about the optimizations implemented in this release:
  • Use typed arrays, if available in the JS engine (thanks to pcwalton and njn for the idea)
  • Optimize after the relooper runs, removing unneeded code flow overhead
  • Nativize many more variables than before (i.e., move them off the emulated stack, and into native JS variables)
  • Optimized stack emulation
  • Inlining of various runtime code fragments
  • Integration with the Closure Compiler: We generate output that it is very good at optimizing (thanks to Anders Riggelsen for the idea)

Also added in this release is support for the brand-new LLVM 2.8. That is now the version being tested against.

Tuesday, October 5, 2010

Emscripten 0.3!

Demo for this release: Raytracing. It isn't very fast, since the focus hasn't been on code speed yet, but it does show that a ray tracer written in C++, using SDL, can be emscriptened and run on the web.

So, I ended up doing more in this release than I had intended, causing it to take longer than planned. But it was for the best. Major changes include:
  • Clang support: All tests now work in both llvm-gcc and Clang. The two produce somewhat different llvm bitcode, to the degree that different methods are needed with Clang, causing it to run 1/2 as fast as llvm-gcc code. That is mainly because llvm-gcc is more explicit with what it does, while Clang uses memcpy and such, with hardcoded C size values (4 bytes for an int, etc.).

    Emscripten therefore now supports optional 'C memory layout' (QUANTUM_SIZE in settings.js). For example, an array of ints of values 1,2,3 with that enabled is [1,0,0,0,2,0,0,0,3,0,0,0] (since each int is 4 bytes), and when it is disabled, [1,2,3]. The latter works fine with llvm-gcc-generated llvm bitcode. Note that things get even more complicated with structures here, which need to be aligned and so forth. Anyhow, after that effort Emscripten should now be able to support anything that C/C++ can throw at it.

  • Faster compilation speed: The original goal of the release. Compilation speed is 2-3 times faster now. Still lots of room for improvement, but it isn't a major nuisance like it was.

  • Proper memory management: A call stack is implemented, and static memory allocation (for global variables, etc.) is also possible. sbrk() is emulated as well, allowing dlmalloc, a popular malloc() implementation, to be emscriptened properly. In particular that lets you use a real malloc() in your emscriptened code.

  • Much better native flow regeneration (the 'relooper'): A major challenge of translating LLVM to JavaScript is to implement native flow structures - if, while, for, etc. LLVM bitcode only provides chunks of code (I call them 'labels', but that's not the right name) and branchings between then. So Emscripten needs to figure out from that low-level data the high-level code flow patterns. Native flow structures are extremely important for good performance of the generated code.

    The first relooper worked on most tests, but was slow and buggy. I wrote a new version almost from scratch, and it now properly processes all the test code. It isn't very fast, though, I didn't focus on that. For that reason it is off by default, which means that Emscripten will not generate native flow structures (instead it will emulate code flow using a switch in a loop, which is very slow - but trivial to generate).

  • The above-mentioned raytracing demo: For this, initial work was done on supporting SDL - just showing video data so far. The SDL Surface is implemented in JavaScript using a Canvas. I found this very amusing, to write C++ code with SDL, compile and run it natively using gcc, and be able to run that same unmodified code on the web through Emscripten ;)

  • Lots more tests: There are now 37 separate tests, from small LLVM features to high-level tests like the CubeScript engine, dlmalloc, and raytracing; each test is run through both Clang and llvm-gcc, and with relooping&optimization both on and off, for a total of 148 tests. This takes 7 minutes on my slow laptop, which is starting to be significant, but it's extremely important in a project like this.

Next goals include performance of the generated code - lots, lots of low-hanging fruit there - and compiling yet more real-world code.