Showing posts with label emscripten. Show all posts
Showing posts with label emscripten. Show all posts

Saturday, September 24, 2011

Road to Emscripten 2.0

Lots and lots of work has been taking place on emscripten (the LLVM-to-JS compiler). I haven't been breaking things out into smaller releases, instead, there will be a 2.0 release in the near future (last release was 1.5). The remaining issues before that are:
  • Fix any remaining regressions in the llvm-svn branch compared to the master branch, and merge llvm-svn to mastert. The llvm-svn branch uses LLVM's svn, which will soon become 3.0. Some of the code changes in LLVM have hurt our generated code, however most of the issues are now fixed. The latest update is a fix for exception handling which has led to a 5% smaller ammo.js build (compared to 2.9), with no speed decrease :)
  • Bundle headers with emscripten. As more people have begun to use emscripten, we have been seeing more issues with platform-specific problems, almost all due to using different system headers (for example, issues #82 and #85 on github). Bundling working headers will fix that, in a similar way as to how SDKs and NDKs typically bundle a complete build environment. Currently the plan is to use the newlib headers for libc and start from there.
As mentioned in a previous blog post, emscripten 2.0 will require LLVM 3.0, and will no longer officially support the deprecated llvm-gcc compiler (it might still work though). This is a significant change which may affect projects using emscripten, please let me know if you are aware of any issues there.

Note also that I will be merging llvm-svn to master before LLVM 3.0 goes stable. That means that you will need to build LLVM from source at that time since there won't be LLVM binaries. Of course, you will still be able to use a previous revision of emscripten, that works with LLVM 2.9, with no issues.

Sunday, July 31, 2011

Emscripten 1.5!

Version 1.5 of Emscripten, the LLVM to JavaScript compiler, is out. Lots of new stuff:
  • A Text-to-Speech demo using eSpeak. Not much had to be done to get this to work, a few library functions were missing but that is pretty much it. I did need to bundle getopt and strtok C sources in the project though. Also, I had to use typed arrays type 2, since the eSpeak source code is not as platform independent as we would like (so this ended up being a good test of typed arrays 2 actually). For more details, source code etc., see the demo page.
  • max99x has written a nice Filesystem API. See that link for documentation. It makes the emulated filesystem much more flexible and useful. The text-to-speech demo uses it, as do all the automatic tests. Aside from the API itself, this update comes with a ton of library additions for IO related things.
  • max99x also wrote parsing code to detect field names in LLVM metadata. This lets you use the original C/C++ field names in your JavaScript, so integrating compiled code and JavaScript becomes much easier. I am thinking about extending this for use in the bindings generator.
  • Speaking of the bindings generator, it has seen a lot of work and things are finally starting to run with Bullet, at least a 'hello world' of creating a btVector3. There is still some work ahead before it is finished, not sure how much.


Sunday, July 10, 2011

Emscripten 1.4!

Version 1.4 of Emscripten, the open source LLVM to JavaScript compiler, has been released.

Some significant improvements this time, including
  • Support for compiling and loading dynamic libraries, thanks to max99x for writing this very useful (and not easy to write!) feature. You can now compile a module as a shared library, and load it from your main compiled script just like you would load a normal shared library in native code, using dlopen() and so forth. This can potentially be very useful, both in not needing to rewrite code that is already split up into modules, and also in that it lets you load the main module quickly since other stuff is split out into other files, which can be loaded later on demand. I hope to see a demo of this up soon.

  • Automatic bindings generation. Until now, you could compile a C or C++ library and run it on the web, but using it from normal JavaScript was clunky. Thankfully bretthart pointed me to CppHeaderParser, a pure Python C++ header parser, which Emscripten can now use to generate bindings (for more details on the header parser, see here). The result is a set of JavaScript objects that wrap the compiled C++ code, so you can write quite natural JavaScript code to access them, for example, var inst = new CppClass() to create an instance, inst.doSomething() to call a function, etc. A lot of basic stuff already works (see part 2 of test_scriptaclass), I am currently investigating the use of this with Bullet in ammo.js, hopefully I will succeed there and have a more detailed blogpost afterwards.

  • Library stuff, lots of fixes and additions there, thanks to max99x and timdawborn.

Wednesday, June 22, 2011

Emscripten 1.3!

Version 1.3 of Emscripten, the open source LLVM to JavaScript compiler, has been released.

No new demo this time, sorry. However the Python demo has been updated to improve performance and enable raw_input to work (it prompts for input using window.prompt). Press 'execute' in the demo to see it work.

Main updates:
  • Support for a new usage of typed arrays, TA2. In TA2, a single shared buffer is used, and int8, int32 etc. are all accessed through views into that buffer. The main benefit here is memory usage - this mode takes much less memory than TA1 (the original typed array usage), and in most cases probably less than the non-typed array case.

    In theory, this can also be faster. However that doesn't appear to be the case in my benchmarks, due to the need to divide pointers by 2 or 4 constantly (pointers are raw addresses, while indexes into typed arrays take into account the size of the element. int32vec[1] is at address 4!), and since JS engines still do not heavily optimize typed arrays.

    One thing you can do, though, is use dangerous nonportable LLVM optimizations with TA2. TA2 lets you write an int and read the first character, and get the 'right' result. Of course 'right' will depend on the endianness, so this is very dangerous and not recommended. However you can compile two versions, one for each endianness. This can potentially be faster.

  • Some relooper optimizations were done, which gave us a nice speed improvement. I'll probably do a full blogpost on performance issues, but to briefly summarize, we seem to be getting close to the speed of handwritten JS code, which is to say, as fast as we can probably get. In absolute terms, compared to gcc -O3 (the fastest native code), we are around 5X slower (on the latest development versions of SpiderMonkey and V8). But there is a big spread: In raw numeric processing we are often just 2-3X slower, which is about the same as Scala, Haskell, and Mono, but certain other operations are costlier and in some benchmarks we are up to 10X slower.

Other news:
  • Still hoping for people to help out with OpenGL/WebGL stuff. Please step up! I don't know what to do there myself.
  • Next main project for me personally will probably be better tools to integrate compiled code with normal JS code. One option is to use SWIG to generate bindings. We could then compile a C++ library and use it in a natural way on the web, which would be very cool. If you know SWIG, or don't but want to see this happen (like me ;) then please get in touch.
  • I had some discussions with people interested in compiling certain large projects to the web, for example Second Life and Mono. Both have significant technical difficulties (rendering and networking for Second Life, the non-existence of an interpreter and the limitations of mono-llvm in Mono), but if the people interested in each are serious enough to do the work to overcome the respective difficulties, I have promised to do the raw C++ to JS conversion for each of those projects. Hopefully cool things will happen here.

Monday, May 30, 2011

Emscripten 1.2, Doom on the Web

Emscripten, the LLVM to JavaScript compiler, is now at version 1.2. The main updates in this release were to enable this demo of Doom on the Web - a playable version of the classic game Doom, compiled from C to JavaScript and rendering using Canvas.

The demo is known to work on Firefox and Safari. It works, but slowly, on Opera. I can't get it to run properly in Chrome due to a problem with V8. I have no idea if it runs on IE9, since I don't have a Windows machine, but since IE9 has a fast JS engine and supports canvas, it should (please let me know if you try it there). Edit: Here's a screencast of the demo running on Firefox Nightly if you can't run it yourself.

Highlights of Emscripten 1.2:
  • Many improvements to Emscripten's implementation of the SDL API in JavaScript, including support for color palettes (Doom uses a 256-color palette), input events (we translate normal web keyboard events into their SDL forms), and audio (for now, just using the Mozilla Audio Data API - it's the most straightforward API at this point. Patches are welcome for other ones).
  • Many improvements to the CHECK_* and CORRECT_* options, which are very important for generating optimized code using Emscripten. In particular, there is a new AUTO_OPTIMIZE option which will output a summary of which checks ran how any times, and how many of those checks failed, giving you a picture of which lines are important to be optimized, and which can be.
  • Some additional experimental work is ongoing about supporting OpenGL in WebGL. I don't know either OpenGL or WebGL very well, I'm learning as I go, and I'm not sure how feasible this project is. If you can help here, please do!
  • Various bug fixes. Thanks to all the people that submitted bug reports. In addition compiling Doom uncovered a few small bugs, for example we were not doing bit shifts on 64-bit integers properly.

Sunday, May 1, 2011

Emscripten 1.1!

Emscripten is an LLVM to JavaScript compiler, allowing you to run code written in C or C++ on the web. I released version 1.1 today, with the following updates:
  • A much improved Bullet demo - check it out! This version is much faster. The main differences are use of memory compression (see below), LLVM optimizations, and CubicVR.js for rendering.
  • QUANTUM_SIZE == 1, a.k.a memory compression. This is an advanced, and somewhat risky, optimization technique. I see speedups of around 25%, but take note, this must be used carefully. See the docs.
  • Dead function elimination tool: A Python script that scrubs an .ll file to remove unneeded functions. This is useful to reduce the size of the generated code and speed up compilation. Note though that if you want to compile a library, then this tool will remove functions that you probably want left in - it removes everything that cannot be reached by main(). The test runner now uses this by default.
  • Various performance improvements and bug fixes.

Sunday, April 10, 2011

Emscripten Moves to GitHub

After starting on Google Code, and later adding a git mirror, Emscripten has now moved entirely to GitHub. emscripten.org and so forth should now forward to there.

The main reason for the change is the inconvenience in maintaining two clones. hg-git helped greatly, but it remained a constant hassle. Meanwhile GitHub has been getting more popular and more useful. The last straw was GitHub's recent addition of a much nicer issue tracker. So I decided to make the move today, which coincides nicely with the release of Emscripten 1.0: A fresh start on the road ahead to 2.0.

Code will not be updated in the Google Code page anymore, and I pushed a final commit there to warn people they are running old code if they get there by mistake. I moved the important wiki pages to GitHub, which leaves only one thing left behind, the open issues. If you have an open issue on Google Code that you care about, please either open a new issue on GitHub for it, or tell me and I'll do it for you.

Saturday, April 9, 2011

Emscripten 1.0!

It's been almost a year since I started Emscripten (which, if you haven't heard of it, is a tool to compile LLVM to JavaScript), during which it took up much of my spare time. So I am very pleased to announce that today Emscripten has reached the 1.0 milestone. This release comes with a demo of rendering PDFs on the web (warning: that page downloads >12MB, since it includes Poppler and FreeType. It's like downloading an entire desktop app, almost).

Other highlights in this release:
  • Very significant optimization of memory use in the compiler. This was necessary for the PDF demo to build, since it is far larger than previous demos.
  • Full support for the recently released LLVM 2.9.
  • The Emscripten documentation paper is finished. It explains how Emscripten works, so you might be interested in it if you care what Emscripten does under the hood (but if you just want to use Emscripten you don't need to read it).
Overall Emscripten is now in very good shape. It can probably compile most any C/C++ project out there, subject to some limitations (like JS not allowing C-style multithreading). At times some manual intervention is needed, like changing the project's settings so it doesn't generate inline assembly, and of course bugs probably still exist, but recently the code I have compiled has tended to just work (hence the rate of commits has greatly decreased recently).

The speed of the generated code can be quite good. By default Emscripten compiles with very conservative settings, so the code will be slow, but optimizing the code is not that hard to do. Optimized code tends to run around 10x slower than gcc -O3, which is obviously not great, but on the other hand fairly decent and more than good enough for many purposes. And of course, that ratio will improve along with advancements in JavaScript engines, LLVM, and the Closure Compiler.

So, Emscripten 1.0 is in my opinion pretty solid. There are no major outstanding bugs, and no major missing features. (But I do have plans for some major improvements, which are difficult, but should end up with code that runs at least twice as fast.) Now that Emscripten is at 1.0, I am hoping to see it used in more places. I'm starting to propose at Mozilla that we use it in various ways, and also I'd love to see things like GTK or Qt ported to the web - if anyone wants to collaborate on that, let me know.

Saturday, March 19, 2011

Emscripten moving to LLVM 2.9

LLVM 2.9 will be released very soon, and Emscripten has just been updated to support it.

Emscripten has a lot of automatic tests - they take over 2 hours to run on my laptop - so I won't be running tests for LLVM 2.8 anymore (that would double the time the tests take). Until LLVM 2.9 is formally released with binary builds, you can build LLVM from svn source (the instructions on the Emscripten wiki are useful), or use LLVM 2.8 with Emscripten 0.9 (the last release of Emscripten that supports 2.8).

If you do build LLVM 2.9 and put it in a different location than 2.8 was, don't forget to update your ~/.emscripten file so it uses the version you want. Also, if you update LLVM to 2.9 and want to use llvm-gcc, you need to update that to their current svn as well.

There were not a lot of changes for Emscripten to support 2.9, so it is possible 2.8 will still work. But as mentioned above, I am not testing it, so I can't say for sure.

Sunday, March 6, 2011

Puzzles on the Web

Check out this very cool port of Simon Tatham's Portable Puzzle Collection to the web, by Jacques Le Roux, using Emscripten.

Nice quote from there:
This was basically just an experiment to see how hard it would be to port C code to a web application running entirely on the client (turns out not that hard).
:)

Friday, February 11, 2011

Git Mirror for Emscripten

We now have a git mirror on GitHub! Getting the Emscripten code is now as easy as

git clone git://github.com/kripken/emscripten.git

Code will be mirrored there, so you can either use git with that GitHub repo, or hg with the Google Code repo, and the result will be the same. (However for project stuff - issues, wiki, etc. - we will continue to use the Google Code project page.)

Sunday, February 6, 2011

Emscripten 0.8!

The main highlights of this release are:
  • Tests for FreeType and zlib, two important real-world codebases. Aside from all the fixes and improvements necessary to get them to work, the test infrastructure now runs the entire build procedure (using emmaken) for those two tests, giving even more complete test coverage.
  • File emulation. Just enough to let compiled C/C++ code think it is accessing a filesystem. For example, the FreeType test loads a TrueType font from a file (but really it's a virtual filesystem, set up in JavaScript).
  • Additional compilation options for overflows and signedness (CHECK_OVERFLOWS, CORRECT_OVERFLOWS, CHECK_SIGNS). These allow even more C/C++ code to be compiled and run properly, but are switchable, so code that doesn't need them can run fast.
I'll post a web demo soon.

Tuesday, January 18, 2011

LIL browser demo

I just saw this demo of LIL (a Little Interpreted Language) running in the browser, compiled from C to JavaScript using Emscripten. Cool stuff!

Saturday, December 18, 2010

Emscripten 0.7!

Main changes in this release:
  • Lots of minor fixes and additions, in order to get CPython working. As a result there is now a web demo of Python, which seems to work quite well aside for being very slow in Chrome.
  • Figuring out what to do with LLVM optimizations. It looks like all of them generate suitable code except for -instcombine, which apparently combines instructions in a CPU-specific way (so, it isn't portable, and confuses Emscripten). All the tests now pass with LLVM optimizations enabled (all but the problematic one just mentioned).
So, not much in the way of new features: As mentioned before, we are already pretty much feature complete at this point.

Friday, November 12, 2010

Now What?

For several months I've known that when I get home at night, I have tasks X and Y to do in Emscripten in order to move it forward. Last night, however, I took out the guitar instead. Suddenly, there is not much to do - basically the goals of the project have been achieved, Emscripten can compile things like the Bullet physics engine and run it on the web. As far as the core code-generating capabilities are concerned, Emscripten is pretty much complete.

So, what now?
  • There are some additional optimizations and enhancements that can be done, like nativizing structures or emulating multithreaded code.
  • There are various tooling improvements that can be done, like making it easier to glue together web code and compiled code.
  • Emscripten could be used in other ways, for example, it could be combined with something like Rubinius that generates LLVM code from Ruby, allowing running Ruby code on the web.
  • Various code cleanups and refactorings could be done.
I'm not in a rush to do any of these - none is urgent or essential. I guess I'll get around to them eventually, or perhaps someone else will.

Right now, I'm considering doing one or both of the following:
  • Return to my original goal, that of bringing Syntensity to the web. In other words, to compile a version of Syntensity using Emscripten. The time of 3D-environments-on-the-web is almost upon us, and when it is, we need to make sure that the main tools for it are open source and platform agnostic. Sadly, currently the main contenders are not such.
  • Some other side project, got at least two ideas in my head of things I'd like to hack up. They are very experimental and speculative though, so they may end up a waste of time. But if they succeed...

Tuesday, November 9, 2010

Bullet/WebGL Demo


Click on the screenshot for a live demo.

Sunday, October 17, 2010

Emscripten 0.4!









The focus of this release was on making the generated code faster. As the chart shows, we went from being 100X slower than hand-optimized JavaScript code, to around 5X slower. You can see the difference in the raytrace demo, which has been updated to use all the current optimizations.

5X slower is still slower. It will be hard to do much better, though, without either better JS engine support, or much more clever code analysis. Both will hopefully happen over time. Meanwhile, 5X slower is not too terrible, and there are some advantages over hand-written code - we hardly use garbage collection, so no GC pauses. Also, the speed really depends on the code - the comparison in the chart above uses benchmarks for which we have comparable code in both C++ and JavaScript. But the most interesting uses of Emscripten are to convert code for which we don't have a JavaScript equivalent. Also worth mentioning is that it is perfectly possible to hand-optimize the crucial parts of the code that Emscripten generates.

Some technical details about the optimizations implemented in this release:
  • Use typed arrays, if available in the JS engine (thanks to pcwalton and njn for the idea)
  • Optimize after the relooper runs, removing unneeded code flow overhead
  • Nativize many more variables than before (i.e., move them off the emulated stack, and into native JS variables)
  • Optimized stack emulation
  • Inlining of various runtime code fragments
  • Integration with the Closure Compiler: We generate output that it is very good at optimizing (thanks to Anders Riggelsen for the idea)

Also added in this release is support for the brand-new LLVM 2.8. That is now the version being tested against.

Tuesday, October 5, 2010

Emscripten 0.3!

Demo for this release: Raytracing. It isn't very fast, since the focus hasn't been on code speed yet, but it does show that a ray tracer written in C++, using SDL, can be emscriptened and run on the web.

So, I ended up doing more in this release than I had intended, causing it to take longer than planned. But it was for the best. Major changes include:
  • Clang support: All tests now work in both llvm-gcc and Clang. The two produce somewhat different llvm bitcode, to the degree that different methods are needed with Clang, causing it to run 1/2 as fast as llvm-gcc code. That is mainly because llvm-gcc is more explicit with what it does, while Clang uses memcpy and such, with hardcoded C size values (4 bytes for an int, etc.).

    Emscripten therefore now supports optional 'C memory layout' (QUANTUM_SIZE in settings.js). For example, an array of ints of values 1,2,3 with that enabled is [1,0,0,0,2,0,0,0,3,0,0,0] (since each int is 4 bytes), and when it is disabled, [1,2,3]. The latter works fine with llvm-gcc-generated llvm bitcode. Note that things get even more complicated with structures here, which need to be aligned and so forth. Anyhow, after that effort Emscripten should now be able to support anything that C/C++ can throw at it.

  • Faster compilation speed: The original goal of the release. Compilation speed is 2-3 times faster now. Still lots of room for improvement, but it isn't a major nuisance like it was.

  • Proper memory management: A call stack is implemented, and static memory allocation (for global variables, etc.) is also possible. sbrk() is emulated as well, allowing dlmalloc, a popular malloc() implementation, to be emscriptened properly. In particular that lets you use a real malloc() in your emscriptened code.

  • Much better native flow regeneration (the 'relooper'): A major challenge of translating LLVM to JavaScript is to implement native flow structures - if, while, for, etc. LLVM bitcode only provides chunks of code (I call them 'labels', but that's not the right name) and branchings between then. So Emscripten needs to figure out from that low-level data the high-level code flow patterns. Native flow structures are extremely important for good performance of the generated code.

    The first relooper worked on most tests, but was slow and buggy. I wrote a new version almost from scratch, and it now properly processes all the test code. It isn't very fast, though, I didn't focus on that. For that reason it is off by default, which means that Emscripten will not generate native flow structures (instead it will emulate code flow using a switch in a loop, which is very slow - but trivial to generate).

  • The above-mentioned raytracing demo: For this, initial work was done on supporting SDL - just showing video data so far. The SDL Surface is implemented in JavaScript using a Canvas. I found this very amusing, to write C++ code with SDL, compile and run it natively using gcc, and be able to run that same unmodified code on the web through Emscripten ;)

  • Lots more tests: There are now 37 separate tests, from small LLVM features to high-level tests like the CubeScript engine, dlmalloc, and raytracing; each test is run through both Clang and llvm-gcc, and with relooping&optimization both on and off, for a total of 148 tests. This takes 7 minutes on my slow laptop, which is starting to be significant, but it's extremely important in a project like this.

Next goals include performance of the generated code - lots, lots of low-hanging fruit there - and compiling yet more real-world code.

Tuesday, September 21, 2010

Emscripten, Now With More Clang

Emscripten can now work with Clang. It turns out that the llvm bitcode that Clang generates is slightly different from that of llvm-gcc, which uncovered various minor bugs and missing elements in Emscripten (for example, the 'phi' command).

All the tests now pass both llvm-gcc and Clang, and with both optimization and relooping on (however, relooping has been weakened, due to some bugs that were discovered).

The benefits of allowing Clang to be used is that having two sources of LLVM bitcode is better than one - more chances to catch bugs (but even more important would be to add non-C/C++ sources of LLVM bitcode as well!). Another benefit is that Clang is simpler to build so it will allow more people to play with Emscripten.

The wiki has been updated with full instructions, so you can get Clang and try it out with Emscripten right now.

Saturday, September 11, 2010

CubeScript on the Web

Emscripten 0.2 is out, and here's a silly demo: the CubeScript engine from Sauerbraten, compiled from C++ to JavaScript, and running in a web page. Finally, you can use a script language on the web ;P

A few more details about the demo appear on that page. Other details about the 0.2 release are in the changelog.

After successfully compiling the CubeScript engine (which was mainly to see if Emscripten could do it - fixed a lot of bugs on the way), I think most C/C++ stuff should work (but there are probably a lot of minor corner cases left). The next steps are something like this:
  • Version 0.3: Optimize the compiler for speed. Right now compiling CubeScript, about 2,500 lines of code, takes a minute on my (slow) laptop. The goal is to compile large projects, so this needs to be much faster.
  • Version 0.4: Optimize the generated code for speed. Some radical solutions for making it faster are possible, but might take a lot of time, so maybe post-1.0. But some straightforward optimizations should be done in the near future.
  • Version 0.5: Tools and integration. Make it easy to build multi-file projects, and to connect the generated code to web JavaScript (calling functions both ways).
  • ???
  • 1.0!