Monday, December 5, 2011

Emscripten stuff on other blog

As mentioned previously, I've moved to working fulltime on Emscripten now. So Emscripten-related blogposts will now be on my other blog. (You can also follow me on twitter.)

I'll use this blog for Syntensity-specific stuff, that is, about porting Sauerbraten and/or Syntensity which is based on it to the web. I have not had any progress to report on that recently, since I am completely blocked on OpenGL issues: I can compile the C++ to JS, but I can't convert the OpenGL code to WebGL. I've asked around for help but so far no luck, hopefully it will happen though.

Thursday, November 10, 2011

Emscripten Updates

Lots of stuff has happened with Emscripten, I haven't blogged because I've been too busy. Here are some updates:
  • Emscripten was used to compile the Android H264 codec to JavaScript, in a project called Broadway. (live demo)
  • I gave a talk about Emscripten at SPLASH 2011 in Portland. (slides)
  • I gave a talk about Emscripten at JSConf.EU in Berlin. (slides)
  • Performance of the generated code has been improving due to progress in the relevant projects (JS engines, Emscripten and Closure Compiler). Some numbers appear in the slides linked to above (the upper link is more recent).
  • Bundled headers. This makes it easier to use Emscripten on non-Linux platforms (Linux being the platform that most development is done on) and in a portable way that does not depend on your local system headers.
  • Some library bugfixes that resolved almost all the open issues on speak.js, the Emscripten port of eSpeak to JavaScript which lets you do text-to-speech on the web.
  • j2k.js, a port of OpenJPEG to JavaScript with a nice API, letting you decode JPEG2000 images on the web. This might be helpful with pdf.js.
  • Support for LLVM svn (soon to be 3.0). Note that revision 141881 is known to work, others should but not necessarily.
  • Many other improvements and bugfixes to Emscripten. I should probably formally release a 2.0 version, but I can't seem to decide when.
  • Finally, I am now working fulltime on Emscripten and Emscripten-related things (at Mozilla, where I already worked but on other stuff before). So progress on Emscripten will be faster :)

Sunday, October 9, 2011

llvm-svn branch has been merged

The llvm-svn branch of Emscripten has been merged to master, in preparation for Emscripten 2.0, after all tests have been fixed and all speed regressions resolved. If you are currently using master, the consequences of that are:
  • You should use LLVM's svn (soon to be 3.0). LLVM 2.9 might still work, but it isn't guaranteed. Also, LLVM 3.0 is deprecating llvm-gcc, so Emscripten no longer uses that in its tests (as with 2.9, it might still work, but it might not). clang in 3.0 is much improved and is able to compile much more code than 2.9, so llvm-gcc is less necessary; there is also dragonegg which combines LLVM and GCC in a different manner, but its website says it is not mature yet.
  • Emscripten now uses its own header files, not your system headers. That means that Emscripten should now work on all platforms exactly the same. However, if you were using Emscripten to compile something that relied on your system headers, you might need to change how your project is built (that is, tell it to use those headers and not just Emscripten's bundled ones in system/include). Note that if you do not use the bundled headers, you will probably need to use the -H flag with emscripten.py, which tells it what headers to parse for constants (library.js needs to be aware of constants in your library headers, so that it is synchronized with them).
Please report bugs if you find them. If there are no show-stoppers, Emscripten 2.0 will be released soon.

Saturday, September 24, 2011

Road to Emscripten 2.0

Lots and lots of work has been taking place on emscripten (the LLVM-to-JS compiler). I haven't been breaking things out into smaller releases, instead, there will be a 2.0 release in the near future (last release was 1.5). The remaining issues before that are:
  • Fix any remaining regressions in the llvm-svn branch compared to the master branch, and merge llvm-svn to mastert. The llvm-svn branch uses LLVM's svn, which will soon become 3.0. Some of the code changes in LLVM have hurt our generated code, however most of the issues are now fixed. The latest update is a fix for exception handling which has led to a 5% smaller ammo.js build (compared to 2.9), with no speed decrease :)
  • Bundle headers with emscripten. As more people have begun to use emscripten, we have been seeing more issues with platform-specific problems, almost all due to using different system headers (for example, issues #82 and #85 on github). Bundling working headers will fix that, in a similar way as to how SDKs and NDKs typically bundle a complete build environment. Currently the plan is to use the newlib headers for libc and start from there.
As mentioned in a previous blog post, emscripten 2.0 will require LLVM 3.0, and will no longer officially support the deprecated llvm-gcc compiler (it might still work though). This is a significant change which may affect projects using emscripten, please let me know if you are aware of any issues there.

Note also that I will be merging llvm-svn to master before LLVM 3.0 goes stable. That means that you will need to build LLVM from source at that time since there won't be LLVM binaries. Of course, you will still be able to use a previous revision of emscripten, that works with LLVM 2.9, with no issues.

Friday, September 9, 2011

LLVM 3.0, llvm-svn Branch

LLVM 3.0 will probably be released in about a month. In preparation for that, I've gotten emscripten to work properly with LLVM svn in the llvm-svn branch.

As in the past, I intend to only support one version of LLVM at a time, since it takes too much effort to do any more - our automatic tests already require several hours, doubling that for another LLVM version is a huge burden.

With LLVM 3.0 it looks like llvm-gcc is pretty much obsoleted. It isn't being developed much, and remains on gcc 4.2 (I am guessing due to Apple's aversion to the GPL3?). There is Dragonegg, which is a plugin for recent GCC versions which uses LLVM as the backend, however as of 2.9 Dragonegg is not considered mature. I am not sure if 3.0 will be sufficiently stable or not.

So, the question is what compilers to use with LLVM 3.0. Clang goes without saying. The good news is that Clang can finally build all of the source code in the emscripten automatic tests which is very nice (although I did need to file a bug last week for libc++ - which is kind of ironic considering it's an LLVM project ;) but kudos to the LLVM people for the quick fix). As for other compilers, llvm-gcc seems of little importance if Clang can compile the same code, since llvm-gcc is deprecated. Dragonegg is interesting but not sure it makes sense to use it before it is fully ready - we might end up wasting a lot of time on bugs.

So my current plan is to move to a single compiler, Clang, in the emscripten test suite. That is what is currently done in the llvm-svn branch. The main risk here is of code that gcc can compile but Clang cannot. Is anyone aware of any significant cases of that? In particular I am curious about Python, I have not tried to compile it with Clang yet (the emscripten test suite has a prebuilt .ll file - we should fix that).

If no one raises any concerns about this plan, I'll merge the llvm-svn branch into master in the near future.

Thursday, September 8, 2011

The VTable Customization Hack

Recently my main focus in emscripten (the LLVM-to-JavaScript compiler) has been on the bindings generator: A tool to make it easy to use C++ code from within JavaScript. Why is this needed? Well, assume you have some C++ class,

class MyClass {
public:
  MyClass();
  virtual void doSomething();
};

The bindings generator will autogenerate bindings code so that you can do the following from JavaScript:

var inst = new MyClass;
inst.doSomething();

In other words, use that class from JavaScript almost as if it was a native JavaScript class.

Turns out that really doing this is not easy to do ;) One issue is callbacks from C++ into JavaScript: Imagine that you compiled some C++ library into JavaScript, and at some point the C++ code will expect to receive an object on which is a virtual function, which it will call. The virtual function is a common design pattern where you can basically get a callback to your own code. Typically you would create a new subclass, implement that virtual function, create an instance, and pass it to the library. That function will then be called when needed from the library.

Why is this difficult when mixing C++ and JavaScript? The main issue is that in C++ you would be creating those new classes and functions at compile time. But in JavaScript you are doing it at runtime. Creating a new class at runtime is not simple, but it was one option I considered. However compilation speed was too much of a concern. Instead, I went for a vtable customization approach.

The vtable of a class is a list of addresses to its virtual functions. Virtual functions at runtime work as follows: The code goes to the vtable, and to the proper index into it, loads the address, and calls that function. So by replacing the vtable you can change what gets called. However this still turned out to be fairly difficult. The reason is that the bindings code gets you into this situation:

// 1: Original C++ codevoid MyClass::doSomething();

// 2: Autogenerated C++ bindings code
void emscripten_bind_MyClass_doSomething(MyClass *self)
{ self->doSomething(); }


// C++/JS barrier

// 3: Autogenerated JS bindings code

MyClass.prototype.doSomething = function() {

  _emscripten_bind_MyClass_doSomething(this.ptr);
};

// 4: Handwritten JS code
myClassInstance.doSomething();

The top layer is the original C++ code in the library you are compiling. Next is the generated C++ bindings code. This does almost nothing except for it being defined as "extern C", so that there is no C++ name mangling. Below that is the JS bindings code, which also seems fairly trivial here, but generally speaking it handles type conversions, object caching and a few other crucial things. Finally, at the bottom is the handwritten JS code you create yourself.

So, the idea of the vtable customization hack is to receive a concrete object, then copy and modify its vtable, replacing functions as desired. The replacements can be native, normal JS functions, and presto: Your C++ library is calling back into your handwritten JS code. However, how do you modify the vtable, exactly? When your handwritten code wants to modify it, what it specifies is code on the third level, something like this:

customizeVTable(myClassInstance, [{
  original: MyClass.prototype.doSomething,
  replacement: function() { print('hello world!') }
}]);

Here we want to replace doSomething with a custom JS function. But what appears in the vtable is not the third-layer function specified here. It isn't even the second-layer function! It's the first-layer one. How can you get to there, from here..?

A natural idea is to add something to the second layer,

// 2: Autogenerated C++ bindings code
void emscripten_bind_MyClass_doSomething(MyClass *self)
{ self->doSomething(); }
void *emscripten_bind_MyClass_doSomething_addr = &MyClass:doSomething;

- basically, have the address of the function in the bindings code. You can then read it at runtime and use that. But there are a few problems here. The first is that this code won't compile! The right-hand-side is a two-part pointer, consisting of a class and an representation of the function in the class. You can't convert that to void* (well, GCC will let you, but it won't work). Even if you do get around the compilation issue, though, you will be left with that representation of the function. I had hoped it was a simple offset into the vtable - but it isn't, at least not in Clang. After some mucking around with trying to figure out what in the world it was, I realized there was a better solution anyhow, because of the other reason that this approach is a bad idea: This approach forces you to add a lot of bindings code, a little for every single function. That's a lot of overhead, considering you will likely use that information for very few functions!

So instead, I arrived at the following hack:
  • Add a terminating 0 to all vtables at compile time. (This adds some overhead, but there is one vtable per class, and it's just one 32-bit value for each).
  • Copy the object's vtable.
  • Replace all the vtable elements with 'canary functions', that report back to you with their index in the vtable.
  • Call the function you want to replace, through the third-layer function you have available in JavaScript.
  • Since you replaced the entire vtable, you end up calling one of those. The canary function then reports back by setting a value. That value is the index of the function you want to replace in the vtable.
  • Copy the vtable again, this time the only modification is to replace the function at the index that you just found with the replacement function you want run instead.
  • (There are some additional complications, for example due to how emscripten handles C++ function pointers in JavaScript - pointers to functions are just integers, like all pointers, so there is a lookup table to map them to actual JS functions. Another issue is that the third-layer JS bindings code will try to convert types, and if you pass it the wrong things it will fail, so calling the canaries must be done very carefully. But the description above is the main idea.)
This ends up working properly. You can see the code in tools/bindings_generator.js (search for customizeVTable), and you can see it used in the latest version of ammo.js (the README there has been updated with documentation for it).


Thursday, August 11, 2011

Rewritten Physics Engine Demo

Initial testing of ammo.js (a port of Bullet Physics to JavaScript using Emscripten) found some issues, but they have been quickly resolved. ammo.js should be ready for use now.

Completing that allowed me to rewrite the original Emscripten Bullet demo using ammo.js. That is, the original demo code - creating the scene and so forth - was written in C++, and was compiled alongside Bullet into JavaScript for the original demo. What I did now was to write the scene generating code in JavaScript, where it uses Bullet through ammo.js's autogenerated bindings. Check out the demo here, and read the JavaScript embedded in the HTML file to see a complete example of using ammo.js.

I'm very happy with the result: The JavaScript code in the demo is very nice to work with now, and in addition it outperforms the original demo due to build system improvements that were completed since the original demo was finished.