How to get Bazel and Emscripten to compile C++ to WebAssembly or JavaScript
In my quest to generate a re-usable WebAssembly template, I discovered many ways that appear easy to implement, but donāt really work in applications beyond a simple āhello worldā. If youāre interested in building slightly more complex WebAssembly modules, this article is for you.
Ultimately, Iām looking to compile a nice C/C++ library to the JavaScript domain ā Iām not looking to build a specific one-off functions, I want the entire library support (or at least the majority). Additionally, I want this to run in the browser as well as NodeJS. I donāt want to deal with instantiating the WASM or manage its memory across these environments either. These requirements mean I can rule out several alternativesā¦
Whatās out there?
There are many starting points and many tutorials on how to get started using C/C++ with JavaScript. Here are a few tools you would find in the wild:
- WebAssembly Text FormatĀ (WAT)
- Node AddonsĀ (N-API)
- LLVM
- Wasmer
- Cheerp
- Emscripten
Some of these tools are not exactly what I wantā¦
WATĀ ā requires a lot of low-level work and only makes sense for simple one-off functions. ššš
N-APIĀ ā is not much better. Iām still writing a lot of bindings, need to worry about node-gyp, and it may not work in the browser. šš
LLVMĀ ā allows us to directly compile C/C++ to WASM! Unfortunately, this is still a low-level job that requires me to performĀ a lot of extra stepsĀ just to get it working. š
WasmerĀ āĀ actually looks great! Theyāre a relatively new player and support lots of integrations. Even their builds are relatively lean! Unfortunately, they stillĀ require a lot of glueĀ to the native C/C++ code which is not really pleasant for larger projects. However, they are working on better integration support and are moving rather fast. š
CheerpĀ ā another great tool. Theyāre similar to emscripten, but have a differentĀ memory modelĀ that allows for automated garbage collection. The performance is quite similar, often beating emscripten in special cases. However, the community support is not quite as large and I found myself getting stuck. Iāll keep these guys on the radar. šš
EmscriptenĀ ā just right. Integration with C++ is made extremely easy by usingĀ embind. I can pass non-primitive types between both domains (C++ only). They have a larger community presence. They can output into a format that is relatively straight forward to use in the browser or NodeJS with ease. ššš
Getting started
Iāll showcase a simple āhello worldā C++ application that we will convert to WebAssembly.
How do I convert an existing C++ library?
This is the crux of it all. Every toolchain has some initial difficulties setting up and Iām often left scratching my head on where to even start. No one wants to manually invokeĀ gccĀ so we built scripts such asĀ
configure
,Ā make
, orĀ cmake
Ā to automate the build process ā great!ā¦except, not š
Sometimes Iāve needed to hack the existing make/cmake rules to avoid dependencies on shared libraries, ignore some intrinsics checks, etc. This obviously doesnāt play nice with a centralized C++ code base that attempts to build bindings for many languages. So what are our options?
BazelĀ š
While this build system can be quite daunting, it is actuallyĀ veryĀ powerful. Unfortunately, thereās just not that much documentation to learn to use it with emscripten. In fact ā their docs areĀ broken, moreĀ broken, andĀ maybe not even supported.
I argue that it can be done decently well ā even the reputable TensorFlow.js team has managed to get it working! So what was so difficult? What makes it so special?
After converting several libraries to WebAssembly, I can tell you that the isolation Bazel offers is quite nice ā no horrible breaking changes when a cmake script has been modified. No more complex logic determining the target to build, etc. Once defined it will almost alwaysĀ just work.
First steps
Install Bazel. You will also needĀ yarnĀ to install the dev dependencies.
Fast forward a bit,Ā here is the github repoĀ so you can follow along.
Note: Iāve taken a lot of inspiration from theĀ TensorFlow.jsĀ project on how they managed to get it working. My changes revolve around compiler/linker flags, showing how to output both JS and WASM, and most important ā using theĀ latestĀ emscripten release š!
git clone --recurse-submodules https://github.com/s0l0ist/bazel-emscripten.git
cd bazel-emscripten
yarn install
Iāve taken the liberty to include the emsdk as a git submodule instead of managing it yourself. The first step is to get the emsdk cloned. If youāve cloned my repo recursively, you can skip this step:
yarn submodule:update
Next, we need to update the release tags and then install the latest version of emscripten:
yarn em:update
yarn em:init
Done š!
The layout
Some important files and directories:
Ā ā describes default commands for building a target.bazelrc
Ā ā defines our external dependenciesWORKSPACE
Some files insideĀ
hello-world/
:
Ā ā empty file so bazel doesnāt complainBUILD
Ā ā bazel toolchain dependencies (emsdk)deps.bzl
A few directories inĀ
hello-world/
:
Ā ā holds the simple C++ sourcescpp/
Ā ā holds all JS related materialjavascript/
Ā ā holds all emscripten bindingsjavascript/bindings/
Ā ā holds all JS wrappersjavascript/src/
Ā ā the handy build scripts to shorten ourĀ cliĀ statementsjavascript/scripts
Ā ā the heart of the Bazel + Emscripten configurationjavascript/toolchain
The rest is self explanatory.
The code š»
Iāve outlined a very simple library containingĀ GreetĀ andĀ LocalTimeĀ classes that have static methods for this example:
LocalTimeĀ class:
//////// cpp/localtime.hpp ////////
#ifndef LIB_LOCAL_TIME_H_
#define LIB_LOCAL_TIME_H_
namespace HelloWorld {
class LocalTime {
public:
/*
* Prints the current time to stdout
*/
static void Now();
};
} // namespace HelloWorld
#endif
//////// cpp/localtime.cpp ////////
#include <ctime>
#include <stdio.h>
#include "localtime.hpp"
namespace HelloWorld {
void LocalTime::Now() {
std::time_t result = std::time(nullptr);
printf("%s", std::asctime(std::localtime(&result)));
}
} // namespace HelloWorld
GreetĀ class:
//////// cpp/greet.hpp ////////
#ifndef LIB_GREET_H_
#define LIB_GREET_H_
#include <string>
namespace HelloWorld {
class Greet {
public:
/*
* Greets the name
*/
static std::string SayHello(const std::string &name);
};
} // namespace HelloWorld
#endif
//////// cpp/greet.cpp ////////
#include <string>
#include "greet.hpp"
namespace HelloWorld {
std::string Greet::SayHello(const std::string &name) {
return "Hello, " + name + "!";
}
} // namespace HelloWorld
Emscripten bindings š¦¾
The bindings are quite short for our example. We make use of the powerfulĀ embindĀ which lets us talk to C++ classes.
You may notice thatĀ
LocalTime::Now
Ā outputs directly toĀ stdout. Emscripten is intelligent enough to redirect our output to
console.log
Ā so we donāt need to do anything else š. Greet::SayHello
Ā returns a primitiveĀ stringĀ that we will manually need to send toĀ console.log
.//////// javascript/bindings/hello-world.cpp ////////
#include <emscripten/bind.h>
#include "hello-world/cpp/greet.hpp"
#include "hello-world/cpp/localtime.hpp"
using namespace emscripten;
using namespace HelloWorld;
EMSCRIPTEN_BINDINGS(Hello_World) {
class_<Greet>("Greet")
.constructor<>()
.class_function("SayHello", &Greet::SayHello);
class_<LocalTime>("LocalTime")
.constructor<>()
.class_function("Now", &LocalTime::Now);
}
Now that weāve defined our bindings, weāre ready to build!
Building š
You may build the native libraries, but theyāre quite useless by themselvesā¦
bazel build -c opt //hello-world/cpp/...
Iāve configured theĀ
.bazelrc
Ā file to build the with two different options:Ā JSĀ orĀ WASM.JSĀ ā Specifies flags to emscripten to output a singleĀ asmjsĀ file that doesĀ notĀ contain any WebAssembly. This is useful for environments that canāt work with WebAssembly such as React-Native, but is significantly larger and slower.
WASMĀ ā Specifies flags to emscripten to output a singleĀ JavaScriptĀ file containing the WebAssembly as aĀ base64Ā encoded string. This means we donāt need to manage a separateĀ
.wasm
Ā file in our bundles or figure out how to properly serve this file in the browser. The drawback is a larger file size due to the base64 encoding.To make it simple, Iāve created some helper scripts so all you need to do is run the following:
yarn build:js
// or
yarn build:wasm
// or both
yarn build
There are some good and bad things about using emscripten here:
Good: It generates glue code for you automatically.
Bad: It generates glue code for you automatically.
Obviously, the glue code adds some bloat but keeps me from having to deal with the intricacies of initialization š.
Note: InĀĀ there are a few defined compiler flags that are present for both the JS/WASM builds geared towards production use. You may feel free to modify the flags as necessary, but I wanted to show whatās possible here..bazelrc
If youĀ doĀ want to have full control over instantiating the WASM to reduce the bundle size, you may generate aĀ pureĀ WASM build by adding the link flagĀĀ inside the starlark file,Ā-s STANDALONE_WASM=1
.hello-world/javascript/BUILD
Bundling š¦
You may have seen theĀ
javascript/src/implementation
Ā files which wrap the emscripten output. Do we really need these files? āĀ no, you donāt. However, I like my APIs to be abstracted from the output of emscripten. This allows for more flexibility when there are potentially breaking changes to the C++ core.An important thing to note is that the outputs are quite a bit larger than you would expect. AĀ bigĀ reason for some people is that some code requiresĀ
<iostream>
Ā whereĀ a lot of code is pulled in for static constructors to initialize the iostream system even if it is not usedĀ ā but our builds donāt have this problem. Then there is the glue code auto-generated to manage initialization and provide helpers for memory allocation, resizing, and the like.Generate the bundles
yarn rollup
This gathers the files inĀ
hello-world/javascript/bin/*
,Ā hello-world/javascript/src/*
Ā and produces a few output bundles inĀ hello-world/javascript/dist/
.You will notice two minified bundles forĀ
js
Ā and forĀ wasm
Ā that each have two different targets forĀ ES6 module
Ā support orĀ UMD
Ā (for browser and NodeJS) inĀ hello-world/javascript/dist/<js|wasm>/<es|umd>/*
.Details of the rollup configuration are inĀ rollup.config.js.
Letās run š
So weāve compiled our C++ to JS and WASM ā whatās next?
Run theĀ JSĀ bundle inĀ NodeJS:Ā
yarn demo:js
Or run theĀ WASMĀ bundle inĀ NodeJS:Ā
yarn demo:wasm
Or openĀ
javascript/html/index_wasm.html
Ā to run the WASM bundle in theĀ browser:Conclusion
By spending a little time with bazel, you can create a nice build system that works for many languages without breaking your other targets.
We can now drive a core C++ application with bindings in several different languages all while simplifying the interoperability between them.
Stay tuned forĀ part 2Ā where I show aĀ realĀ C++ library converted to JS and WASM!
Hope you enjoyed and thanks for reading!
Credits
- schoppmp for help with optimizing the bazel configuration
- TensorFlow.js for the initial bazel configuration