The Pain of Debugging WebAssembly
If you know anything about WebAssembly (WASM), it’s probably that WASM lets you execute code compiled from languages such as C, C++, Rust, or others in the browser at almost native speeds. You might be less familiar with the fact that WASM is not only an interesting technology in the browser, but also in other environments that require fast sandboxing. As such, WASM has found some popularity with edge computing and as a lightweight docker replacement for certain situations.
The latter is enabled by a standard called WASI, which provides a platform-independent abstraction to interface with the operating system for basic input/output, file system access, getting the current time, and other such things. WASI is implemented by runtimes such as wasmtime or wasmer.
WASM can do some awesome things. But what’s not awesome? Debugging with WASM.
Many of us are working on complex systems where it’s increasingly difficult to reproduce real bugs. As developers, we understand this challenge. That’s why we made sure you can use Sentry’s source maps to find stack traces that reliably show what your real stack trace looked like — even on a minimized JavaScript build. And when it comes to WASM, we want to have a reliable stack trace with reliable function names, line numbers and file names in production, so we can send this to a system like Sentry. But this is where we’re running into limitations at the moment.
Current Debug Gripes
WASM at its core is quite different from many systems that native languages normally deal with. It’s a stack machine! Functions are not “addressable” in the same memory space as the memory we are working with. This “novel” concept has advantages, but much of WASM’s existing tooling is not built for our current reality.
Stack Unwinding
Let’s start with the basics: to get a stack trace we need to unwind the stack. In the world of native code, this is typically done with libraries like libunwind — which implements the platform’s unwinding scheme. For our needs, we just care about the return addresses for a function and nothing more.
There are generally two ways to unwind. The first is to dump the entire stack memory with registers into a memory dump (like a minidump), then unwind after the fact. The second is to capture the registers and unwind in the running program. The latter is also necessary when C++ exceptions or Rust panics are thrown. These exceptions or panics already tell us that stack unwinding is not just needed to build a fancy stack trace, but also to execute destructors in the presence of exceptions.
At present, WASM does not support stack unwinding. Though it seems like a pretty severe limitation, it turns out not to be a huge issue (at least for the browser JavaScript case). It turns out that because of WASM’s stack-based nature, the WASM function calls are visible within the JavaScript stack trace. This means that if a JavaScript function calls WASM and WASM calls JavaScript, we can observe the WASM frames from JavaScript. So as a fancy workaround, we can create a JavaScript exception object and then parse its stack trace.
DWARF
DWARF is our favorite debug standard, mostly because it’s the only one that is thought through and documented. (The other being source maps and PDB, which is so bizarre that we might have some material for future content).
While DWARF is great, it just doesn’t really work with WASM yet — it’s supported by at least Chrome for debugging WASM, but there are remaining issues.
DWARF works by embedding sections with DWARF debug data in an executable or object file. Because WASM is an extensible object format, it’s absolutely possible to embed DWARF in it. However, we already mentioned earlier that WASM is stack-based and that functions are not in memory. It works because there is a DWARF for WASM spec and it says that code addresses are byte offsets in the code section of the WASM file.
Cool. Except a lot of WASM tools don’t account for things like byte offsets. For good reason, because WASM actually has two formats: a binary format and WAST, which is a text representation. Within DWARF, these two formats are no longer interchangeable in the presence of debug information, as otherwise the offsets would require rewriting. Bummer.
DWARF Splitting
So alright, DWARF kind of works. But let’s think about how DWARF works. DWARF debug data is embedded right in the WASM file. This is a problem for two reasons: one is that most people don’t want anyone to easily decompile their code and see the file names from their build machines or other metadata.
But even if you’re an open source lover who doesn’t mind debug information living on the internet, you still wouldn’t want to embed the debug info in the WASM file. At present, DWARF data for WASM is humongous and even in the most optimistic future, WASM debug data will still be an order of magnitude larger than the main WASM file.
This is where the idea of splitting DWARF data from the main executable comes into play. On macOS, these are called .dSYM
files — but really this works everywhere, including with WASM. You can, in fact, already split a WASM file into code and debug data just fine. The debug data can live in a non-executable and non-functional WASM file that only contains the debug information.
The problem is that after splitting these two things, it’s difficult to link them back together. The current DWARF spec proposes to embed a reference to a downloadable debug file in the main WASM file as a custom section. This is basically how source maps work, but not how native debugging normally works. One of the brilliant ideas popularized by both Apple and Microsoft was giving debug and executable files globally unique debug IDs. With that knowledge, you can link a debug file and a code file together.
From what I can tell, nobody supports DWARF splitting yet. The only spec floating around uses URL references, which are a worse user experience than debug IDs.
Debug IDs
As you can see from the last section, you can’t say “DWARF splitting” without talking about debug IDs. They also go by other monikers: Build ID, Debug ID, Code ID — all names that more or less communicate a unique ID for our debug data.
Debug IDs are useful because they let you connect the debug file to the right WASM file. If I were to connect the wrong debug data with your file, then it’s a classic example of two unrelated pieces of good data in, garbage out.
I proposed such an extension for WASM on the tool conventions issue tracker, but so far it hasn’t gone anywhere.
What makes debug IDs so wonderful is that they are so nice to use. Your toolchain will just emit those files and you can put them on a symbol server, where debuggers can download the binaries and debug data. (As a side note, Sentry also loves debug IDs — even if you don’t use a symbol server, you can upload these files conveniently to your organization on Sentry and it will work.
Source Maps
The WASM source map format is completely inadequate (you can’t get function names, can’t access variables in a debugger, can’t get scope information, etc.) even for JavaScript debugging, but it has found its way into the WASM world as well. I think it would be better for us to collectively forget that they exist in a WASM world, but at the moment they’re the only thing that is somewhat widely supported. They’re not great, cannot map function names, and only show where an instruction points in the text assembly version of WASM.
Stack Trace Information
Let’s imagine all of the above works. We’re still not out of the woods yet. WASM rarely comes alone. At the very least, it comes with its friend JavaScript, but not uncommonly, it also brings other WASM modules. Though WASM modules are self-contained, they can export and import functions. When you have a stack trace involving WASM, a file name encodes WASM location information and looks something like this: ${url}:wasm-function[${funcIndex}]:${pcOffset}
. It tells us the function index and the offset within that. Unfortunately, you can’t know where to look up the function index or tell loaded WASM modules apart, because two different functions from two different modules can have the same funcIndex
.
Debugging is Still a Niche
This is the meta-gripe. It seems like everybody is super excited about Web Assembly, but barely anyone is excited about making sure it’s debuggable. There are great folks working on this of course, but it’s a relatively tight-knit community. If you’d like to participate or follow along check out any of the following projects: