Reflections on Three Years of wasmCloud

wasmCloud Maintainer

June 25, 2022 · 7 min read

waxosuit

It's hard to believe that wasmCloud has been around for 3 years, and the inspiration and desire even longer.

It was a pleasant, cold winter's day in February of 2016. A friend of mine had sent me an inconspicuous link to an unassuming article. Surely this link would never send me down an inescapable rabbit hole from which I have yet to escape to this day. Surely.

The offending article was about this thing called the Cloud ABI. The short version of the story behind Cloud ABI is that people were frustrated with the lack of security around the native plugin model. They were also frustrated with the lack of performance of secure isolation mechanisms like bulky virtual machines that provided better security than native plugins. Further, the folks behind this movement were also disappointed with the way in which capabilities could be granted to application code. Cloud ABI was meant to be the solution to these things and more.

Fast forward a year and, at least as far as I could tell, the efforts around Cloud ABI had stagnated, making way for this new thing called WebAssembly. WebAssembly's 1.0 specification aimed to deal with the security, isolation, performance, and portability problems that Cloud ABI wanted to tackle. Eventually, WASI would be born of the evolution of the Cloud ABI ideas around granting POSIX-like capabilities to user code. I don't mean to imply that there is direct lineage from Cloud ABI to WebAssembly, but you can see relationship and shared inspiration.

I'd been inspired about the concepts from Cloud ABI since early 2016. I'd continued to follow the evolution of this idea through 2017 (WebAssembly was announced in 2015 and released in March of 2017) and into early 2018, where I dove head first into WebAssembly until I couldn't stand idly by any longer. I decided to write a book about it. I started writing Programming WebAssembly with Rust in an official capacity in June of 2018.

In the process of writing the book, in telling the "hero's journey" from beginner to expert in the WebAssembly ecosystem, I'd developed an itch that needed scratching. The book would come out in March of 2019, but it wasn't enough for me. I knew what WebAssembly would ultimately be capable of, and I had a choice. I could sit back and wait for the ecosystem to catch up to Wasm's potential, or I could roll up my sleeves and start building it myself. For those doing the math, my book came out just 3 months after WebAssembly became a W3C recommendation.

When I started building, the first question I asked, before anything else, was, "Can I embed security information into a WebAssembly module so that it doesn't require consulting a central authority to validate?". This was before "will it blend?" and long before I ever figured out what would come to be called the lattice. As a proof of concept, I found that I could place signed JSON Web Tokens (JWT) into a Wasm module without interfering with its execution. I could sign a module with a list of capability claims; a finite list of what this module was allowed to access. This stepping-off point ultimately led to the GitHub release of the first prototype of waxosuit in June of 2019, so named because we likened the framework to an exosuit into which you could place your WebAssembly modules. We subsequently renamed it to wasmCloud, but we will always have a place in our heart for the waxosuit logo (the header image for this blog).

I guess this is the part where I get to say that, back in my day, our Wasm runtimes had to walk uphill both ways in the snow, and we liked it. The first version of waxosuit was a standalone Rust binary (about 23MB on Linux) that could read a manifest file and use it to start actors and providers and configure them. It had no networking support outside of the providers--hosts were an island unto themselves.

The first version of the runtime was built with Rust's "heavy" threads and communication between them was done via channels. There was a background thread per actor, one per capability provider, and another used for dispatch between the two. Ultimately, this became too much spaghetti to maintain, and we refactored.

This refactor switched from "heavy" threads to using an actor-model crate for Rust called Actix. It made a lot of sense at the time--We were building an actor-based host runtime so an actor model for the internal plumbing felt harmonious. It did clean up the spaghetti from the heavy threads and we were able to see some performance improvements in function calls. However, Actix is based on Rust's async model, and carries all of the baggage that entails. The more we added features to this new runtime, the more difficult it became to develop. Ultimately, the only people who could touch certain parts of the codebase were those who knew the precise magical incantation of syntax to get the futures+Actix code to work.

This violated one of the core philosophies of wasmCloud. Every aspect of the codebase needed to be accessible to everyone. This new code was inscrutable even to those who wrote it. The key smell occurred when we were planning upcoming features and people visibly recoiled when we suggested large enhancements to the core Actix (async) plumbing.

It felt like we were rewriting large portions of Erlang's OTP in Rust. In fact, it felt so much like it that we made a huge decision to rewrite the host runtime in Elixir. We looked at where we were spending our time and it turned out to be the very same horrible 90/10 split that we're trying to eliminate with wasmCloud. We spent 90% of our time working on things that weren't core features like async and concurrency and thread-safe queue and dispatch management, and 10% actual features. The proposition was that if we "simply" built on top of OTP, we could stop rewriting OTP in Rust and instead focus on wasmCloud-specific features.

This was a huge undertaking. There is fantastic support for calling Rust functions from Elixir, so we knew we wouldn't need to completely rewrite all of our crates. We thought it was going to be an enormous task and, of course, it turned out to be twice as long and difficult as originally expected. Even with all of the effort and time we put into it, we still knew it had been worth it. Immediately our runtime was able to do more things with better and more clear concurrency.

The codebase got less intimidating, but it was still daunting because Elixir is new to a lot of people. However, we'd rather walk people through a functional programming forest than have them hack their way through a code jungle armed with nothing but a fork.

It grew easier to add new features and functionality to the host that were tolerant to crashes and ran concurrently while still consuming a small amount of resources. We learned a ton about running Elixir in production environments, but that's likely a topic for a whole series of blog posts on their own.

It's hard to imagine that it's been three years. Since we started, the community has grown, we've steadily increased the number of people who attend our weekly community calls, the slack regularly gets new members, and we routinely see contributions from new folks and are continually surprised by what people are building using wasmCloud.

With all this in our rear view mirror, the best is yet to come. With all that we've experienced and learned, we can clearly see what needs to be done to continually improve the developer experience so that wasmCloud will be the way to build secure, portable, distributed applications that can dynamically scale from monoliths to globally redundant services with the flip of a switch.