Skip to main content

WASI WebGPU Demo, Train Release Model, HTTP Reuse & NATS Interface Proposal

Key Takeaways
Chapters
Meeting Notes
WebAssembly News and Updates
What is wasmCloud?
Topic Deep Dive: WebAssembly Component Model and Composition
Who Should Watch This
Up Next
Get Involved
Full Transcript

Watch on YouTube ↗

The April 22, 2026 wasmCloud community call opens with Colin Murphy's WASI WebGPU demo running an Adobe TrustMark watermarking model as a Wasm component — about 20% faster than the CPU path and the first end-to-end WebGPU-from-a-component demo on wasmCloud. Bailey Hayes then walks through the proposed two-week automated train release model, micro-benchmark results showing a 5x throughput win from Wasmtime's HTTP reuse path, and the plan to push WebAssembly component composition into the workload deploy step. Aditya opens up his proposed first-class NATS WIT interface, which leads into a long, deep discussion with Yordis Prieto and Frank Schaffa about specifying capabilities, non-functional behavior, idempotency, and how Protobuf could become a strict subset of WIT for event sourcing.

Key Takeaways

First end-to-end WASI WebGPU + components demo on wasmCloud — Colin Murphy ran Adobe's TrustMark watermarking model through ONNX Runtime as a WebAssembly component, getting ~20% throughput improvement vs. CPU and proving the architecture works with wash dev today
Two-week automated train release model proposed — Tuesday releases every two weeks, with hot-fix releases preserved out-of-band; the goal is predictable cadence and never letting release automation rot
HTTP reuse delivers a 5x throughput improvement — new Wasmtime proxy_handler_reuse feature keeps the store alive between invocations, eliminating most per-request allocation cost on the hot path
Component composition is moving into workload deploy — Bailey is composing the stateless components in a workload into a single .wasm ahead of time so Cranelift can optimize across interface boundaries; service components keep their own lifetime for connection pools and state
Runtime instantiation is the long-term answer — Bailey traced Luke Wagner's queue of component-model features (child handles → runtime instantiation → reentrancy) that will eventually let one component instantiate another lazily, without the host doing the linking
NATS WIT interface proposal opens for review — Aditya proposed first-class WIT for NATS core, KV, and JetStream rather than forcing NATS through wasmcloud-messaging; the design principle is to be the lingua franca of NATS users, not the lowest common denominator
Capabilities and non-functional requirements need to be in the spec, not the implementation — Yordis argued that idempotency, batch limits, geo-replication, and failure modes belong in the WIT contract (or in named "worlds" that profile them), not in undocumented provider behavior
Glibc host binaries are coming for WebGPU and ONNX — the team will start producing glibc-based release binaries so WebGPU/ONNX Runtime can run in Docker host images without special-casing the musl builds

Chapters

Meeting Notes

Colin Murphy's WASI WebGPU Demo

Colin Murphy opened the call with a long-running side project: getting compute-heavy WebAssembly workloads to use the GPU through WASI WebGPU. His original motivation came from work on fast image resizing in Rust, where SIMD instructions like NEON or AVX-512 give large speedups but can't be relied on inside a portable Wasm component — you don't know what the host has until you're running.

WebGPU, modeled after Vulkan and Metal rather than the older OpenGL-derived WebGL, gives Wasm components a single architecture-independent interface for compute shaders. Colin used Adobe's open-source TrustMark watermarking model running on ONNX Runtime to demonstrate the architecture end-to-end: a single Wasm component that calls into wasi-gfx (Mendy Berger's WASI GFX host bindings) to push tensors to the GPU and run inference on the host side. The watermarking ran roughly 20% faster than the CPU-only path, and Colin emphasized that the more meaningful result was that it worked at all as a component — not just as a CLI or a p1 module like his prior attempts.

Frank Schaffa and Dan Phillips asked about p3 gains, async pre-processing, and how the runtime discovers available hardware. Colin's view: most of the wins from cooperative threads and p3 are in pre/post-processing (image manipulation, tensor reshaping), not in the GPU-bound critical path. Bailey added that wasmCloud is the only runtime today that ships both wasi-http and wasi-webgpu; she pointed at the wgpu Rust crate as a more ergonomic alternative to C++ bindings for guest authors.

The team identified a near-term blocker: wasmCloud's release artifacts ship as musl binaries, but ONNX Runtime and WebGPU work much better against glibc. The fix is to add a glibc-based release pipeline alongside the existing musl one and switch the Docker host images to use those binaries — keeping WebGPU as a feature-flagged build path rather than a separate fork.

Two-Week Train Release Model

Bailey proposed an automated two-week train release cadence — Tuesdays specifically (Mondays are still warming up, "no-ship Fridays," and Tuesday gives a clean work week to deal with anything that goes sideways). Hot-fix releases stay out-of-band so urgent patches aren't blocked. The bigger win is preventing release automation from rotting: by always shipping on schedule, the path stays exercised and predictable for downstream consumers. Today the only manual step is version bumping; the rest of the release pipeline is already automated.

HTTP Reuse, Composition, and the Workload Deploy

Bailey walked through her HTTP micro-benchmark results. With Wasmtime's new proxy_handler_reuse and a long-lived store, the p3 hot path shows roughly a 5x throughput improvement over the previous instantiate-per-request model. Most of the remaining cost between wasmCloud and raw Wasmtime is structural — wasmCloud has to plumb plugin calls and service routing on every request — and the bench is now sensitive enough to detect regressions, like a dropped-lock bug she found while running it.

The harder change is what wasmCloud does with a workload deploy that has multiple components. Today, wasmCloud links components together at the host layer, which works but pushes the host store across async runtime boundaries and creates real headaches for HTTP reuse. The clean answer is WebAssembly component composition ahead of time: pre-compose all the stateless components in a workload deploy into a single .wasm, pay the composition cost once, and let Cranelift optimize across interface boundaries (lifting/lowering elimination, mem-copy for shared string representations).

Service components — the long-lived parts that hold connection pools or other state — keep their own lifetime and link to the composed unit. The trade-off is that everything in the composed unit instantiates together; Bailey's argument is that most users today aren't yet running workloads dense enough for this to matter, and those who are have an easy workaround (split into multiple workload deploys, talk over HTTP).

The ideal long-term answer is runtime instantiation — Luke Wagner's planned component-model feature where one component can lazily instantiate another at runtime with full type safety. Fastly and other CDNs want this to scale CRUD-style sub-handlers to zero. It depends on "child handles" (scoped child callbacks needed for the browser component-model implementation), which itself depends on threading primitives being expressed through canonical built-ins rather than core Wasm. Cooperative threads progress is unblocking that whole queue. Reentrancy is still an open design problem Dan Phillips and Sebastien Guillemot both flagged.

Benchmarking Infrastructure

Bailey demoed the Criterion-based HTTP invocation benchmarks she just landed and explained why she's adding Valgrind-based IOPS counts alongside them. Criterion gives wall-time numbers users intuitively understand ("requests per second") but picks up noise from the OS and async runtime. Valgrind/Cachegrind-style instruction counting is noisier in interpretation but much more deterministic — a regression in instructions retired is hard to dispute. She's looking for dedicated bare-metal hardware (Hetzner is the leading candidate now that Equinix Metal is gone) so benchmarks stay reproducible. Frank offered mini-PCs if that helps. She also surfaced that copy-on-write memory init — a default Wasmtime feature on Linux only — is one of the things making Linux benchmarks materially different from her MacBook runs.

Aditya's NATS WIT Interface Proposal

Aditya brought a new proposal to add a first-class NATS WIT interface to wasmCloud, covering NATS core (pub/sub, KV) plus JetStream. The current situation splits NATS across two generic interfaces: wasmcloud-messaging for pub/sub and wasi-keyvalue for KV (with a NATS backend). The question is whether to expand that pattern or build NATS-native WIT.

Bailey came down firmly on the side of NATS-native: build the interface that NATS users would recognize and want, don't contort it to match wasmcloud-messaging. The portability argument — least common denominator interfaces let you swap implementations — applies to commodity stores, but NATS users are picking NATS for what NATS specifically does. If portability matters for a given consumer, the component model already gives you a clean tool: a virtualization component that adapts NATS-native calls to wasmcloud-messaging. That's a much better separation than forcing every NATS user through a lowest-common-denominator surface.

Capabilities, Non-Functional Requirements, and What WIT Doesn't Say

Yordis Prieto raised the critical follow-up: WIT alone doesn't communicate the expected behavior of an interface. Does put provide optimistic concurrency? Is it idempotent? What's the failure mode? NATS supports batch publishing — but with a 1,000-message limit. RabbitMQ and Kafka both implement queues and streams now, but the offset/log semantics still diverge in ways consumers care about. Without explicit semantics in the contract, every provider ends up with custom flags ("pass this for Kafka, pass that for NATS") and the portability story breaks down anyway.

Bailey agreed and proposed two complementary tools:

WIT documentation — wasi-filesystem is a good example. The doc-comments in the interface have been iterated repeatedly to tighten language around error conditions and platform-specific behavior (Windows being a frequent pain point).
Named worlds for behavior profiles — letting a producer declare not just "I implement KV" but "I implement KV with these specific guarantees" via a named world that profiles which subset of capabilities and which failure modes apply.

Frank Schaffa underlined the point: handling failure modes — and being explicit about expected guarantees — is what separates a POC from a product. Aditya committed to iterating on the proposal with these comments and bringing back an updated draft.

Protobuf, Event Sourcing, and Community

The conversation drifted (productively) into Yordis's longer-running effort to make Protobuf a strict subset of WIT so that any tool built on Protobuf — including gRPC and the storage layer of his own Trogan AI work — gets a complete WIT interface for free. Bailey pushed back on whether structured annotations might be needed for the more exotic parts of Protobuf (the form-style annotations especially), since WIT has resisted YOLO-annotation extensibility precisely to preserve type-level reasoning. Yordis's counter-position is that ecosystem chaos normalizes over time and the cost of locking things down too early is higher than the cost of letting people experiment.

Frank Schaffa added that even a minimum Protobuf subset as part of the WIT standard would unlock gRPC as a common binary interface across components, hosts, and the network — and you'd get encrypted payloads as a side benefit. The recent integration of WIT map types — driven heavily by Yordis — was held up as proof that this kind of multi-year, hard-won contribution path is possible.

WebAssembly News and Updates

This week's call connects directly to several active threads in the broader WebAssembly ecosystem. WASI WebGPU is gaining traction as the way to run GPU-accelerated AI inference inside sandboxed components, and Adobe's TrustMark demo is exactly the kind of real-world workload that exercises the spec end-to-end. Wasmtime's proxy_handler_reuse feature is meaningfully changing what serverless WebAssembly hosts can achieve in throughput. And the long-running component model queue — child handles, runtime instantiation, cooperative threads — continues to land features that wasmCloud and other host runtimes are eagerly waiting to consume.

What is wasmCloud?

wasmCloud is a CNCF project for building and running WebAssembly components anywhere — cloud, edge, or Kubernetes. It uses the WebAssembly component model so business logic written in Rust, Go, TypeScript, Python, or C# can be composed with capability providers for HTTP, messaging, KV, secrets, and now WebGPU through a uniform interface. Components are deployed as declarative workload specs, distributed via OCI artifacts, and observed through built-in OpenTelemetry. The platform runs as a Kubernetes operator or standalone, with a pluggable host architecture that supports both native plugins (for hardware access like WebGPU) and WebAssembly host components (for everything that can fit inside the sandbox).

Topic Deep Dive: WebAssembly Component Model and Composition

The composition work Bailey walked through is a real demonstration of why the WebAssembly component model matters in production. Today wasmCloud links components at the host layer — flexible, but it forces the host store across async runtime boundaries and prevents the runtime from optimizing across interface calls. Composing the components in a workload deploy into a single .wasm ahead of time lets Cranelift see the whole call graph: it can eliminate redundant lifting/lowering when two components share a string representation, do mem-copy where it would otherwise serialize, and apply standard optimizer passes across what used to be opaque interface boundaries. The trade-off — everything in the composed unit instantiates together — is real, but it's also a much better default than the alternative once you account for the 5x HTTP-reuse win that depends on a stable store. As runtime instantiation lands in the component model, the pattern evolves naturally: pre-compose what you want to keep hot, and let the runtime instantiate the rest lazily.

Who Should Watch This

ML and AI platform engineers evaluating GPU-accelerated WebAssembly should start with Colin's WebGPU demo at 02:39 and the glibc/ONNX next-steps discussion at 25:43. Runtime and platform implementers focused on throughput and component composition will want HTTP reuse and composition at 41:58 and the runtime instantiation walkthrough at 52:34. WIT interface designers and anyone working on event-driven WebAssembly should jump to Aditya's NATS proposal at 1:07:15 and Yordis's capabilities and semantics discussion at 1:12:19.

Up Next

The team will continue work on the glibc/ONNX release pipeline, finalize the two-week train release model, and break the HTTP reuse + composition changes into reviewable PRs. Aditya will iterate on the NATS WIT proposal incorporating Yordis's and Frank's feedback on non-functional requirements and behavior worlds. Bailey plans to revisit the WASI TLS pickup, the secrets work Jeremy is taking on, and progress toward Hetzner bare-metal benchmarking.

Get Involved

wasmCloud is a CNCF project and contributions are welcome. Join the community:

GitHub — star the repo and check out open issues
Slack — join the conversation
Community Meetings — every Wednesday at 1:00 PM ET
wasmCloud Blog — latest news and releases

Full Transcript

Read the complete transcript with speaker labels and timestamps:

Read the full transcript →