Skip to main content
← Back

ETL Pipelines on the Wasm Component Model with wasmCloud

The June 4, 2025 wasmCloud community call is a design review for a data-processing platform built on the Wasm component model. Mike Nikles walks through an architecture diagram for a drag-and-drop ETL pipeline platform built on wasmCloud, where each pipeline step composes trusted in/out components around an untrusted customer-authored business-logic component. Brooks Townsend recognizes the design as wasmCloud's platform harness pattern, and the group works through capability providers, HTTP-server density, and lattice-per-customer versus lattice-per-pipeline isolation. With many maintainers at a Cosmonic off-site doing roadmapping, this is a lighter, feedback-focused call.

Key Takeaways

  • Mike Nikles presented an architecture diagram for a drag-and-drop ETL pipeline platform built on wasmCloud, aimed at running data-processing tasks closer to the IoT machines that generate the signals rather than shipping everything back to the cloud
  • Each pipeline step composes a trusted in-component and out-component around a single untrusted, customer-authored business-logic component — bundled and linked into one deployable Wasm component at build time
  • Brooks Townsend identified the design as wasmCloud's platform harness pattern, the same approach used in Cosmonic's wasm pay demo, where the platform controls the ins and outs and the customer only has to conform to a tight interface
  • Steps within a step call each other directly, while step-to-step hops use messaging so each stage can scale independently and asynchronously
  • On HTTP density, Brooks recommended one HTTP server capability provider per lattice to start, and noted a host feature flag that runs the HTTP server in the same process as the host for higher density
  • Running one lattice per customer (rather than per pipeline) unlocks reusable sub-pipelines, since wasmCloud does not let separate lattices communicate by default — isolation is the key property
  • Colin Murphy noted that asynchronous directed-graph data processing has always been a strong wasmCloud use case that has received comparatively little attention
  • Many maintainers were at a Cosmonic off-site for planning and roadmapping, so this was a lighter, feedback-focused call; Mike plans to return with a live demo

Chapters

Meeting Notes

A Drag-and-Drop ETL Platform Built on wasmCloud

Brooks Townsend opened a lighter-than-usual call — most wasmCloud maintainers on the Cosmonic side were together at an off-site doing planning and roadmapping — and handed the floor to first-time presenter Mike Nikles. Mike described spending the previous couple of years building data pipelines for a crypto client (moving data off the blockchain and into downstream systems) before discovering WebAssembly components at the start of 2025 and, through them, wasmCloud. His current project is for an IoT company that wants to run computational tasks closer to the machines generating the signals rather than hauling everything back to a central cloud environment.

Mike came with an architecture diagram and a specific ask: a gut check on whether the design was sound before he sank more time into it. His goal was a platform where customers configure data sources and sinks through a drag-and-drop web UI (or YAML), reducing the work a customer has to do down to just the business logic in the middle.

Pipeline Architecture: In/Out Components Around Customer Logic

At a high level, Mike's design has two parts: a controller component (reachable via web UI or CLI) that starts, installs, and monitors pipelines, and the runtime itself — a source that kicks off step one, processing steps that can fan out, and a sink that lands the transformed data.

Zooming into a single step revealed the heart of the design. Each step is built from a trusted in-component (which knows how to receive and decompose incoming data — e.g. from Kafka or S3 — and handles tracing and monitoring), the customer-provided business-logic component, and a trusted out-component that knows how to call the next step. These are linked and bundled at build time into a single deployable Wasm component, so to wasmCloud it is just one portable artifact. Because a Wasm component is portable, the same workload can be deployed wherever the platform needs it.

The Platform Harness Pattern and Capability Providers

Brooks immediately recognized the shape: it is the platform harness pattern wasmCloud has used several times, most recently in Cosmonic's wasm pay demo. The idea is to wrap an untrusted component — code a customer wrote, or merely supplied — by controlling the ins and outs, exposing a tight interface the customer must conform to while the platform owns everything around it. This composition of trusted and untrusted code through typed interfaces is exactly what the Wasm component model is built for, and it lets the platform bound what customer code can reach.

Mike confirmed the model: the in-components call directly into the customer component, while step-to-step hops use messaging so stages can scale independently and asynchronously. He then raised a real operational question about capability providers: should he run one HTTP server per pipeline, or one per customer with path-based routing? Brooks advised starting with one HTTP server per lattice on the inbound, and pointed to a host feature flag that runs the HTTP server in the same process as the host — opening the socket and doing path-based routing in-process — for teams that care about the highest density and not running an extra provider.

Lattice Isolation, Reusable Pipelines, and Next Steps

The conversation turned to lattice boundaries. Because wasmCloud does not let separate lattices communicate by default, the lattice is a natural unit of isolation — a desirable trait for a multi-tenant data platform. Mike worked out, with the group's help, that running one lattice per customer (rather than per pipeline) would let him build reusable sub-pipelines: a common pipeline that multiple others call into, all within the same lattice. Brooks added that since Mike's design controls all inbound and outbound traffic through the in/out components, he could even run everything in one lattice and choose how reusable to make things.

Colin Murphy observed that asynchronous, cyclic/directed-graph data processing has been a compelling wasmCloud use case since the early days — one that, perhaps because there are so many good use cases, has received comparatively little attention. Mike noted his pipelines' previous resource consumption had become "annoying," and that a Wasm-native approach looked far more lightweight and capable of higher throughput. Brooks closed the design review by naming the main hurdle for Mike's users: understanding WIT and authoring the interface. Once that interface exists and code is generated, the customer side is simple — do your transform and hand it back — and the platform operator chooses what customer code is even allowed to do (for example, disabling outbound APIs so an unauthorized call simply panics). Mike plans to keep building and return for a live demo once he has something more impressive to show.

WebAssembly News and Updates

This was a quieter community call by design — many wasmCloud maintainers were at a Cosmonic off-site focused on planning and roadmapping, with the team flagging that more concrete roadmap news would land in the following weeks. The substance of the call was a community-driven design review rather than a release announcement, which is itself a healthy signal: developers like Mike Nikles are arriving at wasmCloud through the Wasm component model and building real data-engineering platforms on top of it. For ongoing ecosystem updates, follow the Bytecode Alliance and the wasmCloud blog.

What is wasmCloud?

wasmCloud is a CNCF project that lets you build applications using WebAssembly components and deploy them anywhere — cloud, edge, or Kubernetes clusters. It uses the WebAssembly component model to let you write business logic in any supported language (Rust, Go, Python, TypeScript, C#, and more) while the platform handles capabilities like HTTP, messaging, and key-value storage through a pluggable provider architecture. wasmCloud's reference host is built on Wasmtime, and a lattice connects hosts over NATS so workloads can be scheduled, isolated, and composed across machines. With built-in OpenTelemetry observability and Kubernetes integration, wasmCloud bridges WebAssembly's portable, sandboxed execution model and production cloud-native infrastructure.

Topic Deep Dive: The Wasm Component Model

Mike's entire platform rests on the Wasm component model. The reason the platform harness pattern works is that the component model composes isolated components through typed interfaces — so a platform can wrap an untrusted, customer-authored component between trusted in/out components and link them into one deployable artifact, without either side trusting the other's internals. The customer writes only the business logic against a WIT interface; the platform controls every capability the component can reach, which is what makes it safe to run arbitrary customer code in a shared environment. That same property is why wasmCloud can schedule the composed component as a single portable workload, route between steps with messaging, and bound a customer to exactly the capabilities the operator grants. As more developers reach for WebAssembly to build data-engineering and platform-engineering products, the component model is the contract that lets "write the interface once, plug in any language's business logic" become a real product pattern. To go deeper, see WASI Preview 3 on wasmCloud.

Who Should Watch This

This call is especially valuable for platform and data engineers designing multi-tenant data-processing or ETL systems who need to run untrusted customer logic safely (start with Mike's architecture walkthrough at 8:10), architects weighing isolation models — lattice-per-customer versus lattice-per-pipeline, and HTTP-server density trade-offs (Brooks' guidance at 14:08), and developers new to WebAssembly who want to see the Wasm component model and the platform harness pattern applied to a concrete product (the harness discussion at 11:42).

Up Next

With most maintainers at the Cosmonic off-site this week, the team expected to share more roadmap and planning news on upcoming calls. Watch for Mike Nikles returning with a live demo of the ETL pipeline platform once more of it is built, and for the broader roadmapping outcomes to surface in the weeks ahead.

Get Involved

wasmCloud is a CNCF project and contributions are welcome. Join the community:

Full Transcript

Read the complete transcript with speaker labels and timestamps:

Read the full transcript →