Skip to main content
← Back

Transcript: ETL Pipelines on the Wasm Component Model with wasmCloud

← Back to watch page

wasmCloud Weekly Community Call — Wed, Jun 4, 2025 · 29 minutes

Speakers: Brooks Townsend, Mike Nikles, Colin Murphy, Taylor Thomas, Florian


Transcript

Brooks Townsend 04:13

Alrighty. Hello, everybody. Welcome to Wednesday. We always have an exciting agenda for today, albeit maybe a smaller one. A lot of the wasmCloud maintainers who are on the Cosmonic side are actually all at an off-site and hanging out in person, so doing lots of planning, lots of thinking, which has been fun. So let me go into the agenda. What we have for today is a new — well, not a new community member, but new to the community call at least: Mike is going to be presenting an architecture diagram and some of the brainstorming and thoughts, and maybe even a demo, of what he's been working on for a wasmCloud-based ETL pipeline platform. So Mike, that was my whole spiel. I know maybe this is the first community call you've been on — would you be interested in maybe starting off with an intro, how you found wasmCloud, and then go into what you're building now? The floor is yours.

Mike Nikles 05:34

I've watched a few of the talks, all the meetings, so I have an idea of what's happening. But yeah, first time being on stage. My name is Mike. I've been around for longer than I want to admit at this point — half my life in IT. And one thing that I've built recently, for the last couple of years, was some pipelines for a crypto client — basically getting data from somewhere, from the blockchain, into somewhere else. I recently left that job, and one thing I came across about 20 minutes past midnight of 2025 was Wasm components. Previously I'd heard about Wasm modules, wasn't really excited about it because of some of the limitations in my opinion. And then I found Wasm components, and on January 1st I started to dig in, and I came across wasmCloud, and I'm like, holy cow, this is cool.

One thing I'm building right now is with a client that's in IoT, an IoT company. The idea here is that instead of running everything in their cloud environment, we were looking at running some of the computational tasks closer to the actual machines that send the signals. I came across some diagrams — I think it's on the Cosmonic website, the KSK or something for Kubernetes — and it looked exactly like what I wanted to build. What I want to do today is show a diagram of what I have in mind, and then show you how much of that I've built. And really, what I'm looking for is just somebody to tell me I'm out of my mind, this is never going to work, so I can stop working on it — or somebody to tell me this looks reasonable, keep going, so I can make a decision after the call about how I'm going to spend my time. Let me figure out how I'm going to share my screen, and I'll go ahead and do that. This looks promising.

Brooks Townsend 07:58

Looks good. We can see your browser now. Yeah, we see "ETL pipelines with wasmCloud."

Mike Nikles 08:10

Yeah, okay, let's do that. So, ETL pipelines. I want to start with a high-level diagram of what I have in mind. This is just a pipeline — it's really not that exciting yet. What we have is two things. One, some kind of controller component or app that runs, where I can access it through the web UI or the CLI to start and install the pipeline, and get information on the runtime. But the more interesting thing is the actual runtime part, where we have some source — could be anything, really — that kicks off a component, kind of step one in the pipeline, that does some processing. It then goes to another step, and maybe it fans out into multiple steps, and eventually it puts some transformed data into a sink somewhere. Very, very high level. Now, all of these steps report back to the controller — or maybe this is a separate monitoring controller, I don't know yet.

High-level ETL pipeline architecture diagram with controller, source, processing steps, and sink

But I want to show you a bit more detail of what these components look like. If we look at one component here — step one, where we have a source interacting with the component — what I want to do is basically allow customers to select, in a web UI or through YAML, a source and then tell me what they want to do with it. So if we zoom in on that step-one component, a component that has the source coming in consists of a couple of things: an "in" component — within it, Kafka, S3, whatever — and that component then uses an "out" internal component which is aware of what to call next. So in the step-one component you would then call step two. They get combined at build time into that green component, so that in wasmCloud this is just one component that gets deployed.

Zoomed-in view of a single pipeline step composed from in and out components

That data — likely going to be Avro, not fully sure, but because of the schema — then calls a component that is step two. So inside those incoming and outgoing components, here is where it gets a bit more interesting. We have the "in" internal that knows how to receive the data, how to decompose it, whatever, and it then uses a customer Wasm file. This is just representing a component that's written by the customer of that platform. So this is where the actual business logic lives. This is where people do whatever they want with the data. That all then goes to an "out" internal — we're familiar with that from before — which then calls the next step. Again, this gets bundled into a component at build time, linked and whatnot. And then to send the data out, we have that same "in" internal which knows how to accept the data, does some tracing and monitoring, and then it calls an "out" Kafka or HTTP and sends the data somewhere else. All of that then gets combined into multiple steps within the pipeline. I think I'm going to stop here for just a second. Any initial thoughts, questions, anything I can answer?

Detailed pipeline step showing in internal, customer business-logic component, and out internal

Brooks Townsend 11:42

Yeah. Everybody should feel welcome to chime in here, but immediately — your intermediate component is exactly the same pattern that we've used for a couple of different things, but most recently a demo on the Cosmonic side called wasm pay. We call this pattern the platform harness pattern. You have an untrusted component — a component that a customer provided, or maybe they just gave you the code and you compiled it — and you compose that by essentially surrounding the ins and outs, so that there's a really tight interface the customer needs to conform to, and then you control the ins and outs. We love that pattern. I really love what you put together here.

Mike Nikles 12:40

My goal here is really to limit the amount of work a customer has to do. All the stuff on the left, this is all drag-and-drop, clicking, whatever — this over here is all clicking. And then in between, that red box is really all I want the customer to provide in terms of code. I do have one question I'd love to bring up. These two components, they call each other directly — so the "in" star directly accesses functions within here. But then between step one and step two, I use messaging, so that this is more asynchronous and can scale individually. That felt like a good thing. The question I wanted to ask: I obviously use capability providers for HTTP, for example. What I'm not sure of yet is whether I want to create that per pipeline — an HTTP capability per pipeline — or one capability provider per customer. If a customer has five pipelines, there would be one HTTP server in the wasmCloud environment, and then I'd use path-based routing to go to the different pipelines, like /pipeline-id or something. But I'm not sure what the overhead is of running, for example, five HTTP servers in one lattice, if I pronounce that correctly. Any thoughts on that?

Brooks Townsend 14:08

Yeah. I wouldn't worry too, too much about per one lattice. There's a couple of things — the lattices are essentially just the network namespace. Your question about lattice, I guess, is separate from the providers. I'd do an HTTP server per lattice, since wasmCloud by default doesn't let you communicate between lattices, which is a method of isolation. If that's a desired trait, we should talk about it. But it's worth noting, just for the HTTP server — like the incoming bit — we do have a feature flag you can turn on for the host which runs the HTTP server in the same process as the host. So the host actually opens up the HTTP address or socket, and then it does the path-based routing in-process. That can introduce some — there's pros and cons for sure — but if you're concerned with the highest density, not running an extra provider, that's an option for you too. But I would do one per lattice on the inbound to start.

Mike Nikles 15:38

Yeah. If it becomes an issue, then that's a good problem to have. Okay, sounds good. And then the other question I had — I think I kind of answered it myself — but should I run lattice per customer or per pipeline? I realized that if I do it per customer, I can do stuff like reusable pipelines, sub-pipelines, all that kind of stuff, where I could have something that keeps happening over and over and create a pipeline for it, and then multiple other ones can call into it, since it's in the same lattice. So I think I can answer that myself.

Brooks Townsend 16:20

Yeah, the isolation piece for the lattice is a key property. You can configure things in different lattices to talk to each other — it requires you configuring NATS. So if it's a property you want, you can. It looks like in this system you control the ins and outs of the component, the customers aren't initiating any data or any outbound connection. So you actually could run everything in the same lattice, because you control all of the network traffic, and that way you can reuse things or not. It really just depends, like you said, on how reusable you want those pipelines to be.

Mike Nikles 17:04

Yeah, excellent. There's a scenario I discussed with the client where these "out" components, for example, would turn into something with customer code in them too. So if they have something internal they want to write to, that could be customer-provided as well. But yeah, this gets me going for the next couple of little ones. I just needed a gut check — if somebody screams at me, like, what are you doing? But it doesn't sound like it, so I'll spend another week working on that.

Brooks Townsend 17:42

Yeah, and don't just take my word for it — come off mute too. Earlier, I'm not sure if you had some feedback. Taylor, also check me if there's anything you hate about what I said.

Mike Nikles 17:55

You said it, so I put myself back on mute. That's two people — I'll take that as enough to keep going.

Brooks Townsend 18:08

Three people, nice. Alright, you're on something.

Colin Murphy 18:11

Yeah, I think — from the very beginning of wasmCloud, I thought ETL was a good use case, or just in general async, like directed graphs, was a definite use case. It's funny — it's almost like there are just so many good use cases that this one kind of maybe had a little less attention. But there's definitely a lot that can be done here. This is great.

Mike Nikles 18:42

Okay. Maybe I'll show up again in a couple of weeks and have an actual — well, I have most of it built, but I'm just not in a place to actually demo something that's impressive enough. So I might be coming by.

Brooks Townsend 18:54

We demo not-impressive stuff in here all the time.

Colin Murphy 18:59

In fact, don't demo impressive things — save it. Deal?

Brooks Townsend 19:06

Deal. No, this is awesome. Let's see — Florian, Mark, just inviting everybody — anybody have any thoughts or questions for Mike? We've got everything up here.

Florian 19:28

No, not really, sorry for that. But I also really think the topic is very interesting, for sure, because of all this dynamic part, which I was also really interested in with wasmCloud in the first place. So it would be really cool to see a live demo next time.

Mike Nikles 19:45

Next time. Okay, yeah.

Florian 19:47

Next time means it's not a specific amount of time, it's just the next one.

Mike Nikles 19:53

Yeah, I'm joking. All good. Sounds good. Awesome. Thanks so much for giving me the opportunity to get this in front of people and get some feedback. Appreciate it.

Brooks Townsend 20:04

Oh yeah, the pleasure's all ours. Mike, this sounds great. Looking forward to seeing the progress here as you're working through writing these components and doing the composition. I sent over a couple of links to the Cosmonic wasm pay example, if you want to use that for some inspiration — especially some of the WebAssembly composition bits, or some of the patterns. As you're working through it, we'd love to hear what's useful. And you kind of dropped into the wasmCloud Slack and said, "Hey, I have a demo," which I love. Everybody should feel encouraged to do that. But if you have any feedback as you're building the thing — good things, bad things — it all makes the community tools better, so we'll definitely take it.

Mike Nikles 21:00

That was good. Yeah, I'll check the links out and provide feedback.

Brooks Townsend 21:06

Awesome. Well, thank you again, Mike, for coming on. This was — like I mentioned, a bunch of us are at an off-site, so we're doing all of our own planning and fun things — so that was actually the main agenda I had for today. I didn't do a little pre-flight check, but does anybody have anything in the wasmCloud community or the broader WebAssembly ecosystem that they wanted to queue up as a topic for today? Alrighty. We did the build-time check and the runtime check, I think we can go ahead and end it here, and give people a little bit more time back. First wasmCloud Wednesday of the summer is what I was after.

Colin Murphy 22:10

After Memorial Day. You know, artificial summer.

Taylor Thomas 22:13

US summer — if you're in the upside down, then it's currently your winter.

Colin Murphy 22:21

Is there anyone from Australia or New Zealand on this call? Or Sub-Saharan Africa, South Africa, South America?

Brooks Townsend 22:29

Lachlan is not in the room, thankfully.

Taylor Thomas 22:30

I know, I was about to say — where's Lachlan? He knows what it's like living in the upside down.

Brooks Townsend 22:37

Alright, for everybody in the upside down, so sorry about your unofficial winter. Thank you, everybody, for coming on to wasmCloud Wednesday. It was a great one. Looking forward to the agenda next week, and as always, have a wasmCloud day. I got too silly too fast. Thanks again, Mike, that was awesome.

Mike Nikles 23:12

Yeah, no worries. Thanks for giving me the chance. Not hearing that people are screaming at me — that's a good start.

Taylor Thomas 23:19

I mean, let me put it this way, Mike: I had to come in a little bit late, and I was like, what is happening? Someone built something real cool.

Colin Murphy 23:29

I unsuccessfully interviewed at a company near here a couple of years ago. They were all about pseudo-real-time processing with acyclic directed graphs. And I was like, oh wow — and I said, I'm really interested in this thing called WebAssembly. This was a couple of years ago, so it tells you how long I've been futilely working on WebAssembly. They were like, oh, that sounds cool — I think I was just too into WebAssembly, and they were like, you're not actually going to be interested in what we do here, are you? I'm like, I guess not.

Mike Nikles 24:12

Yeah, we'll see. I built all these pipelines over the last few years, and the resource consumption is just getting annoying. Then I figured this looks a lot more lightweight, and potentially doing a lot more throughput too.

Colin Murphy 24:27

People should be able to build components to manipulate data and transform some stuff — should be pretty good for that.

Brooks Townsend 24:41

I feel like one of the biggest barriers, Mike, that you're taking on, is that you're going to understand WIT and create the interface. Once you have that interface and you've generated some code, the customer side is really fairly simple — you do your transform, and then you hand it back.

Colin Murphy 25:11

Your Wi-Fi is bad, I think. Or my Wi-Fi is bad.

Mike Nikles 25:16

No, I think it's Brooks.

Taylor Thomas 25:18

You're bad, Brooks — not the Wi-Fi. Moral judgment.

Brooks Townsend 25:27

Whatever — you got the point. You learn WIT, and then with different parts, developers know how to write code.

Mike Nikles 25:34

Yeah, all good.

Colin Murphy 25:37

It really does sound awesome — sounds perfect for people. Some customization, but not full customization. Limited customization.

Mike Nikles 25:52

I think what I'll have to do is make sure that the web interface in particular is as easy as possible — drag-and-drop nodes and say, "I want an HTTP server, and then I want to send it to Kafka, here are my secrets that you need to send it to Kafka," and whatnot. But I've built that one, so that should be very straightforward.

Colin Murphy 26:15

Oh yeah, that'd be really cool.

Mike Nikles 26:18

Just really make it that simple. The one thing I saw — somebody else posted in Slack in my initial message saying they built something like that. It's a no-code, low-code kind of environment, but it felt to me very drag-and-throw, very limited. The reason I came up with that code part for the customer is so it's open to the customer to do whatever they want — not whatever features or functions I provide them. I really want people to write their own code. If they want to call an API to get something, they can do that as well. It always has an impact on performance, but that's some training you have to take.

Brooks Townsend 27:03

Yeah, and since you're operating the platform, you can choose what people are able to do. Like, if you don't want to enable outbound APIs, you just don't, and it'll panic if you try.

Mike Nikles 27:18

Yeah, good. Good times to come — at least I have something to build that keeps me entertained.

Brooks Townsend 27:26

Well, hey, Mike, thanks again. This is awesome. Just let us know when you're ready to come back and chat some more stuff out.

Mike Nikles 27:33

I will definitely do. Yeah, I appreciate it. It's pretty fun.

Colin Murphy 27:39

Bye bye. Taylor — no video, Taylor.

Taylor Thomas 27:46

Oh, here, hold on, I can turn it on for a second. Sorry, I was just shoving my — I know people actually watch this recording. It's not like a sandwich where I can daintily do it; I'm like shoveling pasta into my mouth. It's not an elegant thing. I don't mind doing that in front of people most of the time, especially because everybody here does remote things, but I'm not gonna record that for posterity. Anyway, yes — sorry, I'm a little bit behind, by the way, Brooks, on my reviews, but I'm gonna get back to some reviews tonight, probably.

Brooks Townsend 28:33

Sounds good. Yeah, no worries. I just rebased Massoud's HTTP client PR, so that one we should be able to get in now.

Taylor Thomas 28:45

That was big. I figure you're all reviewed on that, so we're good. I need to review some of the other ones.

Brooks Townsend 28:50

Cool. Well, hey, everybody, have a great day. See you next time.

Mike Nikles 28:58

Hey guys, later. Thank you.