Transcript: Building an ETL Pipeline from Wasm Components on wasmCloud

Transcript

wasmCloud Weekly Community Call — Wed, Jun 11, 2025 · 59 minutes

Transcript

Brooks Townsend 03:32

I muted myself. Hey everybody, welcome to wasmCloud Wednesday for Wednesday, June 11. We've got a sooner-than-expected demo turnout from last week, which is awesome. On the agenda we've got a demo from Mike — from drag-and-drop to ETL pipeline — and we've got a couple of important things to call out: an issue and a documentation page of the week. Otherwise, I think we'll probably have some good discussion that follows from the demo. Mike, how's it been since seven days ago?

Mike Nikles 04:12

Seven days — thanks for having me again. It feels like an eternity, especially in software development. If you have a few hours where you're in the flow, you can get quite a bit done. So what I want to do today is just follow up, starting with sharing my screen as step one.

Mike Nikles 04:58

If you were here last week, or have watched the recording since then, you're familiar with this diagram. Quick two-second recap: it's an ETL pipeline — or an ETL builder — that I'm working on, built on wasmCloud. You have any kind of source that comes in. In the middle, you have components provided by customers — it's just a Wasm component they upload — and that gets processed, and then you can send that data to any kind of sink. In this last week, I managed to put together a working environment where this actually plays out quite nicely. So I have a wasmCloud environment here: one host, nothing else deployed — that's important to make sure I'm not faking anything. The other thing I built was a quick UI, where you've got some navigation on the left and a canvas where you can build that pipeline that we just saw in the diagram. You have your sources — you can drag them in here — and you can get your processor, plug that in. The only thing I've built so far is a log sink, which just writes data to the wasmCloud log file. Seven days is not quite eternity, turns out, and you can only get so much done.

ETL pipeline builder UI with a canvas for dragging components

Mike Nikles 06:54

You can connect these things together. Once the HTTP request comes in, data flows to that Wasm processor, and then from there it goes to the log output. Doing this top-down gives a better view. You could add multiple processors, you could fan out, you could do all kinds of stuff in the background. What it does is create a config file based on YAML — it's not a wadm config file, it's just something specific to this app. So we have three nodes, or steps; they have certain types — an HTTP in, a processor, and again, one week is only so much time, so we have a hard-coded file path. But imagine that people could upload that — they could click on the Wasm processor, it would pop up a sheet from the bottom, and they could configure it. Same for the other nodes. We have the out component. In the diagram I had two out components, so we can make this two, and you can see they depend on an array, so it depends on the other one. If we go back here, that looks exactly like the diagram I had initially.

Connected pipeline nodes — HTTP source, Wasm processor, and log sink

Mike Nikles 08:30

The next thing you can do is deploy the whole thing. Remember from the diagram how these components consist of multiple other Wasm components, and then I use wac to combine them — like the in and the out. When I click Deploy, it gets the HTTP in, it gets the terminal out, and it whacks them together. It then also creates a wadm file based on the config file here — it sees that there's an HTTP in, so it adds an HTTP capability. So I click that button, it spins for a minute while it's doing all the work, and once it has the one file, it deploys it to the environment and visualizes it in the UI, so people can see this pipeline is now active and data is flowing through it. When it's in that state, you can create a new version, you can look at the logs, you can do the whole observability thing — maybe I'll come back with that next week.

Deploying the composed pipeline as a wadm application to wasmCloud

Mike Nikles 09:50

The other thing I want to show: I had nothing deployed. If I refresh, you can see we have the four components — bug report incoming for the missing name here — and the two providers. We have NATS messaging for component-to-component communication so they can independently scale and do their thing, plus a bunch of links and configs. From a customer's perspective, all they have to do is create a Wasm component, upload it to that processing node, and hit Deploy. It's already using a local OCI registry I have in place, so once we go to production, that'll just use a public one. I could now send a request to the HTTP Wasm file I hard-coded — it makes the string uppercase, and you'd see it in the logs — but I'd have to share a different screen, so I'll skip that. Take my word for it, that actually works. Very high level: last seven days, putting a UI together and converting this into a wadm file so we can deploy. Any questions, any thoughts? Happy to chat.

Deployed components and NATS messaging providers in the wadm view

Brooks Townsend 11:27

My god, I have so many questions, and I usually minimize the time you go.

Colin Murphy 11:34

No, I think whacking together is awesome. It might be even better, since there's wRPC, that you shouldn't necessarily have to whack everything together, right? Having whacked many things together in the past, it is nice and easy. But for dynamic, I don't know if that's maybe something you think about later — it's not really that big of a point.

Mike Nikles 12:09

Yeah, it's a good point. It crossed my mind. The reason I went with using wac was to reduce the number of components that float around. I think it's just a personality thing — I feel overwhelmed when I see lots of components. It would be nine components in my wadm file and in my Wadm UI. That's really the only reason I combined them together, to have more of a one-to-one mapping between what I see here and what's in wasmCloud. But I definitely see the point of not doing that, because it would speed things up — I wouldn't have to wait for that command to finish.

Colin Murphy 12:51

Also it would just be a feature to be able to pop something in, or change something on the fly, right? Yeah. I mean, for the purposes of this, I understand it — just someday.

Mike Nikles 13:16

Definitely something to keep in mind. I appreciate it.

Brooks Townsend 13:21

Mike, I have many questions, but just to stay on the same vein: I know our wRPC feature makes it nice that you can choose to break those apart later. My question is — is that something that's important to you? Are you thinking, "I might break this out later, and that's why wasmCloud is so cool, because I can do that"? Or are you honestly just thinking, "I can compose them; if they're all going to be running together, why would I want to go wRPC anyway?"

Mike Nikles 14:01

I think that's the initial plan I had — to keep them as components, because these two, like the HTTP in and the internal out, are always together, they're always one unit, and then there's that reuse part. The outgoing here is the same as here. But now that I think more about it, I actually deployed the out component twice — once as part of this and once as part of that. If I didn't whack them together, then I'd deploy it once and use links, right? Is that understanding correct?

Brooks Townsend 14:37

Yeah, that's pretty much it. I kind of wanted to tease that out, because the distributed nature of wRPC is really cool — it adds fallibility to the operation. It's nice when it's all composed together and can't fail to go over the transport. But I was curious. I feel like it's good documentation for us to have, if we support both, why you would go one way or the other.

Colin Murphy 15:17

Yeah. Also, are you using a P3 kind of preview — 0.3, with the streams?

Mike Nikles 15:27

No streams yet, no.

Colin Murphy 15:29

Oh, okay. Well, that's definitely something once it comes out you should use. It would just work very well with the images you have of streaming things. If you had components that took streams in and out — you're not using input streams or output streams?

Mike Nikles 15:49

No. I wasn't even aware that's available — input streams and outputs.

Colin Murphy 15:53

Input streams and output streams are available, but with 0.3 it'll just be stream — so you can take a stream as an input or take a stream as an output. My ctpa stuff uses input streams and output streams, so I can send you a link to a WIT file that has input streams and output streams as an example.

Mike Nikles 16:25

Nice. That's what I need. I'm like, I'm cutting edge — but not that cutting edge.

Colin Murphy 16:36

It's in there. I think you're cutting edge — you're just trying to go with an ETL thing and have an ETL solution. But I think once you iterate on it, you would want streams, and you'd probably want dynamic in-and-out kind of stuff. That's just the way software works, right?

Mike Nikles 16:59

That's why I'm here too, right? It's awesome. I'll definitely check it out. I'll send a message afterwards.

Brooks Townsend 18:49

Every time I see a browser and I'm thinking about us doing Wasm things, I'm like, "Ooh, what kind of WebAssembly could we throw on the front end to do some of this stuff?" I'll nerd-snipe you with a project I worked on, wit-to-wadm, which just translates a WIT into a wadm file. So a really similar structure — you just go from your config to wadm. Not saying that's the right thing here, but it would be neat to run some of that client-side.

Mike Nikles 19:51

I agree. The other thing that went through my mind was running that on wasmCloud as well, instead of running a separate HTTP server and all that — it could totally work there too, having a service that exposes the web app and whatnot. The thing is, I had a week to do it, and I was familiar with building standalone services. There we go — but easy enough to migrate, obviously.

Brooks Townsend 20:21

No, Mike, this is awesome for a week. Last week you showed us squares on an Excalidraw, and this week you have a drag-and-drop deploy pipeline. This is awesome. I don't want to bombard you with a billion questions, but I can't wait to get your readout on the experience, and everything you open-source, I can't wait to take a look at.

Mike Nikles 20:50

Good. I mean, that's really all I've got as a foundation. I won't be back next week, I'm sure, but I'll come back another time when I have a bit more to show.

Brooks Townsend 21:01

Next week you'll be announcing your seed-round funding.

Mike Nikles 21:07

Let me know if you're investing.

Colin Murphy 21:10

This is definitely an area ripe for some Wasm — an industry sector that really could use some of this.

Mike Nikles 21:18

One thought I have: I keep thinking about that diagram on the Cosmonic website for Kubernetes. If you have this view here, and I can do a few more boxes showing different edge environments, where you can drag components into the edge environment, it would automatically deploy. I have some thoughts on how this could really take advantage of all the features we have to distribute the load.

Brooks Townsend 21:55

Interesting times. As a Cosmonic employee, I'd be happy to talk to you about what you could do in Kubernetes — would be fun. Sounds like a plan.

Brooks Townsend 22:28

Well, hey, Mike, awesome stuff again. I can't wait to hear about how it went. I'm sorry I talked a lot. Everybody else on the call — do you have questions or comments for Mike?

Brooks Townsend 23:01

Alrighty. Also, if you open-source part of it, let me know — I'll link to it in the community notes, if you want that. So I think we can go ahead and move on to the next things in the agenda. I just had a couple of little things. I want to keep doing the issue and docs page of the week, because it's a great place for us to talk about new things coming across the pipe and point to new documentation you may not know exists. As far as the issue of the week goes, this isn't an issue as much as a call to action. If you've seen our community roadmap planning that we do every quarter — we usually try to do it in the first community meeting of the quarter, so that's going to be coming up in July. This discussion is where we try to source a lot of our feedback from folks who aren't on the community call, who aren't in Slack. It's an open space for people to leave comments about what they'd like to see in wasmCloud next quarter. Trying to scope those to a shorter time frame and then triage them ahead of roadmap planning is really helpful for figuring out what we can do.

Brooks Townsend 25:00

If you've been holding on to some thoughts, or have things you'd really like to see us lean into going forward, please leave comments there. There's no guarantee we'll be able to find the maintainer time to do all the things, but community feedback is what really helps give us direction beyond stewarding the goal of being an awesome WebAssembly platform. Now I'll put my Cosmonic hat on: on the Cosmonic side, we've been working with people who have deployed wasmCloud to prod, who've been working with wasmCloud 1.0 for a while. So I have some things I'd like to throw on here from the Cosmonic perspective — like, it would be great for this mechanism to be more pluggable, or it'd be great to embed the wasmCloud host inside wash, and maybe have a mode where you don't need to enable JetStream when you're just doing local development. I'll try to put the things I feel like wasmCloud should be doing from a project perspective in there, and also call out where my perspective is fed by people we work with at Cosmonic. Massoud, I think your comment's great — I'll read it out loud. You said in the chat, "I'd love to see, on the upstream side, support for HTTP/2." We've gotten this request a couple of times. WASI-HTTP is HTTP/1.1; I don't know if the P3 version of the interface will support HTTP/2 or not.

Massoud 27:26

I recall Bailey saying a while ago, and I also read it on the upstream side, that support for HTTP/2 is going to be a follow-up to P3 — basically P3 is going to pave the way. The reason they brought it up, and I'm sure everybody who works in the AI space knows, is that all the new protocols are based on HTTP/2, which kind of excludes using WebAssembly directly there.

Brooks Townsend 28:13

Yeah, that is disappointing. It would be really nice to be able to start writing some WASI-based agentic things. Even if it's something we need to prototype on the wasmCloud side, or some experimental interface, or have a provider that does a translation — I'm sure it's more complicated than that — it would be nice to be able to take advantage of that ecosystem.

Massoud 28:47

What I have in mind right now is either using messaging to a provider that communicates, or a provider doing the HTTP/2 side and then using HTTP/1.1 on the back end to engage the services of the components. Just extra plumbing you have to do.

Colin Murphy 29:26

You can have something that takes HTTP/2 ingress, but you just wouldn't be able to write a component that could take the HTTP/2 itself. So you could have an HTTP/2 server — something like Pingora — and have that put something onto the NATS bus, or whatever, to call your components.

Massoud 30:06

Yeah, again, that becomes kind of messaging, right?

Colin Murphy 30:10

I'm just saying, if you had some sort of agentic thing — if you wanted to take some sort of agent request, or call some sort of ML workflow — you'd have all the HTTP/2 spec available, but you'd be able to take HTTP/2 requests, or outgoing, that way.

Massoud 30:50

So what I'm doing right now is really doing the bridging to a provider.

Colin Murphy 30:59

Yeah, so you could have a provider that could handle HTTP/2.

Brooks Townsend 31:02

Yeah, that makes sense. And that's what I think you said, Massoud, right? You'll take HTTP and then publish it on messaging, or whatever.

Massoud 31:14

Yeah, that's why I'm also interested to see when the spec is going to pave the way for messaging, because I'd like to implement it for wasmCloud and also use it in my work.

Brooks Townsend 31:47

I think this is great stuff to put on the roadmap. When it becomes WASI, messaging may not be directly in our control. We can certainly do wasmCloud messaging, or — we've talked previously that we could do a NATS messaging interface, something that more directly represents the NATS message bus, or maybe even JetStream.

Massoud 32:24

I think this is in their proposal, because I review it time to time. The one we have right now is simple, it's effective — a lot of people use it, I'm using it — but it doesn't have all the bells and whistles you need to do some fancy stuff. That's why I'm interested in implementing the spec as a provider.

Brooks Townsend 33:03

Interesting. This is all really good brain batter for Q3. I think we kind of go back and forth on where the WASI cloud interfaces and the more generic interfaces fit versus the more specific vendored ones. We're already using standards in a lot of places — like Postgres is a protocol standard — but we make such heavy use of NATS that it would be really neat to have more robust support for that messaging system. Maybe you could back it onto Kafka, and that's fine. But, Massoud, I'm sure you saw this when you were working with the NATS object store and key-value store interfaces — there's no mechanism in there for some of the fancier things.

Massoud 34:20

My first implementation really is going to be for NATS, because it's kind of integrated, and that makes it possible for everybody to immediately leverage messaging in place.

Brooks Townsend 34:49

Yeah. And we've probably beaten this horse to death about NATS trademarking, but just to continue to affirm — it's staying in Apache 2, and I don't see an equivalent replacement project for what we use it for in wasmCloud. So I don't think there's anything to be concerned about there.

Taylor Thomas 35:15

Is it opening a can of worms, Brooks, to talk about flexibility in the platform? It's just stuff you and I have talked about various times. I think many of us who've been around this project for a while, and our maintainers, have been having random conversations about trying to make it as flexible as possible — and flexible at the right levels. At some levels, we can't be everything to everyone, but Wasm lets us do that a little bit more than other technologies would, because interfaces are awesome. There's thinking being done around how to do that. There's certain value that things like NATS add that we don't want to get rid of. Some people, when they come in and see it for the first time, are like, "well, just use a service mesh." Well, it actually replaces three technologies that most people are used to using from the Kubernetes space, and that's where its value really lies. What we want to do is not lock people into it, but also make sure we can make some assumptions on how we set things up. So we're trying to go for flexible, but I think that's as we go into new territory here.

Massoud 36:56

Well, I personally think, Taylor, that NATS is great. Actually, that was one of the primary things that attracted me to the wasmCloud platform, because it gives an easy way of making the system distributed without coming up with a lot of fancy stuff to do it yourself.

Brooks Townsend 37:26

So we've already done some refactors on the wasmCloud host to make abstractions for some of the things we tightly couple to NATS for today — like configuration. When you configure a component or link or a provider, the host reaches out to a config store, and now there's a little light abstraction there so that if you're embedding wasmCloud, you don't have to back it with the NATS object store or key-value store. But if you do that, then you're responsible for making sure that config is available for any distributed host, and that's a hard problem we don't want to be responsible for. So leaning into NATS there is great. We try so strongly for wasmCloud not to be dependent on specific technologies like Kubernetes or a service mesh, and — speaking with my Cosmonic hat on again — it would be really nice to feel that same way about NATS. A lot of people use Kubernetes and have their opinions, like all traffic has to go through Istio. So it'd be cool if people who want to run wasmCloud in a Kubernetes-style area and just send stuff over HTTP could do that, without setting up something separate. That's something I've been trying to figure out — the right balance of abstraction.

Massoud 39:32

Couldn't they do it using a custom provider — say, have a local host running, and then have that interface with the service mesh?

Brooks Townsend 39:46

They could. What people have to do now is the built-in HTTP provider for ingress — soon a built-in client provider — and then you have to have NATS JetStream set up somewhere, even if it's just local, because wasmCloud won't start without it. But you're right, we could totally do it with a custom provider. It's just about whether we could remove the requirement for a NATS deployment with wasmCloud, assuming you solve all the other problems.

Massoud 40:30

Well, personally, I consider the combination of NATS and the ability to write your own providers as providing a lot of flexibility. What I mentioned about HTTP/2 is that, for good reasons, all the new agentic protocols are based on it, and we know WASI doesn't support it — so we have that dependency. We also have some security considerations on the wasmCloud side for MCPs that are OAuth-based, which limits what you can do. That's an area where I don't know how it could be enabled securely without compromising the security guarantees the platform provides. But there's only so much that can be done at the moment with the specs we've got. I don't think flexibility is something the platform lacks — that's my personal opinion.

Brooks Townsend 42:09

What we want is a really flexible, extensible platform. I forget how we got on this thread exactly.

Massoud 42:21

It was about the roadmap. I put my wish list, but I do know where it's coming from, and that's why I'm not putting any comments there — it's not something we could do internally.

Brooks Townsend 42:40

Cool. Well, thanks — that's all awesome stuff. Everybody, please feel free to leave your comments, questions, concerns, feedback, and love letters in the Q3 2025 roadmap. That's probably enough there. I did want to call out one last thing — a new page in our documentation. We've had a couple of different pages around our use of NATS, and the question we get more often is, "Hey, I'm deploying wasmCloud somewhere — what NATS streams and key-value buckets do I need to set up ahead of time? How do they need to be configured? What subjects do I need to allow or deny in and out of my NATS accounts?" This is a fairly complicated area of a production wasmCloud setup.

The consolidated NATS reference page in the wasmCloud docs

Brooks Townsend 46:00

So under the reference guide, NATS, I've tried to collect all the information we have around the docs. I wanted an easy place for folks to see what subjects we use in wasmCloud. Here's the subject structure for our wRPC protocol — so if you're interested in allow-listing RPC for components, you can do *.*.wrpc.> to allow everything. Here's how you could debug all RPC traffic flowing in the wasmCloud system — basically what wash spy does for you, but with a specific ID by filling in this second wildcard. Here's the pattern and subjects we use for the control interface — how you could subscribe to see what messages are going across — and here's what the wadm API uses. Here are the two key-value buckets wasmCloud uses, so if you wanted to pre-provision them, set up replicas, back them by a file, and start backing them up, you don't have to let wasmCloud create them — you can create them and wasmCloud will connect. Same thing with wadm. I also left information about productionizing this: you should definitely back up your wadm manifests bucket, because that's where all the declarative manifests for your applications live. But wadm state is not as critical to back up — we reconstitute it at most every 60 seconds, realistically in less than 30, by combining all the heartbeats of hosts in the lattice. Here's a recommended operator structure for signing keys. There's a ton of information here. So when we say a hub-and-spoke architecture, this is what we're talking about — your setup with NATS servers and clusters.

NATS subjects, key-value buckets, and hub-and-spoke topology in the reference

Brooks Townsend 48:00

This is going to be really informative for what we're trying to think about — what can we make an abstraction, and what can we lean further into NATS on. You probably don't need to know all this before you get started, but depending on your learning style, some people really like to come into the docs and read how things are used and how they work. This should serve as a great place to understand how we're using such a powerful technology. Massoud, I'd love that suggestion — you suggested having a "What's New" section on the main page of the docs to call out new pages. I think that'd be great. Documentation is hard — you always have to write it and keep it up to date — but that would be really nice, especially as we add things like deployment guides.

Massoud 48:37

I think a CI job could help with that. I've seen other projects automate that out of the changelog, or maybe the document tags — something even basic that says, "these are the new docs or updated docs," and then people can jump to those pages. For somebody who has read through the documentation, coming back is kind of a "go and discover what has changed," and that's time-consuming, because you really need to figure out what's new.

Brooks Townsend 49:34

Yeah. I feel like we've considered using the versions for this a bit more liberally. Right now we have docs for pre-1.0 and post-1.0. But if we used release notes for some automation, we could have a version of the docs for, like, wasmCloud 1.8, and on that page you'd get a little sparkly emoji or whatever for the new documentation that came with that version. I'm just thinking out loud — I hate to say it, but this kind of feels like something that would be valuable to set up in an LLM context. Not to have an LLM write the docs, but if it has the context of the documentation repository and can take in release notes as an input, we could probably do interesting things to point out sections that need to change, or new documentation that needs to be generated — like, "you added this new feature for error handling, I didn't see anything for it in the docs." That might help low-lead developers like myself who forget where everything is. Sounds like a little docs refresh might be in the mix for Q3.

Yordis Prieto 51:42

Yeah, you could take advantage of the release, but the website itself — like start publishing. I just noticed there's no release thing.

Brooks Townsend 51:57

Yeah, we don't publish the release notes to the documentation site. We actually have this problem generally — we have Markdown documents in GitHub that we don't publish. You've got to sync it to a different repo, so we just don't even set that up. But maybe we should. I hate to add more things to the monorepo, but that would be one advantage of having documentation alongside the main project.

Massoud 52:43

I think describing what we want to achieve will help us understand the best implementation approach. There are different ways to slice and dice this. By the way, I'm working on some custom context for wasmCloud — if it pans out, I'm going to publish it, and then we can all use it. I'm still working on that area, so I don't know what's going to come out of it. Maybe nothing, but hopefully something useful for everybody, and then we can see how to put it to use for some of these repetitive tasks that aren't enjoyable to do by hand.

Brooks Townsend 53:45

Yeah. Like searching through the entire documentation to figure out if you need to change an OCI reference is lame. To round that out on my side: I don't have a specific prescription for what I want to do on the docs. I really like the Diátaxis framework for organizing things. I feel like we've leaned into that — maybe we could lean harder, and literally call it "how-tos," "reference," whatever on the sidebar. Generally I think we have quite a lot of documentation, and I'd like to work with Eric, our main technical writer on the Cosmonic side, to curate some of those flows. If really verbose technical content goes into reference, and reference is a large section, that's great. The guides are very pointed. That's really the only opinion I have. I think Yordis knows exactly where it should go.

Yordis Prieto 55:19

No, I just prefer if you actually label the pages — like, it's an explanation, or a how-to, or whatever. A small thing, because sometimes I have to click around to find which one I actually want, because I can't tell based on the hierarchy. The ones under references — okay, I know everything down there is a reference. But the operation guide is, like, okay, is it a how-to? When I go in, it sounds like it's an explanation. Environment variables inside the operational guide feel more like an explanation. Things like that.

Brooks Townsend 56:03

Yeah. "How to Kubernetes deploy" — boom, that kind of stuff. I like the idea. If you notice it, definitely share.

Yordis Prieto 56:36

I helped the Sequence guys introduce theirs, and I love their documentation — I can pass you a link later. They've done a really good job with it.

Brooks Townsend 56:48

Cool. Thank you. Well, interested in NATS reference documentation? We've got more of that now. Other than that, that's pretty much all we have for the call today. We've got maybe five minutes left. Does anybody have anything from the wasmCloud side, the broader WebAssembly ecosystem — maybe one last topic? Speak now.

Yordis Prieto 57:33

My LLM hallucinated, though — you know, you cost me $3.

Brooks Townsend 57:41

My bad on being so confident that the repo never existed. There aren't that many repos in the wasmCloud org.

Yordis Prieto 57:52

Social gaslighting. Okay, fine.

Brooks Townsend 57:56

I'm wrong, okay. Well, I guess me getting gaslit is the last topic of today's meeting. Everything's going great. I think we can probably call it before we just goof off for three whole minutes. Thanks, everybody — thank you for coming to wasmCloud Wednesday and participating again. Mike, awesome demo. As soon as you make progress or have questions, we'll always have you back for another demo. Other than that, I think we can call it. Thanks, everybody. We'll see you next week for wasmCloud Wednesday.

Transcript: Building an ETL Pipeline from Wasm Components on wasmCloud

Transcript​

Transcript