Thanks to ahmedtadde for the work on this initiative.
Contributors should hold off on adding major features until this refactor is complete. (Bug fixes are welcome!)
There are a few CI issues on the PR at the moment. Maintainers are taking a look at the moment!
DISCUSSION: Upgrades and updates for wasmCloud applications
Today we have a control interface operation to upgrade a component from, say, an OCI reference v1 to an OCI reference v2. The host will verify the component and as soon as it is ready, it'll be hot-swapped in, allowing for a zero-downtime upgrade. We don't do this in wadm or wash dev but the option is available.
This can get complicated when you're adding capabilities, changing configuration, etc. This existing approach doesn't seamlessly update links and configs and all of that—you have to do it manually.
The purpose here is to introduce robust transition from one version (or configuration) to another and back.
For components and providers, it would helpful to report the status at all times.
Step 1 is an operator has more information on what the host is doing.
Step 2 is to change the way we handle lattice-wide state to have lattice state NATS KV operations managed by wadm or an operator in order to more effectively coordinate operations from multiple hosts.
See the diagram on the RFC for a step-by-step walkthrough of how rollouts/rollbacks will work under this plan.
Question: A reinitiation of scalers wasn't included in the diagram - is that intentional?
Answer: No, good call-out, that should be included in the diagram as well.
Question: Are canary deployments or blue-green deployments in scope for this RFC or kept for later?
Answer: Probably mostly kept for later. Changing configs can be hard because providers contend for host resources; after introducing the rollouts and rollbacks, a solution here might be to introduce this style of upgrade while excepting the cases where you might have host resource contention.
Question: Shouldn't this be an add-on? We've had discussions about an extension model.
Answer: This gives the primitives for someone to implement the rolling update at another level.