Our AKS images are updated every 30 days, or whenever we detect that the AKS control plane has been upgraded.
Microsoft suddenly stopped providing new machine images for third parties in 2022, so we developed our image flow for AKS clusters. We start with the same Ubuntu base image Microsoft used internally, update the image and its packages, install all required components to it, and run it on all CAST AI-managed AKS nodes.
All of our core pieces are on the
castai-hub, whereas add-on pieces are on Docker (for instance, our open-source
castai/pod-node-lifecycle is the mutating webhook piece which is hosted on Docker rather than the
CAST AI should update the version in the node pool within 10-15 minutes. You can also perform a "Trigger Reconcile" in the CAST AI console UI. Once the node pool is updated you can rebalance to get all new nodes.
Once you upgrade the control plane version, CAST AI will synchronize with it within 10-15 minutes. If you prefer, you can also manually reconcile it through the UI. After that, you can perform a rebalance.
CAST AI will always create nodes with the latest available Kubernetes version. Additionally, when replacing, it first creates a new node and then drains the old one, ensuring that multi-replica applications experience minimal or no downtime.
You can also set a Node-TTL in a scheduled rebalance job that will automatically rotate out old nodes to pick up the upgrade without the need for a full rebalance.
This shouldn't be a problem unless you're upgrading to more than two versions where the APIs have changed significantly.
When creating the required node pools, the versions must align with what Azure currently supports. To clarify this requirement, please refer to the updated documentation provided here: Troubleshooting.
You can set the execution time and minimum node age for the scheduled rebalance to run and set a maximum number of nodes. For example, you can set the following: "I want 3 nodes to rebalance if they are older than 7 days every hour between 10 pm and 2 am EST on Saturday and Sunday."
This would mean every hour from 10 pm to 2 am, the rebalancer would check for 3 nodes that are over 7 days old and swap them with new nodes. If all the nodes got swapped and no nodes were over 7 days old, it would do nothing.
Currently, AKS images used by CAST AI are upgraded after every AKS control plane upgrade or every 30 days. We don't have a way of updating the OS image without a K8s upgrade at the moment.
The patch schedule is as follows: CAST AI AKS images are re-created after every AKS control plane upgrade or every 30 days.
When a cluster is connected, we take a snapshot of the boot disk from an existing AKS-managed node that is running the official AKS image and use that to create a custom image. This ensures the custom image we build is up to date with the latest patches. This was implemented due to Microsoft stopping to share their machine images publicly.
CAST AI uses the bootstrap script from EKS worker nodes and by default, it adds IP forwarding (learn more here). You can also do that via the nodeConfig via the init Script section if needed.
Updated 14 days ago