Upgrading Kubernetes version

This guide outlines the recommended steps for upgrading your Kubernetes cluster managed by CAST AI. Following these steps ensures a smooth upgrade process and minimizes potential disruptions to your workloads.

Before you begin

Before deciding to upgrade the Kubernetes version in your clusters, review the Kubernetes release notes for any new features or deprecations that may affect your workloads.

Before starting the upgrade process, ensure that:

  • You have access to your cloud provider's console (AWS, Azure, or GCP).
  • You have the necessary permissions to modify your Kubernetes cluster.
  • You have access to the Cast AI console.

Upgrade process

Step 1: Upgrade the control plane

  1. Log in to your cloud provider's console.
  2. Navigate to your Kubernetes cluster management section.
  3. Initiate the control plane upgrade process.

📘

Note

Make sure to upgrade the control plane separately from the node pools. Cast AI will automatically handle node pool upgrades after the control plane upgrade. This approach helps avoid scaling delays and ensures seamless updates.

  1. Wait for the control plane upgrade to complete successfully.

📘

Note

For exact steps on how to perform a control plane upgrade using your cloud provider's console, refer to your cloud provider's documentation:

Example of upgrading the Kubernetes version for AKS

Example of upgrading the Kubernetes version for AKS

Step 2: Reconcile in CAST AI

After the control plane upgrade is finished:

  1. Navigate to the CAST AI console.
  2. Locate your cluster and click the "Trigger reconcile" button.
Triggering reconciliation from the cluster dashboard

Triggering reconciliation from the cluster dashboard

📘

Note

Alternatively, you can wait for the auto-reconcile, which occurs every 30 minutes.

The reconciliation process will:

  • Initiate the creation of new images for node pools (this can take 20-30 minutes).
  • Ensure that all new nodes added by CAST AI use the upgraded control plane version.

💡

Tip

For AKS clusters, you can confirm the image update by checking the Azure Compute Gallery for new images with the current date.

Step 3: (Optional) Full node replacement

To replace all existing nodes with new ones using the upgraded version:

  1. Set up a scheduled rebalance with the following parameters:

    • Minimum node age: 0
    • Target savings: 0%

This ensures all old nodes are replaced.

  1. Generate and run a rebalancing plan.

For detailed instructions on scheduled rebalancing, see our Scheduled rebalancing guide.

Troubleshooting

If you encounter issues during the upgrade process, follow these steps:

Nodes failing to be added after rebalance

If you initiated a rebalance without reconciling first, and nodes are timing out or failing to be added:

  1. For AKS clusters:

    • Delete the castpool from the Azure cloud console.

    Delete castpool

  2. Trigger a reconciliation from the CAST AI UI.

  3. After the reconciliation is complete, re-run the rebalance to add nodes successfully.

Next steps

After successfully upgrading your Kubernetes cluster:

  • Verify that all your workloads are running correctly.
  • Update any client applications or tools that interact with your cluster to ensure compatibility with the new Kubernetes version.