Cluster and Node Status Overview

This guide includes an overview of cluster status values, defining the current state of the cluster's connection to CAST AI, and an overview of node status values, indicating the health and readiness of nodes to accept pods.

Overview of cluster Status values

The Status of the cluster in the CAST AI console defines the current state of the cluster's connection to CAST AI and indicates whether the platform can perform automated optimization actions on the cluster

StatusExplanationAction
ConnectingCluster is in the process of being connected to CAST AI in Read only mode

OR

Cluster is transitioning from the Read only mode to CAST AI managed mode (where customer will be able to setup automation)
Read onlyCluster is connected to CAST AI in the read-only mode, reporting features are enabled
ConnectedCluster is connected to CAST AI managed mode, reporting features are enabled and automation can be setup
WarningCAST AI managed cluster has encountered a transient error and is currently attempting to recover from it automatically. Autoscaling is not working.
Not responding (Read only)CAST AI has recently lost connectivity to a cluster that was previously connected in the Read only mode, if connection is not restored in 5 minutes, status will change to Disconnected (Read only)Check the status of castai-agent pod in the castai-agent namespace
Not respondingCAST AI has recently lost connectivity to a cluster. Autoscaling is not working.Check the status of castai-agent pod in the castai-agent namespace
FailedCAST AI has encountered an error and can't recover from it automatically. Autoscaling is not working.Hover over the Status to view error details

Check the status of CAST AI components in castai-agent namespace
DisconnectingThe cluster is being disconnected from CAST AI
DisconnectedCluster, that was previously connected to CAST AI, is now disconnected Hover the Status to see when cluster was disconnected

Overview of Node status values

The status of the node in the console indicates its health and readiness to accept pods.

StatusExplanationAction
CordonedWhen a Kubernetes node is in the Cordoned state, it means that scheduling new pods onto that node is temporarily disabled. A node might have been cordoned by a user or system in preparation for node deletion.

CAST AI also cordons and leaves a node in the cluster if, during rebalancing (with the Graceful Rebalancing option turned on), pods were not evicted in time.
Inspect the node to understand the reason behind cordoning.

If a node was cordoned during rebalancing, adjust the pod disruption budget, and un-cordon the node.
CreatingCAST AI is in the process of creating a node
DeletedA short-term status indicates that a node was deleted
DeletingCAST AI is in the process of deleting the node
DetachedA node that is still present in the cloud but has been detached from the Kubernetes cluster.Inspect the node and delete it manually from the cloud.
DrainingNode is being drained; Kubernetes is gracefully evicting existing pods from a node.
InterruptedA couple of scenarios might trigger this spot node status, in all cases, CAST AI is managing the interruption and preparing the necessary capacity:
- Interruption event received from a cloud provider
- A rebalancing recommendation is received from the cloud provider indicating possible interruption
- CAST AI predicted node interruption
CAST AI is handling the interruption and is preparing replacement capacity
LostA node is no longer part of the Kubernetes cluster; however, it has not yet been deleted by CAST AI. If a node is in this state for a prolonged period contact CAST AI support to troubleshoot the issue
Not readyA node is temporarily unable to accept new workloads either because it is still being prepared by CAST AI as part of the provisioning process or it is experiencing issues, such as network problems or insufficient resources, that prevent it from properly communicating with the control plane.If a node is in this state for a prolonged period contact CAST AI support to troubleshoot the issue
ReadyNode is fully operational and ready to accept pods.