Cluster and node status overview

This guide includes an overview of cluster status values, defining the current state of the cluster's connection to Cast AI, and an overview of node status values, indicating the health and readiness of nodes to accept pods.

Overview of cluster Status values

The cluster's status in the Cast AI console defines the current state of its connection to Cast AI and indicates whether the platform can perform automated optimization actions on it.

StatusExplanationAction
ConnectingCluster is in the process of being connected to Cast AI in Read only mode

OR

Cluster is transitioning from the Read only mode to Cast AI managed mode (where the customer can set up automation).
Read onlyCluster is connected to Cast AI in read-only mode. Reporting features are enabled.
ConnectedCluster is connected to Cast AI managed mode, reporting features are enabled, and automation can be set up.
WarningThe Cast AI-managed cluster has encountered a transient error and is attempting to recover from it automatically. Autoscaling is not working.
Not responding (Read only)Cast AI has recently lost connectivity to a cluster that was previously connected in the Read only mode, if connection is not restored in 5 minutes, status will change to Disconnected (Read only).Check the status of castai-agent pod in the castai-agent namespace.
Not respondingCast AI has recently lost connectivity to a cluster. Autoscaling is not working.Check the status of castai-agent pod in the castai-agent namespace.
FailedCast AI has encountered an error and can't recover from it automatically. Autoscaling is not working.Hover over the Status to view error details.

Check the status of Cast AI components in castai-agent namespace.
DisconnectingThe cluster is being disconnected from Cast AI.
DisconnectedCluster, which was previously connected to Cast AI, is now disconnected.Hover the Status to see when the cluster was disconnected.

Overview of Node status values

The node's status in the console indicates its health and readiness to accept pods.

StatusExplanationAction
CordonedWhen a Kubernetes node is in the Cordoned state, scheduling new pods onto that node is temporarily disabled. A user or system might have cordoned a node in preparation for node deletion.

Cast AI also cordons and leaves a node in the cluster if pods were not evicted during rebalancing (with the Graceful Rebalancing option turned on).
Inspect the node to understand the reason behind cordoning.

If a node was cordoned during rebalancing, adjust the pod disruption budget, and un-cordon the node.
CreatingCast AI is in the process of creating a node.
DeletedA short-term status indicates that a node was deleted.
DeletingCast AI is in the process of deleting the node.
DetachedA node that is still present in the cloud but has been detached from the Kubernetes cluster.Inspect the node and delete it manually from the cloud.
DrainingThe node is being drained; Kubernetes gracefully evicts existing pods from the node.
InterruptedA couple of scenarios might trigger this spot node status. In all cases, Cast AI is managing the interruption and preparing the necessary capacity:

- Interruption event received from a cloud provider
- A rebalancing recommendation is received from the cloud provider, indicating a possible interruption
- Cast AI predicted node interruption
Cast AI is handling the interruption and is preparing replacement capacity.
LostA node is no longer part of the Kubernetes cluster; however, Cast AI has not yet deleted it. If a node is in this state for a prolonged period, contact Cast AI support to troubleshoot the issue.
Not readyA node is temporarily unable to accept new workloads either because Cast AI is still preparing it as part of the provisioning process or it is experiencing issues, such as network problems or insufficient resources, that prevent it from properly communicating with the control plane.If a node is in this state for a prolonged period, contact Cast AI support to troubleshoot the issue.
ReadyNode is fully operational and ready to accept pods.