Cluster and Node Status Overview
This guide includes an overview of cluster status values, defining the current state of the cluster's connection to CAST AI, and an overview of node status values, indicating the health and readiness of nodes to accept pods.
Overview of cluster Status values
The Status of the cluster in the CAST AI console defines the current state of the cluster's connection to CAST AI and indicates whether the platform can perform automated optimization actions on the cluster
Status | Explanation | Action |
---|---|---|
Connecting | Cluster is in the process of being connected to CAST AI in Read only modeOR Cluster is transitioning from the Read only mode to CAST AI managed mode (where customer will be able to setup automation) | |
Read only | Cluster is connected to CAST AI in the read-only mode, reporting features are enabled | |
Connected | Cluster is connected to CAST AI managed mode, reporting features are enabled and automation can be setup | |
Warning | CAST AI managed cluster has encountered a transient error and is currently attempting to recover from it automatically. Autoscaling is not working. | |
Not responding (Read only) | CAST AI has recently lost connectivity to a cluster that was previously connected in the Read only mode, if connection is not restored in 5 minutes, status will change to Disconnected (Read only) | Check the status of castai-agent pod in the castai-agent namespace |
Not responding | CAST AI has recently lost connectivity to a cluster. Autoscaling is not working. | Check the status of castai-agent pod in the castai-agent namespace |
Failed | CAST AI has encountered an error and can't recover from it automatically. Autoscaling is not working. | Hover over the Status to view error detailsCheck the status of CAST AI components in castai-agent namespace |
Disconnecting | The cluster is being disconnected from CAST AI | |
Disconnected | Cluster, that was previously connected to CAST AI, is now disconnected | Hover the Status to see when cluster was disconnected |
Overview of Node status values
The status of the node in the console indicates its health and readiness to accept pods.
Status | Explanation | Action |
---|---|---|
Cordoned | When a Kubernetes node is in the Cordoned state, it means that scheduling new pods onto that node is temporarily disabled. A node might have been cordoned by a user or system in preparation for node deletion. CAST AI also cordons and leaves a node in the cluster if, during rebalancing (with the Graceful Rebalancing option turned on), pods were not evicted in time. | Inspect the node to understand the reason behind cordoning. If a node was cordoned during rebalancing, adjust the pod disruption budget, and un-cordon the node. |
Creating | CAST AI is in the process of creating a node | |
Deleted | A short-term status indicates that a node was deleted | |
Deleting | CAST AI is in the process of deleting the node | |
Detached | A node that is still present in the cloud but has been detached from the Kubernetes cluster. | Inspect the node and delete it manually from the cloud. |
Draining | Node is being drained; Kubernetes is gracefully evicting existing pods from a node. | |
Interrupted | A couple of scenarios might trigger this spot node status, in all cases, CAST AI is managing the interruption and preparing the necessary capacity: - Interruption event received from a cloud provider - A rebalancing recommendation is received from the cloud provider indicating possible interruption - CAST AI predicted node interruption | CAST AI is handling the interruption and is preparing replacement capacity |
Lost | A node is no longer part of the Kubernetes cluster; however, it has not yet been deleted by CAST AI. | If a node is in this state for a prolonged period contact CAST AI support to troubleshoot the issue |
Not ready | A node is temporarily unable to accept new workloads either because it is still being prepared by CAST AI as part of the provisioning process or it is experiencing issues, such as network problems or insufficient resources, that prevent it from properly communicating with the control plane. | If a node is in this state for a prolonged period contact CAST AI support to troubleshoot the issue |
Ready | Node is fully operational and ready to accept pods. |
Updated 8 months ago