Notifications
This Cast AI feature informs you via UI or webhook about key issues affecting the cluster. It also passes other valuable information, such as the daily vulnerability report. This guide outlines all notification types and examples you may see in Cast AI with relevant action points.
Once new items are ready for you to view, the bell icon in the top menu will show a count. You can view all items in the Notifications page.
Due to the dynamic nature of Kubernetes clusters, notifications are set to expire automatically in 24 hours.
Notification severity types
Cast AI uses several notification severity types to categorize the severity and importance of messages.
| Severity | Description |
|---|---|
| Critical | Indicates a severe issue that requires immediate attention and may significantly impact cluster operations. |
| Error | Signifies a problem that is causing a malfunction or preventing expected behavior. |
| Warning | Alerts about potential issues or situations that could lead to problems if not addressed. |
| Info | Provides general information about cluster operations, updates, or status changes. |
| Success | Confirms that an operation or process has completed successfully. |
Notification categories
Cast AI organizes notifications into the following categories:
- Reporting anomalies - Cost and performance anomaly detection notifications
- Inventory - New cloud provider instance availability notifications
- Security - Runtime anomaly and image security notifications
- Other - General cluster operations, system connectivity, and configuration issues
Complete notifications reference
This section provides a comprehensive list of all notifications that Cast AI can generate, organized by category and severity level.
Critical notifications
Other category
| Notification | Description |
|---|---|
| Cast AI agent is not able to connect to the API | The Cast AI agent cannot communicate with Cast AI API endpoints. Check network connectivity, firewall restrictions, or authentication. |
| Cluster controller not responding | The cluster controller component is unresponsive. This can prevent cluster management operations and autoscaling from functioning properly. |
| Failed to Reconcile Cluster | A severe reconciliation failure occurred. This can happen when Cast AI service accounts are modified or there are significant configuration conflicts. |
| IP Address quota exceeded | Your cloud provider's IP address quota has been exceeded, preventing new nodes from being created with proper network connectivity. |
| Node Configuration Validation Failed | Node configuration is invalid. Check the notification details for specific configuration errors. |
| Node deletion failed | Cast AI was unable to delete a node from the cluster. This may indicate permission issues or cloud provider API problems. |
| Operation failed | A critical cluster operation has failed. Check notification details for which operation failed and why. |
| Spot Instance quota exceeded | Your cloud provider's Spot Instance quota has been exceeded, preventing Cast AI from launching cost-effective Spot Instances for your workloads. Additionally, since Spot Fallback is not enabled, the Autoscaler might not be able to add any capacity. |
| The Cast AI agent is unable to connect to the API | Duplicate of "Cast AI agent is not able to connect to API" notification. |
Error notifications
Other category
| Notification | Description |
|---|---|
| Missing permission when adding a node to a target group | Cast AI lacks the necessary IAM permissions to add nodes to target groups in your load balancer. |
| Missing permission when adding a node to load balancer(s) | Cast AI lacks the necessary IAM permissions to add nodes to load balancers. |
| Missing permission when adding a node to target groups | Cast AI lacks the necessary IAM permissions to add nodes to target groups. |
| Missing permission when adding VMSS IP address to a backend pool | Cast AI lacks the necessary Azure permissions to add Virtual Machine Scale Set IP addresses to backend pools. |
| Missing permission when deleting a node from target groups | Cast AI lacks the necessary IAM permissions to remove nodes from target groups. |
| Missing permission when removing a node from load balancer(s) | Cast AI lacks the necessary IAM permissions to remove nodes from load balancers. |
| SSO Connection problem | There is an issue with your Single Sign-On configuration preventing proper user authentication. |
Warning notifications
Other category
| Notification | Description |
|---|---|
| Network traffic anomaly notifications | Cast AI monitors network traffic patterns and alerts when unusual activity is detected. May include Cloud API, Internet, inter-region, or inter-zone traffic anomalies. |
| Resource overprovisioning anomaly notifications | Cast AI detects when CPU or RAM resources are significantly overprovisioned, helping identify optimization opportunities. |
| Cost anomaly notifications | Cast AI monitors cost metrics and alerts to unusual spending patterns. May focus on compute costs, CPU provisioning costs, or cost-per-resource metrics. Content varies based on detected patterns. |
| Cannot find valid instance types for the given workloads | Cast AI cannot identify suitable instance types for your workloads. Consider adjusting workload resource requests or instance type preferences. |
| Continuous OOMKilled Events Detected | Pods are being continuously killed due to out-of-memory conditions, indicating insufficient memory allocation or memory leaks. |
| Failed Helm Test of castai-workload-autoscaler | The Helm test for the Cast AI workload autoscaler component has failed, indicating potential deployment or configuration issues. |
| Failed to reconcile cluster | A non-critical reconciliation issue occurred. While not immediately severe, this should be monitored and addressed. |
| GPU quota exceeded | Your cloud provider's GPU quota has been exceeded, preventing allocation of GPU resources for workloads that require them. |
| Outdated cluster-controller | The cluster controller component is running an outdated version and should be updated for optimal functionality and security. |
| Unable to create castpoolarm. ARM VMs will not work | Cast AI cannot create the ARM instance pool, preventing ARM-based virtual machines from being used in your cluster. |
| Unable to update pool | Cast AI was unable to update a node pool configuration, which may prevent scaling operations or configuration changes. |
| Spot Instance quota exceeded | Your cloud provider's Spot Instance quota has been exceeded, preventing Cast AI from launching cost-effective Spot Instances for your workloads. |
Reporting anomalies category
| Notification | Description |
|---|---|
| Cost anomaly notifications | Cast AI monitors cost and efficiency metrics across your cluster. Notifications vary in focus and specificity, targeting different metric combinations. Each notification specifies which metrics triggered detection. |
Info notifications
Inventory category
| Notification | Description |
|---|---|
| New machines available in AWS | New AWS instance types have become available and can now be used by Cast AI for your clusters. |
| New machines available in Azure | New Azure virtual machine sizes have become available and can now be used by Cast AI for your clusters. |
| New machines available in GCP | New Google Cloud Platform machine types have become available and can now be used by Cast AI for your clusters. |
Other category
| Notification | Description |
|---|---|
| Read-Only access activated | Cast AI has been activated in read-only mode, which means it can monitor your cluster but cannot make changes to it. |
| Trial expires soon | Your Cast AI trial period is approaching its expiration date. Consider upgrading to a paid plan to continue using Cast AI features. |
| Trial has expired | Your Cast AI trial period has expired. Upgrade to a paid plan to restore full functionality. |
Reporting anomalies category
| Notification | Description |
|---|---|
| Daily Vulnerability Report | Your daily security vulnerability report is available, containing information about potential security issues in your cluster workloads. |
Success notifications
Other category
| Notification | Description |
|---|---|
| Cluster reconciled | The cluster has been successfully reconciled, indicating that Cast AI has successfully synchronized the desired cluster state with the actual state. |
Taking action on notifications
When you receive notifications, consider the following general action steps:
- Critical notifications - Address immediately as they can severely impact cluster operations
- Error notifications - Investigate and resolve permission issues or configuration problems
- Warning notifications - Review for potential cost savings or performance improvements
- Info notifications - Stay informed about new capabilities and system status
- Success notifications - Confirm that operations completed as expected
For specific troubleshooting steps related to individual notifications, consult the relevant Cast AI documentation or contact support if the issue persists.
Updated 26 days ago
