This error isn't related to CAST AI and is a common issue with Kubernetes workload configuration.
Regarding the eviction; this is an expected behavior of the Kubernetes Scheduler when requests != limits for memory, and all of the containers defined for this pod have that setup. Learn more about this here.
It's a Kubernetes best practice to set requests = limits for memory and no limits for the CPU. The idea is to get the requested memory as close as usage to get better resilience. Limits are there to protect us from bursting.
Our general recommendation is not to set CPU limits as it's a costly functionality and the burstable nature of workloads ensures better hardware utilization.
Add CPU limits only in case you have abusers like miners, stress-testers, etc. that may eat all CPU for long periods.
Note: Your workloads will lose Guaranteed QoS
With a single replica deployment, it's not possible to automatically ensure that another replica is up and healthy before terminating the existing one.
The recommended solution is always to have at least two replicas running for high availability and fault tolerance to avoid service interruptions and downtime. However, there are measures you can take to mitigate downtime during spot interruptions or maintenance activities without increasing the replica count.
- Implement readiness probes - By implementing a readiness probe in your application's container specification, you can ensure that the replacement pod is ready before terminating the current one. The readiness probe signals to Kubernetes when a pod is ready to serve requests.
But to ensure high availability during spot interruptions, it's recommended to deploy multiple replicas, which is the best solution. Here's what you can do:
- Deploy multiple replicas - Set up and configure your application to run multiple replicas concurrently. This way, the interruption of one replica does not impact the availability of your application, as other replicas can take over the workload.
Updated 14 days ago