MySQL Support for Index Advisor, Performance Advisor Pilot, and Azure Savings Plans

In March, Cast AI added MySQL support to Database Optimizer's Index Advisor, starting with index recommendations, and introduced the Performance Advisor early access, which maps query execution plans to actionable recommendations alongside Index Advisor.

Workload Autoscaler shipped a reliability suite that automatically detects native VPA conflicts, endless memory upscaling loops, and JVM tuning environment variables, alongside DaemonSet resource scaling.

Azure Savings Plans joined AWS Savings Plans and GCP Flex CUDs as a supported commitment type for both reporting and node autoscaling.

The month also brought broader GPU and TPU coverage — TPU resource requests, MPS support on GKE, and fractional-GPU utilization across reporting endpoints — along with self-service spot interruption simulation, Oracle Cloud price adjustments, and a range of AKS, OMNI, and console improvements.

Major Features and Improvements

MySQL Support in DBO Index Advisor

Database Optimizer's Index Advisor now supports MySQL. The first MySQL recommendation type identifies unused indexes that are safe to drop, helping reduce write overhead and storage on busy MySQL databases.

Performance Advisor Early Access

Performance Advisor entered early access this month. It analyzes query execution plans collected from your databases and maps them to actionable recommendations alongside the existing Index Advisor. Queries are validated before recommendations are surfaced, and validation results are stored so recommendations can be re-validated as conditions change.

Workload Autoscaler Reliability Suite

Workload Autoscaler shipped a set of changes aimed at eliminating common failure modes that customers hit when rolling Workload Autoscaler out across mixed clusters. Native Kubernetes VPA is now automatically detected — workloads already managed by a native VPA are skipped rather than competing with the existing controller, removing a class of conflicts that previously required manual exclusion. Endless memory upscaling loops, where each OOM-kill triggers a higher recommendation that triggers another OOM, are now detected and handled safely rather than pushing memory upward indefinitely.

For JVM workloads, Workload Autoscaler automatically detects JVM tuning environment variables (such as JAVA_TOOL_OPTIONS) and accounts for them when generating recommendations. The disconnect between the policy-level, workload-level, and detected-container JVM scaling toggles has also been fixed. DaemonSet resource scaling was also shipped this month, giving customers a controlled way to scale DaemonSet resource requests by a configured factor rather than applying per-workload recommendations.

Cloud Provider Integrations

AWS

AWS Capacity Reservations in Terraform

AWS capacity reservations can now be targeted from the castai_node_template Terraform resource, matching the targeting that was previously available only in the console. Scheduled capacity reservations are now treated as active when the current time falls within the start/end date range, and the autoscaler uses the incrementalRequestedQuantity tag as a fallback when determining scheduled reservation instance counts.

Azure

Encryption at Host

Cast AI now respects the encryptionAtHost setting when provisioning AKS nodes, so VMs are provisioned with host-level encryption enabled when the node configuration requires it. The setting is exposed in the Node Configuration in the console and supported through the Terraform provider.

Improved Quota Error Handling and Boot Diagnostics

When AKS returns MaxSpotInstanceCountExceeded, QuotaExceeded, or related errors, the autoscaler now categorizes them correctly and stops retrying tight loops, sparing customers from a flood of identical events. When add-node times out, AKS boot diagnostics are now captured automatically, so the cause of the failed start is visible.

Kubelet Config and Ephemeral Storage in Node Configuration

Advanced Settings on AKS node configuration now expose the Kubelet config block, and ephemeral storage options are surfaced in the Node Configuration section of the console, removing the need to set them via the API.

Oracle Cloud (OCI)

Price Adjustments for OCI

Custom price adjustments are now available for Oracle Cloud. GPU instance pricing uses a per-GPU model rather than the standard CPU/memory split, reflecting how Oracle prices its GPU shapes.

Commitments

Multi-Team and Cross-Organization Commitments

Commitments can now be isolated to specific teams within an organization, and assignments and usage tracking now work across organizations for customers with multi-org Cast AI footprints. Cloud Connect commitment scope was also extended to track external usage for AWS.

Azure Savings Plans

Azure Savings Plans now join AWS Savings Plans and GCP Flex CUDs as a supported commitment type for both reporting and node autoscaling. They behave the same way as their AWS and GCP counterparts — spend-based, region-flexible, and applied automatically to matching nodes after import. Utilization tracking covers usage outside Cast AI-onboarded clusters, so customers using Azure Savings Plans across a broader Azure footprint can see how much of their commitment is being consumed by Cast AI-managed workloads versus the rest of their infrastructure.

Pod mutations

Pod Mutations: Root Owner Resolution

The pod mutator now resolves a pod's root owner all the way up the controller chain — for example, a Deployment that owns a ReplicaSet that owns the Pod — so mutations and the Evictor both target the correct owning resource. The pod mutations summary view now also shows which workloads each mutation targets.

Workload Optimization

JVM Heap Scaling (Early Access)

Workload Autoscaler can now scale JVM workloads based on actual JVM heap usage rather than the memory footprint Kubernetes observes from outside the JVM. This produces tighter memory recommendations for Java applications.

Once enabled, Cast AI automatically identifies applications as JVM workloads, scales memory based on Prometheus-scraped heap metrics, injects -Xmx and -Xms into the pod, and rightsizes the memory request based on actual heap usage.

JVM Heap Scaling requires Prometheus running in the cluster with JVM metrics scraped, and Prometheus connected to Cast AI as a Custom Metrics data source. The feature is in early access.

Native VPA Detection

Workload Autoscaler now detects workloads that are already managed by a native Kubernetes VPA and skips them when Workload Autoscaler is enabled on the cluster, rather than competing with the existing controller. This removes a class of conflict that previously required manual exclusion of VPA-managed workloads.

Endless Memory Upscaling Loop Detection

Workload Autoscaler now detects workloads stuck in an endless memory upscaling loop — where each OOM-kill triggers a higher memory recommendation, which leads to another OOM at the new ceiling — and stops pushing memory upward indefinitely. OOMKill handling was also improved more broadly.

DaemonSet Resource Scaling

DaemonSet resource scaling was shipped in March, giving customers a controlled way to scale DaemonSet resource requests by a configured factor X relative to node size, rather than applying per-workload recommendations to each DaemonSet.

System Policies: CPU Limits Option

System policies now include an option to control CPU limits, alongside the existing CPU requests and memory controls.

Karpenter Enterprise Suite

Scheduled Rebalancing

Scheduled rebalancing is now available for Cast AI on Karpenter clusters via a dedicated page in the console, with timezone adjustments to make schedules more predictable across regions.

Tracing Rebalancing Plans Back to Their Schedule

When a scheduled rebalance runs on a Karpenter Enterprise Suite cluster, the resulting rebalancing plan now shows which schedule produced it. This makes it possible to trace any plan back to the schedule that triggered it from inside the console, instead of having to correlate timestamps by hand.

Node Autoscaling

Cluster Autoscaler Configuration Overrides

Cluster Autoscaler arguments — flags such as --scale-down-delay and --max-node-provision-time that control how aggressively nodes are added and removed — can now be overridden from Cast AI without editing the Cluster Autoscaler pod spec directly. Overrides persist across pod restarts and are surfaced in the cluster settings page, so customers can see exactly which defaults have been changed and to what.

Self-Service Spot Interruption Simulation

Customers can now trigger a spot-interruption simulation on a node themselves — by annotating the node directly or from the node list in the console — to validate that workloads on that node fail over correctly, without involving Cast AI support. A new Spot Interruptions dashboard makes the surface of spot activity easier to inspect.

Rebalancing Improvements

The maximum drain timeout for rebalancing can now be configured up to 180 minutes, giving long-draining workloads time to finish during a rebalance. Handling of drain-failed nodes is improved, the rebalancer surfaces simulation errors in the console, and large rebalancing plans persist faster through batched inserts.

Clearer GPU Scheduling Diagnostics

When a GPU pod has an incompatible node selector, the Rebalancer no longer falls through to the misleading "NVIDIA Device Plugin is required" message and instead reports the real reason it couldn't place the pod.

CO-4340 — Misleading device-plugin pod events resolved

Node List Click-Through and Test Nodes

Clicking a node name in the node list now opens the node details page directly instead of requiring an extra step. Test nodes can be created from a node template.

v5 TPU Provisioning and GCP G4 Instance Family

Cast AI now provisions TPU-capable nodes when workloads request TPU resources directly in their pod specs, extending the existing autoscaler coverage from GPUs to TPUs on GCP. Pricing for TPU instances is now first-class in the pricing inventory, and the GCP G4 instance family is also now available for autoscaling.

AI Enabler

Analytics: Tags, Model Filtering, and TTFT Percentiles

AI Enabler Analytics now supports filtering by model and lets customers tag requests so spend can be attributed across teams or workloads. Time-to-first-token is reported as different percentiles, giving a clearer view of latency than a single average. A new OTEL report section was added, API key identification in analytics was improved, and API keys with read-only access can now report chat completions, embeddings, and reranking usage.

AWS Bedrock Sync via Cloud Connect

AWS Bedrock data is now pulled through Cloud Connect using the existing AWS_AI_SERVICES integration scope, surfacing Bedrock metrics alongside other AI Enabler data without a separate integration step.

Batch Processing Bucket Retention

The AI Enabler batch processing GCS buckets are now protected from accidental deletion when a cluster is offboarded.

Custom LLM Provider Name in the Drawer

When editing an external LLM provider — for example, an Anthropic provider configured with a custom name such as anthropic-castai — the provider details drawer now shows the custom name assigned during registration instead of the generic provider type, making it easier to identify a specific provider configuration.

Application Performance Automation (APA)

DBO Recommendations as APA Findings

APA now ingests both Index and Performance recommendations from Database Optimizer and surfaces them as Findings alongside its other discoveries.

Findings: Dismiss, Dedupe, and Status Improvements

Findings can now be dismissed from the APA UI. Duplicate findings are filtered out before they're shown, new statuses and sub-statuses are introduced, and each finding now carries a completion reason. The summaries have been tuned to be more readable as well.

Agent and Platform Improvements

The agent now explains its work as it runs, supports user-provided additional instructions, and has a fallback path for cases where it couldn't find a target workload. A resource topology scan, agent sandboxes, a remediation execution flow, multi-tenant support, and a GitLab PR events integration round out the platform work this month. The vulnerability runbook added a dedicated planner agent and a component that produces findings rather than only opening pull requests.

Service Discovery in APA Global Settings

APA's Global Settings page now includes a Service Discovery section, where customers can configure how the discovery agent finds repositories and services across their environment. This rounds out the global settings introduced earlier in February, which previously covered runbook defaults and models but not service discovery.

Database Optimization

DBO Metrics Filtering

DBO metrics can now be filtered in the console by database, endpoint, or user, making it easier to isolate the source of a performance issue.

Cost Management

Platform Usage Reporting

A new Platform Usage area is now available in the console that breaks down platform usage in detail for customers.

Reliability Metrics

The Workload List in Monitoring shows reliability metrics next to each workload alongside the workload details page. OpsPilot has access to the same reliability metrics and can answer customer questions about them.

Fractional GPU Utilization in Cost Reporting

Cost reporting endpoints now surface utilization for MIG, time-sliced, and fractional GPUs end-to-end, so customers running shared GPU workloads can see how their accelerators are actually being used rather than seeing only whole-GPU allocations.

Billing: Anywhere Removed as a Separate Billable Feature

Cast AI Anywhere is no longer treated as a separate billable feature in the new billing system. Usage is attributed through Workload Autoscaler billing instead, avoiding the double-counting that previously affected organizations using both.

AWS Marketplace Anomaly Detection

A hardcoded AWS Marketplace usage threshold was replaced with per-organization anomaly detection, so unusually high or low usage is flagged against the customer's normal usage rather than a global cutoff.

OMNI Edge Provisioning

OMNI shipped a substantial set of platform improvements this month, hardening the edge networking layer and expanding edge node capability.

Edge Node Capability Expansion

Edge nodes now run on ARM in addition to x86, and GPU MIG and time slicing are supported for finer-grained GPU sharing across workloads. Private connectivity — for example, VPC peering — is now supported for both the API server and the WireGuard server, so edge traffic can stay off the public internet where required. Edge clusters get a local-path provisioner installed by default, and the per-edge-location component set has been rounded out with kvisor, gpu-metrics-exporter, and spot-handler now being default.

Onboarding Hardened Across AWS, GCP, and OCI

Edge location onboarding scripts for all three providers were rebuilt this month. GCP and AWS no longer use static credentials — GCP moved to workload identity, AWS to OIDC — and OCI's onboarding script was rewritten with the same pattern and switched to OIDC authentication. AWS edge nodes can now be assigned a custom Instance Profile (a prerequisite for pulling images from private ECR registries), GCP edge nodes can use a provided GCP service account as their credential provider, and the GCP onboarding flow supports a configurable VPC CIDR. Cluster onboarding can also be expressed in Terraform.

Edge Node Reliability and Onboarding Guardrails

Pods scheduled to an unhealthy edge node no longer stay stuck in Pending — edge nodes are now reported as NotReady when the underlying remote node becomes unhealthy, so the Kubernetes scheduler can route work elsewhere. Edge location onboarding also blocks when an older agent version is running, preventing customers from connecting to an edge location that would silently misbehave in the cluster.

Organization Management

Organization Membership Badge

The organization list now shows a badge when you're not a member of an organization, making the distinction visible before you try to act on it.

Per-Cluster Notifications and Navigation Improvements

Notifications can now be configured per cluster from the console. The bell icon in the header navigates to the notifications view, where a Slack channel multi-select replaces the previous behavior, and several default values have been tightened to reduce setup friction for customers new to the platform.

User Interface Improvements

OpsPilot Header and Delete Confirmation

The OpsPilot chat header was redesigned, the delete-conversation flow now prompts for confirmation before deleting, and a badge appears when a reminder is set.

Event Log View Persistence

The event log now keeps the selected view when you toggle between options, instead of resetting.

Terraform and Agent Updates

We've released an updated version of our Terraform provider. As always, the latest changes are detailed in the changelog on GitHub. The updated provider and modules are now ready for use in your infrastructure-as-code projects in Terraform's registry.

We have released a new version of the Cast AI agent. The complete list of changes is here. To update the agent in your cluster, please follow these steps or use the Component Control dashboard in the Cast AI console.