New release

A recently released feature for which we are actively gathering community feedback.

The commitments feature set is CAST AI's generic approach to utilizing Reserved Instance (AWS, Azure) and Committed Use Discount (GCP) capacity in autoscaling.


Reservations are being replaced by commitments

Up till now CAST AI supported only utilization of Azure's Reserved instances. Commitments feature set makes this feature more generic with an ability to support all cloud providers in similar manner.

CAST AI will utilize Azure Reservations(RIs) or GCP Instances procured using Resource-based committed use discounts(CUDs) to scale up a cluster and maximize the utilization of the customer's long-term commitments.

Supported providers

AWScoming soon

How it works?

Commitments are uploaded at the organizational level using a script (Committed Use Discount for GCP) or CSV file upload functionality (Azure Reservations). One CUD or Reserved Instance in the user's cloud account is equal to one commitment in CAST AI platform. Once uploaded, commitments can be managed in the following way:

  • Enabled - when commitment is assigned to a cluster but disabled it will not be actively utilized in the autoscaling.
    • CAST AI can be restricted to use only a specified percentage of an uploaded commitment by setting the allowed percentage of utilization in the commitment settings.
  • Assigned to one or multiple clusters - if commitment is uploaded to CAST AI but is not assigned to a cluster and not enabled, it will not be used in the autoscaling, however, its utilization is still tracked.
  • Deleted - commitment can be deleted from CAST AI inventory, and will no longer be tracked or utilized. The record of the commitment in the cloud account is not affected by this action.

When autoscaling and tracking commitment usage CAST AI supports flex sizing, meaning an instance of any size can be provisioned as long as it is within the same instance family that is covered by commitment.

Upscaling the cluster using commitments

The following section details how the CAST AI Autoscaler utilizes enabled and assigned commitments:

  • Commitments must belong to the same region as the cluster to be utilized by the Autoscaler.
  • Instances listed in the commitments list are given the highest priority when scaling up the cluster.
  • Autoscaler will try to utilize commitment up to 100% of the assigned scope. When a commitment is fully utilized autoscaler will continue on the usual flow.
  • Since a single commitment can be assigned to multiple clusters there is a chance that several clusters can require upscale at the same time, resulting in temporary over-usage of a commitment (this can be resolved by setting up scheduled rebalancing).
  • Commitment usage is global in the organization, hence Autoscaler will respect global usage when making decisions. If a cluster is not assigned to a commitment but contains an instance type covered by it - Autoscaler will count that instance toward commitment usage.
  • Depending on the configuration, any instance type could be added by the Autoscaler in any of the clusters within the organization. A cluster that is not assigned to any commitment might upscale with an instance type that will be counted towards global commitment usage related to that instance type.
  • User-provided price data is not currently considered in upscaling decisions. All instances are priced at $0. Therefore, the Autoscaler will always prioritize them when other scheduling conditions are met. As a result, other features (e.g., Rebalancer, Cost Monitoring) also depict instances covered by Commitments as costing $0.
  • Also, note that instances covered by commitments are treated as 'on-demand' capacity. Therefore, if a workload is specifically marked to run only on spot capacity the Autoscaler will respect this requirement. Consequently, the workload will still be scheduled to run on a spot instance, even if commitment capacity is available.
  • Autoscaler will no longer utilize instances covered by commitments if the expiration date of the commitment has already passed or the maximum purchased CPU count per commitment has been reached.


Setting up cluster to prioritize commitments, else running everything on spot instances

It is possible to set up a cluster to run workloads on instances covered by commitments, and if they are not available, to utilize spot instances. This behavior is supported by either applying spot tolerations to workloads in the configuration files or setting up a spot mutating webhook to apply spot only tolerations during scheduling.

Prioritized utilization

The Prioritized Utilization feature enables the allocation of commitment-covered resources among different clusters based on each cluster's business priority. This feature allows high-priority clusters, such as production environments, to have first access to capacity, ensuring they primarily run on these discounted resources during peak times for maximum stability and cost efficiency. Lower-priority clusters, like non-production environments, can be set up to utilize spot instances when commitment-covered resources are not available. During off-peak hours when higher-priority clusters scale down, lower-priority clusters can access any freed-up capacity.

The feature operates by allowing users to set priority levels for each cluster assigned to the commitment. As clusters upscale or downscale, the system dynamically adjusts resource allocation based on these priorities, ensuring that commitments are maximized by the highest priority clusters. When upscaling, a top-priority cluster's Autoscaler ignores the commitment count assigned to lower-priority clusters and considers the commitment as fully available to itself (only counting its own utilization). During the upscale of the second priority cluster, the commitment calculation takes into account the utilization of the top priority cluster however ignores the utilization of all lower priority clusters, and so on. This behavior could lead to over-utilization of the assigned commitments and should be addressed by running a Rebalancer on the lower-priority clusters.

Reporting utilization of commitments

Currently, reporting only provides a snapshot of the present situation, without the ability to review historical data.

Key formulas

The following formulas are applied when it comes to commitment management:

  • Commtiment utilization = (Provisioned CPU Γ· Total CPU in the commitment) Γ— 0.5 + (Provisioned MEMΓ· Total MEM in the commitment) Γ— 0.5

Known limitations

The current implementation of commitment management has the following limitations:

  • Commitment utilization calculations are not updated frequently. They will be recalculated with each node addition or deletion, and whenever clusters are onboarded or removed.
  • If you upload commitments but haven't connected any clusters from that cloud provider to CAST AI, the commitments won't appear in the user interface.