The commitments feature set is Cast AI's generic approach to utilizing Reserved Instance (AWS, Azure) and Committed Use Discount (GCP) capacity in autoscaling.

Cast AI will utilize Azure Reservations, AWS Reserved Instances(RIs), or GCP Instances procured using Resource-based committed use discounts(CUDs) to scale up a cluster and maximize the utilization of the customer's long-term commitments.

Supported providers

Provider	Supported
GCP	+
Azure	+
AWS	+

How it works?

Commitments are uploaded at the organizational level using a script (Committed Use Discounts for GCP, RIs for AWS, Azure Reservations). One CUD or Reserved Instance in the user's cloud account equals one commitment in the Cast AI platform. Once uploaded, commitments can be managed in the following way:

Enabled—when a commitment is assigned to a cluster but is disabled, it will not be actively utilized in the autoscaling.
- Cast AI can be restricted to use only a specified percentage of an uploaded commitment by setting the allowed percentage of utilization in the commitment settings.
Assigned to one or multiple clusters - if a commitment is uploaded to Cast AI but is not assigned to a cluster and is not enabled, it will not be used in the autoscaling; however, its utilization is still tracked.
- Cast AI supports assigning multiple commitments to all existing and newly added clusters as long as they match by region.
Deleted—The commitment can be deleted from Cast AI inventory and will no longer be tracked or utilized. This action does not affect the commitment record in the cloud account.

When autoscaling and tracking commitment usage, Cast AI supports flex sizing, meaning an instance of any size can be provisioned as long as it is within the same instance family covered by the commitment.

Upscaling the cluster using commitments

The following section details how the Cast AI Autoscaler utilizes enabled and assigned commitments:

Commitments must belong to the same region as the cluster to be utilized by the Autoscaler.
Instances listed in the commitments list are prioritized when scaling up the cluster.
The autoscaler will try to utilize a commitment of up to 100% of the assigned scope. The autoscaler will continue in the usual flow when a commitment is fully utilized.
Since a single commitment can be assigned to multiple clusters, several clusters may require upscale simultaneously, resulting in temporary overuse of a commitment (this can be resolved by setting up scheduled rebalancing).
Commitment usage is global in the organization; hence, Autoscaler will respect global usage when making decisions. If a cluster is not assigned to a commitment but contains an instance type covered by it - Autoscaler will count that instance toward commitment usage.
Depending on the configuration, the Autoscaler could add any instance type to any of the clusters within the organization. A cluster not assigned to any commitment might upscale with an instance type that will be counted towards global commitment usage related to that instance type.
User-provided price data is not currently considered in upscaling decisions. All instances are priced at $0. Therefore, the Autoscaler will always prioritize them when other scheduling conditions are met. As a result, other features (e.g., Rebalancer, Cost Monitoring) also depict instances covered by Commitments as costing $0.
Also, note that instances covered by commitments are treated as 'on-demand' capacity. Therefore, if a workload is specifically marked to run only on spot capacity, the Autoscaler will respect this requirement. Consequently, the workload will still be scheduled to run on a Spot Instance, even if commitment capacity is available.
Autoscaler will no longer utilize instances covered by commitments if the expiration date of the commitment has already passed or the maximum purchased CPU count per commitment has been reached.

📘
Setting up a cluster to prioritize commitments, else running everything on spot instances
It is possible to set up a cluster to run workloads on instances covered by commitments, and if they are not available, to utilize spot instances. This behavior is supported by either applying spot tolerations to workloads in the configuration files or setting up a Pod Mutation.

Prioritized utilization

The Prioritized Utilization feature enables allocating commitment-covered resources among different clusters based on each cluster's business priority. This feature allows high-priority clusters, such as production environments, to have first access to capacity, ensuring they primarily run on these discounted resources during peak times for maximum stability and cost efficiency. Lower-priority clusters, like non-production environments, can be set up to utilize Spot Instances when commitment-covered resources are unavailable. During off-peak hours, when higher-priority clusters scale down, lower-priority clusters can access any freed-up capacity.

The feature allows users to set priority levels for each cluster assigned to the commitment. As clusters upscale or downscale, the system dynamically adjusts resource allocation based on these priorities, ensuring that the highest priority clusters maximize commitments. When upscaling, a top-priority cluster's Autoscaler ignores the commitment count assigned to lower-priority clusters and considers the commitment fully available (only counting its own utilization). During the upscale of the second priority cluster, the commitment calculation considers the utilization of the top priority cluster; however, it ignores the utilization of all lower priority clusters, and so on. This behavior could lead to over-utilization of the assigned commitments and should be addressed by running a Rebalancer on the lower-priority clusters.

Reporting utilization of commitments

Currently, reporting only provides a snapshot of the present situation without the ability to review historical data.

Key formulas

The following formulas are applied when it comes to commitment management:

Commtiment utilization = (Provisioned CPU ÷ Total CPU in the commitment) × 0.5 + (Provisioned MEM÷ Total MEM in the commitment) × 0.5

Known limitations

The current implementation of commitment management has the following limitations:

Commitment utilization calculations are not updated frequently. They will be recalculated with each node addition or deletion and whenever clusters are onboarded or removed.
If you upload commitments but haven't connected any clusters from that cloud provider to Cast AI, the commitments won't appear in the user interface.

Supported providers

How it works?

Upscaling the cluster using commitments

📘Setting up a cluster to prioritize commitments, else running everything on spot instances

Prioritized utilization

Reporting utilization of commitments

Key formulas

Known limitations

📘
Setting up a cluster to prioritize commitments, else running everything on spot instances