Commitments

๐Ÿ“ฃ

New release

A recently released feature for which we are actively gathering community feedback.

The commitments feature set is CAST AI's generic approach to utilizing Reserved Instance (AWS, Azure) and Committed Use Discount (GCP) capacity in autoscaling.

๐Ÿ“˜

Reservations are being replaced by commitments

Up till now CAST AI supported only utilization of Azure's Reserved instances. Commitments feature set makes this feature more generic with an ability to support all cloud providers in similar manner.

CAST AI will utilize Azure Reservations(RIs) or GCP Instances procured using Resource-based committed use discounts(CUDs) to scale up a cluster and maximize the utilization of the customer's long-term commitments.

Supported providers

ProviderSupported
GCP+
Azure+
AWSComing soon

How it works?

Commitments are uploaded at the organizational level using a script (Committed Use Discount for GCP) or CSV file upload functionality (Azure Reservations). One CUD or Reserved Instance in the user's cloud account is equal to one commitment in the CAST AI platform. Once uploaded, commitments can be managed in the following way:

  • Enabled - when a commitment is assigned to a cluster but is disabled, it will not be actively utilized in the autoscaling.
    • CAST AI can be restricted to use only a specified percentage of an uploaded commitment by setting the allowed percentage of utilization in the commitment settings.
  • Assigned to one or multiple clusters - if a commitment is uploaded to CAST AI but is not assigned to a cluster and is not enabled, it will not be used in the autoscaling; however, its utilization is still tracked.
  • Deleted - the commitment can be deleted from CAST AI inventory and will no longer be tracked or utilized. This action does not affect the commitment record in the cloud account.

When autoscaling and tracking commitment usage, CAST AI supports flex sizing, meaning an instance of any size can be provisioned as long as it is within the same instance family that is covered by the commitment.

Upscaling the cluster using commitments

The following section details how the CAST AI Autoscaler utilizes enabled and assigned commitments:

  • Commitments must belong to the same region as the cluster to be utilized by the Autoscaler.
  • Instances listed in the commitments list are given the highest priority when scaling up the cluster.
  • The autoscaler will try to utilize a commitment up to 100% of the assigned scope. When a commitment is fully utilized, the autoscaler will continue in the usual flow.
  • Since a single commitment can be assigned to multiple clusters, several clusters may require upscale at the same time, resulting in temporary overuse of a commitment (this can be resolved by setting up scheduled rebalancing).
  • Commitment usage is global in the organization; hence, Autoscaler will respect global usage when making decisions. If a cluster is not assigned to a commitment but contains an instance type covered by it - Autoscaler will count that instance toward commitment usage.
  • Depending on the configuration, the Autoscaler could add any instance type to any of the clusters within the organization. A cluster that is not assigned to any commitment might upscale with an instance type that will be counted towards global commitment usage related to that instance type.
  • User-provided price data is not currently considered in upscaling decisions. All instances are priced at $0. Therefore, the Autoscaler will always prioritize them when other scheduling conditions are met. As a result, other features (e.g., Rebalancer, Cost Monitoring) also depict instances covered by Commitments as costing $0.
  • Also, note that instances covered by commitments are treated as 'on-demand' capacity. Therefore, if a workload is specifically marked to run only on spot capacity, the Autoscaler will respect this requirement. Consequently, the workload will still be scheduled to run on a spot instance, even if commitment capacity is available.
  • Autoscaler will no longer utilize instances covered by commitments if the expiration date of the commitment has already passed or the maximum purchased CPU count per commitment has been reached.

๐Ÿ“˜

Setting up a cluster to prioritize commitments, else running everything on spot instances

It is possible to set up a cluster to run workloads on instances covered by commitments, and if they are not available, to utilize spot instances. This behavior is supported by either applying spot tolerations to workloads in the configuration files or setting up a spot mutating webhook to apply spot only tolerations during scheduling.

Prioritized utilization

The Prioritized Utilization feature enables the allocation of commitment-covered resources among different clusters based on each cluster's business priority. This feature allows high-priority clusters, such as production environments, to have first access to capacity, ensuring they primarily run on these discounted resources during peak times for maximum stability and cost efficiency. Lower-priority clusters, like non-production environments, can be set up to utilize spot instances when commitment-covered resources are not available. During off-peak hours, when higher-priority clusters scale down, lower-priority clusters can access any freed-up capacity.

The feature allows users to set priority levels for each cluster assigned to the commitment. As clusters upscale or downscale, the system dynamically adjusts resource allocation based on these priorities, ensuring that commitments are maximized by the highest priority clusters. When upscaling, a top-priority cluster's Autoscaler ignores the commitment count assigned to lower-priority clusters and considers the commitment as fully available to itself (only counting its own utilization). During the upscale of the second priority cluster, the commitment calculation considers the utilization of the top priority cluster; however, it ignores the utilization of all lower priority clusters, and so on. This behavior could lead to over-utilization of the assigned commitments and should be addressed by running a Rebalancer on the lower-priority clusters.

Reporting utilization of commitments

Currently, reporting only provides a snapshot of the present situation without the ability to review historical data.

Key formulas

The following formulas are applied when it comes to commitment management:

  • Commtiment utilization = (Provisioned CPU รท Total CPU in the commitment) ร— 0.5 + (Provisioned MEMรท Total MEM in the commitment) ร— 0.5

Known limitations

The current implementation of commitment management has the following limitations:

  • Commitment utilization calculations are not updated frequently. They will be recalculated with each node addition or deletion and whenever clusters are onboarded or removed.
  • If you upload commitments but haven't connected any clusters from that cloud provider to CAST AI, the commitments won't appear in the user interface.