Node configuration
What is Node configuration?
The CAST AI provisioner allows you to set node configuration parameters that the platform will apply to provisioned nodes. Node configuration on its own does not influence workload placement. Its sole purpose is to apply user-provided configuration settings on the node during the provisioning process.
A cluster can have multiple Node Configurations linked to various Node Templates. However, you can select only one node configuration, which CAST AI Autoscaler will use as the default.
Note
You can link node configuration to multiple node templates, but one node template can have just a single node configuration link.
You can manage node configurations via UI:Autoscaler->Node configuration, API or Terraform.
Shared configuration options
The following table provides a list of supported cloud-agnostic configuration parameters:
Configuration | Description | Default value |
---|---|---|
Root volume ratio | CPU to storage (GiB) ratio | 1 CPU: 0 GiB |
Initial disk size | The base size of the disk attached to the node | 100 GiB |
Image | Image to be used when building a CAST AI provisioned node. See virtual machine image choice below for cloud-specific behaviors. | The latest available for Kubernetes release, based on an OS chosen by CAST AI |
SSH key | Base64-encoded public key or AWS key ID | "" |
Subnets | Subnet IDs for CAST AI provisioned nodes | All subnets pointing to NAT/Internet Gateways inside the cluster's VPC |
Instance tags | Tags/VM labels to be applied on CAST AI provisioned nodes | "" |
Kubelet configuration | A set of values that will be added or overwritten in the Kubelet configuration | JSON {} |
Init script | A script to be run when building the node | bash "" |
EKS-specific subnet rules
Note
In EKS only subnets which match one of the rules below are allowed to be added to Node Configuration:
- association with a route table that has a 0.0.0.0/0 route to Internet Gateway, it's known as a public subnet. Subnet also must have "MapPublicIpOnLaunch: true" set.
- association with a route table that has a 0.0.0.0/0 route to Transit Gateway, it's known as a private subnet
- association with a route table that has a 0.0.0.0/0 route to NAT Gateway, it's known as a private subnet
Some configuration options are cloud provider-specific. See the table below:
EKS-specific configuration options
Configuration | Description | Default value |
---|---|---|
Security groups | Security group IDs for nodes provisioned in CAST AI | Tagged and CAST AI SG |
Instance profile ARN | Instance profile ARN for CAST AI provisioned nodes | cast-<cluster-name>-eks-<cluster-id> (only the last 8 digits of the cluster ID) |
Dns-cluster-ip | Override the IP address to be used for DNS queries within the cluster | "" |
Container runtime | Container runtime engine selection: docker or containerd | Unspecified |
Docker configuration | A set of values that will be overwritten in the Docker daemon configuration | JSON {} |
Volume type | EBS volume type to be used for provisioned nodes | gp3 |
Volume IOPS | EBS volume IOPS value to be used for provisioned nodes | 3000 |
KMS Key ARN | Customer-managed KMS encryption key to be used when encrypting EBS volumes | Unspecified |
Volume throughput | EBS volume throughput in MiB/s to be used for provisioned nodes | 125 |
Use IMDS v1 | IMDSv1 and v2 are enabled by default, else only IMDSv2 will be allowed | True |
Target Groups | A list of Arn and port (optional). New instances will automatically be registered for all given load balancer target groups upon creation. | Unspecified |
Image Family | Which OS family will be used when provisioning nodes | Amazon Linux 2 (FAMILY_AL2) |
Kubelet configuration
Note that
kubeReserved
is not supported in EKS configurations.
KMS key for EBS volume
The key that you provide for the encryption of EBS volume must have the following policy:
{
"Sid": "Allow access through EBS for all principals in the account that are authorized to use EBS",
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": [
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:Encrypt",
"kms:DescribeKey",
"kms:Decrypt",
"kms:CreateGrant"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"kms:CallerAccount": "<<account_ID>",
"kms:ViaService": "ec2.<<region>>.amazonaws.com"
}
}
}
module "kms" {
source = "terraform-aws-modules/kms/aws"
description = "EBS key"
key_usage = "ENCRYPT_DECRYPT"
# Policy
key_statements = [
{
sid = "Allow access through EBS for all principals in the account that are authorized to use EBS",
principals = [
{
type = "AWS"
identifiers = ["*"]
}
]
actions = [
"kms:Encrypt",
"kms:Decrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:CreateGrant",
"kms:DescribeKey"
],
resources = ["*"],
conditions = [
{
test = "StringEquals"
variable = "kms:ViaService"
values = [
"ec2.${var.cluster_region}.amazonaws.com",
]
},
{
test = "StringEquals"
variable = "kms:CallerAccount"
values = [
data.aws_caller_identity.current.account_id
]
}
]}
]
# Aliases
aliases = ["mycompany/ebs"]
tags = {
Terraform = "true"
Environment = "dev"
}
}
Load balancer target groups prerequisites
In order to use the target groups functionality, the CAST.AI IAM role must be extended with additional permissions. An example IAM policy is provided below. The sample policy allows registering against all target groups but can be customized to only allow specific resources by replacing the wildcards *
with appropriate values.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "castai-targetgroup-registeration",
"Effect": "Allow",
"Action": "elasticloadbalancing:RegisterTargets",
"Resource": "arn:aws:elasticloadbalancing:<region>:<account>:targetgroup/*/*"
}
]
}
If target groups are configured for a node configuration but the permission is missing, nodes will be created and not joined to the target group. A notification will be sent detailing which target groups failed to be updated.
GKE-specific configuration options
Configuration | Description | Default value |
---|---|---|
Network tags | A string to be added to a tags field in a GCP VM resource | Empty |
Max pods per node | Maximum number of pods to be hosted on a node | 110 |
Boot disk | Boot disk storage type | balanced as per GCP documentation |
Use Local SSD-backed ephemeral storage | Attach local ephemeral storage backed by Local SSD volumes. Check GCP documentation for more details. | False |
AKS-specific configuration options
Configuration | Description | Default value |
---|---|---|
Max pods per node | Maximum number of pods to be hosted on a node | 30 |
OS Disk | The type of managed OS disk | Standard SSD |
Note
Kubelet configuration is not supported in AKS.
Virtual machine image choice
When CAST AI provisions a node, it must choose an appropriate VM image. This choice is crucial because the OS and version of the image determine the correct bootstrapping logic and instance type support and are critical to ensuring the node joins the cluster successfully. For advanced use cases, CAST AI offers several options in the node configuration.
EKS
EKS supports a combination of the Image and Image Family fields to control OS choice.
- Image family: Determines the provisioning logic based on OS. If not provided, a default family is used for all operations (currently Amazon Linux 2).
- Image: Used to determine the actual image choice more precisely. The system supports three scenarios for this field:
- AMI ID (e.g.,
ami-1234567890abcdef0
): A single item. Must point to a specific AMI. If the AMI architecture does not match the instance type, provisioning will fail. Use architecture restrictions in the Node template to avoid this scenario. The AMI must match the image family (default or provided value), or provisioning will fail. - Search string (e.g.,
amazon-eks-node-*
): The search matches thename
filter in aws describe-images and can include wildcards. The search can result in multiple images, and the system will choose the latest image in the list based on instance type architecture and Kubernetes version (if part of the image's name). If no images match the instance type architecture or the images are from a different family than the Image family field, provisioning will fail. - Empty: A default search will be performed based on the Image family. This search looks for public Amazon-owned images and will consider instance type architecture and Kubernetes versions to choose the proper image.
Sample scenarios and suggested configuration:
Scenario | Suggested setup |
---|---|
Hands-off approach, let CAST AI choose. | Empty Image and Image family. |
I want to use a specific OS family and let CAST AI choose the latest image based on the instance architecture and Kubernetes version. | Select Image family, empty Image field. |
I want to use private or third-party AMI images and let CAST AI choose the image based on instance architecture. | Add a search string in Image that matches the required images. Select the proper image family (if different from the default). For multi-architecture instances, the list must include images for both arm64 and x86. |
I want to use private or third-party AMI images that do not have architecture-agnostic builds but let CAST AI choose the latest release. | Add a search string in Image. Select the proper image family (if different from the default). Add architecture constraints to node templates. |
I want to use a specific golden AMI. | Enter the AMI in the Image field. Select the Image family (if different from the default) that matches the OS. Add architecture constraints to node templates. |
GKE/AKS
For GKE and AKS, the image field can be used to control the node bootstrapping logic (for Linux).
- The reference must point to a specific image.
- If the image does not match the instance type architecture (for example, ARM64 image for x86 node), node provisioning will fail.
- Changing the value might require a successful reconciliation to recreate CAST AI-owned node pools.
- If an image is not provided, the default behavior is to use the OS image captured when creating the
castpool
node pool.
How to create a node configuration
A default node configuration is created during cluster onboarding in the CAST AI-managed mode.
You can choose to modify this configuration or create a new one. If you add a new node configuration that will be applied to all newly provisioned nodes, you must mark it as default.
Node configurations are versioned, and when the CAST AI provisioner adds a new node, the latest version of the node configuration is applied.
A new configuration can't be applied to an existing node. If you want to upgrade node configuration on a node or a set of nodes, you need to delete an existing node and wait until Autoscaler replaces it with a new one or rebalance the cluster (fully or partially).
Kubelet configuration examples
You can find all available Kubelet settings in the Kubernetes documentation β Kubelet Configuration. Please refer to the version of your cluster.
For example, if you want to add some specific custom taints during node startup, you could do it with the following snippet:
{
"registerWithTaints": [
{
"effect": "NoSchedule",
"key": "nodes-service-critical",
"value": "true"
}
]
}
The second example involves configuring kubelet image pulling and setting kube API limits like the following:
{
"eventBurst": 20,
"eventRecordQPS": 10,
"kubeAPIBurst": 20,
"kubeAPIQPS": 10,
"registryBurst": 20,
"registryPullQPS": 10
}
Create node configuration with the CAST AI Terraform provider
Use the resource castai_node_configuration
from CAST AI terraform provider.
Reference example:
resource "castai_node_configuration" "test" {
name = local.name
cluster_id = castai_eks_cluster.test.id
disk_cpu_ratio = 5
subnets = aws_subnet.test[*].id
tags = {
env = "development"
}
eks {
instance_profile_arn = aws_iam_instance_profile.test.arn
dns_cluster_ip = "10.100.0.10"
security_groups = [aws_security_group.test.id]
}
}
Updated 9 days ago