Configuring Kvisor features

⚠️

Warning

The Cast AI Kubernetes Security feature set is undergoing significant changes. Some features shown in this documentation are being deprecated and others are moving to the cluster view in the console. Screenshots and navigation paths may not reflect the current product. Updated documentation is in progress.

📘

Which upgrade method to use

The Helm commands on this page use the umbrella chart (castai-helm/castai) by default. If you need to use a different method:

  • castctl: To upgrade all Cast AI components at once without managing Helm flags:
    castctl castware upgrade
    This preserves your existing configuration. See the castctl documentation for installation and authentication instructions.
  • Individual charts: If you installed each component as a separate Helm release (e.g., for ArgoCD or custom GitOps), replace the release name and chart reference with the component-specific ones (e.g., castai-workload-autoscaler and castai-helm/castai-workload-autoscaler) and remove the autoscaler.castai-workload-autoscaler. value prefix.

Not sure which method you used? Run helm list -n castai-agent. A single release named castai means umbrella chart; separate releases like castai-workload-autoscaler mean individual charts.


This guide explains how to configure various features of the Kvisor security agent to enhance your Kubernetes security posture. Kvisor provides several specialized monitoring and scanning capabilities that can be enabled and customized according to your needs.

Configuration overview

Kvisor supports multiple configuration options that can be set via Helm during installation or upgrade. The basic format for enabling or modifying features is:

📘

Configuration-only changes

This command upgrades all components in the umbrella chart, not just Kvisor. To change configuration without upgrading component versions, pin the chart to your current version by adding --version <your-current-chart-version> to the command. Run helm list -n castai-agent to check your current version.

helm upgrade castai castai-helm/castai -n castai-agent \
  --reset-then-reuse-values \
  --set autoscaler.castai-kvisor.[configuration-option]=[value]

All supported configuration values are found in the Kvisor Helm chart values.yaml.

Scanning frequencies and intervals

Cast AI performs different types of security scans at various intervals. You can customize these frequencies to meet your specific requirements:

Image vulnerability scanning

Default behavior:

  • Detection interval: Checks for new images every 30 seconds
  • Concurrent scans: 1 image scanned at a time
  • Trigger: Scans start automatically when new running images are detected

Configure image scan detection interval

helm upgrade castai castai-helm/castai -n castai-agent \
  --reset-then-reuse-values \
  --set autoscaler.castai-kvisor.controller.extraArgs.image-scan-interval=60s

Configure concurrent image scans

To increase scanning performance for environments with many images:

helm upgrade castai castai-helm/castai -n castai-agent \
  --reset-then-reuse-values \
  --set autoscaler.castai-kvisor.controller.extraArgs.image-concurrent-scans=6

Compliance checking

Default behavior:

  • Scan frequency: Every 60 seconds
  • Monitoring: Continuous configuration monitoring

Configure compliance scan interval

helm upgrade castai castai-helm/castai -n castai-agent \
  --reset-then-reuse-values \
  --set autoscaler.castai-kvisor.controller.extraArgs.kube-linter-scan-interval=120s

Summary of scan frequencies

Security FeatureDefault FrequencyConfiguration Parameter
Image vulnerability scanningEvery 30 secondsimage-scan-interval
Compliance checksEvery 60 secondskube-linter-scan-interval
Attack pathsEvery 3 hoursNot configurable
Runtime anomaliesReal-timeEvent-driven

Image scanning configuration

Scanning intervals and behavior

Cast AI automatically scans container images for vulnerabilities with the following default behavior:

  • Detection interval: Checks for new images every 30 seconds
  • Concurrent scans: 1 image scanned at a time
  • Trigger: Scans start automatically when new running images are detected

Configure image scan detection interval

helm upgrade castai castai-helm/castai -n castai-agent \
  --reset-then-reuse-values \
  --set autoscaler.castai-kvisor.controller.extraArgs.image-scan-interval=60s

Configure compliance scan interval

Compliance checks run every 60 seconds by default. To adjust this interval:

helm upgrade castai castai-helm/castai -n castai-agent \
  --reset-then-reuse-values \
  --set autoscaler.castai-kvisor.controller.extraArgs.kube-linter-scan-interval=120s

Use a custom image

You can configure Kvisor to use a custom image from your private registry:

helm upgrade castai castai-helm/castai -n castai-agent \
  --reset-then-reuse-values \
  --set autoscaler.castai-kvisor.image.repository=my-kvisor-repository \
  --set autoscaler.castai-kvisor.image.tag=my-tag \
  --set 'autoscaler.castai-kvisor.imagePullSecrets[0].name=my-pull-secret'

Private image scanning

For detailed instructions on configuring private image scanning, refer to the dedicated Private Image Scanning documentation.

Excluding Namespaces from Scanning

You may want to exclude specific namespaces from image scanning, particularly for third-party monitoring tools or systems that manage their own security. To configure namespace exclusions:

helm upgrade castai castai-helm/castai -n castai-agent \
  --reset-then-reuse-values \
  --set autoscaler.castai-kvisor.controller.extraArgs.image-scan-ignored-namespaces=namespace1,namespace2

For example, to exclude the Dynatrace namespace from scanning:

helm upgrade castai castai-helm/castai -n castai-agent \
  --reset-then-reuse-values \
  --set autoscaler.castai-kvisor.controller.extraArgs.image-scan-ignored-namespaces=dynatrace

You can specify multiple namespaces by separating them with commas.

Network traffic monitoring

Kvisor can collect Kubernetes network flows using eBPF. This feature provides visibility into pod-to-pod and pod-to-external communications, which is valuable for security analysis and network optimization.

To enable network traffic monitoring:

helm upgrade castai castai-helm/castai -n castai-agent \
  --reset-then-reuse-values \
  --set autoscaler.castai-kvisor.castai.apiKey=<your-api-token> \
  --set autoscaler.castai-kvisor.castai.clusterID=<your-cluster-id> \
  --set autoscaler.castai-kvisor.agent.enabled=true \
  --set autoscaler.castai-kvisor.agent.extraArgs.netflow-enabled=true
📘

Note

If you have the egressd component running, it should be uninstalled before enabling network traffic monitoring in Kvisor:

helm uninstall castai-helm/egressd -n castai-agent
📘

Enrich network flows with cloud context

To enrich netflow data with VPC, subnet, and region information from your cloud provider, see Cloud network context.

Resource statistics monitoring

Kvisor can collect resource usage statistics from containers and nodes. These monitoring capabilities are controlled by two independent flags, each enabling a distinct category of metrics.

Kubernetes storage metrics

Kvisor collects storage utilization data for both ephemeral storage and persistent volumes across your cluster nodes. This data surfaces per-node storage usage in the Node list of the Cast AI console.

Storage metrics collection requires both the Kvisor controller and agent to be enabled.

helm upgrade castai castai-helm/castai -n castai-agent \
  --reset-then-reuse-values \
  --set autoscaler.castai-kvisor.castai.apiKey=<your-api-token> \
  --set autoscaler.castai-kvisor.castai.clusterID=<your-cluster-id> \
  --set autoscaler.castai-kvisor.controller.enabled=true \
  --set autoscaler.castai-kvisor.agent.enabled=true \
  --set autoscaler.castai-kvisor.agent.extraArgs.storage-stats-enabled=true

Besides storage utilization directly in the node, Kvisor is also able to collect metrics on cloud volumes that are
being used. Collection happens inside the controller component and requires cloud access permissions, in order to fetch
volume related information. For now, only AWS and GCP are supported.

AWS

If IMDS (Instance Metadata Service) access is enabled for the Kvisor controller pod, no additional configuration is
required. Kvisor will automatically use the node's instance role to authenticate with AWS.

If IMDS access is not available (e.g. it has been restricted at the instance or pod level), you must set up
IRSA (IAM Roles for Service Accounts)
and annotate the Kvisor controller service account with the appropriate IAM role ARN. The IAM role must have
the following permission:

  • ec2:DescribeVolumes

Once authentication is configured, enable cloud volume metrics collection:

helm upgrade castai castai-helm/castai -n castai-agent \
  --reset-then-reuse-values \
  --set autoscaler.castai-kvisor.castai.apiKey=<your-api-token> \
  --set autoscaler.castai-kvisor.castai.clusterID=<your-cluster-id> \
  --set autoscaler.castai-kvisor.controller.enabled=true \
  --set autoscaler.castai-kvisor.agent.enabled=true \
  --set autoscaler.castai-kvisor.agent.extraArgs.storage-stats-enabled=true \
  --set autoscaler.castai-kvisor.controller.extraArgs.cloud-provider=aws \
  --set autoscaler.castai-kvisor.controller.extraArgs.cloud-provider-aws-region=<your-aws-region> \
  --set autoscaler.castai-kvisor.controller.extraArgs.cloud-provider-storage-sync-enabled=true

Replace <your-aws-region> with the AWS region your cluster is running in.

GCP

Cloud volume metrics collection on GCP requires Workload Identity
to be configured so that the Kvisor controller service account can authenticate with the GCP API. The bound GCP
service account must have the following permissions in your project:

  • compute.disks.list
  • compute.instances.list

Once Workload Identity is configured, enable cloud volume metrics collection:

helm upgrade castai castai-helm/castai -n castai-agent \
  --reset-then-reuse-values \
  --set autoscaler.castai-kvisor.castai.apiKey=<your-api-token> \
  --set autoscaler.castai-kvisor.castai.clusterID=<your-cluster-id> \
  --set autoscaler.castai-kvisor.controller.enabled=true \
  --set autoscaler.castai-kvisor.agent.enabled=true \
  --set autoscaler.castai-kvisor.agent.extraArgs.storage-stats-enabled=true \
  --set autoscaler.castai-kvisor.controller.extraArgs.cloud-provider=gcp \
  --set autoscaler.castai-kvisor.controller.extraArgs.cloud-provider-gcp-project-id=<your-gcp-project-id> \
  --set autoscaler.castai-kvisor.controller.extraArgs.cloud-provider-storage-sync-enabled=true

Replace <your-gcp-project-id> with your GCP project ID.

Pressure Stall Information (PSI) metrics

Kvisor can collect Pressure Stall Information (PSI) metrics from containers and nodes, along with CPU, memory, and I/O usage statistics. PSI metrics indicate how often workloads experience delays due to contention for CPU, memory, or I/O resources, helping you identify performance bottlenecks before they affect application behavior.

helm upgrade castai castai-helm/castai -n castai-agent \
  --reset-then-reuse-values \
  --set autoscaler.castai-kvisor.castai.apiKey=<your-api-token> \
  --set autoscaler.castai-kvisor.castai.clusterID=<your-cluster-id> \
  --set autoscaler.castai-kvisor.agent.enabled=true \
  --set autoscaler.castai-kvisor.agent.extraArgs.stats-enabled=true

GPU metrics

Kvisor collects GPU metrics from NVIDIA GPUs in your cluster using DCGM Exporter. The GPU export pipeline and dcgm-exporter run only on nodes with GPUs attached, regardless of cloud provider — Kvisor itself continues to run on all cluster nodes.

To enable GPU metrics collection:

helm upgrade castai castai-helm/castai -n castai-agent \
  --reset-then-reuse-values \
  --set autoscaler.castai-kvisor.castai.apiKey=<your-api-token> \
  --set autoscaler.castai-kvisor.castai.clusterID=<your-cluster-id> \
  --set autoscaler.castai-kvisor.agent.gpu.enabled=true

Migrating from gpu-metrics-exporter

If you are currently using the standalone gpu-metrics-exporter component, follow these steps to migrate:

  1. Upgrade Kvisor and enable GPU metrics:
helm repo update castai-helm
helm upgrade castai castai-helm/castai -n castai-agent \
  --reset-then-reuse-values \
  --set autoscaler.castai-kvisor.castai.apiKey=<your-api-token> \
  --set autoscaler.castai-kvisor.castai.clusterID=<your-cluster-id> \
  --set autoscaler.castai-kvisor.agent.gpu.enabled=true
  1. Uninstall the old component:
helm uninstall gpu-metrics-exporter -n castai-agent
⚠️

Important

Kvisor detects if gpu-metrics-exporter is present in the cluster and will not report GPU metrics while it remains installed. Uninstall the old component for GPU metrics to start flowing.

Runtime security monitoring

Runtime Security monitoring enables real-time detection of anomalous activities in your cluster. This feature requires installing the Kvisor agent as a DaemonSet on all nodes.

To enable runtime security:

helm upgrade castai castai-helm/castai -n castai-agent \
  --reset-then-reuse-values \
  --set autoscaler.castai-kvisor.castai.apiKey=<your-api-token> \
  --set autoscaler.castai-kvisor.castai.clusterID=<your-cluster-id> \
  --set autoscaler.castai-kvisor.agent.enabled=true \
  --set autoscaler.castai-kvisor.agent.extraArgs.ebpf-events-enabled=true \
  --set autoscaler.castai-kvisor.agent.extraArgs.file-hash-enricher-enabled=true

For detailed information on runtime security, refer to the Runtime Security documentation.

Resource configuration

For large clusters or environments with intensive monitoring requirements, you may need to increase the resources allocated to Kvisor:

helm upgrade castai castai-helm/castai -n castai-agent \
  --reset-then-reuse-values \
  --set autoscaler.castai-kvisor.controller.resources.requests.cpu=100m \
  --set autoscaler.castai-kvisor.controller.resources.requests.memory=2Gi \
  --set autoscaler.castai-kvisor.controller.resources.limits.memory=2Gi

Adjust these values based on your specific cluster size and monitoring needs.

Combine multiple features

You can enable multiple features and configure scan intervals in a single Helm command:

helm upgrade castai castai-helm/castai -n castai-agent \
  --reset-then-reuse-values \
  --set autoscaler.castai-kvisor.castai.apiKey=<your-api-token> \
  --set autoscaler.castai-kvisor.castai.clusterID=<your-cluster-id> \
  --set autoscaler.castai-kvisor.controller.enabled=true \
  --set autoscaler.castai-kvisor.agent.enabled=true \
  --set autoscaler.castai-kvisor.agent.extraArgs.ebpf-events-enabled=true \
  --set autoscaler.castai-kvisor.agent.extraArgs.file-hash-enricher-enabled=true \
  --set autoscaler.castai-kvisor.agent.extraArgs.netflow-enabled=true \
  --set autoscaler.castai-kvisor.agent.extraArgs.stats-enabled=true \
  --set autoscaler.castai-kvisor.agent.extraArgs.storage-stats-enabled=true \
  --set autoscaler.castai-kvisor.agent.gpu.enabled=true \
  --set autoscaler.castai-kvisor.controller.extraArgs.image-concurrent-scans=6 \
  --set autoscaler.castai-kvisor.controller.extraArgs.image-scan-interval=30s \
  --set autoscaler.castai-kvisor.controller.extraArgs.kube-linter-scan-interval=60s

Verify your configuration

To verify that your configuration changes have been applied correctly:

helm get values castai -n castai-agent

This command displays the current configuration values for Kvisor.

Troubleshooting

If you encounter issues after changing Kvisor's configuration:

Check Controller Logs

kubectl logs -l app.kubernetes.io/name=castai-kvisor-controller -n castai-agent

Check Agent Logs

kubectl logs -l app.kubernetes.io/name=castai-kvisor-agent -n castai-agent

Restart the Pods

If configuration changes don't seem to be taking effect, you can restart the Kvisor pods:

kubectl rollout restart deployment castai-kvisor-controller -n castai-agent
kubectl rollout restart daemonset castai-kvisor-agent -n castai-agent

Next Steps

After configuring Kvisor features, you can: