Jump to Content
Cast AI
DocsAPI ReferenceRelease Notes
Log InCast AI
Docs
Log In
DocsAPI ReferenceRelease Notes

Introducing Cast AI

  • Getting started
    • About the read-only agent
    • Step by step guide to connecting your cluster
  • Enable automation
    • Autoscaler preparation checklist
  • Platform permissions & data privacy
    • Kubernetes permissions
    • Cloud permissions
    • Data collection and storage
    • Communication requirements
  • Component management
    • Component control dashboard
    • Helm charts
    • Hosted components
    • Terraform provider
  • Disconnect your cluster

Cast AI Anywhere

  • Overview
  • Getting started

Savings

  • Available Savings

Cluster Autoscaling

  • Autoscaling
    • Node templates
    • Subnets
    • Pod placement
    • Spot instances
    • GPU instances
    • Reservations
    • Commitments
    • Pod Pinner
    • Network bandwidth
    • Autoscaler settings
  • Node provisioning
    • Node configuration
  • Downscaling
    • How it works
    • Evictor
  • Rebalancing
    • How it works
    • Preparation
    • Scheduled rebalancing
    • Paused drain configuration
  • Spot-only cluster
  • Cluster hibernation
    • Cluster hibernation (Legacy)
  • Migration from Karpenter
  • Upgrading Kubernetes version
  • Pod mutations

Workload Autoscaling

  • Overview
  • Workload Autoscaler Configuration
    • Legacy Annotations Reference (Deprecated)
  • Vertical scaling policies
  • Horizontal Pod Autoscaling
    • Vertical & Horizontal Pod Autoscaling
  • Event log

Cost Monitoring

  • Introduction to Cost Monitoring
  • Dashboard
  • Organization-level reports
    • Organizational cluster cost report
    • Organizational allocation groups
  • Cluster-level reports
    • Efficiency
    • Workloads
    • Namespaces
    • Allocation groups
    • Cost comparison
  • GPU utilization
  • Network cost
  • Storage cost
  • CPU vs. memory cost calculation
  • Cluster score

Kubernetes Security

  • Getting started
  • Kvisor security agent
    • Overview
    • Installation & upgrading
    • Configuring Kvisor features
    • Private image scanning
  • Security reports
    • Security dashboard
    • Compliance
    • Vulnerabilities
    • Attack paths
    • Node updates
  • Runtime security
    • Overview
    • Installation & upgrading
    • Anomaly rules engine

AI Enabler

  • Getting started
  • Supported LLM providers
  • Hosted model deployment

Database Optimizer

  • Introduction
  • Getting started
  • How does it work?
    • Access requirements and permissions
    • Security and compliance
    • Supported platforms
    • Performance estimation & cost savings
  • Connecting client applications
  • Application failover configuration

Observability

  • Notifications
    • How to set up notification webhooks
    • Webhook integration examples
  • Metrics
    • Integrating Prometheus Metrics with New Relic
  • Network Observability

Open Source

  • Cluster controller
  • Spot handler
  • egressd
  • Audit log exporter
  • kVisor security agent
  • GPU metrics exporter

Administration

  • SSO
  • AWS Marketplace subscription setup
  • Organization management
  • Role-Based Access Control (RBAC)
    • Users
      • How-to: Inviting users to organization
      • How-to: Changing user roles and access
      • How-to: Removing users
    • User Groups
      • How-to: Creating and configuring user groups
      • How-to: Managing existing user groups
    • Service Accounts
      • How-to: Creating service accounts
      • How-to: Managing service accounts
  • Discount engine
    • Overview
    • Getting started
  • Audit log

API

  • Reference

Troubleshooting

  • Cluster and node status overview
  • Managing DaemonSets with Cast AI
  • Cast AI components troubleshooting
  • Cloud provider troubleshooting
  • Common deployment challenges
  • Terraform troubleshooting
  • Cluster certificate rotation

Business Continuity

  • Risks and detection
  • Minimize impact

FAQ

  • API
  • Arm and Graviton
  • Autoscaler
  • Cast AI Agent
  • Evictor
  • Spot nodes, castai-spot-handler, and mutating webhook
  • CUD, Savings Plans, and reservations
  • egressd, network and VPC
  • General
  • GPU
  • Helm
  • Kubernetes
  • Logs, alerts, and metrics
  • Node templates, node configuration, and labels
  • Permissions, users, Orgs, and SSO
  • Rebalancing
  • Reports and UI/UX
  • Rightsizing recommendations and Workload Autoscaling
  • Storage
  • Terraform
  • Updates and images

Platform permissions & data privacy

Suggest Edits

This section provides an overview of the permissions used by Cast AI components, required port openings, and the data collected by the components. Please refer to the relevant section in the submenu for more details.

Updated 18 days ago


What’s Next
  • Kubernetes permissions
  • Cloud permissions
  • Data collection and storage
  • Communication requirements