Jump to Content
Cast AI
DocsAPI ReferenceRelease Notes
Log InCast AI
Docs
Log In
DocsAPI ReferenceRelease Notes

Introducing Cast AI

  • Getting started
    • About the read-only agent
    • Step by step guide to connecting your cluster
  • Enable automation
    • Autoscaler preparation checklist
  • Platform permissions & data privacy
    • Kubernetes permissions
    • Cloud permissions
    • Data collection and storage
    • Communication requirements
  • Component management
    • Component control dashboard
    • Helm charts
    • Hosted components
    • Terraform provider
  • Disconnect your cluster

Cast AI Anywhere

  • Overview
  • Getting started

Savings

  • Available Savings

Cluster Autoscaling

  • Autoscaling
    • Node templates
    • Subnets
    • Pod placement
    • Spot instances
    • GPU instances
    • Reservations
    • Commitments
    • Pod Pinner
    • Network bandwidth
    • Autoscaler settings
  • Node provisioning
    • Node configuration
  • Downscaling
    • How it works
    • Evictor
  • Rebalancing
    • How it works
    • Preparation
    • Scheduled rebalancing
    • Paused drain configuration
  • Spot-only cluster
  • Cluster hibernation
    • Cluster hibernation (Legacy)
  • Migration from Karpenter
  • Upgrading Kubernetes version
  • Pod mutations

Workload Autoscaling

  • Overview
  • Workload Autoscaler Configuration
    • Legacy Annotations Reference (Deprecated)
  • Vertical scaling policies
  • Horizontal Pod Autoscaling
    • Vertical & Horizontal Pod Autoscaling
  • Event log

Cost Monitoring

  • Introduction to Cost Monitoring
  • Dashboard
  • Organization-level reports
    • Organizational cluster cost report
    • Organizational allocation groups
  • Cluster-level reports
    • Efficiency
    • Workloads
    • Namespaces
    • Allocation groups
    • Cost comparison
  • GPU utilization
  • Network cost
  • Storage cost
  • CPU vs. memory cost calculation
  • Cluster score

Kubernetes Security

  • Getting started
  • Kvisor security agent
    • Overview
    • Installation & upgrading
    • Configuring Kvisor features
    • Private image scanning
  • Security reports
    • Security dashboard
    • Compliance
    • Vulnerabilities
    • Attack paths
    • Node updates
  • Runtime security
    • Overview
    • Installation & upgrading
    • Anomaly rules engine

AI Enabler

  • Getting started
  • Supported LLM providers
  • Hosted model deployment
  • AI Enabler Settings

Database Optimizer

  • Introduction
  • Getting started
    • AWS RDS & Aurora quick start
  • How does it work?
    • Access requirements and permissions
    • Security and compliance
    • Supported platforms
    • Performance estimation & cost savings
  • Connecting client applications
  • Application failover configuration
  • Tutorials
    • Analyzing database performance
  • Database Optimizer FAQ

Observability

  • Notifications
    • How to set up notification webhooks
    • Webhook integration examples
  • Metrics
    • Integrating Prometheus Metrics with New Relic
  • Network Observability

Open Source

  • Cluster controller
  • Spot handler
  • egressd
  • Audit log exporter
  • Kvisor security agent
  • GPU metrics exporter

Administration

  • SSO
  • AWS Marketplace subscription setup
  • Organization management
  • Role-Based Access Control (RBAC)
    • Users
      • How-to: Inviting users to organization
      • How-to: Changing user roles and access
      • How-to: Removing users
    • User Groups
      • How-to: Creating and configuring user groups
      • How-to: Managing existing user groups
    • Service Accounts
      • How-to: Creating service accounts
      • How-to: Managing service accounts
  • Discount engine
    • Overview
    • Getting started
  • Audit log

API

  • Reference

Troubleshooting

  • Cluster and node status overview
  • Managing DaemonSets with Cast AI
  • Cast AI components troubleshooting
  • Cloud provider troubleshooting
  • Common deployment challenges
  • Terraform troubleshooting
  • Cluster certificate rotation

Business Continuity

  • Risks and detection
  • Minimize impact

FAQ

  • API
  • Arm and Graviton
  • Autoscaler
  • Cast AI Agent
  • Evictor
  • Spot nodes, castai-spot-handler, and mutating webhook
  • CUD, Savings Plans, and reservations
  • egressd, network and VPC
  • General
  • GPU
  • Helm
  • Kubernetes
  • Logs, alerts, and metrics
  • Node templates, node configuration, and labels
  • Permissions, users, Orgs, and SSO
  • Rebalancing
  • Reports and UI/UX
  • Rightsizing recommendations and Workload Autoscaling
  • Storage
  • Terraform
  • Updates and images

Creating and managing caches

Suggest Edits

Updated 9 days ago