September 2024

Pod Pinner Release, Enhanced Runtime Security, and New Workload Autoscaler Features

Major Features and Improvements

Pod Pinner: General Availability Release

We're excited to announce that Pod Pinner has graduated from beta and is now generally available. This CAST AI in-cluster component enhances the alignment between CAST AI Autoscaler decisions and actual pod placement, leading to improved resource utilization and potential cost savings.

Pod Pinner is enabled by default for eligible clusters. Some customers may not see this feature immediately; if you're interested in using Pod Pinner, please contact our support team for activation.

For detailed installation instructions, configuration options, and best practices, refer to our Pod Pinner documentation.

Custom Lists for Runtime Security

We've enhanced our Runtime Security capabilities with configurable custom lists. This flexible feature lets you create and manage lists of elements like IP addresses, process names, or file hashes that you can reference in your security rules.

Key features:

  • Create and manage custom lists via API
  • Integrate lists into security rules using CEL expressions
  • Support for various entry types, including IPv4, SHA256 hashes, and process names

New API endpoints:

  • Create a list: POST /v1/security/runtime/list
  • Retrieve all lists: GET /v1/security/runtime/list
  • Get a specific list: GET /v1/security/runtime/list/{id}
  • Add items to a list: POST /v1/security/runtime/list/{id}/add
  • Remove items from a list: POST /v1/security/runtime/list/{id}/remove
  • Get entries of a list: GET /v1/security/runtime/list/{id}/entries
  • Delete a list: POST /v1/security/runtime/list/delete

This addition provides greater flexibility in defining and enforcing security policies in your Kubernetes environment. For usage details and examples, refer to our Runtime Security documentation.

Cloud Provider Integrations

GPU Support for AKS Clusters (Preview)

Our GPU support expanded to include Azure Kubernetes Service (AKS) clusters, bringing feature parity across major cloud providers. This addition allows for autoscaling of GPU-attached nodes in AKS environments.

Key features:

  • Support for NVIDIA GPUs in AKS clusters
  • Autoscaling capabilities for workloads requesting GPU resources
  • Integration with default and custom node templates
  • Updated Terraform modules to support GPU configurations in AKS

This feature is currently in preview and available upon request. If you're interested in using GPU support for your AKS clusters, please contact our support team for enablement.

For more information on GPU support across cloud providers, see our GPU documentation.

Optimization and Cost Management

Cost Comparison Report Overhaul

We've significantly improved our cost comparison reports, providing deeper visibility into your Kubernetes spending over time. This overhaul includes several new features:

  • Memory dimension added alongside CPU metrics for a more comprehensive resource analysis
  • Workload analysis showing Workload Autoscaler impact, helping you understand cost optimizations
  • Growth rate insights for cluster costs vs. size, allowing you to track cost efficiency as your cluster grows

These enhancements will help you gain more detailed insights into your Kubernetes spending patterns and optimization opportunities.

For more information on using these new features, please refer to our Cost Comparison Report documentation.

Workload Startup Ignore Period: Full Feature Availability

The feature to ignore initial resource usage during workload startup is now fully available across our platform. This enhancement is particularly beneficial for applications with high initial resource demands, such as Java applications.

Users can now configure scaling policies to disregard resource metrics for a specified period after startup, ensuring more accurate autoscaling decisions. This feature is accessible via:

  • CAST AI Console
  • Terraform
  • API

For details on implementation and best practices, please refer to our Workload Autoscaler documentation.

Custom Look-back Period for Workload Autoscaler

We've enhanced our Workload Autoscaler with a custom look-back period feature. This allows you to specify the historical timeframe the autoscaler uses when analyzing resource usage and generating scaling recommendations.

Key points:

  • Set custom look-back periods for CPU and memory separately
  • Available through the Annotations, API, and Terraform, with UI coming soon

This feature provides greater flexibility in optimizing your autoscaling policies to match your specific application needs and usage patterns. For more details on configuring custom lookback periods, see our Workload Autoscaler documentation.

Enhanced Cluster Efficiency Reporting

We've improved the granularity of our cluster efficiency reports by reducing the minimum time step from 1 day to 1 hour. This change affects the /v1/cost-reports/clusters/{clusterId}/efficiency API endpoint.

Key benefits:

  • More accurate representation of cluster efficiency over time
  • Better alignment between efficiency reports and dashboard data
  • Improved visibility for clusters with frequent resource changes

For more details on using the efficiency report API, see our Cost Reports API documentation.

Node Configuration

Improved EKS Node Configuration with Instance Profile ARN Suggestions

We've enhanced the EKS node configuration experience by adding suggestions for Instance Profile ARNs. This feature simplifies the setup of CAST AI-provisioned nodes in your EKS clusters.

Key benefits:

  • Automated suggestions for Instance Profile ARNs
  • Reduced need to switch between CAST AI and AWS consoles

Added Ability to Define MaxPods per Node Using Custom Formula

We've enhanced the EKS node configuration to allow users to select a formula for calculating the maximum number of pods per specific AWS EC2 node.

Key benefits:

  • Supports customer configurations with various Container Network Interfaces (CNIs)
  • Allows for non-default max pods per node, providing greater flexibility in cluster management
  • Enables more precise control over pod density on nodes

This feature enhances CAST AI's ability to adapt to diverse customer environments and networking setups. For more information on using this feature, please refer to our EKS Node Configuration documentation.

Security and Compliance

SSH Protocol Detection in Runtime Security

We've enhanced our Runtime Security capabilities by implementing SSH protocol detection. This feature helps identify potentially unauthorized or unusual SSH connections to pods, which are generally discouraged in Kubernetes environments.

Key benefits:

  • Improved visibility into SSH usage within your clusters
  • Enhanced security signaling for potentially risky connections

This addition strengthens your ability to monitor and secure your Kubernetes workloads. For more information on Runtime Security features, see our documentation.

Expanded Detection of Hacking Tools

We've enhanced our KSPM capabilities by adding detection rules for a wider range of hacking and penetration testing tools. This update improves our ability to identify potential security threats in your Kubernetes environments.
For more information on our security capabilities, refer to our Runtime Security documentation.

Security Improvement for GPU Metrics Exporter

We've improved the security of the GPU Metrics Exporter by moving sensitive information (API Key, Cluster ID) from ConfigMap to Secrets. This change enhances the protection of your credentials. When updating to the latest version, a job will automatically migrate existing data to the new secure format. For details on updating, see our GPU Metrics Exporter documentation.

API and Metrics Improvements

New Cost Comparison API Endpoint

We've introduced a new API endpoint for retrieving cluster-level cost comparison data between two periods of time:GET v1/cost-reports/organization/cost-comparison.

This endpoint allows you to compare costs and savings between two time periods, providing resource cost breakdowns and savings calculations.

For more details on parameters and response format, see our API documentation.

Enhanced Node Pricing API

We've updated our Node Pricing API to provide a more detailed breakdown of node costs. This improvement offers greater transparency and flexibility in understanding your cluster's pricing structure.

Key updates:

  • Detailed base price breakdown for nodes, including CPU, RAM, and GPU prices
  • All price components are now exposed directly in the API response
  • Available for new pricing data; historical data breakdowns not included at this time

This update allows for more accurate cost analysis and simplifies integration with external tools. To access this enhanced pricing data, use the endpoint: /v1/pricing/clusters/{clusterId}/nodes/{nodeId}.

For more details on using the updated Node Pricing API, refer to our API documentation.

Expanded Node Cost Metrics

We've expanded our node metrics endpoint (/v1/metrics/nodes) to include detailed cost information. New metrics include hourly costs for CPU and RAM, as well as overprovisioning percentages for these resources.
For more information on using these metrics, refer to our metrics documentation or check out our API Reference.

Updated Image Scanning API

We've updated the /v1/security/insights/images API endpoint to include an image scan status filter. This improvement allows for more efficient querying of scanned images and supports multiple status selections in a single query. For details, see our API documentation.

User Interface Improvements

Enhanced Grouping Options in Runtime Security

We've improved the user experience in the Runtime Security section of the CAST AI console by introducing flexible grouping options. This allows users to organize and analyze security data more effectively.

Key features:

  • New Group by dropdown menu in the Runtime Security interface
  • Additional grouping parameters including Anomaly, Cluster, Namespace, Resource, and Workload

Terraform and Agent Updates

We've released an updated version of our Terraform provider. As always, the latest changes are detailed in the changelog. The updated provider and modules are ready for use in your infrastructure as code projects in Terraform's registry.

We have released a new version of the CAST AI agent. The complete list of changes is here. To update the agent in your cluster, please follow these steps.