How to set up notification webhooks

Check how to configure webhooks to send important notifications to external Ops systems when something happens with your clusters.

Select the organization you want to configure the webhook to send notifications from the CAST AI Console to an external system.

  1. Click on the Notifications Icon -> View All

  2. Click on Webhooks

  3. Click on Add webhooks

  4. Create the Webhook

FieldDescription
NameThe name of the Webhook configuration
Callback UrlThe callback URL to send the requests to
Severity TriggersThe severity levels that will trigger that notification
TemplateThe template of the request that will be sent to the callback URL

The Request Template should be a valid JSON. We provide a better overview of how to customize the payloads in the next section.

Request Template Configuration

We allow users to fully customize the request sent to external systems, in that way, we can support almost any application out there. The Request Template is the payload sent within the webhook call. The following variables from notifications are available:

VariableDescriptionUsage
NotificationIDThe UUID of the notification, it is unique{{ .NotificationID }}
OrganizationIDThe organization that owns the notification{{ .OrganizationID }}
SeverityIndicates the severity of the impact on the affected system.{{ .Severity }}
NameName of the notification{{ .Name }}
MessageA high-level text summary message of the event.{{ .Message}}
Details as JSONFree-form details from the event can be parsed into JSON.{{ toJSON .Details }}
Details as Escaped StringEscaped string format of the details that can be sent in any string field of the request.{{ toEscapedString .Details }}
TimestampWhen the Notification was created by CAST AI{{ toISO8601 .Timestamp }}
ClusterCluster information, might be empty, if the notification isn't specific{{ toJSON .Cluster }}
Cluster.IDThe unique identifier of the cluster on CAST AI{{ .Cluster.ID }}
Cluster.NameName of the cluster on CAST AI{{ .Cluster.Name }}
Cluster.ProviderTypeCloud provider of the cluster (eks, gke, aks, kops){{ .Cluster.ProviderType }}
Cluster.ProjectNamespaceIDCluster location where cloud provider organizes resources, eg.: GCP project ID, AWS account ID.{{ .Cluster.ProjectNamespaceID }}

As you can see, the variables are in go template style, and you can mix them anywhere you want in your Request Template.

Example of Request Template Slack

To send a notification on Slack we need a simple JSON request with payload in the body,

{
    "text": "CAST AI - {{ .Name }}",
    "blocks": [
     {
      "type": "section",
      "text": {
       "type": "mrkdwn",
       "text": "{{ .Cluster.Name }}<br> {{ .Message}}"
      }
     }
    ]
}

How to create the webhook URL isn't in the scope of this how-to. You can find more information in the following link.

Example of Request Template PagerDuty

PagerDuty accepts Alerts in the endpoint https://events.pagerduty.com/v2/enqueue. The content is a simple JSON request with the payload in the body. You can find below an example of a request template with the available variables:

{
    "payload": {
        "summary": "{{ .Message }}",
        "timestamp": "{{ toISO8601 .Timestamp }}",
        "severity": "critical",
        "source": "CAST AI",
        "component": "{{ .Cluster.Name}}-{{ .Cluster.ProviderType}}-{{ .Cluster.ProjectNamespaceID }}",
        "group": "{{ .Name }}",
        "class": "kubernetes",
        "custom_details": {
            "details": {{ toJSON .Details }}
        }
    },
    "routing_key": "--routing_key--",
    "dedup_key": "{{ .NotificationID }}",
    "event_action": "trigger",
    "client": "CAST AI",
    "client_url": "https://console.cast.ai/external-clusters/{{ .Cluster.ID}}?org={{ .OrganizationID }}"
}

Note that dedup_key was set as the NotificationID. This field is unique in CAST AI and will ensure you won't produce an alert with the same content more than once.

How to create the routing_key isn't in the scope of this how-to. You can find more information at https://developer.pagerduty.com/docs/

Example of Request Template OpsGenie

OpsGenie accepts Alerts in the endpoint https://api.opsgenie.com/v2/alerts. The content is a simple JSON request with the payload in the body. You can find below an example of a request template with the available variables:

{
    "message": "{{ .Message }}",
    "alias": "{{ .NotificationID }}",
    "description":{{ toEscapedString .Details }},
    "details": {{ toJSON .Details }},
    "tags": ["tag1"],
    "priority":"P3"
}

Note that alias was set as the NotificationID. This field is unique in CAST AI and will ensure you won't produce an alert with the same content more than once.

See more details for all supported fields https://docs.opsgenie.com/docs/alert-api#create-alert

NOTE: You must set Content-Type: application/json in CAST AI UI Webhook headers section.

Anomaly Detection Webhook

It is possible to configure a Webhook to receive notifications about newly detected Anomalies. For this select category Security and operation Anomalies. Provide an URL to your endpoint and configure the JSON request template like

{
    "details": {{ toJSON .Details }},
    "cluster": {{ toJSON .Cluster }}
}

The structure of the .Details JSON object is as follows:

{
  "anomaly_id": "<UUID of the detected anomaly>",
  "status: "<Anomaly status. One of open/acked/closed>",
  "rule_metadata": {
    "id": "<ID of the rule>",
    "name": "<Name of the rule>",
    "type": "<Type of the rule>",
    "category": "<Category defined for the rule>",
    "labels": {
      // Labels defined for the rule that detected the anomaly
      "custom": "label"
    }
  },
  "events": [
    {
      "timestamp": "<Event timestamp in RFC3339 format>",
      "type": "<Type of the event>",
      "cluster": {
        "id": "<ID of the cluster the event was recorded in>",
        "name": "<Name of the cluster the event was recorded in>",
        "organization_id": "<ID of the organization the event was recorded in>",
      },
      "resource": {
        "namespace": "<Kubernetes namespace the event was recorded in>",
        "pod": "<Name of the pod the event was recorded in>",
        "container": "<Name of the container the event was recorded in>",
        "workload_id": "<ID of the workload the event was recorded in>",
      	"workload_name": "<Name of the workload the event was recorded in>",
      	"workload_kind": "<Kind of the workload, e.g., Deployment>",
        "workload_id": "<ID of the workload>",
        "pod_labels": {
          // labels set on the pod
        },
        "pod_annotations": {
          // annotations set on the pod
        },
      },
      
      "process": "<Name of the process the event was recorded for>",
      "host_pid": "<PID on the host of the process the event was recorded for>",
      "payload_digest": "<Unique key used to group related events>",
      
      // only one of the following top level fields will be set
      
      "exec": {
        "path": "<Path to the executed file>",
        "args": ["<Arguments passed to the process>"],
        "sha256": "<SHA256 hash of the executed file>",
        "file_details": { // optional value that will not always be present
          "category": "<Category the file falls in e.g., crypto>",
          "malware_name": "<Name of the malware, if file has been categorized to be related to malware>",
          "malware_version": "<Version of the malware the file was identified to be>"
        },
      },
      
      "file": {
        "path": "<Path to the file (e.g., in magic write events, path to file)>",
      },
      
      "tcp": {
        "destination": {
          "ip": "<IP address the event connected to>",
          "port": <Port to which the process connected to>
        },
        "ip_details": { // optional value that will not always be present; see ipDetails section below for an example
          "ip_address": "<Same as destination.ip>",
          "ip_version": <version of the IP used>,
          "country_code": "<Country code of the IP address (e.g., US)>",
          "isp": "<Name of the Internet Service Provider that owns the IP address>",
          "domain": "<Correlated domain name to the IP>",
          "hostnames": [
          	"<Additional hostnames for this IP>"
          ],
        	"is_tor": <Flag indicating if the given IP is a Tor node>,
        	"abuse_confidence_score": <Score from 0-100 to indicate how confidently the IP was marked as malicious>
        },
        "network_details": { // optional value that will not always be present
          "category": "<Category the IP falls in e.g., crypto>"
        }
      },
      
      "dns": {
        "question": "<Domain ask to be resolved>",
        "answers": [
    			{
    				"type": "<Type of the DNS answer. One of PUBLIC/PRIVATE/CNAME>",
    				"ip": "<Answer IP, only set if type is either PUBLIC/PRIVATE>",
    				"cname": "<Answer CNAME, only set if the type is CNAME>"
    			}
  			],
        "flow_direction": "<Direction of the network flow. One of INGRESS/EGRESS/UNKNOWN>",
        "network_details": { // optional value that will not always be present
          "category": "<Category the IP falls in e.g., crypto>"
        }
      },
      
      "socks5": {
        "flow_direction": "<Direction of the network flow. One of INGRESS/EGRESS/UNKNOWN>",
        "role": "<Role the process takes in the SOCKS5 proxy. One of UNKNOWN/SERVER/CLIENT>",
        "command_or_reply": <Command or reply from the SOCKS5 client/server, see RFC1928 for more details>,
        "address_type": "<Type of the address. One of UNKNOWN/IPv4/DOMAIN_NAME/IPv6>",
        "destination": { // only set for address type IPv4/IPv6
          "ip": "<IP address the event connected to>",
          "port": <Port to which the process connected to>
        },
        "destination_domain": "<If addressType is DOMAIN_NAME, the domain the SOCKS5 proxy should connect to>"
      },
      
      "stdio_via_socket": {
        "destination": {
          "ip": "<IP address the event connected to>",
          "port": <Port to which the process connected to>
        },
        "fd": "<File descriptor identified to be hooked up to a socket (either 0,1,2)>"
      }
    },
    <Up to 10 events related to the anomaly>
  ],
}

Rule Types

The following rule types are currently supported:

TypeNameDescription
crypto_mining:binary_executedCrypto mining command line argumentsChecks for EXEC events and tries to identify if they are related to crypto miners. The check is based on matching the binary file name, as well as the arguments.
crypto_mining:dns_lookupDNS to crypto miningChecks for DNS events that try resolve a well known crypto related domains.
crypto_mining:tcp_connectTCP connection to crypto miningChecks for TCP connections to crypto related IPs.
network:tcp_public_non_standard_portSuspicious Internet connectionChecks for TCP connections to public IPs on non HTTP related ports (neither 80 nor 443).
network:suspicious_destination_ipSuspicious Destination IPChecks for network related events, that have a suspicious IP as destination.
suspicious_binary:nezha_serverProcess related to Nezha serverChecks EXEC events for execution of the Nezha Monitoring Tool.
suspicious_binary:vnc_serverProcess related to VNC serverChecks EXEC events for execution of VNC servers.
general:dropped_binary_executedDropped new binary (container drift)Checks for MAGIC_WRITE events (fires if ELF headers are written to any filesystem).
general:oom_killedProcess OOM killedChecks if a process was OOM killed.
ml:suspicious_container_statsSuspicious container statsLeverages Machine Learning to detect suspicious resource usage patterns of containers.

ipDetails

Example:

{
  "ip_address": "118.25.6.39",
  "ip_version": 4,
  "country_code": "CN",
  "isp": "Tencent Cloud Computing (Beijing) Co. Ltd",
  "domain": "tencent.com",
  "hostnames": [],
  "is_tor": false
}

Event Types

TypeDescription
execTriggered by any executed processes in a pod.
dnsTriggered by any DNS lookup in a pod.
file_changeTriggered by any write to a file in a pod.
tcp_connectTriggered by any TCP connection.
tcp_listenTriggered by any TCP socket listening in a pod.
tcp_connect_errorTriggered by any connection errors when trying to open a TCP conenction.
process_oom_killedTriggered by any process that got OOM killed.
magic_writeTriggered by any write event that writes an ELF binary header.

Anomaly Detection Webhook for OpsGenie

You can access .Details object inside template by creating local $details variable.

This allows to access anomaly related fields.

{{- $details := fromJSON .Details -}}
{
    "message": "Runtime: {{$details.rule_metadata.name}}",
    "alias": "{{$details.anomaly_id}}",
    "description": "View anomaly details in Console UI https://console.cast.ai/organization/security/runtime/anomalies/{{$details.anomaly_id}}",
    "details": {
       "cluster": "{{.Cluster.Name}}"
    },
    "tags": ["sec"],
    "priority":"P3"
}