How to set up notification webhooks
Check how to configure webhooks to send important notifications to external Ops systems when something happens with your clusters.
Select the organization you want to configure the webhook to send notifications from the CAST AI Console to an external system.
-
Click on the Notifications Icon -> View All
-
Click on Webhooks
-
Click on Add webhooks
-
Create the Webhook
Field | Description |
---|---|
Name | The name of the Webhook configuration |
Callback Url | The callback URL to send the requests to |
Severity Triggers | The severity levels that will trigger that notification |
Template | The template of the request that will be sent to the callback URL |
The Request Template should be a valid JSON
. We provide a better overview of how to customize the payloads in the next section.
Request Template Configuration
We allow users to fully customize the request sent to external systems, in that way, we can support almost any application out there. The Request Template is the payload sent within the webhook call. The following variables from notifications are available:
Variable | Description | Usage |
---|---|---|
NotificationID | The UUID of the notification, it is unique | {{ .NotificationID }} |
OrganizationID | The organization that owns the notification | {{ .OrganizationID }} |
Severity | Indicates the severity of the impact on the affected system. | {{ .Severity }} |
Name | Name of the notification | {{ .Name }} |
Message | A high-level text summary message of the event. | {{ .Message}} |
Details as JSON | Free-form details from the event can be parsed into JSON. | {{ toJSON .Details }} |
Details as Escaped String | Escaped string format of the details that can be sent in any string field of the request. | {{ toEscapedString .Details }} |
Timestamp | When the Notification was created by CAST AI | {{ toISO8601 .Timestamp }} |
Cluster | Cluster information, might be empty, if the notification isn't specific | {{ toJSON .Cluster }} |
Cluster.ID | The unique identifier of the cluster on CAST AI | {{ .Cluster.ID }} |
Cluster.Name | Name of the cluster on CAST AI | {{ .Cluster.Name }} |
Cluster.ProviderType | Cloud provider of the cluster (eks, gke, aks, kops) | {{ .Cluster.ProviderType }} |
Cluster.ProjectNamespaceID | Cluster location where cloud provider organizes resources, eg.: GCP project ID, AWS account ID. | {{ .Cluster.ProjectNamespaceID }} |
As you can see, the variables are in go template style, and you can mix them anywhere you want in your Request Template.
Example of Request Template Slack
To send a notification on Slack we need a simple JSON request with payload in the body,
{
"text": "CAST AI - {{ .Name }}",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "{{ .Cluster.Name }}<br> {{ .Message}}"
}
}
]
}
How to create the webhook URL isn't in the scope of this how-to. You can find more information in the following link.
Example of Request Template PagerDuty
PagerDuty accepts Alerts in the endpoint https://events.pagerduty.com/v2/enqueue
. The content is a simple JSON request with the payload in the body. You can find below an example of a request template with the available variables:
{
"payload": {
"summary": "{{ .Message }}",
"timestamp": "{{ toISO8601 .Timestamp }}",
"severity": "critical",
"source": "CAST AI",
"component": "{{ .Cluster.Name}}-{{ .Cluster.ProviderType}}-{{ .Cluster.ProjectNamespaceID }}",
"group": "{{ .Name }}",
"class": "kubernetes",
"custom_details": {
"details": {{ toJSON .Details }}
}
},
"routing_key": "--routing_key--",
"dedup_key": "{{ .NotificationID }}",
"event_action": "trigger",
"client": "CAST AI",
"client_url": "https://console.cast.ai/external-clusters/{{ .Cluster.ID}}?org={{ .OrganizationID }}"
}
Note that dedup_key
was set as the NotificationID
. This field is unique in CAST AI and will ensure you won't produce an alert with the same content more than once.
How to create the routing_key
isn't in the scope of this how-to. You can find more information at https://developer.pagerduty.com/docs/
Example of Request Template OpsGenie
OpsGenie accepts Alerts in the endpoint https://api.opsgenie.com/v2/alerts
. The content is a simple JSON request with the payload in the body. You can find below an example of a request template with the available variables:
{
"message": "{{ .Message }}",
"alias": "{{ .NotificationID }}",
"description":{{ toEscapedString .Details }},
"details": {{ toJSON .Details }},
"tags": ["tag1"],
"priority":"P3"
}
Note that alias
was set as the NotificationID
. This field is unique in CAST AI and will ensure you won't produce an alert with the same content more than once.
See more details for all supported fields https://docs.opsgenie.com/docs/alert-api#create-alert
NOTE: You must set Content-Type: application/json in CAST AI UI Webhook headers section.
Anomaly Detection Webhook
It is possible to configure a Webhook to receive notifications about newly detected Anomalies. For this select category Security
and operation Anomalies
. Provide an URL to your endpoint and configure the JSON request template like
{
"details": {{ toJSON .Details }},
"cluster": {{ toJSON .Cluster }}
}
The structure of the .Details
JSON object is as follows:
{
"anomaly_id": "<UUID of the detected anomaly>",
"status: "<Anomaly status. One of open/acked/closed>",
"rule_metadata": {
"id": "<ID of the rule>",
"name": "<Name of the rule>",
"type": "<Type of the rule>",
"category": "<Category defined for the rule>",
"labels": {
// Labels defined for the rule that detected the anomaly
"custom": "label"
}
},
"events": [
{
"timestamp": "<Event timestamp in RFC3339 format>",
"type": "<Type of the event>",
"cluster": {
"id": "<ID of the cluster the event was recorded in>",
"name": "<Name of the cluster the event was recorded in>",
"organization_id": "<ID of the organization the event was recorded in>",
},
"resource": {
"namespace": "<Kubernetes namespace the event was recorded in>",
"pod": "<Name of the pod the event was recorded in>",
"container": "<Name of the container the event was recorded in>",
"workload_id": "<ID of the workload the event was recorded in>",
"workload_name": "<Name of the workload the event was recorded in>",
"workload_kind": "<Kind of the workload, e.g., Deployment>",
"workload_id": "<ID of the workload>",
"pod_labels": {
// labels set on the pod
},
"pod_annotations": {
// annotations set on the pod
},
},
"process": "<Name of the process the event was recorded for>",
"host_pid": "<PID on the host of the process the event was recorded for>",
"payload_digest": "<Unique key used to group related events>",
// only one of the following top level fields will be set
"exec": {
"path": "<Path to the executed file>",
"args": ["<Arguments passed to the process>"],
"sha256": "<SHA256 hash of the executed file>",
"file_details": { // optional value that will not always be present
"category": "<Category the file falls in e.g., crypto>",
"malware_name": "<Name of the malware, if file has been categorized to be related to malware>",
"malware_version": "<Version of the malware the file was identified to be>"
},
},
"file": {
"path": "<Path to the file (e.g., in magic write events, path to file)>",
},
"tcp": {
"destination": {
"ip": "<IP address the event connected to>",
"port": <Port to which the process connected to>
},
"ip_details": { // optional value that will not always be present; see ipDetails section below for an example
"ip_address": "<Same as destination.ip>",
"ip_version": <version of the IP used>,
"country_code": "<Country code of the IP address (e.g., US)>",
"isp": "<Name of the Internet Service Provider that owns the IP address>",
"domain": "<Correlated domain name to the IP>",
"hostnames": [
"<Additional hostnames for this IP>"
],
"is_tor": <Flag indicating if the given IP is a Tor node>,
"abuse_confidence_score": <Score from 0-100 to indicate how confidently the IP was marked as malicious>
},
"network_details": { // optional value that will not always be present
"category": "<Category the IP falls in e.g., crypto>"
}
},
"dns": {
"question": "<Domain ask to be resolved>",
"answers": [
{
"type": "<Type of the DNS answer. One of PUBLIC/PRIVATE/CNAME>",
"ip": "<Answer IP, only set if type is either PUBLIC/PRIVATE>",
"cname": "<Answer CNAME, only set if the type is CNAME>"
}
],
"flow_direction": "<Direction of the network flow. One of INGRESS/EGRESS/UNKNOWN>",
"network_details": { // optional value that will not always be present
"category": "<Category the IP falls in e.g., crypto>"
}
},
"socks5": {
"flow_direction": "<Direction of the network flow. One of INGRESS/EGRESS/UNKNOWN>",
"role": "<Role the process takes in the SOCKS5 proxy. One of UNKNOWN/SERVER/CLIENT>",
"command_or_reply": <Command or reply from the SOCKS5 client/server, see RFC1928 for more details>,
"address_type": "<Type of the address. One of UNKNOWN/IPv4/DOMAIN_NAME/IPv6>",
"destination": { // only set for address type IPv4/IPv6
"ip": "<IP address the event connected to>",
"port": <Port to which the process connected to>
},
"destination_domain": "<If addressType is DOMAIN_NAME, the domain the SOCKS5 proxy should connect to>"
},
"stdio_via_socket": {
"destination": {
"ip": "<IP address the event connected to>",
"port": <Port to which the process connected to>
},
"fd": "<File descriptor identified to be hooked up to a socket (either 0,1,2)>"
}
},
<Up to 10 events related to the anomaly>
],
}
Rule Types
The following rule types are currently supported:
Type | Name | Description |
---|---|---|
crypto_mining:binary_executed | Crypto mining command line arguments | Checks for EXEC events and tries to identify if they are related to crypto miners. The check is based on matching the binary file name, as well as the arguments. |
crypto_mining:dns_lookup | DNS to crypto mining | Checks for DNS events that try resolve a well known crypto related domains. |
crypto_mining:tcp_connect | TCP connection to crypto mining | Checks for TCP connections to crypto related IPs. |
network:tcp_public_non_standard_port | Suspicious Internet connection | Checks for TCP connections to public IPs on non HTTP related ports (neither 80 nor 443). |
network:suspicious_destination_ip | Suspicious Destination IP | Checks for network related events, that have a suspicious IP as destination. |
suspicious_binary:nezha_server | Process related to Nezha server | Checks EXEC events for execution of the Nezha Monitoring Tool. |
suspicious_binary:vnc_server | Process related to VNC server | Checks EXEC events for execution of VNC servers. |
general:dropped_binary_executed | Dropped new binary (container drift) | Checks for MAGIC_WRITE events (fires if ELF headers are written to any filesystem). |
general:oom_killed | Process OOM killed | Checks if a process was OOM killed. |
ml:suspicious_container_stats | Suspicious container stats | Leverages Machine Learning to detect suspicious resource usage patterns of containers. |
ipDetails
Example:
{
"ip_address": "118.25.6.39",
"ip_version": 4,
"country_code": "CN",
"isp": "Tencent Cloud Computing (Beijing) Co. Ltd",
"domain": "tencent.com",
"hostnames": [],
"is_tor": false
}
Event Types
Type | Description |
---|---|
exec | Triggered by any executed processes in a pod. |
dns | Triggered by any DNS lookup in a pod. |
file_change | Triggered by any write to a file in a pod. |
tcp_connect | Triggered by any TCP connection. |
tcp_listen | Triggered by any TCP socket listening in a pod. |
tcp_connect_error | Triggered by any connection errors when trying to open a TCP conenction. |
process_oom_killed | Triggered by any process that got OOM killed. |
magic_write | Triggered by any write event that writes an ELF binary header. |
Anomaly Detection Webhook for OpsGenie
You can access .Details object inside template by creating local $details variable.
This allows to access anomaly related fields.
{{- $details := fromJSON .Details -}}
{
"message": "Runtime: {{$details.rule_metadata.name}}",
"alias": "{{$details.anomaly_id}}",
"description": "View anomaly details in Console UI https://console.cast.ai/organization/security/runtime/anomalies/{{$details.anomaly_id}}",
"details": {
"cluster": "{{.Cluster.Name}}"
},
"tags": ["sec"],
"priority":"P3"
}
Updated about 2 months ago