Anomaly rules engine

General

The Anomaly Rules Engine is a powerful tool CAST AI provided as part of our Kubernetes runtime security feature set. It allows you to define custom rules to detect and classify events as anomalies based on specific criteria. This enables proactive monitoring and alerting for potential security threats or unusual behavior within your Kubernetes cluster.

Rule types

The Anomaly Rules Engine supports two types of rules:

  • Built-In Rules: These rules are pre-defined by CAST AI and cover common security scenarios. They are readily available for use and can be quickly enabled or disabled as needed.
  • User-Defined CEL Rules: These rules are custom-defined by you using the Common Expression Language (CEL). CEL provides a flexible and expressive way to define complex matching criteria based on event properties and resource attributes.

Each rule consists of two main components:

  • Resource Selectors: Used to filter and select the relevant resources to which the rule should be applied.
  • Event Matching: Defines the conditions and criteria for identifying anomalous events.

📘

Note

Only the resource selectors can be modified for Built-In rules, while the event-matching logic is pre-defined by CAST AI. For User-Defined CEL rules, you have full control over both the resource selectors and event-matching logic.

Resource selectors

Resource selectors allow you to filter events based on specific resource attributes before applying the rule. This helps narrow down the scope of events to be analyzed.

Resource selectors are defined using CEL expressions. The CEL program has access to two variables:

  • cluster of type types.Cluster: Represents the Kubernetes cluster.
  • resource of type types.KubernetesObject: Represents the Kubernetes resource associated with the event.

These variables expose various properties that can be used in the CEL expressions to select the desired resources. Refer to the corresponding sections below for more details on the available properties.

Resource selectors examples

Include only some clusters and namespaces:

cluster.name in ["dev", "testing"] && resource.namespace == "apps"

Exclude pods with prefixes:

resource.namespace == "castai-agent" && resource.pod.startsWith("castai-imgscan")

User-defined CEL rules

User-defined CEL rules provide the flexibility to define custom anomaly detection logic based on your specific requirements. You can write CEL expressions to match events and trigger anomalies based on various conditions.

CEL rules Examples

Detect OOM:

event.type == event_process_oom_killed

Detect that container dropped new executed binary and process started with the name nginx:

event.type == event_magic_write && event.process.name.startsWith("nginx")

Detect TCP connection to non-standard ports for public IP:

event.type == event_tcp_connect &&
event.tcp.destination.ip.public() && 
!(event.tcp.destination.port in [80, 443])

Detect exec arguments matches:

cel.bind(bad_args, ["tigervnc", "novnc", "--vnc", "rfbport"],
event.type == event_exec &&
bad_args.exists_one(bad_arg, 
	event.exec.args.exists_one(arg, arg.lowerAscii().contains(bad_arg))
))

Allow dropped binary by hash:

event.type == event_exec && event.exec.is_upper_layer && 
!(hex(event.exec.sha256) in [
  "9f64a747e1b97f131fabb6b447296c9b6f0201e79fb3c5356e6c77e89b6a806a",
])

Allow dropped binary by hash from custom list:

event.type == event_exec && event.exec.is_upper_layer && 
  !customLists("allowed-binaries").contains(match_sha256, event.exec.sha256)

NOTE: The allowed-binaries list need to be created before. See the custom list sections of the docs for more details.

Detect SSH:

event.type == event_ssh

Exposed variables

In a User-defined CEL rule, the program has access to the event through the event variable of type types.Event. This variable represents the current event being evaluated and provides access to its properties.

Refer to the sections below for more information on the available exposed properties.

Helper Functions

The Anomaly Rules Engine provides several helper functions that can be used in CEL expressions to perform common operations:

FunctionDescription
IP(string) -> IPParses the given string into an IP address. If the string is not a valid IP address, it will fail with an error.
Example: IP("10.0.0.1") != IP("2345:425:2CA1:0000:0000:567:5673:23b5")
CIDR(string) -> types.CIDRParses the given string formatted in prefix notation to a CIDR.
Example: CIDR("10.0.0.0/8") != CIDR("2001:1111:2222:3333::/64")
hex(bytes) -> stringEncodes the given bytes as a hexadecimal string. The resulting string will be lowercase.
Example: hex(event.exec.sha256)
fromHex(string) -> bytesInverse of hex(bytes) that takes a hex-encoded string and turns it into bytes. It will fail with an error if the given string is an invalid hex.
Example: fromHex('CAFE')
UUID(string) -> bytesParses the given string as a UUID and returns the underlying bytes.
Example: event.cluster.id == UUID('ecb5cb1b-7e7f-4dad-b504-6d13b69ce62e')
public(IP) -> boolChecks if the given IP is a public IP.
bitSet(int, int) -> boolProbes if the bit specified by the second argument is set in the first argument.
bitMaskSet(int, int) -> boolProbes if all bits from the second argument are set in the first argument.
customLists(string) -> customListMatcherLoads the values from the specified custom lists into a matcher, that can be used to probe if values exists in that list. For more details, see the section about CustomLists.
Standard library functions https://github.com/google/cel-spec/blob/master/doc/langdef.md#list-of-standard-definitions
Other custom string functionshttps://github.com/google/cel-go/blob/b66ac6c0896350d105d71bf1960eece62ebb0c3c/ext/strings.go#L41

These functions can be used in your CEL expressions to parse and manipulate data related to events and resources. They provide convenient ways to work with IP addresses, CIDR notations, hexadecimal representations, and UUIDs.

Common Types

The Anomaly Rules Engine uses various types to represent events, resources, and their associated properties. These types expose properties that can be accessed and used in CEL expressions to define resource selectors and event-matching logic. Here are the commonly used types.

types.Event

The types.Event type represents an event observed in the Kubernetes cluster. It contains information about the event type, timestamp, associated cluster, resource, and other event-specific details.

PropertyTypeDescription
typetypes.EventTypeType of the event. See the types.EventType table below for available types.
timestampTimestamp (RFC3339 string)Timestamp when the event was observed.
clustertypes.ClusterCluster in which the event was observed.
resourcetypes.KubernetesObjectKubernetes resource the event was observed for.
processtypes.ProcessProcess details.
dnstypes.DNSPayloadAdditional information about the observed DNS request if the event is of type event_dns.
exectypes.ExecPayloadAdditional information about the observed command execution if the event is of type event_exec.
filetypes.FilePayloadAdditional information about the file if the event is of type event_magic_write.
socks5types.SOCKS5PayloadAdditional information about the observed SOCKS5 actions if the event is of type event_socks5.
stdio_via_sockettypes.StdioViaSocketPayloadAdditional information regarding the potential reverse shell if the event is of type event_stdio_via_socket.
tcptypes.TCPPayloadAdditional information about the observed TCP actions if the event is any of the event_tcp_\* types.
sshtypes.SSHPayloadAdditional information about an observed SSH connection, if the event is of type event_ssh.
payload_digestuint64Field used to group related events together.

types.EventType

The types.EventType type represents the different types of events that can be observed by the Anomaly Rules Engine. Each event type corresponds to a specific action or occurrence in the Kubernetes cluster.

NameDescription
event_execTriggered when the execution of a binary is observed.
event_dnsTriggered when a DNS-related request is observed.
event_tcp_connectTriggered when a new TCP connection is observed.
event_tcp_listenTriggered when a new TCP socket starts listening.
event_process_oom_killedTriggered when a process is observed to have been killed due to an out-of-memory (OOM) condition.
event_magic_writeTriggered when a write operation on an ELF binary is observed at runtime.
event_stdio_via_socketTriggered when the binding of any standard input/output (STDIO) file descriptors to a network socket is observed, which might indicate a reverse shell.
event_tty_detectedTriggered when the allocation of a new pseudo-terminal (PTTY) device is detected.
event_socks5_detectedTriggered when SOCKS5-related network traffic is observed.
event_sshTriggered when an SSH connection is observed.

types.Cluster

The types.Cluster type represents a Kubernetes cluster and contains information about the cluster's identity and organization.

PropertyTypeDescription
idUUIDUnique identifier of the cluster.
namestringName of the cluster.
organization_idUUIDIdentifier of the organization to which the cluster belongs.

types.KubernetesObject

The types.KubernetesObject type represents a Kubernetes resource associated with an event. It provides details about the container, pod, namespace, and workload related to the event.

PropertyTypeDescription
containerstringContainer the event was observed for.
container_idstringID of the container the event was observed for.
namespacestringNamespace in which the container is running.
podstringPod the event was observed for.
pod_annotationsmap[string]stringAnnotations set on the Pod that the event was observed for.
pod_labelsmap[string]stringLabels set on the Pod that the event was observed for.
workload_idUUIDID of the workload related to the event.
workload_kindstringKind of workload the Pod belongs to (e.g., Deployment, StatefulSet).
workload_namestringName of the workload to which the Pod belongs.

types.Process

PropertyTypeDescription
namestringProcess name.
pidintProcess ID as seen on the container.
host_pidintProcess ID as seen on the host.
start_timeintThe time when the process was started in seconds.

types.DNSPayload

The types.DNSPayload type represents details about an observed DNS query. If the observed event was triggered by a DNS server, flow_direction will be set to flow_egress.

PropertyTypeDescription
questionstringDomain name to be resolved by the server.
answers[]types.DNSAnswerList of resolved answers for the given question.
flow_directiontypes.FlowDirectionDirection of flow of the observed request.
network_detailstypes.NetworkDetailsDetails about the observed DNS request.
remotetypes.AddrPortAddress details about the remote end of the observed request.

types.DNSAnswer

The types.DNSAnswer type represents the answer received for a DNS query. It contains information about the type of the answer and the associated data.

PropertyTypeDescription
typetypes.DNSAnswerTypeType of the DNS answer, indicating whether it is a public IP, private IP, or CNAME.
cnamestringDomain name returned if the DNS answer type is dns_cname.
ipIPIP address returned if the DNS answer type is either dns_public_ip or dns_private_ip.

types.DNSAnswerType

The types.DNSAnswerType type represents the different types of DNS answers that can be received.

NameDescription
dns_unknownDNS type could not be determined.
dns_public_ipDNS answer was classified as a public IP address.
dns_private_ipDNS answer was classified as a private IP address.
dns_cnameDNS answer was classified as CNAME.

types.ExecPayload

The types.ExecPayload type contains additional information about an observed command execution event.

PropertyTypeDescription
args[]stringList of arguments observed in the execute command.
file_detailstypes.FileDetailsAdditional information about the executed file.
pathstringPath to the executed file.
sha256types.SHA256HashSHA256 hash of the executed file.
is_upper_layerboolExecution from upperdir writable layer. Works only for overlays.
is_memfdboolExecution of a binary in memfd.
is_tmpfsboolExecution of a binary in tmpfs.
is_dropped_binaryboolExecution of a binary that was observed to be dropped (this should also have triggered a MAGIC_WRITE event).

types.FilePayload

The types.FilePayload type represents additional information about a file related to the event.

PropertyTypeDescription
pathstringPath to the file related to the event.

types.TCPPayload

The types.TCPPayload type contains additional information about observed TCP actions

PropertyTypeDescription
destinationtypes.AddrPortDestination of the TCP-related packets.
network_detailstypes.NetworkDetailsAdditional information about the destination based on IP set data.
ip_detailstypes.IPDetailsAdditional information about the destination IP from third-party services (like AbuseIPDB).

types.StdioViaSocketPayload

The types.StdioViaSocketPayload type represents additional information about a potential reverse shell event.

PropertyTypeDescription
destinationtypes.AddrPortThe destination of the socket to which the standard input/output (STDIO) file descriptor is bound.
fduint32The file descriptor bound to the socket (0 = STDIN, 1 = STDOUT, 2 = STDERR).

types.SSHPayload

The types.SOCKS5Role type represents the role of the observed process in SOCKS5 communication.

PropertyTypeDescription
flow_directiontypes.FlowDirectionObserved SSH connection flow.
remotetypes.AddrPortAddress details of the remote part of the connection.

types.AddrPort

The types.AddrPort type represents an IP address and port combination.

PropertyTypeDescription
ipIPIP address related to the observed event (depends on the event type).
portuint16Port number related to the observed event (depends on the event type).

types.NetworkDetails

The types.NetworkDetails type contains additional information about a network.

PropertyTypeDescription
Categorytypes.CategoryThe category under which the network has been classified.

types.FlowDirection

The types.FlowDirection type represents the direction of network flow.

NameDescription
flow_unknownNetwork flow direction is unknown.
flow_ingressNetwork was classified as incoming.
flow_egressNetwork was classified as outgoing.

types.Category

The types.Category type represents different categories to which events can be classified.

NameDescription
category_malwareEvent was classified as being related to malware.
category_cryptoEvent was classified as being related to cryptocurrency.

types.FileDetails

The types.FileDetails type contains additional details about a file.

PropertyTypeDescription
categorytypes.CategoryCategory to which the file has been classified.
malware_namestringName of the malware identified, if the file is related to malware.
malware_versionstringVersion of the malware detected, if the file is related to malware.

types.IPDetails

The types.IPDetails type represents additional information about an IP address.

PropertyTypeDescription
abuse_confidence_scoreintA score from 0-100 indicating the confidence level of classifying the IP address as malicious.
country_codestringCountry code from which the IP address originates. In ISO 3166-1 alpha-2 format.
domainstringDomain name related to the IP address.
hostnames[]stringHost names associated with the IP address.
ip_addressstringIP address of the event as a string.
ip_versionintVersion of the IP address (4= IPv4, 6 = IPv6).
is_torboolFlag indicating whether the IP address is related to the Tor network.
ispstringName of the Internet Service Provider (ISP) to which the IP belongs.

types.SOCKS5Payload

The types.SOCKS5Payload type contains additional information about observed SOCKS5 actions.

PropertyTypeDescription
destinationtypes.AddrPortDestination details of the SOCKS5 communication. If the address type is socks5_address_domain_name, only the port field is populated.
flow_directiontypes.FlowDirectionDirection of the observed SOCKS5 communication.
address_typetypes.SOCKS5AddressTypeAddress type used in the SOCKS5 command. If the command or reply does not contain an address type, this field might be set to unknown.
command_or_replyuint8Command or reply identifier as specified by RFC1928.
destination_domainstringDestination domain if the address_type is set to socks5_address_domain_name.
roletypes.SOCKS5RoleRole of the observed process in the SOCKS5 communication.

types.SOCKS5CmdOrReply

The types.SOCKS5CmdOrReply type represents the different address types used in SOCKS5 commands or replies.

NameDescription
socks5_address_domain_nameA domain name was observed to be used.
socks5_address_ipv6An IPv6 address was observed to be used.
socks5_address_unknownThe address type could not be determined.
socks5_address_ipv4An IPv4 address was observed to be used.

types.SOCKS5Role

The types.SOCKS5Role type represents the role of the observed process in SOCKS5 communication.

NameDescription
socks5_role_unknownRole could not be identified.
socks5_role_clientEvent was triggered by a SOCKS5 client.
socks5_role_serverEvent was triggered by a SOCKS5 server.

Custom Lists

It often can be useful to specify a list of values that can be quickly probed to e.g. test if a given binary hash is malicious. Within the CAST.AI security product, this can be achieved by so called CustomLists.

As of right now, Custom Lists can only be managed via the corresponding REST API endpoints. To create a custom list, use the /v1/security/runtime/list endpoint. All it expects is the name of the list, that will also be used to reference them from within CEL rules. On that note, list names are required to be unique (the API will return an error if one tries to create a list with an already existing name).

Now that we have a list, we can add entries to it. Those can then later be probed from within the CEL rule. Adding items to list is achieved by calling the /v1/security/runtime/list/{id}/add endpoint. As of right now, the following types of entries are supported:

TypeDescription
LIST_ENTRY_KIND_SHA256The value has to be an hex encoded SHA256 hash.
LIST_ENTRY_KIND_IPIPv4 or IPv6 address in string format (e.g. 1.2.3.4, fe80::42:bdff:feda:a32e).
LIST_ENTRY_KIND_CIDRIP address with a specified host mask (e.g. 1.0.0.0/8, fe80::42:bdff:feda:a32e/64).
LIST_ENTRY_KIND_STRINGSimple string value.

Items from lists can be removed via the /v1/security/runtime/list/{id}/remove endpoint. All fields of the entry you want to delete must be specified and match.

To delete a whole custom list, you can use the /v1/security/runtime/list/delete endpoint. The ID to be used in any of the list related endpoints can be retrieved by either storing it after a list create, or by querying via the /v1/security/runtime/list endpoint. NOTE: Lists that are referenced in CEL rules cannot be deleted.

Now that we created a custom list, lets have a look how to use it as part of a CEL rule. Lets have a look at a quick example:

event.type == event_exec &&
  customLists("known-malware", "likely-malware").contains(match_sha256, event.exec.sha256)

Custom lists can be loaded via the customLists function. This returns a custom list matcher, that offers a contains method. The first argument specifies what type of value to match. Here is a list of all currently available matchers:

MatcherArgument TypeDescription
match_sha256bytes (SHA256 hash)Checks if any of the specified lists contain a matching SHA256 hash.
match_ipIPChecks if any of the specified lists contains the given IP.
match_cidrIPChecks if any of the CIDRs from the specified lists contains the given IP.
match_stringstringChecks if any of the specified lists contains the given string.

You can feed any value into the contains function, as long as it matches the argument type required by the matcher.