Epsagon Documentation

Welcome to the Epsagon Documentation. You'll find comprehensive guides and documentation to help you start working with our product as quickly as possible. Let's jump right in!

Get Started

Configure Alerts

Epsagon alerts support many use cases: Kubernetes metrics, application performance metrics, and application exceptions.


Exception alerts

Exception alerts are collected from the tracing libraries and available only for traced services and code

Manage your alert rules

  • To manage your alert rules, go to Alerts
  • To create a new alert rule, click Create New Alert.
    To set up an alert for exceptions, go to the alerts page, and create a new "simple" alert. In the alert type - choose Exception.

Lambda Events

Lambda event alerts are triggered for each Lambda invocation event of specific types.

  • Select event types to trigger the alert. Support events types are: Timeout, Out of Memory, Code Exception (available for traced functions), Function Error (identified by CloudWatch logs), Insight (function is close to the time limit or memory limit).
  • Filter which Lambda functions will trigger the alert by Application, specific Functions or AWS Accounts.

Lambda Metric

Lambda metric alerts are triggered whenever Lambda invocations cross a specific threshold over time. For example, when the average number of invocations ending in timeout is greater than 10 invocations for a period of 15 minutes.

To create a Lambda metric alert:

  • Select an aggregation function
  • Select the invocation event on which to base the metric
  • Select an operator and the threshold value
  • Select the timeframe in which to check the metric value before triggering the alert

Kubernetes Alerts

Kubernetes alerts trigger when a Kubernetes metric crosses a threshold. For example, when the average CPU usage of all pods is above 95% for 5 minutes.

To create a Kubernetes alert:

  • Select which Kubernetes entity you want to monitor: Node, Pod, or Container
  • Select a metric to check
  • Select an operator and threshold value
  • Select a timeframe in which to check the metric value before triggering the alert
  • Select which entities to check based on Deployment, Namespace or Cluster.

Trace Metrics Alerts

Trace metric alerts trigger when a check on trace data is met. For example, when the average duration of putting items in a specific DynamoDB is longer than 2 seconds.

To create a trace metric alert:

  • Select an aggregation function
  • Select the metric you want to check. Possible metrics are indexed numeric trace tags.
  • Select an operator and threshold value
  • Filter traces to match the use case you want to check. For example, filter traces by a Kafka stream name, or Kubernetes cluster.

Epsagon Metrics Alerts

"Epsagon Metrics" alerts lets you configure alerts based on metrics collected by your Prometheus. See our Kubernetes integration page for further info.

To create an "Epsagon Metrics" alert:

  • Select the metric you want to alert for.
  • Select an aggregation for the metric.
  • Filter the metric by specific filters.

Also, you need to create a threshold for your alert:

  • Frequency - How often the alert rule should be checked.
  • Evaluate For - For how long the threshold is checked. The threshold must be met for this amount of time in order to make the alert fire.

Select the conditions for your alert:

  • Reducer - Choose a reducing function for your query results.
  • Query - Choose what query to apply this conditions for.
  • From, To - Choose a timeframe for the alert's evaluation.
  • Operator - For comparison to the threshold.
  • Threshold - Your selected threshold.

Updated 2 months ago

Configure Alerts

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.