Epsagon Documentation

Welcome to the Epsagon Documentation. You'll find comprehensive guides and documentation to help you start working with our product as quickly as possible. Let's jump right in!

Get Started

Creating Alerts

Epsagon Alerts' Best Practices

Here's a few of the most popular use cases for environment alerts.

Detect Function Slowdowns

Alerts can be used to monitor a group of lambda functions durations. If the duration crosses a defined threshold, users can be alerted. This can be used to immediately identify slow-downs in your system.

Create a lambda duration alert:

  • Select an alert type - 'Lambda Metric'
  • Select an aggregation function - 'max' (recommended)
  • Select the invocation metric - 'duration'
  • Select an operator and the threshold value - ' > X'
  • Select the timeframe in which to check the metric value

Alert on Expensive Resources

Alerts can also be triggered when a cost threshold is passed. This will allow users to be on top of any major spikes in cost to our services.

Create a lambda cost alert:

  • Select an alert type - 'Lambda Metric'
  • Select an aggregation function - 'max' (recommended)
  • Select the invocation metric - 'cost'
  • Select an operator and the threshold value - ' > X'
  • Select the timeframe in which to check the metric value

Reduce Alert Fatigue

Alerting on every error could result in error fatigue. In the case where some errors should be expected, a threshold could alert when errors reach a certain amount.

Create a lambda error alert:

  • Select an alert type - 'Lambda Metric'
  • Select an aggregation function - 'max' (recommended)
  • Select the invocation event on which to base the metric - 'errors'
  • Select an operator and the threshold value - ' > X'
  • Select the timeframe in which to check the metric value

Get Alerts on Specific Customers or Events

Alerts can also be based on custom labels, tags, or data within the payload, allowing innumerable flexibility and unique dimensions to alert off of. As an example, users may want to be alerted only on conditions crossed for specific customers.

Create a trace event alert:

  • Select an alert type - 'Trace'
  • Select a tag
  • Select an aggregation function
  • Select the metric or tag you want to check. Possible metrics areΒ indexedΒ numeric trace tags.
  • Select an operator and threshold value
  • Use the filter to identify specific customer identifier by selecting the exact JSON path to the customer id. Please note you need to first use Index Tags in the trace to index a specific field.

Be on Top of Any Production Issues

It is crucial to differentiate between Production issues and any other environment issues. One of the most common mistakes users make is not splitting alerts into different channels based on environment. This could lead to Production and Pre-Production alerts going into the same channel, potentially resulting in alert fatigue.

In order to avoid that, follow these rules when setting up an alarm:

  • Make sure you select the correct application (Production applications)
  • Assign a dedicated channel for each environment (most recommended: Slack or Teams specific channels)
  • Name alerts properly to make it clear that it's a production-based alert.
  • Use application-based alerts

Identify Database or Other Resource Related Issues

Epsagon enables you to not only be alerted based on a traces payload, but also on any resources or operations that are detected by Epsagon agents. Users can easily create a trace-based alert whenever a database is showing increased latency, or a high percentage of errors.

Create a trace event alert:

  • Select alert type - 'Trace'
  • Select a tag
  • Select an aggregation function
  • Select the metric to be monitored - e.g. 'error = true'
  • Select an operator and threshold value
  • Use the filter for specific resources: e.g. aws.dynamodb.table equals to .

Updated 3 months ago


Creating Alerts


Epsagon Alerts' Best Practices

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.