Getting started with Keep — Observability Alerting with ease

Daniel Olabemiwo
May 14, 2023
Getting started with Keep — Observability Alerting with ease
Back to all Blog Posts

Creating and maintaining effective alerts, avoiding alert fatigue, and promoting a strong alerting culture can be difficult tasks. Keep addresses these challenges by treating alerts as code, integrating with observability tools, and using LLMs.

TL;DR:

Creating and maintaining effective alerts, avoiding alert fatigue, and promoting a strong alerting culture can be difficult tasks. Keep addresses these challenges by treating alerts as code, integrating with observability tools, and using LLMs. Culled from Keep

Observability is the ability to measure a system’s current state based on the data it generates such as logs, metrics, and traces. It is important for a number of reasons, including identifying problems early on, tracking performance over time, and troubleshooting problems. Observability has become more critical in recent years as engineering environments have gotten more complex.

It is important to have a good understanding of the system being monitored. This knowledge can help engineers to interpret the data collected by monitoring tools and tracing tools.

Alerting allows engineers to be notified of potential problems before they cause major disruptions. By monitoring system metrics and logs, engineers can identify trends that may indicate an impending failure. When these trends are detected, alerts can be sent to engineers so that they can take corrective action before the problem escalates.

For example, an alert could be triggered if a system’s CPU usage exceeds a certain threshold. This would alert engineers that the system is under heavy load and that they may need to scale it up or down to improve performance.

Where does Keep fit in?

Keep, a developer tool that treats alerts as a workflow and integrates with existing observability tools. With Keep, you can manage your alerts just as you manage your tests — stored in the application repository and integrated with your CI/CD.

Getting started with Keep using Docker

The Keep team has some alerts examples already, and we will be running the alerts in this blog. You can check them out on GitHub.

A provider in this instance is a Keep term for data source

docker run -v ${PWD}:/app -it keephq/cli config provider - provider-type slack - provider-id slack-demoConfig file: providers.yamlSlack f Url:https://hooks.slack.com/services/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

Note: At the time of reading this blog post, the Slack Webhook Url might have changed, so it’s important you check the KeepHQ Slack for an updated webhook URL

If your Slack webhook URL is correct, you should get a config file success message.

Slack Webhook Url on my CLI

You can check out KeepHQ Slack on how to obtain a demo webhook URL.


docker run -v ${PWD}:/app -it keephq/cli config provider - provider-type slack - provider-id slack-demoConfig file: providers.yamlSlack Webhook Url: https://hooks.slack.com/services/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Config file created at providers.yaml

If you’re using a Docker desktop, you can see keephq/cli image in your containers list.

To run your first alert, you can add this to your CLI:


docker run -v ${PWD}:/app -it keephq/cli -j run - alert-url https://raw.githubusercontent.com/keephq/keep/main/examples/alerts/db_disk_space.yml

Below is a view of the cli output after running the command above.

Executing alert on CLI

You have successfully executed Keep demo example “Paper DB has insufficient disk space” alert.

You can check Keep Slack alerts-playground if you used Keep Webhook Url as used in this article.

Alert on KeepHq slack channel

Congratulations on your first alert!


AIOps! Finding Incidents in the Haystack of Alerts

Picture this: a flood of alerts pouring in from various monitoring systems, each clamoring for attention. Amidst this deluge, identifying critical incidents is akin to finding a needle in a haystack.

Tal Borenstein
April 11, 2024
AIOps! Finding Incidents in the Haystack of Alerts

Unifying alerts from various sources

Demonstrate the strength of a unified API in consolidating and managing alerts.

Shahar Glazner
November 26, 2023
Unifying alerts from various sources

Observability vendor lock-in is in the small details

In the world of observability, vendor lock-in slows progress and spikes costs. OpenTelemetry broke some chains but didn't free us entirely. This post shows the bridge between talk and action and how platforms like Keep offer flexibility, interoperability, cost optimization, community-driven support, and an escape from vendor lock-in traps. If you maintain >1 observability/monitoring system, are concerned with vendor lock-in, and need help keeping track of what's going on and where, this post is for you.

Tal Borenstein
October 31, 2023
Observability vendor lock-in is in the small details

Extending Grafana with Workflows

We all have that one service that, for some Phantom-de-la-machina reason, gets stuck and requires some manual action, like maybe a reboot or a REST call.

Gil Zellner
September 14, 2023
Extending Grafana with Workflows

Building a new shift-left approach for alerting

Alerting (aka monitors/alarms) always felt like a second-class citizen within all the different monitoring/observability/infrastructure tools with a very narrow feature set, which in turn results in poor alerts, alert fatigue (yes, your muted Slack channel), unreliable product and a complete alerting-hell.

Tal Borenstein
April 10, 2023
Building a new shift-left approach for alerting

Current problems in the alerting space

In the past month, we have engaged in conversations with over 50 engineers, engineering managers, and SREs to gather feedback on the products we are developing at Keep. Here is a summary of what we have learned.

Shahar Glazner
March 19, 2023
Current problems in the alerting space