Building a new shift-left approach for alerting

Tal Borenstein
April 10, 2023
Building a new shift-left approach for alerting
Back to all Blog Posts

Alerting (aka monitors/alarms) always felt like a second-class citizen within all the different monitoring/observability/infrastructure tools with a very narrow feature set, which in turn results in poor alerts, alert fatigue (yes, your muted Slack channel), unreliable product and a complete alerting-hell.

Keep is an open-source alerting CLI tool that @shaharglazner and I wrote out of a pain we felt throughout our careers as developers and developers managers. Alerting (aka monitors/alarms) always felt like a second-class citizen within all the different monitoring/observability/infrastructure tools with a very narrow feature set, which in turn results in poor alerts, alert fatigue (yes, your muted Slack channel), unreliable product and a complete alerting-hell.

It's not only that we couldn't create better applicative/infrastructure alerts, but it's also that it is tough to maintain them and ensure they work over time.

Organizations today have so many tools they use for alerting that it's becoming an absolute nightmare.

Alerting as a first-class citizen

The best way to describe what we had in mind when we first built Keep is how one of our first users puts it:

Keep is doing to alerting what GitHub actions did to CI/CD

There were three main guidelines when we started coding:

  1. Good alerts are not just over thresholds/logs BUT should be treated as workflows with multiple "tests" (steps/actions).
  2. The tool should be 100% data agnostic - agnostic to where data resides (& not only "traditional" data sources but also a DB, for example). There's no real reason why it shouldn't be abstracted from developers.
  3. Maintained and lives in your code - allowing it to be integrated with all CI/CD processes (imagine a gate that fails your PR when you break alerts).

What's Ahead?

We constantly try to improve with our promised:

Try our first mock alert and get it up and running in <5 minutes.

So we're adding plenty more deployment options, providers, and functions. We're working on simplifying the syntax furthermore.

What do you think about the need for this kind of "abstraction"? What do you think about alerts as post-production tests? How do you manage and control your alerting chaos right now?

Would love to hear your thoughts; feel free to comment here / on our Github repo / in our Slack

AIOps! Finding Incidents in the Haystack of Alerts

Picture this: a flood of alerts pouring in from various monitoring systems, each clamoring for attention. Amidst this deluge, identifying critical incidents is akin to finding a needle in a haystack.

Tal Borenstein
April 11, 2024
AIOps! Finding Incidents in the Haystack of Alerts

Unifying alerts from various sources

Demonstrate the strength of a unified API in consolidating and managing alerts.

Shahar Glazner
November 26, 2023
Unifying alerts from various sources

Observability vendor lock-in is in the small details

In the world of observability, vendor lock-in slows progress and spikes costs. OpenTelemetry broke some chains but didn't free us entirely. This post shows the bridge between talk and action and how platforms like Keep offer flexibility, interoperability, cost optimization, community-driven support, and an escape from vendor lock-in traps. If you maintain >1 observability/monitoring system, are concerned with vendor lock-in, and need help keeping track of what's going on and where, this post is for you.

Tal Borenstein
October 31, 2023
Observability vendor lock-in is in the small details

Extending Grafana with Workflows

We all have that one service that, for some Phantom-de-la-machina reason, gets stuck and requires some manual action, like maybe a reboot or a REST call.

Gil Zellner
September 14, 2023
Extending Grafana with Workflows

Getting started with Keep — Observability Alerting with ease

Creating and maintaining effective alerts, avoiding alert fatigue, and promoting a strong alerting culture can be difficult tasks. Keep addresses these challenges by treating alerts as code, integrating with observability tools, and using LLMs.

Daniel Olabemiwo
May 14, 2023
Getting started with Keep — Observability Alerting with ease

Current problems in the alerting space

In the past month, we have engaged in conversations with over 50 engineers, engineering managers, and SREs to gather feedback on the products we are developing at Keep. Here is a summary of what we have learned.

Shahar Glazner
March 19, 2023
Current problems in the alerting space