The Monitoring Solution That Requires Monitoring

By Aviv Zohari, groundcover’s Founding Engineer:

An engineer I know recently set up Datadog on his Kubernetes environment to check out their observability capabilities. He used the system every day for about a week, testing out the different features for logs, traces and infrastructure monitoring. Then, he got busy with a feature that needed to be released earlier than expected, taking his attention away from Datadog.

There was one small thing my friend wasn’t aware of: As it turns out, there is no rate limiting to Datadog’s logs collection mechanism. One line of YAML configuration file means all logs are collected, ingested and stored – no matter how high the projected costs will be, without any prior notification or warning.

The next week, someone from the billing department stormed into the office inquiring how the Datadog bill is $33,000 for one month. The engineer was shocked, the bill was supposed to be $1,700!

You know that GIF of Homer Simpson slowly backing into the bushes until he disappears? That was my friend at that moment. He described it as a mix of guilt, shame and disbelief.

My friend felt his error was enormous, maybe unforgivable. But was it really his fault?

The Datadog complexity

Datadog’s pricing model is not easily digestible. It is influenced by many factors, including data volume, retention period, and cluster size. It’s like a confusing puzzle of financial variables and contractual caveats that, at the end of the day, make some of the costs impossible to detect ahead of time, let alone predict. Just ask the recent recipient of their $65 million bill.

Avoiding situations similar to what my friend experienced require constant monitoring of the monitoring solution. That slows down daily work and long term growth.

Observability affordability

The high costs of observability are a complex issue that is increasingly becoming top-of-mind for many organizations. This is especially true now that IT leaders, and often chief executives themselves, realized that they must take action to better manage their infrastructure budgets, which seem to have spun out of control in the cloud-age.

The shift to microservices and distributed architectures has led to an explosion in the amount of data that needs to be observed. And with traditional methods, more data translates to higher costs. It also means significant resource consumption which, you guessed it, also leads to higher costs, but also to inefficiencies. 

Datadog is only one example of a complex observability pricing structure. The majority of observability tools out there have pricing models that make it impossible to optimize spending, let alone predict it. Applications produce large volumes of log data. This should be an advantage, as it makes it easier for developers to identify issues and troubleshoot them. Instead, it has become a major cause for concern, with best practice guides advising to simply monitor less, or minimize the logs data retention period. In other words, their best remedy for fat bills is a data eating disorder.

Woah, kernelly!

eBPF (extended Berkeley Packet Filter) is a groundbreaking technology that has significantly impacted the Linux kernel, offering a new way to safely and efficiently extend its capabilities. 

With eBPF running at specific hook points in the kernel, it can gather data without significant overhead, ensuring that the application’s resources are not heavily consumed. It can watch each packet as it enters or exits the host, and then map it onto processes or containers running on that host. This provides super-granular visibility into network traffic, allowing for detailed insights into what’s happening within the system.

In addition, eBPF-powered agents operate separately from the main application it is monitoring, so it doesn’t interfere with the application’s primary functions. This makes application performance monitoring a process with near-zero impact on microservice resources and preserves the application’s performance (shouldn’t this be the #1 goal of any observability platform?). 

In our ever-advancing technological landscape and intricate infrastructure, achieving top-notch observability remains an ongoing and dynamic challenge. As organizations navigate the complexities of data management, cost efficiency, and optimal performance, maintaining a competitive edge demands a steadfast dedication to innovation and an open embrace of transformative solutions such as eBPF. This approach ensures that observability serves as a tool of empowerment rather than a hindrance.

Observability should help engineers feel like heroes. I believe no one should be made to feel like my friend felt that day. Observability platforms should promise full protection from unexpected overhead caused by them, total immunity against surprise spikes in data volumes and subscription bills, and no more awkward interactions with billing departments.

Aviv Zohari

Aviv is a founding engineer at groundcover, a start up with a mission to reinvent the cloud-native application monitoring domain with eBPF. His focus is gaining a deep understanding of software systems in order to design and build accessible, high performant platforms.

In his previous life, Aviv was a security researcher figuring out the ins and outs of weird machines. In his free time he is passionate about gaming, crossfit and playing the piano.

error: Content is protected !!