The Best and Worst Reasons to Adopt OpenTelemetry
It was a rainy day in Seattle at KubeCon + CloudNativeCon North America in December 2018 when I first encountered the term ‘OpenTelemetry.’ At that time, I was an active member of a working group focused on developing W3C Trace Context, a standard now extensively employed for context propagation in distributed systems. It is likely no coincidence that some key individuals in this working group also played a pivotal role in orchestrating the merger between two prominent open source telemetry products: OpenCensus and OpenTracing. I never would’ve guessed that, in 2023, OpenTelemetry would become the second-largest project in the CNCF, sitting just behind Kubernetes.
As is the case with any popular new technology like OpenTelemetry, the hype can make it difficult to discern signal from noise. During my years working in observability, I’ve had the chance to chat with many organizations that were thinking about using OpenTelemetry or already actively use it. Through those conversations, I’ve heard many reasons for adopting OpenTelemetry, some good and some bad. In this post, I will share examples of the best and worst reasons to adopt OpenTelemetry to help you cut through the noise.
Poor Reason One: Following the Crowd
It’s never a good idea to adopt something solely because others are doing it. Just because you saw a conference talk praising a new technology as the greatest innovation since sliced bread doesn’t necessarily mean that technology is suitable for your specific use case.
Speaking of which, have you clearly defined your use case? To start, ask yourself the following three questions:
Are you able to determine at any given moment whether your systems are performing as expected, such as meeting their service level objectives (SLOs)?
When something goes wrong, can you easily pinpoint the root cause?
Are you able to drill into previously unidentified problems, also known as “unknown unknowns”?
If you answered these questions with “no,” you certainly have observability gaps worth tackling, and OpenTelemetry may help close some of them.
Poor Reason Two: Building Your Own Observability Product
Gathering telemetry data can be a challenge, and with OpenTelemetry now handling essential signals like metrics, traces and logs, you might feel the urge to save your company some cash by building your own system. As a developer myself, I totally get that feeling, but I also know how easy it is to underestimate the effort involved by just focusing on the fun parts when kicking off the project. No joke, I’ve actually seen organizations assign teams of 50 engineers to work on their observability stack, even though the company’s core business is something else entirely.
Keep in mind that data collection is just a small part of what observability tools do these days. The real challenge lies in data ingestion, retention, storage and, ultimately, delivering valuable insights from your data at scale. Chances are, a homegrown solution won’t give you a return on investment that your company would appreciate.
However, whether you build your own system or use an existing product, OpenTelemetry may be a good fit for your observability stack.
Poor Reason Three: ‘Vendor Neutrality’
People I chat with often cite vendor neutrality as a major reason for their interest in OpenTelemetry. While there’s some truth to this, which I’ll touch on shortly, it’s important to take this argument with a grain of salt to avoid over-promising your management.
If you’re a developer, you’ve likely worked with database abstraction libraries and have probably mentioned to someone that they allow you to swap out the database at any time–using MySQL today and Oracle tomorrow, for instance. While that’s the promise, I’ve never seen this happen in my career. There’s more to consider other than the SQL dialect when switching a database backend.
In the same vein, the idea of vendor neutrality suggests that you can easily switch your observability backend. While this might be true in theory, in practice, vendor lock-in often occurs on the backend, where people and processes become closely tied to the tool currently in use.
So, if you believe you’ve achieved vendor neutrality by going all-in with OpenTelemetry, I’d encourage you to think twice and consider building a stronger case. Vendor neutrality should not be your only reason.
With all that said, it’s time to focus on the good reasons to use OpenTelemetry today. Trust me, there are plenty, and they’re definitely worth considering.
Good Reason One: Bridging Gaps With Manual Instrumentation
Nearly every observability tool out there today relies on some degree of auto-instrumentation. This means once you’ve added their respective language SDK to your code, the library will utilize a technique called “monkey patching” to insert observability code that runs automatically.
At the very least, auto-instrumentation will search for recognized libraries and APIs and then add some code to indicate the start and end of well-known function calls. Additionally, auto-instrumentation takes care of capturing the current context from incoming requests and forwarding it to downstream requests.
However, auto-instrumentation falls short when it comes to pure userland code, particularly for dynamic languages like JavaScript. It won’t catch your blocking Fibonacci function that’s slowing down all your requests by 50%. In some cases, you might also have to add your own context-propagation logic to RPC calls not captured by the auto-instrumentation.
That’s where manual instrumentation comes in. In the past, vendors provided an API for their SDK, allowing developers to manually add instrumentation code to segments that weren’t auto-instrumented. This often resulted in code being littered with vendor-specific instructions. This brings us back to the topic of vendor neutrality. While adding a dependency to your project is usually no big deal, it becomes cumbersome when 20% of your codebase is made up of vendor-specific observability instrumentation.
OpenTelemetry addresses this issue by offering both auto-instrumentation and a vendor-neutral API for manual instrumentation. This allows observability vendors to access the data while maintaining vendor neutrality within your codebase.
Good Reason Two: Sending Data to Multiple Vendors
I’ve yet to come across a large organization that relies on just one observability tool. Different parts of the organization have unique needs and perspectives when it comes to data, and it’s rare that a tool designed for SREs satisfies all use cases developers have and vice versa.
At the same time, it’s not practical to instrument code specifically for each vendor. Doing so would lead to performance issues and other problems, such as double-tracing. For example, if multiple vendors modify an outbound HTTP request to add context, which one takes precedence?
OpenTelemetry addresses this by offering a unified way to instrument code and multiple integration points where vendors can capture the collected data and send it to their respective backends. As a result, a single application can send data to multiple observability backends with a minimal added performance impact.
Good Reason Three: Gaining Insight Into Third-Party Libraries and Services
In an era where many applications depend on third-party libraries and services, gaining insight into all your application dependencies has become increasingly difficult. It’s also challenging for observability vendors to keep up with every new library and service that’s released.
Traditionally, vendors would try to reverse-engineer these services and apply the aforementioned monkey patching to add in their code. However, since every vendor must do this for each service, it ends up being a lot of redundant work. A key goal of the OpenTelemetry project is to make it easy for infrastructure and library vendors to add instrumentation hooks to their code. This way, their offerings can be monitored without the need for reverse engineering. The observability community is gradually making progress on this goal. One successful example is Istio, which emits OpenTelemetry data that can be easily consumed by observability backends to understand what’s happening inside the service mesh.
Unsurprisingly, cloud vendors like AWS, Microsoft and Google are among the founding members of the OpenTelemetry project. They share a strong interest in defining a format that allows observability vendors to seamlessly consume data from cloud services. OpenTelemetry Protocol, or OTLP, is set to become the standard for emitting observability data from these services that would otherwise be black boxes.
Wrapping it Up
“If all you have is a hammer, every problem looks like a nail.” This statement holds especially true when selecting tools in the tech industry. While there are many good reasons to adopt OpenTelemetry, it’s crucial to approach technology choices pragmatically.
Avoid jumping on the bandwagon for its own sake; instead, carefully evaluate your use case and assess the available tools. OpenTelemetry is undoubtedly here to stay, and if you’re new to the topic, taking a closer look at the project and adding it to your toolbelt makes perfect sense.
And don’t forget: OpenTelemetry is open source. Your contribution is always appreciated.
Check out the latest on OpenTelemetry at Sentry here.