Brainwave: What is observability? Buddy Brewer, New Relic

November 10, 2020

Problem Solvers — Podcast / vCast for November 10, 2020:

JE (@bluefug) and Buddy Brewer (@bbrewer), Global VP and GM of Observability Solutions and web performance expert for New Relic, discuss the state of modern Observability, how it has evolved from an IT Ops concern to an important aspect of all software engineering itself, and what companies are doing to promote better observability practices.

Guest: Buddy Brewer, GVP Observability Solutions, New Relic (@bbrewer)
Intellyx co-host: Jason English.

Topics covered:

What is observability?
How has the observability space evolved from the days of performance monitoring?
Has open source / open telemetry changed observability?
Who owns responsibility for observability?

Show links:

Audio podcast: https://anchor.fm/intellyx/episodes/Brainwave-What-is-Observability–Buddy-Brewer–New-Relic-em9lso
Video version of podcast on Intellyx YouTube channel: https://youtu.be/Hp3cXhAJKug
New Relic website – Observability solutions (Free trial): https://newrelic.com/

Listen/download the Podcast on your favorite player here: https://anchor.fm/intellyx/episodes/Brainwave-What-is-Observability–Buddy-Brewer–New-Relic-em9lso

Watch the YouTube version here: https://youtu.be/Hp3cXhAJKug

Full transcript of the podcast:

Jason English: Welcome to another Intellyx Brainwave podcast and v-cast. Today with me, I have from new Relic, the GVP strategy of the full stack observability product, Buddy Brewer. So thanks for joining me, Buddy.

Buddy Brewer: Yeah, thanks for having me, Jason.

Jason English: I think observability is one of the hot topics of the last five years — I would say. In fact, it’s gotten so hot. Maybe it’s gotten a little bit worn out by, by some different vendors, but I think there’s still a lot of confusion as to just what is observability.

Buddy Brewer: Yeah. You know, I think at it’s strictest definition: the term observability comes from a term that actually comes from control theory, which has to do with the ability to infer the internal state of a system by its external outputs. And so, it’s the opposite of the notion of the black box, where things are going on inside the system.

Somehow I have no idea what’s going on. I just know that the outcome is either positive or negative, according to some set of expectations, observability is the, the ability to actually understand what’s going on inside the system, by just looking at what’s going on externally.

Jason English: Yeah. So if I was an executive and I’ve kind of been hearing this topic over the last few years, what would be a good place for me to start looking within my organization, to find observability or to start encouraging more of it to happen?

Buddy Brewer: Yeah, I think in order to understand how to get started, it’s important to understand how that term of observability applies in the domain of technology. Right? And so if we take it outside of the realm of control theory, and we actually apply it to the domain of delivering digital experiences to end users, the whole reason, I think a lot of this came about is driven by the complexity that we find ourselves in, in application delivery in 2020 versus the way that we used to think about how we understand how systems work. And I think that complexity has reached a tipping point where, sort of collectively as an industry, we’re all applying a different word to it.

Just to recognize the fact that you really do have to approach the way that you think about application performance and quality in a different way. I mean, think about the way that applications used to be delivered. You know, 15 years ago or 10 years ago, or maybe even five years ago. I mean, it’s not that long ago that everybody was talking about three-tiered architectures and it was pretty simple, right?

I mean, you had a database, you put your data in there. It’s probably just one of them. It’s probably relational and operates on SQL. On top of that is some type of an application server there’s code in there that does stuff. And then there’s a front end and dimensions of complexity inside each of those tiers.

When you think about the problems that you need to solve, in the app and in the database tier, I’m just looking at SQL queries. How long is it taking? Maybe I need to do some query optimization.

Jason English: Yeah.

Buddy Brewer: The application tier I’m interested in pretty simple stuff there. Right? Timing, method calls. If there’s an error, give me a stack trace.

Let me dig into the stack trace and see what’s going on. And then on the front end, not that long ago, really just boiled down to how long does it take to download all this content over the network and web browsers were basically dumb terminals, right? They just took pretty simple HTML document parsed it, said, I need these images. Maybe I need a style sheet, JavaScript, maybe, but it really didn’t do that much.

Compared to now where you’ve got application complexity everywhere, starting from the front end, really rich and complicated JavaScript applications. That all of a sudden aren’t just about download time over the network.

But you know, the, the browsers, the user agents themselves are actually executing code and that can create a whole series of issues. You know, you’ve got browsers that become CPU-bound, errors that are generated on the front end. All of that stuff that has to be reasoned about.

Then through the middle tier, it’s not just an application server. There are microservices everywhere and multiple calls, which leads to things like distributed tracing, but just all sorts of things that have to line up and execute correctly in order to produce that ultimate customer experience. And then as we move back from there, the complexity continues to compound.

As you start to think about infrastructure. Well, in the cloud era. A lot of people don’t actually own that infrastructure anymore. So, you’re running on cloud with multiple layers of abstraction through Docker containers, orchestrated with Kubernetes. I mean, it just goes on and on. So when you compound all of that together, the amount of data that you have to generate in order to go back to that sort of academic definition of observability in order to understand what’s going on inside the box, you need a lot of instrumentation and a lot of external inputs and sensors and things like that to observe in order to reason about what’s going on in the application. And that’s incredibly important because, in our industry and the customers that I work with, everyone always talks about wanting to have a unified view.

I think the truth is, that a unified view is actually pretty simple. In fact, the end user, the digital consumer always has a unified view. They see it all come together in their phone or in their browser. And, you know, they will make a snap judgment that says, do I stay engaged with this brand or on the site or in this application? Or do I go?

And, in an era now, which even before COVID more and more of people’s interactions with brands were moving online. And now with COVID in the frame, it’s just extreme. I mean, I literally spend 100% of my time in meetings in my day job, doing it through software. Right?

Jason English: Right.

Buddy Brewer: I have no in-person meetings anymore. And so, you know, rotating back to your question, when you think about how do you get started? I think it starts with identifying all of those different pieces and making sure that every step in that chain is giving off the amount of telemetry that you need in order to reason about what’s going on inside that system.

Because like I said, the unified view, actually, the customer already has that, the hard part is unpacking from that unified view, when something goes wrong, understanding what’s the part of the software that I need to fix, because there are just so many parts that are in the software today.

Jason English: I think it is very refreshing to take this view where observability has to reside at so many layers when fundamentally, it still is just a customer experience issue. It is an issue of, does it cause them to remain or join as a customer or continue using it, or just to be productive if they’re employees using that system. Um, how do you think it’s evolved from, the old days where observability really kind of grew out of the idea of performance monitoring with several other sorts of tools in that suite. But it’s gotten a lot more complex than that. So how has, how has it really evolved since it started?

Buddy Brewer: Yeah, in that sense, observability is sort of a prerequisite to monitoring, right?

Like the application is so complicated that oftentimes, and this is the observability journey that many of our customers are on at New Relic, they find that a customer impacting problem comes up and they were looking at different pieces of telemetry, but they weren’t sufficient to understand what the root cause was of the issue.

And so that’s a sign that there’s not enough telemetry, those different facets of the application. The application has grown faster in complexity, then the instrumentation has grown along with it in order to understand the system. And so, as you put more and more of that in, then it unlocks the possibility of watching all of those signals to look for changes that suggest that there’s a problem.

And then when there’s a problem, then you can go and take action. But a lot of the work that we’re doing today at New Relic is really around helping our customers manage that complexity and get the right amount and kinds of telemetry in place across. All of those different facets. Um, that’s why actually we repackaged our whole platform recently where we used to have 12 or 13 different products and we’re centering around really three, a data tier full stack, observability bringing all of those different pieces together and then applied intelligence on top. The ability to look across and see from the front end through all of the application code and all the way back into the infrastructure, isn’t just a nice to have anymore. It’s a requirement. And you really do have to have that full view. You can’t any longer really understand how to manage the quality of your application by only looking at the front end performance or only looking at the application tier.

And so, the world that our customers are living in today, one of the things that we’ve tried to do recently at New Relic is really simplify and streamline their ability to not just do real user monitoring or synthetic monitoring or infrastructure monitoring, or application performance monitoring, but to do full-stack monitoring.

Because if you don’t have all of the pieces, then customer facing issues are going to come up that you’re not going to be able to solve.

Jason English: I think one interesting development in this space is also as we move into a containerized world and we started adopting Kubernetes and some of these microservices architectures, there’s a lot of open source involvement too, in this process.

So I’m seeing a huge rate of adoption of, of some of that technology and involvement of a community. So, what’s your perspective on that side of things?

Buddy Brewer: It’s so great to see that because you look at, we’ve been talking about this problem of complexity and all the different telemetry required.

Imagine if every single element of that had to be either hand-built or it had to be, a proprietary vendor that was brought in. And then if you wanted to change then you had to rip all that stuff out and replace it, if every time you onboarded a new engineer in your company, they had to train up on whoever the partner was that you had chosen for observability, whether it’s New Relic or it’s anyone else.

What open source technology allows our customers to do is to simplify and actually speed up and remove friction from instrumenting the application in the first place. You look at where, just to talk for a minute about New Relic and where we see our biggest growth opportunities.

You look at where applications are today on the internet, just in general at large. it’s not that folks are instrumenting with something else. And you know, when we’re talking to them about moving that instrumentation to New Relic, more often than not it’s that these elements of the application aren’t instrumented at all.

And it’s because these engineers and these organizations, they simply don’t have the time to take on the burden of figuring all this stuff out. And so what open source does, is it democratizes it so that people can put that instrumentation in and get it going. And then after that, it’s a, it’s a question of what do you want to use to analyze that data?

Jason English: Right. I mean, just like the open telemetry project itself, it’s a common language that they can lean on. You know, it, it doesn’t matter which vendor they’re using to interpret that data. They can kind of take advantage of at least having that to move with them, wherever their projects are going.

Yeah. So I guess that’s a good point, we could wrap up, who owns responsibility for observability within an organization?

Buddy Brewer: It’s evolving — when you talk about the technology specifically, it’s still the technology part of the business, of course, it’s incumbent on them to figure all this out and to get the application instrumented, but what’s happening is, the responsibility is kind of tracking with DevOps methodologies and the evolution of that.

So meaning specifically that when I started my career, uh, 20 years ago, working on monitoring. Really you are only talking to Operations people who had a very operations-specific role, they didn’t build the application. They operated it. And what’s happening now is we’re seeing that all of this telemetry that we’ve been talking about, it’s actually incumbent on the engineering teams as they’re building new functionality, to instrument that functionality in a way that allows you to observe it.

And so, New Relic was kind of early to this trend. You think about back in 2008, one of the first quote-unquote “monitoring companies,” where our primary constituency was actually engineers what’s happening is you’re seeing more and more of that now, moving back, shifting left, if you will, to much, much more direct engineering involvement.

As you see those roles, blend and kind of converge together between Dev and Ops. Now, the other trend that’s underway– While all of this stuff that we’ve been talking about is addressing a technical problem, because technology and digital experiences increasingly drive the lion’s share of the revenue for these businesses that are providing this technology.

You’re seeing other stakeholders take interest in elements of observability, maybe not inspecting what’s going on inside a Docker container, but they want to understand how the behavior of the application is actually affecting business metrics that some of these other stakeholders care about, like conversion rates in the case of an e-commerce business or the number of impressions in a session, in the case of a media company, that’s maybe trying to drive increased ad impressions.

Jason English: Right.

Buddy Brewer: And you can’t understand that relationship if you don’t have the data coming out of the technology in order to help you understand it.

Jason English: Right. I mean, it’s the only way that you could have a marketing outcome that isn’t subjective. Right? you need to understand it’s not just the quality of the offer or what we’re, what we’re selling is, you know, what is that experience and how does it affect my bottom line?

That’s a very interesting development, I think.

Buddy Brewer: Yeah, and people have less and less patience every year. You know? I mean, like I go back, as we’re recording this we’re three weeks out from Thanksgiving. So I’ll use e-commerce as an example, you know, as an e-commerce company, when a consumer lands on your site, they’re not just comparing you to the other online stores that they go to.

They’re comparing you to the internet at large, and there’s a good chance that they got to your site in the first place. If they didn’t get there through a marketing campaign or an email, they likely got there through a search engine like Google. And so that was their last experience. And now they’re on your site.

Right? And so you’re, you’re being compared to that. And when you start to think about some of these stakeholders beyond the technology organization, what they need to understand, isn’t just how fast the application is, but how fast the application needs to be. As a critical input, along with everything else, like the packaging and the pricing of their products and the merchandising and all of that stuff.

But, you know, I mean, people don’t have a lot of patience in 2020 as we’re sitting here talking and they are very likely to leave and there are in switching costs are low, you know, I mean, it’s really easy to go to a competitor and, interact with that brand instead. And so you have to meet their expectations.

And the only way to understand the role that application speed and quality plays in that is to instrument every element of the application.

Jason English: Well, that’s some great insight Buddy, for answering our questions about observability today in our problem solvers podcast. Um, where can people go to look for some more information about this?

Buddy Brewer: In the case of New Relic, if you wanted to try out observability with New Relic, you can just go to our website, sign up, and we have a perpetual free tier anyone can use. What it allows you to do is you can ingest up to a terabyte a month of all of this telemetry data we were talking about, just by signing up for an account on NewRelic.com.

Jason English: Excellent. Well thanks Buddy for joining me, and until next time, keep thinking.

©2020 Intellyx LLC. At the time of publishing, New Relic is an Intellyx subscriber. All dialogue in this program represents the expressed opinions of the hosts and guests, and are not necessarily the official position of Intellyx, or any company mentioned or included in this podcast audio or video.