• Log in
  • Sign up

Diagnose high response times in Telco Lite

15 min

Use New Relic to understand why Telco-Warehouse Portal has slower-than-normal response times.

Prerequisites

Before you begin:

When you're ready, start your journey by observing the high response times in your services.

Step 1 of 6

Log in to New Relic One and select APM from the top navigation menu. Here, you see an overview of all eight Telco Lite services, including the service names, response times, and throughputs. Notice that the response time for Telco-Warehouse Portal is 43 seconds, much higher than the response times in other services:

APM overview

On the APM page, select the Telco-Warehouse Portal service name to see a summary of that service. The top graph in the summary view shows Web transactions time:

APM transaction time summary

You see several peaks, reaching almost 50,000 milliseconds—or 50 seconds! From this graph, you can actually gain a little more insight by toggling the colored components beneath the graph. For example, select Node.js:

APM Node.js transaction time

The graph changes to show only what the Node.js component contributes to the overall response time—less than 100 ms, in this example. Now, select Web external and deselect Node.js:

APM web external transaction time

External web traffic is the primary contributor to the high response times. That's a good start, but it doesn't tell the whole story. Next, dive deeper to find out which external service is causing the issue.

Step 2 of 6

So, you know that Web external is the culprit of the high response times, but it's hard to tell why. External web traffic is all the requests made from your service to other services. This means you should look into what external requests that the warehouse portal makes to try to understand exactly what external service is the bottleneck.

From the left-hand navigation, select Monitor > Distributed tracing:

Distributed tracing overview

This view shows you requests to the warehouse portal. Select a request from the table at the bottom of the view to see a trace through that request:

Specific trace

This trace shows that one external request contributes almost all of the total trace duration. Specifically, an external request to the Telco-Fulfillment Service contributes over 99% of the overall response time.

This is good news! You own the fulfillment service, which means you can drill down for even more information. Select the offending row (called a span), and then select Explore this transaction:

Span details

Step 3 of 6

You're now looking at the __main__:inventory_item transaction overview. Because you know that some part of this transaction is slow, you can use this overview to narrow your focus even further.

Similar to how you modified the warehouse portal APM graph, you can look specifically at the components of this transaction to understand where the root cause of the slowness is. Another way to view this information is to scroll down to the Breakdown table in that same view:

Transaction breakdown

Function/__main__:inventory_item, a Python function, contributes over 99% of the overall response time.

At this point, you know that Telco-Warehouse Portal is slow because it makes an external request to Telco-Fulfillment Service, which is slow. You also know that the issue in the fulfillment service is local because over 99% of the request is spent in Python code, not external services. Navigate to the fulfillment service's summary page to look at the service as a whole, instead of this single transaction:

Fulfillment transaction time summary

Step 4 of 6

Scroll down on this view to familiarize yourself with the graphs it shows, such as Throughput and Error rate. At the bottom of the page, you can see a table with the fulfillment hosts. You can't currently drill into a specific host in the new UI (we're working on it), but you can in the old UI. Toggle Show new view to off and select the host link:

Switch to the old UI

Step 5 of 6

Now, you're looking at graphs in the infrastructure view for that service's host. Notice that CPU % has a lot of high spikes. Click and drag on the graph from the start of a spike to the end of it to narrow the timeslice to the period when CPU utilization goes up:

CPU spike

If you compare this graph to the fulfillment service's transaction graph you looked at earlier, you'll see that soon after __main__:inventory_item begins executing, the CPU utilization of the host sharply rises to 100%!

Step 6 of 6

Now, you understand the problem causing slow response times in the warehouse portal, but you don't know the extent of the issue. Using service maps, you can see all your services that depend on the fullfillment service.

Navigate to the service map under APM > Telco-Fulfillment Service:

Service map

This map shows you the fulfillment service's incoming and outgoing dependencies. Not only is Telco-Warehouse Portal dependent on the fullfillment service, but so is Telco-Web Portal!

Select the web portal node to see that the fulfillment service also affects the web portal's response times:

Web portal dependency

Conclusion

At the end of your investigation, you discovered:

  • Telco-Warehouse Portal is slow because it makes an external request to Telco-Fulfillment Service, which is slow
  • The fulfillment service's Python function contributes over 99% of the response time
  • Soon after the Python function starts executing, the host's CPU utilization spikes up to 100%
  • The fulfillment service also affects Telco-Web Portal response times

Now, as the developer behind the fulfillment service, you have enough information to debug the issue causing the CPU spikes. Congratulations!

You can learn more about using New Relic by diagnosing other issues. If this is your last issue, you can tear down all the Telco Lite services.

Edit this pageCreate an issue
Copyright © 2020 New Relic Inc.

This site uses cookies 🍪

We use cookies and other similar technologies ("Cookies") on our websites and services to enhance your experience and to provide you with relevant content. By using our websites and services you are agreeing to the use of cookies. You can read more here. If you consent to our cookies, please click Yes.