How to get custom metrics in Elastic APM? - node.js

I have setup application performance monitoring in my Nodejs Application with Elastic. I can see my metrics coming through for CPU / memory etc but can't seem to get things to work for custom metrics.
I am following their docs on adding a metric for tracking the clients connected to our web socket server but can't see the metrics in the Elastic dashboard anywhere:
I have the following code:
apm.registerMetric('ws.connections', () => socketClients.length);
Under APM -> Services -> Metrics I am searching for ws.connections but no results are showing up.
I have releasedthe code and know everything is working as expected for APM but just seems I can't track down the custom metrics.
Any advice on where to find this or what I can do to try debug it?

Related

How to find/cure source of function app throughput issues

I have an Azure function app triggered by an HttpRequest. The function app reads the request, tosses one copy of it into a storage table for safekeeping and sends another copy to a queue for further processing by another element of the system. I have a client running an ApacheBench test that reports approximately 148 requests per second processed. That rate of processing will not be enough for our expected load.
My understanding of function apps is that it should spawn as many instances as is needed to handle the load sent to it. But this function app might not be scaling out quickly enough as it’s only handling that 148 requests per second. I need it to handle at least 200 requests per second.
I’m not 100% sure the problem is on my end, though. In analyzing the performance of my function app I found a LOT of 429 errors. What I found online, particularly https://learn.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-request-limits, suggests that these errors could be due to too many requests being sent from a single IP. Would several ApacheBench 10K and 20K request load tests within a given day cause the 429 error?
However, if that’s not it, if the problem is with my function app, how can I force my function app to spawn more instances more quickly? I assume this is the way to get more throughput per second. But I’m still very new at working with function apps so if there is a different way, I would more than welcome your input.
Maybe the Premium app service plan that’s in public preview would handle more throughput? I’ve thought about switching over to that and running a quick test but am unsure if I’d be able to switch back?
Maybe EventHub is something I need to investigate? Is that something that might increase my apparent throughput by catching more requests and holding on to them until the function app could accept and process them?
Thanks in advance for any assistance you can give.
You dont provide much context of you app but this is few steps how you can improve
If you want more control you need to use App Service plan with always on to avoid cold start, also you will need to configure auto scaling since you are responsible in this plan and auto scale is not enabled by default in app service plan.
Your azure function must be fully async as you have external dependencies so you dont want to block thread while you are calling them.
Look on the limits. Using host.json you can tweek it.
429 error means that function is busy to process your request, so probably when you writing to table you are not using async and blocking thread
Function apps work very well and scale as it says. It could be because request coming from Single IP and Azure could be considering it DDOS. You can do the following
AzureDevOps Load Test
You can load test using one of the azure service . I am very sure they have better criteria of handling IPs. Azure DeveOps Load Test
Provision VM in Azure
The way i normally do is provision the VM (windows 10 pro) in azure and use JMeter to Load test. I have use this method to test and it works fine. You can provision couple of them and subdivide the load.
Use professional Load testing services
If possible you may use services like Loader.io . They use sophisticated algos to run the load test and provision bunch of VMs to run the same test.
Use Application Insights
If not already you must be using application insights to have a better look from server perspective. Go to live stream and see how many instance it would provision to handle the load test . You can easily look into events and error logs that may be arising and investigate. You can deep dive into each associated dependency and investigate the problem.

Azure - App Availability percentage is Zero

our Api app is in UAT on Azure with service plan (Standard 3 large). What should we do if App Availability is Zero. It is getting slow response or timeout issue. When i restart the application it is up to normal. (We are using Parallel Language programming.(Async/Await)
How to find the route cause from it for slowness issue.
Ensure that Always On feature is enabled.
Such problems may be caused by application level issues, such as:
network requests taking a long time
application code or database queries being inefficient
application using high memory/CPU
application crashing due to an exception
You could enable web server diagnostics to fetch more details on the issue.
Detailed Error Logging - Detailed error information for HTTP status codes that indicate a failure (status code 400 or greater). This may contain information that can help determine why the server returned the error code.
Failed Request Tracing - Detailed information on failed requests, including a trace of the IIS components used to process the request and the time taken in each component. This can be useful if you are attempting to improve web app performance or isolate what is causing a specific HTTP error.
Web Server Logging - Information about HTTP transactions using the W3C extended log file format. This is useful when determining overall web app metrics, such as the number of requests handled or how many requests are from a specific IP address.
Also, Azure Application Insights collects telemetry from your application to help analyze its operation and performance. You can use this information to identify problems that may be occurring or to identify improvements to the application that would most impact users. This tutorial takes you through the process of analyzing the performance of both the server components of your application and the perspective of the client: https://learn.microsoft.com/en-us/azure/application-insights/app-insights-tutorial-performance
Ref: https://learn.microsoft.com/en-us/azure/app-service/app-service-web-troubleshoot-performance-degradation

How to paginate logs from a Kubernetes pod?

I have a service that displays logs from pods running in my Kubernetes cluster. I receive them via k8s /pods/{name}/log API. The logs tend to grow big so I'd like to be able to paginate the response to avoid loading them whole every time. Result similar to how Kubernetes dashboard displays logs would be perfect.
This dashboard however seems to solve the problem by running a separate backend service that loads the logs, chops them into pieces and prepares for the frontend to consume.
I'd like to avoid that and use only the API with its query parameters like limitBytes and sinceSeconds but those seem to be insufficient to make proper pagination work.
Does anyone have a good solution for that? Or maybe know if k8s plans to implement pagination in logs API?

Programmatically get the amount of instances running for a Function App

I'm running an Azure Function app on Consumption Plan and I want to monitor the amount of instances currently running. Using REST API endpoint of format
https://management.azure.com/subscriptions/{subscr}/resourceGroups/{rg}
/providers/Microsoft.Web/sites/{appname}/instances?api-version=2015-08-01
I'm able to retrieve the instances. However, the result doesn't match the information that I see in Application Insights / Live Metrics Stream.
For example, right now App Insights shows 4 servers online, while API call returns just one (the GUID of this 1 instance is also among App Insights guids).
Who can I trust? Is there a better way to get instance count (e.g. from App Insights)?
UPDATE: It looks like data from REST API are wrong.
I was sending 10000 messages to the queue, logging each function call with respective instance ID which processed the request.
While messages keep coming in and the backlog grows, instance count from REST API seems to be correct (scaled from 1 to 12). After sending stops, the reported instance count rapidly goes down (eventually back to 1, while processors are still busy).
But based on the speed and the execution logs I can tell that the actual instance count kept growing and ended up at 15 instances at the moment of last message processed.
UPDATE2: It looks like SDK refuses to report more than 20 servers. The metric flats out at 20, while App Insights kept steady growth and is already showing 41.
Who can I trust? Is there a better way to get instance count (e.g. from App Insights)?
Based on my understanding we need to use Rest API endpoint to retrieve the instance, App Insights could be configured for multiple WebApps, so the number of servers online in the App Insights may be for multiple WebApps.
Updated:
Based on my test, the number of the application insight may be not real time.
During my test if the WebApp Function scale out then I could get multiple instances with Rest API, and I also can check the number of servers online in the App Insights.
https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourcegroup}/providers/Microsoft.Web/sites/{functionname}/instances?api-version=2016-08-01
But after I finished the test, I could get the number of the instance with Rest API is 1, based on my understanding, it is right result.
At the same time I check it in the Application Insight the number of the servers online is the max number during my test.
And after a while, the number of server online in the application insight also became 1.
So If we want to get the number of intance for Azure function, my suggestion is that using REST API to do that.
Update2:
According to the DavidEbbo mentioned that the REST API is not always reliable.
Unfortunately, the REST API is not always reliable. Specifically, when a Function App scales across multiple scale units, only the instances from the 'home' scale unit are reflected. You probably will not see this in a smallish test, but likely will if you start scaling out widely (say over 20 instances).

Programmatically Get VM Instance Network & Memory Info

This question is for Continuous Web Jobs.
Main Questions
How can we "VIEW" or programmatically "LOG" the current memory & network status of a VM running a Continuous Web Job?
Background:
Our web job is scraping some API and we keep getting 500 errors. We believe that the VM is firing too many threads for API requests - and then because of network limitations - when the responses come back, too many responses come back at the same time, overloading the VM's network limitations.
Side Questions:
How would you use MS Azure to Web scrape - and make sure you don't overload (in terms of memory + network) the VM it's running on?
(It seems that for background processing, these VMs are built for CPU calculation - not for Web/API scraping)
I'm still using the Monitoring (Classic) APIs currently. I've not found a "non-classic" version of the API, but I've also not spent much time looking. Since a web job runs as part of the Web App you'll need to monitor the web app using the tools provided in the Microsoft.WindowsAzure.Management.Monitoring.Metrics Namespace.
I found the API to be somewhat confusing, but spent sometime working with the PG to get it right. I've provided some sample code on the MSPFE github page at: https://github.com/mspfe/AzureMetricsAPISampleKit. Running the "tests" in this Solution will show you how to use the lib.
You first need to identify the web app by getting a list of them:
var webSpaceList = _webSiteClient.WebSpaces.List();
Then collect the availabile metrics:
foreach (var website in websiteList)
{
MetricDefinitionListResponse wsMetricListResponse = _metricsClient.MetricDefinitions.List(website.WebsiteResourceId, null, null);
website.MetricDefinitionsList = wsMetricListResponse.MetricDefinitionCollection;
website.MetricNamesList = new List<string>();
foreach (var metric in website.MetricDefinitionsList.Value)
{
website.MetricNamesList.Add(metric.Name);
}
MetricValueListResponse wsValueResponse = _metricsClient.MetricValues.List(website.WebsiteResourceId, website.MetricNamesList, "",
_timeGrain, _startDateTime, _endDateTime);
website.MetricValueList = wsValueResponse.MetricValueSetCollection;
}
From there you should have metric definitions and values. Sorry if this code is a little dated... but it should work.
Azure WebJobs run within your Azure App Service's web app (formerly called Websites). So, your capacity is governed by the size (and quantity) of Web App instances, whether free tier or one of the paid tiers. And you'd measure your utilization against the Web App instances.
Your side question, about how to use Azure to web scrape, is not answerable here: It's an opinion-based question with no right answer.

Resources