Programmatically Get VM Instance Network & Memory Info - azure

This question is for Continuous Web Jobs.
Main Questions
How can we "VIEW" or programmatically "LOG" the current memory & network status of a VM running a Continuous Web Job?
Background:
Our web job is scraping some API and we keep getting 500 errors. We believe that the VM is firing too many threads for API requests - and then because of network limitations - when the responses come back, too many responses come back at the same time, overloading the VM's network limitations.
Side Questions:
How would you use MS Azure to Web scrape - and make sure you don't overload (in terms of memory + network) the VM it's running on?
(It seems that for background processing, these VMs are built for CPU calculation - not for Web/API scraping)

I'm still using the Monitoring (Classic) APIs currently. I've not found a "non-classic" version of the API, but I've also not spent much time looking. Since a web job runs as part of the Web App you'll need to monitor the web app using the tools provided in the Microsoft.WindowsAzure.Management.Monitoring.Metrics Namespace.
I found the API to be somewhat confusing, but spent sometime working with the PG to get it right. I've provided some sample code on the MSPFE github page at: https://github.com/mspfe/AzureMetricsAPISampleKit. Running the "tests" in this Solution will show you how to use the lib.
You first need to identify the web app by getting a list of them:
var webSpaceList = _webSiteClient.WebSpaces.List();
Then collect the availabile metrics:
foreach (var website in websiteList)
{
MetricDefinitionListResponse wsMetricListResponse = _metricsClient.MetricDefinitions.List(website.WebsiteResourceId, null, null);
website.MetricDefinitionsList = wsMetricListResponse.MetricDefinitionCollection;
website.MetricNamesList = new List<string>();
foreach (var metric in website.MetricDefinitionsList.Value)
{
website.MetricNamesList.Add(metric.Name);
}
MetricValueListResponse wsValueResponse = _metricsClient.MetricValues.List(website.WebsiteResourceId, website.MetricNamesList, "",
_timeGrain, _startDateTime, _endDateTime);
website.MetricValueList = wsValueResponse.MetricValueSetCollection;
}
From there you should have metric definitions and values. Sorry if this code is a little dated... but it should work.

Azure WebJobs run within your Azure App Service's web app (formerly called Websites). So, your capacity is governed by the size (and quantity) of Web App instances, whether free tier or one of the paid tiers. And you'd measure your utilization against the Web App instances.
Your side question, about how to use Azure to web scrape, is not answerable here: It's an opinion-based question with no right answer.

Related

Azure slow communication between APIs

In some 1-5% of our requests, we are seeing slow communication between APIs (REST API requests). Both APIs are developed by us and hosted on Azure, each app service on its own app service plan in the same region, P1v2 tier.
What we are seeing on application insights is that POST or GET requests on origin API can take a few seconds to execute, while real execution time on destination API is only a few milliseconds.
Examples (first line POST request on origin, second execution time on destination API): slow req 1, slow req 2
Our best guess is that the time difference is lost in communication between components. We don't have an explanation for it since the payload is really small and in most cases, communication takes less than 5 milliseconds.
We dismiss the possible explanation it could be due to component cold start since it happens during constant load and no horizontal scaling was performed.
Do you have any idea what might cause it or how to do additional analysis in order to discover it?
If you're running multiple sites on the App Service Plan, then enable the "Always On" setting for your web app > All Settings > Application Settings > Click on Always On
See here for details: https://azure.microsoft.com/en-us/documentation/articles/web-sites-configure/
When Always On is off, the site is shut down after 20 minutes of inactivity to free up resources for any additional websites that might be using the same App Service Plan.
The amount of information it needs to collect, process and then present itself requires some time, and involve internal calls as well, that is why considering the server load and usage, it takes around 6 to 7 seconds sometimes even more.
To Troubleshoot that latency, try this steps, provided by Microsoft.

Ways to overcome the 230s hard coded limit to the SCM endpoint

Background
We have a PHP App Service and MySQL that is deployed using an Azure Devops Pipeline (YML). The content itself is a PHP site that it packaged up into a single file using Akeeba by an external supplier. The package is a Zip file (which can be deployed as a standard Zip deployment) and inside the Zip file is a huge JPA file. The JPA is essentially the whole web site plus database tables, settings, file renames and a ton of other stuff all rolled into one JPA file. Akeeba essentially unzips the files, copies them to the right places, does all the DB stuff and so on. To kick the process off, we can simply connect to a specific URL (web site + path) and run the PHP which does all the clever unpackaging via a web GUI. But, we want to include this stage in the pipeline instead so that the process is fully automated end to end. Akeeba has a CLI as an alternative to the Web GUI deployment, so it should go like this:
Create web app
Deploy the web site ZIP (zipDeploy)
Use the REST API to access Kudu and run the relevant command (php install.php web.jpa) to unpack the jpa and do the MySQL stuff - this normally takes well over 30 minutes (it is a big site and it has a lot of "stuff" to do - but, it does actually work in the end).
The problem is that the SCM REST API has a hard-coded 230s limit as described here: https://blog.headforcloud.com/2016/11/15/azure-app-service-hard-timeout-limit/
So, the unpack stage keeps throwing "Invoke-RestMethod : 500 - The request timed out" exactly on the 230s mark.
We have tried SCM_COMMAND_IDLE_TIMEOUT and WEBJOBS_IDLE_TIMEOUT but, unsurprisingly, they did not make any difference.
$cmd=#{"command"="php .\site\wwwroot\install.php .\site\wwwroot\web.jpa .\site\wwwroot"}
Invoke-RestMethod -Uri $url -Headers #{"Authorization"="Basic $creds"} -Body (ConvertTo-Json($cmd)) -Method Post -ContentType "application/json" -TimeoutSec 7200
I can think of a few hypothetical ways around it (some quite eccentric):
Find another way to run CLI commands inside the Web App after deployment other than the Kudu REST API. Is there such a thing? I Googled and checked SO but all I found were pointers to the way we do it (or try to do it) now.
Use something like Selenium to click the GUI buttons instead of using the CLI. (I do not know if they would suffer a timeout.)
Instead of running the command via Kudu REST, use the same API to create and deploy a script to the web server, start it and then let the REST API exit whilst the script still runs on the Web App. Essentially, bodge an async call but without the callback and then have the pipeline check in on the site at, say, 5 minute intervals. Clunky.
Extend the 230s limit - but I do not think that Microsoft make this possible.
Make the web site as fast as possible during the deployment in the hope of getting it under the 4-minute mark and then down-scale it. Yuk!
See what the Akeeba JPA unpacking actually does, unpack it pre-deployment and do what the unpackage process does but controlled via the Pipeline. This is potentially a lot of work and would lose the support of the supplier.
Give up on an automated deployment. That would rather defeat much of the purpose of a Devops pipeline.
Try AWS + terraform instead. That's not a approved infrastructure environment, however.
Given that Microsoft understandably do not want long-running API calls hanging around, I understand why the limit exists. However, I would expect therefore there to be a mechanism to interact with an App Service file system via a CLI in another way. Does anyone know how?
The 4 minute idle timeout on the TCP level and this is implemented on the Azure hardware load balancer. This timeout is not configurable and this cannot be changed. One thing I want to mention is that this is idle timeout at the TCP level which means that if the connection is idle only and no data transfer happening, only then this timeout is hit. To provide more info, this will hit if the web application got the request and kept processing the request for > 4minutes without sending any data back.
Resolution
Ideally in a web application, it is not good to keep the underlying HTTP request open and 4 minutes is a decent amount of time. If you have a requirement about background processing within your web application, then the recommended solution is to use Azure WebJobs and have the Azure Webapp interact with the Azure Webjob to notify once the background processing is done (there are many ways that Azure provides like queues triggers etc. and you can choose the method that suits you the best). Azure Webjobs are designed for background processing and you can do as much background processing as you want within them. I am sharing a few articles that talk about webjobs in detail
· http://www.hanselman.com/blog/IntroducingWindowsAzureWebJobs.aspx
· https://azure.microsoft.com/en-us/documentation/articles/websites-webjobs-resources/
============================================================================
It totally depends on the app. Message Queue comes to mind. There are a lot of potential solutions and it will be up to you to decide.
============================================================================
Option #1)
You can change the code to send some sort of header to continue to the client to keep the session open.
A sample is shown here
This shows the HTTP Headers with the Expect 100-continue header:
https://msdn.microsoft.com/en-us/library/aa287673%28v=vs.71%29.aspx?f=255&MSPPError=-2147217396
This shows how to add a Header to the collection:
https://msdn.microsoft.com/en-us/library/aa287502(v=vs.71).aspx
Option #2) Progress bar
This sample shows how to use a progress bar:
https://social.msdn.microsoft.com/Forums/vstudio/en-US/d84f4c89-ebbf-44d3-bc4e-43525ae1df45/how-to-increase-progressbar-when-i-running-havey-query-in-sql-server-or-oracle-?forum=csharpgeneral
Option #3) A common practice to keep the connection active for a longer period is to use TCP Keep-alive. Packets are sent when no activity is detected on the connection. By keeping on-going network activity, the idle timeout value is never hit and the connection is maintained for a long period
Option #4) You can also try the option of hosting your application as an IaaS VM instead of a APP SERVICE. This may avoid the ARR timeout issue because its architecture is different and I believe that the time-out is configurable.

Best way to watch multiple IRC channels in Azure

I am attempting to connect my application to multiple IRC channels to read incoming chat messages and send them to my users. New channels may be added or existing channels may be removed at any time during the day and the application must pick up on this in near real-time. I am currently using Microsoft Azure for my infrastructure and am using App Services for client-facing compute and Azure Functions on the App Service plan for background tasks (Not the consumption billing model).
My current implementation is in C#/.NET Core 3.1 and uses a TcpClient over an SslStream to watch each channel. I then use a StreamReader and await reader.ReadLineAsync() to watch for new messages. The problem I am running into is that neither App Services or Azure Functions seems to be an appropriate place to host a watcher like this.
At first, I tried hosting it in the Azure Function app as this clearly seems like a task for a background worker, however Azure Functions inherently want to be triggered by a specific event, run some code, and then end. In my implementation, the call to await reader.ReadLineAsync() halts processing until a message is received. In other words, the call running the watcher needs to run in perpetuity, which seems to go against the grain of an Azure Function. In my attempt, the Azure Function service eventually crashes, the host unloads, all functions on the service cease and then restart a few minutes later when the host reloads. I am unable to find any way to tell what is causing the crash. This is clearly not the solution I want. If I could find an IrcMessageTrigger Azure Function trigger, this would probably be the best option.
Theoretically, I could host the watcher in my App Service, however when I scale out I would run into a problem due to having multiple servers connecting to each channel at once. New messages would be sent to each server and my users would receive duplicates. I could probably deal with this, but the solution would probably be hacky and I feel like the real solution would be to architect it better in the first place.
Anyone have an idea? I am open to changing the code or using a different Azure service (assuming it isn't too expensive) but I will be sticking with C# and .NET Core on Azure infrastructure for this project.
Below is a part of my watcher code to provide some context.
while (client.Connected)
{
//This line will halt execution until a message is received
var data = await reader.ReadLineAsync();
if (data == null)
{
continue;
}
var dataArray = data.Split(' ');
if (dataArray[0] == "PING")
{
await writer.WriteLineAsync("PONG");
await writer.FlushAsync();
continue;
}
if (dataArray.Length > 1)
{
switch (dataArray[1])
{
case "PRIVMSG":
HandlePrivateMessage(data, dataArray);
break;
}
}
}
Thanks in advance!
Results are preliminary, but it appears that the correct approach is to use Azure WebJobs running continuously to accomplish what I am trying to achieve. I did not consider WebJobs initially because they are older technology than Azure Functions and essentially do the same work at a lower level of abstraction. In this case, however, WebJobs appear to handle a use case that Functions are not intended to support.
To learn more about WebJobs (including continuous WebJobs) and what they are capable of, see the Microsoft documentation

How to find/cure source of function app throughput issues

I have an Azure function app triggered by an HttpRequest. The function app reads the request, tosses one copy of it into a storage table for safekeeping and sends another copy to a queue for further processing by another element of the system. I have a client running an ApacheBench test that reports approximately 148 requests per second processed. That rate of processing will not be enough for our expected load.
My understanding of function apps is that it should spawn as many instances as is needed to handle the load sent to it. But this function app might not be scaling out quickly enough as it’s only handling that 148 requests per second. I need it to handle at least 200 requests per second.
I’m not 100% sure the problem is on my end, though. In analyzing the performance of my function app I found a LOT of 429 errors. What I found online, particularly https://learn.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-request-limits, suggests that these errors could be due to too many requests being sent from a single IP. Would several ApacheBench 10K and 20K request load tests within a given day cause the 429 error?
However, if that’s not it, if the problem is with my function app, how can I force my function app to spawn more instances more quickly? I assume this is the way to get more throughput per second. But I’m still very new at working with function apps so if there is a different way, I would more than welcome your input.
Maybe the Premium app service plan that’s in public preview would handle more throughput? I’ve thought about switching over to that and running a quick test but am unsure if I’d be able to switch back?
Maybe EventHub is something I need to investigate? Is that something that might increase my apparent throughput by catching more requests and holding on to them until the function app could accept and process them?
Thanks in advance for any assistance you can give.
You dont provide much context of you app but this is few steps how you can improve
If you want more control you need to use App Service plan with always on to avoid cold start, also you will need to configure auto scaling since you are responsible in this plan and auto scale is not enabled by default in app service plan.
Your azure function must be fully async as you have external dependencies so you dont want to block thread while you are calling them.
Look on the limits. Using host.json you can tweek it.
429 error means that function is busy to process your request, so probably when you writing to table you are not using async and blocking thread
Function apps work very well and scale as it says. It could be because request coming from Single IP and Azure could be considering it DDOS. You can do the following
AzureDevOps Load Test
You can load test using one of the azure service . I am very sure they have better criteria of handling IPs. Azure DeveOps Load Test
Provision VM in Azure
The way i normally do is provision the VM (windows 10 pro) in azure and use JMeter to Load test. I have use this method to test and it works fine. You can provision couple of them and subdivide the load.
Use professional Load testing services
If possible you may use services like Loader.io . They use sophisticated algos to run the load test and provision bunch of VMs to run the same test.
Use Application Insights
If not already you must be using application insights to have a better look from server perspective. Go to live stream and see how many instance it would provision to handle the load test . You can easily look into events and error logs that may be arising and investigate. You can deep dive into each associated dependency and investigate the problem.

Service Fabric - A web api in cluster who' only job is to serve data from reliable collection

I am new to Service Fabric and currently I am struggling to find out how to access data from reliable collection (That is defined, and initialized in a Statefull Service context) from a WEB API (that is, also living in the Service fabric cluster, as a separate application). The problem is very basic and I am sure I am missing something very obvious. So apologies to the community if this sounds lame.
I have a large XML, a portions of which I want to expose via a WEB API endpoints as results from various queries . Searched for similar questions, but couldn't find a suitable answer.
Would be happy to see how an experienced SF developer would do such task.
EDIT I posted the solution i have came up with
After reading around and observing others issues and Azure's samples, I have implemented a solution. Posting here the gotchas I had, hoping that will help other devs that are new to Azure Service fabric (Disclaimer: I am still a newbie in Service Fabric, so comments and suggestions are highly appreciated):
First, pretty simple - I ended up with a stateful service and a WEB Api Stateless service in an azure service fabric application:
DataStoreService - Stateful service that is reading the large XMLs and stores them into Reliable dictionary (happens in the RunAsync method).
Web Api provides an /api/query endpoint that filters out the Collection of XElements that is stored in the rteliable dictionary and serialize it back to the requestor
3 Gotchas
1) How to get your hands on the reliable dictionary data from the Stateless service, i.e how to get an instance of the Stateful service from Stateless one :
ServiceUriBuilder builder = new ServiceUriBuilder("DataStoreService");
IDataStoreService DataStoreServiceClient = ServiceProxy.Create<IDataStoreService>(builder.ToUri(), new ServicePartitionKey("Your.Partition.Name"));
Above code is already giving you the instance. I.e - you need to use a service proxy. For that purpose you need:
define an interface that your stateful service will implement, and use it when invoking the Create method of ServiceProxy (IDataStoreService)
Pass the correct Partition Key to Create method. This article gives very good intro on Azure Service Bus partiotions
2) Registering of Replica listeners - in order to avoid errors saying
The primary or stateless instance for the partition 'a67f7afa-3370-4e6f-ae7c-15188004bfa1' has invalid address, this means that right address from the replica/instance is not registered in the system
, you need to register replica listeners as stated in this post :
public DataStoreService(StatefulServiceContext context)
: base(context)
{
configurationPackage = Context.CodePackageActivationContext.GetConfigurationPackageObject("Config");
}
3) Service fabric name spacing and referencing services - the ServiceUriBuilder class I took from the service-fabric-dotnet-web-reference-app. Basically you need something to generate an Uri of the form:
new Uri("fabric:/" + this.ApplicationInstance + "/" + this.ServiceInstance);,
where ServiceInstance is the name of the service you want to get instance of (DataStoreService in this case)
You can use WebAPI with OWIN to setup a communication listener and expose data from your reliable collections. See Build a web front end for your app for info on how to set that up. Take a look at the WordCount sample in the Getting started sample apps, which feeds a bunch of random words into a stateful service and keeps a count of the words processed. Hope that helps.

Resources