Ways to overcome the 230s hard coded limit to the SCM endpoint - azure

Background
We have a PHP App Service and MySQL that is deployed using an Azure Devops Pipeline (YML). The content itself is a PHP site that it packaged up into a single file using Akeeba by an external supplier. The package is a Zip file (which can be deployed as a standard Zip deployment) and inside the Zip file is a huge JPA file. The JPA is essentially the whole web site plus database tables, settings, file renames and a ton of other stuff all rolled into one JPA file. Akeeba essentially unzips the files, copies them to the right places, does all the DB stuff and so on. To kick the process off, we can simply connect to a specific URL (web site + path) and run the PHP which does all the clever unpackaging via a web GUI. But, we want to include this stage in the pipeline instead so that the process is fully automated end to end. Akeeba has a CLI as an alternative to the Web GUI deployment, so it should go like this:
Create web app
Deploy the web site ZIP (zipDeploy)
Use the REST API to access Kudu and run the relevant command (php install.php web.jpa) to unpack the jpa and do the MySQL stuff - this normally takes well over 30 minutes (it is a big site and it has a lot of "stuff" to do - but, it does actually work in the end).
The problem is that the SCM REST API has a hard-coded 230s limit as described here: https://blog.headforcloud.com/2016/11/15/azure-app-service-hard-timeout-limit/
So, the unpack stage keeps throwing "Invoke-RestMethod : 500 - The request timed out" exactly on the 230s mark.
We have tried SCM_COMMAND_IDLE_TIMEOUT and WEBJOBS_IDLE_TIMEOUT but, unsurprisingly, they did not make any difference.
$cmd=#{"command"="php .\site\wwwroot\install.php .\site\wwwroot\web.jpa .\site\wwwroot"}
Invoke-RestMethod -Uri $url -Headers #{"Authorization"="Basic $creds"} -Body (ConvertTo-Json($cmd)) -Method Post -ContentType "application/json" -TimeoutSec 7200
I can think of a few hypothetical ways around it (some quite eccentric):
Find another way to run CLI commands inside the Web App after deployment other than the Kudu REST API. Is there such a thing? I Googled and checked SO but all I found were pointers to the way we do it (or try to do it) now.
Use something like Selenium to click the GUI buttons instead of using the CLI. (I do not know if they would suffer a timeout.)
Instead of running the command via Kudu REST, use the same API to create and deploy a script to the web server, start it and then let the REST API exit whilst the script still runs on the Web App. Essentially, bodge an async call but without the callback and then have the pipeline check in on the site at, say, 5 minute intervals. Clunky.
Extend the 230s limit - but I do not think that Microsoft make this possible.
Make the web site as fast as possible during the deployment in the hope of getting it under the 4-minute mark and then down-scale it. Yuk!
See what the Akeeba JPA unpacking actually does, unpack it pre-deployment and do what the unpackage process does but controlled via the Pipeline. This is potentially a lot of work and would lose the support of the supplier.
Give up on an automated deployment. That would rather defeat much of the purpose of a Devops pipeline.
Try AWS + terraform instead. That's not a approved infrastructure environment, however.
Given that Microsoft understandably do not want long-running API calls hanging around, I understand why the limit exists. However, I would expect therefore there to be a mechanism to interact with an App Service file system via a CLI in another way. Does anyone know how?

The 4 minute idle timeout on the TCP level and this is implemented on the Azure hardware load balancer. This timeout is not configurable and this cannot be changed. One thing I want to mention is that this is idle timeout at the TCP level which means that if the connection is idle only and no data transfer happening, only then this timeout is hit. To provide more info, this will hit if the web application got the request and kept processing the request for > 4minutes without sending any data back.
Resolution
Ideally in a web application, it is not good to keep the underlying HTTP request open and 4 minutes is a decent amount of time. If you have a requirement about background processing within your web application, then the recommended solution is to use Azure WebJobs and have the Azure Webapp interact with the Azure Webjob to notify once the background processing is done (there are many ways that Azure provides like queues triggers etc. and you can choose the method that suits you the best). Azure Webjobs are designed for background processing and you can do as much background processing as you want within them. I am sharing a few articles that talk about webjobs in detail
· http://www.hanselman.com/blog/IntroducingWindowsAzureWebJobs.aspx
· https://azure.microsoft.com/en-us/documentation/articles/websites-webjobs-resources/
============================================================================
It totally depends on the app. Message Queue comes to mind. There are a lot of potential solutions and it will be up to you to decide.
============================================================================
Option #1)
You can change the code to send some sort of header to continue to the client to keep the session open.
A sample is shown here
This shows the HTTP Headers with the Expect 100-continue header:
https://msdn.microsoft.com/en-us/library/aa287673%28v=vs.71%29.aspx?f=255&MSPPError=-2147217396
This shows how to add a Header to the collection:
https://msdn.microsoft.com/en-us/library/aa287502(v=vs.71).aspx
Option #2) Progress bar
This sample shows how to use a progress bar:
https://social.msdn.microsoft.com/Forums/vstudio/en-US/d84f4c89-ebbf-44d3-bc4e-43525ae1df45/how-to-increase-progressbar-when-i-running-havey-query-in-sql-server-or-oracle-?forum=csharpgeneral
Option #3) A common practice to keep the connection active for a longer period is to use TCP Keep-alive. Packets are sent when no activity is detected on the connection. By keeping on-going network activity, the idle timeout value is never hit and the connection is maintained for a long period
Option #4) You can also try the option of hosting your application as an IaaS VM instead of a APP SERVICE. This may avoid the ARR timeout issue because its architecture is different and I believe that the time-out is configurable.

Related

Execute something which takes 5 seconds (like email send) but return with response immediately?

Context
In an ASP.NET Core application I would like to execute an operation which takes say 5 seconds (like sending email). I do know async/await and its purpose in ASP.NET Core, however I do not want to wait the end of the operation, instead I would like to return back to the to the client immediately.
Issue
So it is kinda Fire and Forget either homebrew, either Hangfire's BackgroundJob.Enqueue<IEmailSender>(x => x.Send("hangfire#example.com"));
Suppose I have some more complex method with injected ILogger and other stuff and I would like to Fire and Forget that method. In the method there are error handling and logging.(note: not necessary with Hangfire, the issue is agnostic to how the background worker is implemented). My problem is that method will run completely out of context, probably nothing will work inside, no HttpContext (I mean HttpContextAccessor will give null etc) so no User, no Session etc.
Question
How to correctly solve say this particular email sending problem? No one wants wait with the response 5 seconds, and the same time no one wants to throw and email, and not even logging if the send operation returned with error...
How to correctly solve say this particular email sending problem?
This is a specific instance of the "run a background job from my web app" problem.
there is no universal solution
There is - or at least, a universal pattern; it's just that many developers try to avoid it because it's not easy.
I describe it pretty fully in my blog post series on the basic distributed architecture. I think one important thing to acknowledge is that since your background work (sending an email) is done outside of an HTTP request, it really should be done outside of your web app process. Once you accept that, the rest of the solution falls into place:
You need a durable storage queue for the work. Hangfire uses your database; I tend to prefer cloud queues like Azure Storage Queues.
This means you'll need to copy all the data over that you will need, since it needs to be serialized into that queue. The same restriction applies to Hangfire, it's just not obvious because Hangfire runs in the same web application process.
You need a background process to execute your work queue. I tend to prefer Azure Functions, but another common approach is to run an ASP.NET Core Worker Service as a Win32 service or Linux daemon. Hangfire has its own ad-hoc in-process thread. Running an ASP.NET Core hosted service in-process would also work, though that has some of the same drawbacks as Hangfire since it also runs in the web application process.
Finally, your work queue processor application has its own service injection, and you can code it to create a dependency scope per work queue item if desired.
IMO, this is a normal threshold that's reached as your web application "grows up". It's more complex than a simple web app: now you have a web app, a durable queue, and a background processor. So your deployment becomes more complex, you need to think about things like versioning your worker queue schema so you can upgrade without downtime (something Hangfire can't handle well), etc. And some devs really balk at this because it's more complex when "all" they want to do is send an email without waiting for it, but the fact is that this is the necessary step upwards when a baby web app becomes distributed.

HTTP Web Server: Agent did not complete within configured time limit

I have a web application that builds web-pages using agent (it's written in LS and we use [print html] to output HTML) and from time to time I see an error as below.
02-11-2020 10:00:18 HTTP Web Server: Agent did not complete within configured time limit [/path-to-database.nsf/web?openagent] Anonymous
02-11-2020 10:00:18 HTTP Server: Execution time limit exceeded by Agent '(Web)|Web' in database '/path-to-database.nsf'. Agent signer 'signer name'.
As a result HTTP task stuck so I have to restart it, but that means I have to monitor it all the time.
It does not seems to be related to agent time execution, otherwise I would have this issue constantly.
The activity does not seems to be the issue as well, according to google analytics it's around ~50 active users.
I doubt [Server Tasks\Agent manager] will help, because agent runs under HTTP task.
Does anybody know how to figure out what is the reason of such issue and where I have to dig to fix it.
Update
Domino version 11.0
The agent is triggered by anonymous visitor and does some relatively heavy computation to construct HTML response (loops and lookups are present, but I'm sure all loops ends properly, without infinitive run).
I guess settings for HTTP Agents are under this section (so 2 mins).
Web Agents and Web Services
Run web agents and web services concurrently? Enabled
Web agent and web services timeout: 120 seconds
In general request takes between 300ms-1 second, however there are some heavy pages with 1-5 seconds (but nothing like 10 seconds or more).
I notice the error only when we get more than 50 active users (who activity open new pages and thus trigger the agent).
I guess Richard is right and there must be some condition when agent stuck (maybe related to views update or some background process).
For now I simply restart HTTP to get this issue fixed (for some time).
So my question could be re-phrased to:
What can cause delay of the agent that build web page (taking into account it's related to 50-100 active users).
Thanks a lot :-)

Concurrent webApi calls stuck on prod env only

i'm having hard time figuring out a blocking issue on a client.
the situation is:
A WebSite on Azure WebApp with Vue and Js and some api deployed on a vm on azure.
one (and only one) of the apis is having serious issues lately. Basically we upload images on a DAM using web api 2.0 as a wrapper.
When we upload multiple images together, the client application iterate through an array and perform X calls to the server(form-Data with images around 1mb). those calls arrives pretty much simultaneously on the server.
on dev/qa, everything works fine. i can see from the logs that different thread are created to handle X requests.
on production, this doesn't work anymore. different thread are created but basically the first request freeze everything for a couple of minutes (iis timeout), not even the log file is updated, and return a "task was canceled" error.
From the log I can see that the LogControllerHandler (that overrides the send async method) is hit but the controller is not; this seems not to happen with small images (150 kb).
the iis on prod and QA should be exactly the same, but for some reason, on prod I got this behavior I cannot figure out.
Any pointer would be apreciated.
Thanks

How to find/cure source of function app throughput issues

I have an Azure function app triggered by an HttpRequest. The function app reads the request, tosses one copy of it into a storage table for safekeeping and sends another copy to a queue for further processing by another element of the system. I have a client running an ApacheBench test that reports approximately 148 requests per second processed. That rate of processing will not be enough for our expected load.
My understanding of function apps is that it should spawn as many instances as is needed to handle the load sent to it. But this function app might not be scaling out quickly enough as it’s only handling that 148 requests per second. I need it to handle at least 200 requests per second.
I’m not 100% sure the problem is on my end, though. In analyzing the performance of my function app I found a LOT of 429 errors. What I found online, particularly https://learn.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-request-limits, suggests that these errors could be due to too many requests being sent from a single IP. Would several ApacheBench 10K and 20K request load tests within a given day cause the 429 error?
However, if that’s not it, if the problem is with my function app, how can I force my function app to spawn more instances more quickly? I assume this is the way to get more throughput per second. But I’m still very new at working with function apps so if there is a different way, I would more than welcome your input.
Maybe the Premium app service plan that’s in public preview would handle more throughput? I’ve thought about switching over to that and running a quick test but am unsure if I’d be able to switch back?
Maybe EventHub is something I need to investigate? Is that something that might increase my apparent throughput by catching more requests and holding on to them until the function app could accept and process them?
Thanks in advance for any assistance you can give.
You dont provide much context of you app but this is few steps how you can improve
If you want more control you need to use App Service plan with always on to avoid cold start, also you will need to configure auto scaling since you are responsible in this plan and auto scale is not enabled by default in app service plan.
Your azure function must be fully async as you have external dependencies so you dont want to block thread while you are calling them.
Look on the limits. Using host.json you can tweek it.
429 error means that function is busy to process your request, so probably when you writing to table you are not using async and blocking thread
Function apps work very well and scale as it says. It could be because request coming from Single IP and Azure could be considering it DDOS. You can do the following
AzureDevOps Load Test
You can load test using one of the azure service . I am very sure they have better criteria of handling IPs. Azure DeveOps Load Test
Provision VM in Azure
The way i normally do is provision the VM (windows 10 pro) in azure and use JMeter to Load test. I have use this method to test and it works fine. You can provision couple of them and subdivide the load.
Use professional Load testing services
If possible you may use services like Loader.io . They use sophisticated algos to run the load test and provision bunch of VMs to run the same test.
Use Application Insights
If not already you must be using application insights to have a better look from server perspective. Go to live stream and see how many instance it would provision to handle the load test . You can easily look into events and error logs that may be arising and investigate. You can deep dive into each associated dependency and investigate the problem.

Azure: offload memory-intensive task synchronously?

I'm designing an ASP.NET web application which is to be hosted in Azure. I plan to resize images that a user uploads on the server side. I've got a library that does this, but currently this happens in the main web front end VM. I'd like to offload this to a worker VM. I understand the pattern for doing this with a queue, but I don't want to return to the user until the task has completed because I plan to display the resized image upon completion of the processed request. So, how can I offload a task that will be performed synchronously? (meaning I won't return from the call site until the remote task has completed.)
Thanks...
-Ben
You can do something similar to what we do in the WebJobs Samples here: http://aspnet.codeplex.com/SourceControl/latest#Samples/AzureWebJobs/PhluffyShuffy/PhluffyShuffyWeb/Views/Shuffle/Index.cshtml
We have a very similar application: process an image and display it when it is done. For that we are just pooling for the result blob using JavaScript.
Holding threads on a web server is usually not a good idea; this negatively impacts scalability.
Consider having a UI JQuery that periodically checks whether the image is ready to be displayed. All you have to do is check for the presence of the image every second or so. There are a couple of ways to do this; in one implementation the client knows ahead of time the name of the image and just attempts to read it (for example a Blob); in another the UI doesn't know the name, but checks a record in an Azure Table that indicates the status of the image creation.

Resources