HTTP Web Server: Agent did not complete within configured time limit - lotus-notes

I have a web application that builds web-pages using agent (it's written in LS and we use [print html] to output HTML) and from time to time I see an error as below.
02-11-2020 10:00:18 HTTP Web Server: Agent did not complete within configured time limit [/path-to-database.nsf/web?openagent] Anonymous
02-11-2020 10:00:18 HTTP Server: Execution time limit exceeded by Agent '(Web)|Web' in database '/path-to-database.nsf'. Agent signer 'signer name'.
As a result HTTP task stuck so I have to restart it, but that means I have to monitor it all the time.
It does not seems to be related to agent time execution, otherwise I would have this issue constantly.
The activity does not seems to be the issue as well, according to google analytics it's around ~50 active users.
I doubt [Server Tasks\Agent manager] will help, because agent runs under HTTP task.
Does anybody know how to figure out what is the reason of such issue and where I have to dig to fix it.
Update
Domino version 11.0
The agent is triggered by anonymous visitor and does some relatively heavy computation to construct HTML response (loops and lookups are present, but I'm sure all loops ends properly, without infinitive run).
I guess settings for HTTP Agents are under this section (so 2 mins).
Web Agents and Web Services
Run web agents and web services concurrently? Enabled
Web agent and web services timeout: 120 seconds
In general request takes between 300ms-1 second, however there are some heavy pages with 1-5 seconds (but nothing like 10 seconds or more).
I notice the error only when we get more than 50 active users (who activity open new pages and thus trigger the agent).
I guess Richard is right and there must be some condition when agent stuck (maybe related to views update or some background process).
For now I simply restart HTTP to get this issue fixed (for some time).
So my question could be re-phrased to:
What can cause delay of the agent that build web page (taking into account it's related to 50-100 active users).
Thanks a lot :-)

Related

Azure slow communication between APIs

In some 1-5% of our requests, we are seeing slow communication between APIs (REST API requests). Both APIs are developed by us and hosted on Azure, each app service on its own app service plan in the same region, P1v2 tier.
What we are seeing on application insights is that POST or GET requests on origin API can take a few seconds to execute, while real execution time on destination API is only a few milliseconds.
Examples (first line POST request on origin, second execution time on destination API): slow req 1, slow req 2
Our best guess is that the time difference is lost in communication between components. We don't have an explanation for it since the payload is really small and in most cases, communication takes less than 5 milliseconds.
We dismiss the possible explanation it could be due to component cold start since it happens during constant load and no horizontal scaling was performed.
Do you have any idea what might cause it or how to do additional analysis in order to discover it?
If you're running multiple sites on the App Service Plan, then enable the "Always On" setting for your web app > All Settings > Application Settings > Click on Always On
See here for details: https://azure.microsoft.com/en-us/documentation/articles/web-sites-configure/
When Always On is off, the site is shut down after 20 minutes of inactivity to free up resources for any additional websites that might be using the same App Service Plan.
The amount of information it needs to collect, process and then present itself requires some time, and involve internal calls as well, that is why considering the server load and usage, it takes around 6 to 7 seconds sometimes even more.
To Troubleshoot that latency, try this steps, provided by Microsoft.

Ways to overcome the 230s hard coded limit to the SCM endpoint

Background
We have a PHP App Service and MySQL that is deployed using an Azure Devops Pipeline (YML). The content itself is a PHP site that it packaged up into a single file using Akeeba by an external supplier. The package is a Zip file (which can be deployed as a standard Zip deployment) and inside the Zip file is a huge JPA file. The JPA is essentially the whole web site plus database tables, settings, file renames and a ton of other stuff all rolled into one JPA file. Akeeba essentially unzips the files, copies them to the right places, does all the DB stuff and so on. To kick the process off, we can simply connect to a specific URL (web site + path) and run the PHP which does all the clever unpackaging via a web GUI. But, we want to include this stage in the pipeline instead so that the process is fully automated end to end. Akeeba has a CLI as an alternative to the Web GUI deployment, so it should go like this:
Create web app
Deploy the web site ZIP (zipDeploy)
Use the REST API to access Kudu and run the relevant command (php install.php web.jpa) to unpack the jpa and do the MySQL stuff - this normally takes well over 30 minutes (it is a big site and it has a lot of "stuff" to do - but, it does actually work in the end).
The problem is that the SCM REST API has a hard-coded 230s limit as described here: https://blog.headforcloud.com/2016/11/15/azure-app-service-hard-timeout-limit/
So, the unpack stage keeps throwing "Invoke-RestMethod : 500 - The request timed out" exactly on the 230s mark.
We have tried SCM_COMMAND_IDLE_TIMEOUT and WEBJOBS_IDLE_TIMEOUT but, unsurprisingly, they did not make any difference.
$cmd=#{"command"="php .\site\wwwroot\install.php .\site\wwwroot\web.jpa .\site\wwwroot"}
Invoke-RestMethod -Uri $url -Headers #{"Authorization"="Basic $creds"} -Body (ConvertTo-Json($cmd)) -Method Post -ContentType "application/json" -TimeoutSec 7200
I can think of a few hypothetical ways around it (some quite eccentric):
Find another way to run CLI commands inside the Web App after deployment other than the Kudu REST API. Is there such a thing? I Googled and checked SO but all I found were pointers to the way we do it (or try to do it) now.
Use something like Selenium to click the GUI buttons instead of using the CLI. (I do not know if they would suffer a timeout.)
Instead of running the command via Kudu REST, use the same API to create and deploy a script to the web server, start it and then let the REST API exit whilst the script still runs on the Web App. Essentially, bodge an async call but without the callback and then have the pipeline check in on the site at, say, 5 minute intervals. Clunky.
Extend the 230s limit - but I do not think that Microsoft make this possible.
Make the web site as fast as possible during the deployment in the hope of getting it under the 4-minute mark and then down-scale it. Yuk!
See what the Akeeba JPA unpacking actually does, unpack it pre-deployment and do what the unpackage process does but controlled via the Pipeline. This is potentially a lot of work and would lose the support of the supplier.
Give up on an automated deployment. That would rather defeat much of the purpose of a Devops pipeline.
Try AWS + terraform instead. That's not a approved infrastructure environment, however.
Given that Microsoft understandably do not want long-running API calls hanging around, I understand why the limit exists. However, I would expect therefore there to be a mechanism to interact with an App Service file system via a CLI in another way. Does anyone know how?
The 4 minute idle timeout on the TCP level and this is implemented on the Azure hardware load balancer. This timeout is not configurable and this cannot be changed. One thing I want to mention is that this is idle timeout at the TCP level which means that if the connection is idle only and no data transfer happening, only then this timeout is hit. To provide more info, this will hit if the web application got the request and kept processing the request for > 4minutes without sending any data back.
Resolution
Ideally in a web application, it is not good to keep the underlying HTTP request open and 4 minutes is a decent amount of time. If you have a requirement about background processing within your web application, then the recommended solution is to use Azure WebJobs and have the Azure Webapp interact with the Azure Webjob to notify once the background processing is done (there are many ways that Azure provides like queues triggers etc. and you can choose the method that suits you the best). Azure Webjobs are designed for background processing and you can do as much background processing as you want within them. I am sharing a few articles that talk about webjobs in detail
· http://www.hanselman.com/blog/IntroducingWindowsAzureWebJobs.aspx
· https://azure.microsoft.com/en-us/documentation/articles/websites-webjobs-resources/
============================================================================
It totally depends on the app. Message Queue comes to mind. There are a lot of potential solutions and it will be up to you to decide.
============================================================================
Option #1)
You can change the code to send some sort of header to continue to the client to keep the session open.
A sample is shown here
This shows the HTTP Headers with the Expect 100-continue header:
https://msdn.microsoft.com/en-us/library/aa287673%28v=vs.71%29.aspx?f=255&MSPPError=-2147217396
This shows how to add a Header to the collection:
https://msdn.microsoft.com/en-us/library/aa287502(v=vs.71).aspx
Option #2) Progress bar
This sample shows how to use a progress bar:
https://social.msdn.microsoft.com/Forums/vstudio/en-US/d84f4c89-ebbf-44d3-bc4e-43525ae1df45/how-to-increase-progressbar-when-i-running-havey-query-in-sql-server-or-oracle-?forum=csharpgeneral
Option #3) A common practice to keep the connection active for a longer period is to use TCP Keep-alive. Packets are sent when no activity is detected on the connection. By keeping on-going network activity, the idle timeout value is never hit and the connection is maintained for a long period
Option #4) You can also try the option of hosting your application as an IaaS VM instead of a APP SERVICE. This may avoid the ARR timeout issue because its architecture is different and I believe that the time-out is configurable.

dotnet core webapi, docker and azure. Performance issues during first api call

We have a dotnet core 3.0 solution running in a docker image running on Azure. For now, we haven't set it up in a k8s cluster. Our app service plan is PremiumV2, which basically means that we're running on dedicated hardware and not sharing our resources with anyone else.
We have a simple api-call to get the executing user based on the JWT. This validates the JWT, gets the users mail from the claims and queries cosmos to get more information about the user. When the request is sent from Postman the first time, it takes roughly 320 ms, however the subsequential requests takes around 50 ms. If we're waiting, lets say 10 minutes more or less, the requests is back at around 300 ms and again the subsequential requests takes around 50 ms. This indicates that the behavior is re-producable. Its worth mentioning that its not only this call that we see this behavior, but every "first" requests to our api takes more time than the other following.
Looking into application insights, apparently cosmos is not the bottleneck here. We've also configured the app service to be "always on"
Any ideas on how we can trace down this issue? Has anyone else experienced the same behavior? Is there any settings or configuration we should look at in Azure?

IIS7.5+ : Is it the correct way to describe Application Initialization feature?

Recently we want to cater the slow loading problem of IIS for first request, after I did some research, I've found that IIS7.5+ has a feature named "Application Initialization" which maybe what I need.
However I have to understand the mechanism before I try to apply it and here is my understanding:
With default IIS setting:
The application pool idle after 20 minutes
The corresponding worker process is killed
First request comes in
IIS starts to create a new worker process
IIS starts to load the application
The client can see after application is loaded
And step 4, 5 makes first request not so responsive.
With Application Initialization set:
The application pool idle after 20 minutes
The corresponding worker process is killed
IIS starts to create a new worker process
IIS starts to load the application through a "fake" request
First request comes in
The client can see after application is loaded
Now the first request is responsive as indeed it is not the first request to the server, sometimes before there was a "fake" request which kicks loading of the application.
What I would like to know is that:
Is my understanding correct?
When application initialization is set, the worker process is still being killed, but a new one is created right after it, is it the case?
That's pretty much how it works. Without Application Initialization, as you mentioned, once the worker process is killed, it is not restarted until a request is sent to it. Upon the first request, a new worker process (W3WP.exe) is started and it starts to load the application. And this cold start of the application is what typically makes the first request less responsive. For eg. if it's an ASP.NET application, the first request triggers the recompilation of the temporary ASP.NET files and this can take several seconds in a moderately large enterprise application.
If you look at the setup of Application Initialization, you will see that there are two main parts to it:
You need to set the startMode of the application pool associated with the website to AlwaysRunning
You need to set preloadEnabled to true on some path (path to the website) on the ApplicationPool
Step 1 is what tells IIS to automatically restart the IIS worker process whenever there is a reboot or IISReset. (You can easily see this in action in TaskManager - do only step 1 and do an IISReset, you should be seeing the existing W3WP.exe process getting removed and a new one is getting created)
Step 2 is what tells IIS to make the initial fake/dummy request that will do all the required initialisation of your web application. For eg. for an ASP.NET application, this essentially will trigger the compilation of all the ASP.NET files, so that the next request - the actual first request to the page does not experience the long delays associated with app initialisation.
While it is true that a traditional approach of keeping using a script to poll the app to prevent it from going idle can do the job, the ApplicationInitalization module makes the job much easier. You can even have IIS issue the dummy request to a custom warmup script that does much more than a simple page load - preloading a cache of several webpages, ahead of time generate/do any task that might otherwise take longer etc.
Official documentations here:
IIS 7.5
IIS 8.0
Your understanding is correct based on my experiences. I first ran into this capability in a performance testing scenario way back in 2014. I was custom coding the ping portion of this into monitoring jobs :O
"The Application Initialization Module basically allows you to turn on
Preloading on the Application Pool and the Site/IIS App, which
essentially fires a request through the IIS pipeline as soon as the
Application Pool has been launched. This means that effectively your
ASP.NET app becomes active immediately, Application_Start is fired
making sure your app stays up and running at all times." - Rick Strahl
Official detailed docs are on the MSDN site, from what I see not much has changed between IIS 7.5 and 8.0 in the way of config.

Slow response times from free web app server every day at same time

Every day at about 3:00PM-4:00PM GMT the response times start to increase (no memory increase or CPU increase)
There is a azure availability test going to server every 10 minutes.
As this is a dev site there is no traffic to it other than me (at the odd time) and the availability test
I log to a variable internally the startup time and this shows that the site is not restarting
The first request via a browser when this starts happening is very slow (2 minutes - probably some timeout).
After that it runs perfectly. That seems like the site is shutting down and then starting up on first request, but the pings are keeping it alive so the site is not shutting down (as far as I know)
On the odd log entry I get - I seem to be getting 502 errors - but I can't confirm this as the FEEB logs are usually off at this time.
FREB logs turn off automatically after 1 hour and as this is the middle of the night for me (NZDT) - I don't get a chance to turn on.
See attached images - as you can see the response times just increase at same time
Ignore the requests where they are above 20 - thats me going to it via browser
I always check the azure dashboard BEFORE viewing site in browser
Just got this error (from web browser randomly - keep accessing the same page:
502: The specified CGI application encountered an error and the server terminated the process.
Other relevant Info (Perhaps):
I initially had the availability test ping going to a ping endpoint /ping that only returned a 200 and empty string when I noticed this happening
It now points to the sites homepage to see if it changed anything - still the same.
Assuming the database is not the issue as the /ping endpoint doesn't touch the database - just a straight controller return.
Internal Exception handling is catching nothing
Service: Azure Free Web App (Development)
There are no web jobs or timed events on this site
Azure Dashboard Initial
Current tests:
Uploading as new site to a Basic 1 Small
Restarting dev site 12 hours before issues (usually 20 hours before)
Results:
Restarting free web-app 12ish hours before issue - same result at same time - so its not the app slowly overloading or it would me much later
Basic 1 Small: no problems - could it be something with the dev server ?
Azure Dashboard From Today
Observations:
Same behavior with /ping endpoint (just return empty string 200 Ok) and Main home page endpoint (database lookups [w/caching] / razer)
If anyone has any ideas what might be going on - I would very much appreciate it
:-)
Update:
It seems to of stopped (on its own) about 11/1/2016 1:50:49 AM GMT - my internal timestamp says it restarted - and then the errors started again same time as usual. Note: no-one is using the app. The basic 1 Small Server is still going fine.
Sorry I can't add anymore images (not enough rep)
By default, web apps are unloaded if they are idle for some period of time, which could cause the web site slow response during this period of time. Besides, this article is about troubleshooting HTTP "502 Bad Gateway" error or a HTTP "503 Service Unavailable" error in Azure web apps, you could read it. And from the article we could know scaling the web app could mitigate the issue.

Resources