I have a C# WebApp MVC5. Everything usually works perfectly, users create invoices every minute, there are 10 users making invoices concurrently in different locations and different machines.
The issue happens once a week.
In the logs, I can see the post is called twice at the same time by the same user, I see some network lag on the client-side when this happens, but I'm not able to reproduce it, even using the network utility of chrome DevTools to simulate network lag.
Of course, I can add some business validation before persisting the data into the database in order to avoid duplicate data, but that's not the real issue.
I've read on the internet it would be because IIS Http2 is enabled and should be disabled, so I've done that a couple of weeks ago, but the error is still occurring.
This is not an issue of an "unintentional double click on a button", I'm pretty sure is not because I make sure to disable the button once it is clicked and enabled back once the server returns a response.
See the logs: the first one takes 9002ms to completes while the second one takes 444ms. That's the network lag I've identified so far because this post usually takes less than a second to completes.
2021-09-22 16:21:41 167.86.95.177 POST /Sales/Invoices/Save - 443 jnamicela 45.225.105.89 Mozilla/5.0+(Windows+NT+6.3;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/93.0.4577.82+Safari/537.36 https://xpertdynamics.com/Home/Index 200 0 1236 9002
2021-09-22 16:21:41 167.86.95.177 POST /Sales/Invoices/Save - 443 jnamicela 45.225.105.89 Mozilla/5.0+(Windows+NT+6.3;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/93.0.4577.82+Safari/537.36 https://xpertdynamics.com/Home/Index 200 0 0 444
It's solved. It was an issue on client-side. Basically they have unstable internet connection. When they click on the 'save' button and in the middle of the process they unspecteclly lose internet connection, the jquery.post will go directly to the post.fail, but the request was successfully sent to the server, it is just the browser that doesn't know it because internet connection was lost. So the user clicks on the 'save' button once again.
I just included a validation step before calling jquery.post. It is: check for internet connection using navigator.onLine, if yes, then check for the user session is still alive. if(true && true) then call jquery.post.
I've been monitoring since 3 weeks ago, and the error never happened again.
Related
I have a web application that builds web-pages using agent (it's written in LS and we use [print html] to output HTML) and from time to time I see an error as below.
02-11-2020 10:00:18 HTTP Web Server: Agent did not complete within configured time limit [/path-to-database.nsf/web?openagent] Anonymous
02-11-2020 10:00:18 HTTP Server: Execution time limit exceeded by Agent '(Web)|Web' in database '/path-to-database.nsf'. Agent signer 'signer name'.
As a result HTTP task stuck so I have to restart it, but that means I have to monitor it all the time.
It does not seems to be related to agent time execution, otherwise I would have this issue constantly.
The activity does not seems to be the issue as well, according to google analytics it's around ~50 active users.
I doubt [Server Tasks\Agent manager] will help, because agent runs under HTTP task.
Does anybody know how to figure out what is the reason of such issue and where I have to dig to fix it.
Update
Domino version 11.0
The agent is triggered by anonymous visitor and does some relatively heavy computation to construct HTML response (loops and lookups are present, but I'm sure all loops ends properly, without infinitive run).
I guess settings for HTTP Agents are under this section (so 2 mins).
Web Agents and Web Services
Run web agents and web services concurrently? Enabled
Web agent and web services timeout: 120 seconds
In general request takes between 300ms-1 second, however there are some heavy pages with 1-5 seconds (but nothing like 10 seconds or more).
I notice the error only when we get more than 50 active users (who activity open new pages and thus trigger the agent).
I guess Richard is right and there must be some condition when agent stuck (maybe related to views update or some background process).
For now I simply restart HTTP to get this issue fixed (for some time).
So my question could be re-phrased to:
What can cause delay of the agent that build web page (taking into account it's related to 50-100 active users).
Thanks a lot :-)
Background:
I have a Python (console) application which includes a socket server. This application receives messages from a 3rd party client (start and stop messages from certain Process A) to control a recording data task (like start and stop recording). You can think of it as receiving messages via sockets to start and stop recording data from the same Process A for about 5 minutes. The 3rd party client sends messages for nearly 2 hours and then stops, and at the end, the Python application will be producing a group of files per session.
This application is running 24/7 (unattended on a Windows 10 Desktop machine) and there is a logging console open as well, but I have noticed that sometimes (Haven't identified a pattern) after running for 4 or 5 days, I access the system remotely, using TeamViewer, and the console window is showing that the last message is of 1-2 days ago. But once I click on the console or press a key in that console, I receive a full batch of messages from the sessions missed during those last days, thus, start and stop messages are received "simultaneously" leading to rubbish data files.
The code:
This is the socket server part of the code. I know I'm setting a buffer of 1024, but in normal operation, this buffer should not be full to read the data
with conn:
#display client information
logger.info('Connected with ' + addr[0] + ':' + str(addr[1]))
while self.enable:
#now keep talking with the client
data = conn.recv(1024)
if data:
self.data_cb(data)
else:
logger.debug("no data, closing connection." )
break
Question:
What is leading to this buffering behaviour?
Could it be...
the 3rd party client?
my Python application?
Something in Windows network stuff?
Has anyone had experienced something like this?
Any idea is really appreciated as I have no clue why is this happening? Thanks.
Edit - Additional info:
The application is running on a real desktop machine (no virtual machine)
The application has been able to work continuously for almost a month (just stopped for valid external reasons, power outage, version update, etc)
Last time I accessed through Teamviewer and noticed that the app wasn't receiving messages for a day (the app was running for 4 days at that time), BUT I assumed it was for another reason and planned to go to the site and check (Because something similar happened before). I accessed the next day, and it was the same. But on the third day, I click on the console and tried to review the messages and instantly the whole batch of messages from the previous 2 days appeared on the log.
The app has been running for 2 weeks and did not access the PC through TeamViewer during the last 4 days, in case that accessing it could prevent the issue to occur.
TL;DR
The selection feature of Command Prompt window prevents somehow the application from printing logging messages and/or reading data from the socket (both are in the same thread).
Well, I found the cause of this buffering behaviour but I am not sure if it is a known thing or not (It was not for me, so I will post later a specific question about that selection feature).
When I checked the system today I found that the console messages were frozen at 3 days before, so I clicked on the console window, and hit a key and all the messages for 3 days were shown at once. Then, I suspected of the selection feature of the console output.
I started the application as usual and followed these steps:
I selected a part of the content in the application console.
Using another console, I connected from a dummy client using ncat (At this point the expected client connected message didn't show up)
I sent dummy messages (didn't show up either)
I finished ncat connection (CTRL-C)
Clicked on the application console and hit any key
Voila! All the logging messages (regarding connection and data appeared), and all the messages that I sent using ncat were received as one big message.
EDIT: Didn't need to create a question, it's a known "feature". There are good questions here, here and here. The last one shows how to disable this "feature".
I have a production Azure website that I deployed a few days ago. I saw the "Availability" section in application insights (configured server side AI and client side AI is enabled right now) and through the new portal (portal.azure.com) I decided to set up availability "ping" test.
I left the majority of the settings at the default, Test Type was URL ping test, the URL is set to the root of my app, and frequency is 5 minutes. I left the test locations to the default of 5 in the different US regions.
What I noticed was that all the tests failed and I saw a lot of requests in my app for GET / with a status of 404. Since it was filling up my application insights request log with junk, and the availability test registered all failures, I deleted the availability test. Annoyingly after deleting, I noticed it was still there after refreshing the page, so I deleted it again and now it shows as "Not configured" and seems to be truly gone.
Before I deleted the availability test, I saw all these GET / requests in my AI logs and I looked at their IP addresses, they indeed seemed to come from the different US regions.
After I deleted the test, I assumed they would all stop. Unfortunately that did not happen, all but 1 of the 5 ping tests stopped but the one from with IP address as ::1 still seems to be happening. For almost a week that ping test still occurs every 5 minutes even though I deleted it.
How to remove the availability test completely from application insights?
The request that you see every 5 minutes is likely caused by the Always On feature, which uses it to keep the site alive. It is not related to the availability test.
You can verify that by temporarily turning off Always On and verifying that those requests stop.
Every day at about 3:00PM-4:00PM GMT the response times start to increase (no memory increase or CPU increase)
There is a azure availability test going to server every 10 minutes.
As this is a dev site there is no traffic to it other than me (at the odd time) and the availability test
I log to a variable internally the startup time and this shows that the site is not restarting
The first request via a browser when this starts happening is very slow (2 minutes - probably some timeout).
After that it runs perfectly. That seems like the site is shutting down and then starting up on first request, but the pings are keeping it alive so the site is not shutting down (as far as I know)
On the odd log entry I get - I seem to be getting 502 errors - but I can't confirm this as the FEEB logs are usually off at this time.
FREB logs turn off automatically after 1 hour and as this is the middle of the night for me (NZDT) - I don't get a chance to turn on.
See attached images - as you can see the response times just increase at same time
Ignore the requests where they are above 20 - thats me going to it via browser
I always check the azure dashboard BEFORE viewing site in browser
Just got this error (from web browser randomly - keep accessing the same page:
502: The specified CGI application encountered an error and the server terminated the process.
Other relevant Info (Perhaps):
I initially had the availability test ping going to a ping endpoint /ping that only returned a 200 and empty string when I noticed this happening
It now points to the sites homepage to see if it changed anything - still the same.
Assuming the database is not the issue as the /ping endpoint doesn't touch the database - just a straight controller return.
Internal Exception handling is catching nothing
Service: Azure Free Web App (Development)
There are no web jobs or timed events on this site
Azure Dashboard Initial
Current tests:
Uploading as new site to a Basic 1 Small
Restarting dev site 12 hours before issues (usually 20 hours before)
Results:
Restarting free web-app 12ish hours before issue - same result at same time - so its not the app slowly overloading or it would me much later
Basic 1 Small: no problems - could it be something with the dev server ?
Azure Dashboard From Today
Observations:
Same behavior with /ping endpoint (just return empty string 200 Ok) and Main home page endpoint (database lookups [w/caching] / razer)
If anyone has any ideas what might be going on - I would very much appreciate it
:-)
Update:
It seems to of stopped (on its own) about 11/1/2016 1:50:49 AM GMT - my internal timestamp says it restarted - and then the errors started again same time as usual. Note: no-one is using the app. The basic 1 Small Server is still going fine.
Sorry I can't add anymore images (not enough rep)
By default, web apps are unloaded if they are idle for some period of time, which could cause the web site slow response during this period of time. Besides, this article is about troubleshooting HTTP "502 Bad Gateway" error or a HTTP "503 Service Unavailable" error in Azure web apps, you could read it. And from the article we could know scaling the web app could mitigate the issue.
We are consuming TM1 cubes with Report Studio through Framework Manager.
Quite often when I am trying to come up with new solutions to my challenges in Report Studio, I get an error when I run the report, and then the server goes down. Then I have to restart the dispatchers (Cognos Administration -> Status -> System -> Right Click on the server -> Test Dispatchers -> Right Click on the server -> Start Dispatchers).
The error message that I get is:
The connection closed before the request is processed. If you are
using WebSphere Application Server, to reduce the frequency of this
error, increase the Persistent Timeout parameter for the Web container
transport chains in the administrative console. Increase the time in
10-15 second intervals until the error no longer or rarely occurs.
We are not using WebSphere, but Tomcat (default with the installation).
-> Increasing connection timout interval on WebSphere thus not applicable
-> The timeout interval in the Tomcat config seems to be 60 seconds (60000 ms)
More importantly: The error message shows immediately (after 1 second) when I run the report
-> Indicates to me that this is regardless of any timeout interval setting
Additional info: The error message comes almost always when I manually and dynamically attempt to build MUNs. However, sometimes (dunno when and why) it shows the MUN that I've created and tells me that it is invalid. Which is WAY much better for debugging.
Any suggestions on why this is happening and how to fix it would be greatly appreciated!
Edit 1: http://www.linkedin.com/groups/Product-Cognos-BI-1011-Cognos-3917273.S.143157206
This post states (almost at the bottom) that
When the Cognos BI report ask for a field that does not exist, the TM1
Application disconnects the connection. And the Cognos BI Report will
timeout.
Is this true? If so; why am I sometimes told that my MUN is invalid, whereas other times the connection is closed and the server shut down? Is it because even Report Studio thinks that my MUN is valid and tries to get it from the TM1 Server?
And additionally: Is it possible to change this behavior for the TM1 server?
Edit 2: Or change the BI server behavior so that it does not shut down when the TM1 connection is disconnected, but rather show an error of some kind?
Thanks again!
Edit 3: Okay, so I did some checking with the TM1 top utility (http://pic.dhe.ibm.com/infocenter/ctm1/v9r5m0/index.jsp?topic=%2Fcom.ibm.swg.im.cognos.tm1_op.9.5.1.doc%2Ftm1_op_id6961UsingtheTM1TopUtility_N160F47.html).
When a normal report is run, a new thread is shown in the monitoring list. This thread then disappears when I stop the BI server dispatchers, or automatically after approximately 5 minutes of idle time without any reports being run (according to the TM1 Top log dump).
Likewise, when the error occurs, a new thread is shown in the list. However, it disappears after a short second (probably because the BI server dispatchers are shut down).
I have therefore concluded that it is safe to assume (?) that the request seems to reach the TM1 server, and that TM1 returns something back (or simply closes the connection as suggested in the linkedin-post that I referenced in my first edit) . And hence, that it is safe to assume that this is something that have to be fixed on the BI server side(?).
The question is therefore more likely: Is it possible to change the BI server behavior so that it does not shut down when the TM1 server returns something invalid or closes the connection, and rather show some kind of error message instead?
Thanks for any input!