Background:
I have a Python (console) application which includes a socket server. This application receives messages from a 3rd party client (start and stop messages from certain Process A) to control a recording data task (like start and stop recording). You can think of it as receiving messages via sockets to start and stop recording data from the same Process A for about 5 minutes. The 3rd party client sends messages for nearly 2 hours and then stops, and at the end, the Python application will be producing a group of files per session.
This application is running 24/7 (unattended on a Windows 10 Desktop machine) and there is a logging console open as well, but I have noticed that sometimes (Haven't identified a pattern) after running for 4 or 5 days, I access the system remotely, using TeamViewer, and the console window is showing that the last message is of 1-2 days ago. But once I click on the console or press a key in that console, I receive a full batch of messages from the sessions missed during those last days, thus, start and stop messages are received "simultaneously" leading to rubbish data files.
The code:
This is the socket server part of the code. I know I'm setting a buffer of 1024, but in normal operation, this buffer should not be full to read the data
with conn:
#display client information
logger.info('Connected with ' + addr[0] + ':' + str(addr[1]))
while self.enable:
#now keep talking with the client
data = conn.recv(1024)
if data:
self.data_cb(data)
else:
logger.debug("no data, closing connection." )
break
Question:
What is leading to this buffering behaviour?
Could it be...
the 3rd party client?
my Python application?
Something in Windows network stuff?
Has anyone had experienced something like this?
Any idea is really appreciated as I have no clue why is this happening? Thanks.
Edit - Additional info:
The application is running on a real desktop machine (no virtual machine)
The application has been able to work continuously for almost a month (just stopped for valid external reasons, power outage, version update, etc)
Last time I accessed through Teamviewer and noticed that the app wasn't receiving messages for a day (the app was running for 4 days at that time), BUT I assumed it was for another reason and planned to go to the site and check (Because something similar happened before). I accessed the next day, and it was the same. But on the third day, I click on the console and tried to review the messages and instantly the whole batch of messages from the previous 2 days appeared on the log.
The app has been running for 2 weeks and did not access the PC through TeamViewer during the last 4 days, in case that accessing it could prevent the issue to occur.
TL;DR
The selection feature of Command Prompt window prevents somehow the application from printing logging messages and/or reading data from the socket (both are in the same thread).
Well, I found the cause of this buffering behaviour but I am not sure if it is a known thing or not (It was not for me, so I will post later a specific question about that selection feature).
When I checked the system today I found that the console messages were frozen at 3 days before, so I clicked on the console window, and hit a key and all the messages for 3 days were shown at once. Then, I suspected of the selection feature of the console output.
I started the application as usual and followed these steps:
I selected a part of the content in the application console.
Using another console, I connected from a dummy client using ncat (At this point the expected client connected message didn't show up)
I sent dummy messages (didn't show up either)
I finished ncat connection (CTRL-C)
Clicked on the application console and hit any key
Voila! All the logging messages (regarding connection and data appeared), and all the messages that I sent using ncat were received as one big message.
EDIT: Didn't need to create a question, it's a known "feature". There are good questions here, here and here. The last one shows how to disable this "feature".
Related
My program has a local server that sends requests and receives information from there. everything works through the foreground service, that is, the application is never unloaded from memory(working as a daemon), it always works. The problem is that after some time of inactivity of the smartphone, requests start working not once every 3 seconds, but every 30-80 seconds (approximately). is it possible to somehow speed up the work in this case? After unlocking the smartphone, the speed returns
is it possible to somehow speed up the work in this case?
Yes: https://developer.android.com/reference/android/provider/Settings#ACTION_REQUEST_IGNORE_BATTERY_OPTIMIZATIONS
But you should have a good reason for this.
Read more on Doze Mode: https://developer.android.com/training/monitoring-device-state/doze-standby
You can use a push notification to wake up the device for your use case. Ask your assistant to find device -> send push notification -> app wakes and up does what you want.
I have a C# WebApp MVC5. Everything usually works perfectly, users create invoices every minute, there are 10 users making invoices concurrently in different locations and different machines.
The issue happens once a week.
In the logs, I can see the post is called twice at the same time by the same user, I see some network lag on the client-side when this happens, but I'm not able to reproduce it, even using the network utility of chrome DevTools to simulate network lag.
Of course, I can add some business validation before persisting the data into the database in order to avoid duplicate data, but that's not the real issue.
I've read on the internet it would be because IIS Http2 is enabled and should be disabled, so I've done that a couple of weeks ago, but the error is still occurring.
This is not an issue of an "unintentional double click on a button", I'm pretty sure is not because I make sure to disable the button once it is clicked and enabled back once the server returns a response.
See the logs: the first one takes 9002ms to completes while the second one takes 444ms. That's the network lag I've identified so far because this post usually takes less than a second to completes.
2021-09-22 16:21:41 167.86.95.177 POST /Sales/Invoices/Save - 443 jnamicela 45.225.105.89 Mozilla/5.0+(Windows+NT+6.3;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/93.0.4577.82+Safari/537.36 https://xpertdynamics.com/Home/Index 200 0 1236 9002
2021-09-22 16:21:41 167.86.95.177 POST /Sales/Invoices/Save - 443 jnamicela 45.225.105.89 Mozilla/5.0+(Windows+NT+6.3;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/93.0.4577.82+Safari/537.36 https://xpertdynamics.com/Home/Index 200 0 0 444
It's solved. It was an issue on client-side. Basically they have unstable internet connection. When they click on the 'save' button and in the middle of the process they unspecteclly lose internet connection, the jquery.post will go directly to the post.fail, but the request was successfully sent to the server, it is just the browser that doesn't know it because internet connection was lost. So the user clicks on the 'save' button once again.
I just included a validation step before calling jquery.post. It is: check for internet connection using navigator.onLine, if yes, then check for the user session is still alive. if(true && true) then call jquery.post.
I've been monitoring since 3 weeks ago, and the error never happened again.
We currently have a production node.js application that has been underperforming for a while. Now the application is a live bidding platform, and also runs timed auctions. The actual system running live sales is perfect and works as required. We have noticed that while running our timed sales (where items in a sale have timers and they incrementally finish, and if someone bids within the last set time, it will increment the time up X amount of seconds).
Now the issue I have found is that during the period of a timed sale finishing (which can go on for hours) if items have 60 seconds between each lots and have extensions if users bid in the last 10 seconds. So we were able to connect via the devtools and I have done heap memory exports to see what is going on, but all I can see is that all indications point to stream writeable and the buffers. So my question is what am I doing wrong. See below a screenshot of a heap memory export:
As you can see from the above, there is a lot of memory being used specifically for this it was using 1473MB of physical RAM. We saw this rise very quickly (within 30 mins) and each increment seemed to be more than the last. So when it hit 3.5GB it was incrementing at around 120MB each second, and then as it got higher around 5GB it was incrementing at 500MB per second and got to around 6GB and then the worker crashed (has a max heap size of 8GB), and then we were a process down.
So let me tell you about the platform. It is of course a bidding platform as I said earlier, the platform uses Node (v11.3.0) and is clustered using the built in cluster library. It spawns 4 workers, and has the main process (so 5 altogether). The system accepts bids, checks other bids, calculates who is winning and essentially pushes updates to the connected clients via Redis PUB/SUB then that is broadcasted to that workers connected users.
All data is stored within redis and mysql is used to refresh data into redis as redis has performed 10x faster than mysql was able to.
Now the way this works is on connection a small session is created against the connection, this is then used to authenticate the user (which is a message sent from the client) all message events are sent to a handler which pushes it to the correct command these commands are then all set as async functions and run async.
Now this has no issue on small scale, but we had over 250 connections and was seeing the above behaviour and are unsure where to find a fix. We noticed when opening the top obejct, it was connected to buffer.js and stream_writable.js as well. I can also see all references are connected to system / JSArrayBufferData and all refer back to these, there are lots of objects, and we are unable to fix this issue.
We think one of the following:
We log to file using append mode, which logs lots of information to the console and to a file using fs.writeFile and append mode. We did some research and saw that writing to console can be a cause of this kind of behaviour.
It is the get lots function which outputs all the lots for that page (currently set to 50) every time an item finishes, so if the timer ends it will ask for a full page load for all the items on that page, instead of adding new lots in.
There is something else happening here that we are unaware of, maybe the external library we are using that may not be removing a reference.
I have listed the libraries of interest that we require here:
"bluebird": "^3.5.1", (For promisifying the redis library)
"colors": "^1.2.5", (Used on every console.log (we call logs for everything that happens this can be around 50 every few seconds.)
"nodejs-websocket": "^1.7.1", (Our websocket library)
"redis": "^2.8.0", (Our redis client)
Anyway, if there is anything painstakingly obvious I would love to hear, as everything I have followed online and other stack overflow questions does not relate close enough to the issue we are facing.
tl;dr: Mail-listener2 appears to timeout, and I want to continually listen for emails without requiring my script to restart.
I'm using the mailer-listerner2 package ( https://github.com/chirag04/mail-listener2/ ) in my node.js project. I would like to continually listen for emails arriving into a particular inbox and then parse these emails for further processing.
I have a connection established as well as my parsing all working, however I am seeing that the imap connection appears to timeout, or at least becomes unresponsive to new emails arriving.
As the mail-listener2 package relies on the imap npm package I have taken a look through the code and attempted to reduce the IDLE timer so that it sends a request to the imap (gmail) servers every 10 seconds instead of once every 30 minutes.
This indeed improved the situation however when waking this morning to check the logs I see the following:
<= 'IDLE OK IDLE terminated (Success)'
=> 'IDLE IDLE'
<= '+ idling'
=> DONE
<= 'IDLE OK IDLE terminated (Success)'
=> 'IDLE IDLE'
<= '+ idling'
[connection] Ended
[connection] Closed
The connection ended & closed appear to come from the core imap module. I thought sending an IDLE check would ensure that the disconnect does not happen, but as you can see this is not the case.
I have also tried looking into Noop but it appears to cause some other issues with mails being read twice.
I understand that if my timers are too low e.g. every few seconds this can cause mails to continually be parsed due to the calls blocking the server responses, which may be why I am seeing the Noop issue above.
Without wanting to go off and keep experimenting with this I'd like to know if others have hit this issue and overcome?
For anyone interested - I've pulled together a bunch of the mail-listener2 forks. Some of these had approached the reconnection issue so I refactored this slightly into a single implementation. I've also pulled together a few other bits not relevant to this issue.
https://www.npmjs.com/package/mail-listener-fixed
https://github.com/Hardware-Hacks/mail-listener-fixed/
We are consuming TM1 cubes with Report Studio through Framework Manager.
Quite often when I am trying to come up with new solutions to my challenges in Report Studio, I get an error when I run the report, and then the server goes down. Then I have to restart the dispatchers (Cognos Administration -> Status -> System -> Right Click on the server -> Test Dispatchers -> Right Click on the server -> Start Dispatchers).
The error message that I get is:
The connection closed before the request is processed. If you are
using WebSphere Application Server, to reduce the frequency of this
error, increase the Persistent Timeout parameter for the Web container
transport chains in the administrative console. Increase the time in
10-15 second intervals until the error no longer or rarely occurs.
We are not using WebSphere, but Tomcat (default with the installation).
-> Increasing connection timout interval on WebSphere thus not applicable
-> The timeout interval in the Tomcat config seems to be 60 seconds (60000 ms)
More importantly: The error message shows immediately (after 1 second) when I run the report
-> Indicates to me that this is regardless of any timeout interval setting
Additional info: The error message comes almost always when I manually and dynamically attempt to build MUNs. However, sometimes (dunno when and why) it shows the MUN that I've created and tells me that it is invalid. Which is WAY much better for debugging.
Any suggestions on why this is happening and how to fix it would be greatly appreciated!
Edit 1: http://www.linkedin.com/groups/Product-Cognos-BI-1011-Cognos-3917273.S.143157206
This post states (almost at the bottom) that
When the Cognos BI report ask for a field that does not exist, the TM1
Application disconnects the connection. And the Cognos BI Report will
timeout.
Is this true? If so; why am I sometimes told that my MUN is invalid, whereas other times the connection is closed and the server shut down? Is it because even Report Studio thinks that my MUN is valid and tries to get it from the TM1 Server?
And additionally: Is it possible to change this behavior for the TM1 server?
Edit 2: Or change the BI server behavior so that it does not shut down when the TM1 connection is disconnected, but rather show an error of some kind?
Thanks again!
Edit 3: Okay, so I did some checking with the TM1 top utility (http://pic.dhe.ibm.com/infocenter/ctm1/v9r5m0/index.jsp?topic=%2Fcom.ibm.swg.im.cognos.tm1_op.9.5.1.doc%2Ftm1_op_id6961UsingtheTM1TopUtility_N160F47.html).
When a normal report is run, a new thread is shown in the monitoring list. This thread then disappears when I stop the BI server dispatchers, or automatically after approximately 5 minutes of idle time without any reports being run (according to the TM1 Top log dump).
Likewise, when the error occurs, a new thread is shown in the list. However, it disappears after a short second (probably because the BI server dispatchers are shut down).
I have therefore concluded that it is safe to assume (?) that the request seems to reach the TM1 server, and that TM1 returns something back (or simply closes the connection as suggested in the linkedin-post that I referenced in my first edit) . And hence, that it is safe to assume that this is something that have to be fixed on the BI server side(?).
The question is therefore more likely: Is it possible to change the BI server behavior so that it does not shut down when the TM1 server returns something invalid or closes the connection, and rather show some kind of error message instead?
Thanks for any input!