tl;dr: Mail-listener2 appears to timeout, and I want to continually listen for emails without requiring my script to restart.
I'm using the mailer-listerner2 package ( https://github.com/chirag04/mail-listener2/ ) in my node.js project. I would like to continually listen for emails arriving into a particular inbox and then parse these emails for further processing.
I have a connection established as well as my parsing all working, however I am seeing that the imap connection appears to timeout, or at least becomes unresponsive to new emails arriving.
As the mail-listener2 package relies on the imap npm package I have taken a look through the code and attempted to reduce the IDLE timer so that it sends a request to the imap (gmail) servers every 10 seconds instead of once every 30 minutes.
This indeed improved the situation however when waking this morning to check the logs I see the following:
<= 'IDLE OK IDLE terminated (Success)'
=> 'IDLE IDLE'
<= '+ idling'
=> DONE
<= 'IDLE OK IDLE terminated (Success)'
=> 'IDLE IDLE'
<= '+ idling'
[connection] Ended
[connection] Closed
The connection ended & closed appear to come from the core imap module. I thought sending an IDLE check would ensure that the disconnect does not happen, but as you can see this is not the case.
I have also tried looking into Noop but it appears to cause some other issues with mails being read twice.
I understand that if my timers are too low e.g. every few seconds this can cause mails to continually be parsed due to the calls blocking the server responses, which may be why I am seeing the Noop issue above.
Without wanting to go off and keep experimenting with this I'd like to know if others have hit this issue and overcome?
For anyone interested - I've pulled together a bunch of the mail-listener2 forks. Some of these had approached the reconnection issue so I refactored this slightly into a single implementation. I've also pulled together a few other bits not relevant to this issue.
https://www.npmjs.com/package/mail-listener-fixed
https://github.com/Hardware-Hacks/mail-listener-fixed/
Related
Background:
I have a Python (console) application which includes a socket server. This application receives messages from a 3rd party client (start and stop messages from certain Process A) to control a recording data task (like start and stop recording). You can think of it as receiving messages via sockets to start and stop recording data from the same Process A for about 5 minutes. The 3rd party client sends messages for nearly 2 hours and then stops, and at the end, the Python application will be producing a group of files per session.
This application is running 24/7 (unattended on a Windows 10 Desktop machine) and there is a logging console open as well, but I have noticed that sometimes (Haven't identified a pattern) after running for 4 or 5 days, I access the system remotely, using TeamViewer, and the console window is showing that the last message is of 1-2 days ago. But once I click on the console or press a key in that console, I receive a full batch of messages from the sessions missed during those last days, thus, start and stop messages are received "simultaneously" leading to rubbish data files.
The code:
This is the socket server part of the code. I know I'm setting a buffer of 1024, but in normal operation, this buffer should not be full to read the data
with conn:
#display client information
logger.info('Connected with ' + addr[0] + ':' + str(addr[1]))
while self.enable:
#now keep talking with the client
data = conn.recv(1024)
if data:
self.data_cb(data)
else:
logger.debug("no data, closing connection." )
break
Question:
What is leading to this buffering behaviour?
Could it be...
the 3rd party client?
my Python application?
Something in Windows network stuff?
Has anyone had experienced something like this?
Any idea is really appreciated as I have no clue why is this happening? Thanks.
Edit - Additional info:
The application is running on a real desktop machine (no virtual machine)
The application has been able to work continuously for almost a month (just stopped for valid external reasons, power outage, version update, etc)
Last time I accessed through Teamviewer and noticed that the app wasn't receiving messages for a day (the app was running for 4 days at that time), BUT I assumed it was for another reason and planned to go to the site and check (Because something similar happened before). I accessed the next day, and it was the same. But on the third day, I click on the console and tried to review the messages and instantly the whole batch of messages from the previous 2 days appeared on the log.
The app has been running for 2 weeks and did not access the PC through TeamViewer during the last 4 days, in case that accessing it could prevent the issue to occur.
TL;DR
The selection feature of Command Prompt window prevents somehow the application from printing logging messages and/or reading data from the socket (both are in the same thread).
Well, I found the cause of this buffering behaviour but I am not sure if it is a known thing or not (It was not for me, so I will post later a specific question about that selection feature).
When I checked the system today I found that the console messages were frozen at 3 days before, so I clicked on the console window, and hit a key and all the messages for 3 days were shown at once. Then, I suspected of the selection feature of the console output.
I started the application as usual and followed these steps:
I selected a part of the content in the application console.
Using another console, I connected from a dummy client using ncat (At this point the expected client connected message didn't show up)
I sent dummy messages (didn't show up either)
I finished ncat connection (CTRL-C)
Clicked on the application console and hit any key
Voila! All the logging messages (regarding connection and data appeared), and all the messages that I sent using ncat were received as one big message.
EDIT: Didn't need to create a question, it's a known "feature". There are good questions here, here and here. The last one shows how to disable this "feature".
I am trying to implement a feature which notifies the user of disconnections to pusher, and indicates when reconnection has occured. My first experiment is simply to log changing pusher states to console:
var pusher = new Pusher('MY_ACCOUNT_STRING');
pusher.connection.bind('state_change', function(states) {
console.log(states.current);
});
I then refresh the page, get a connection to pusher, disable my internet connection, wait for pusher to detect the disconnection, re-enable my internet connection, and wait for pusher to detect that. Here's a screenshot of chrome's console output during the process (click here for a larger version):
Here are my questions:
It took over a minute, possibly even 2-3 minutes, before the disconnection was detected by pusher. Is there a way to decrease that time so pusher detects disconnection within 10 or so seconds?
Why am I seeing those red errors, and what exactly do they mean? Is that normal? I would think with the correct setup the errors would be handled, since a disconnection event is an "expected" exception within the pusher context.
What is the 1006 error and why am I seeing that?
Thanks for any help!
EDIT:
I've been watching the output for a long-standing connection, and I've also seen this a number of times, and would like to know the cause of it, and how I can capture it and handle it?
disconnected login.js:146
connecting login.js:146
Pusher : Error : {"type":"WebSocketError","error":{"type":"PusherError","data":{"code":1007,"message":"Server heartbeat missed"}}} pusher.min.js:12
connected
That's no normal behavior. Have you had the chance to check this on different machine and network? It looks like a network problem.
Question 1.
When I disable wifi it takes pusher 4 seconds to notice and change the state to disconnected and then to unavailable.
When I re-enable wifi I only get same error as you do on http://js.pusher.com/2.1.3/sockjs.js
I've got no idea about the implications of doing so.. but you could try to alter the default timeouts:
var pusher = new Pusher('MY_ACCOUNT_STRING', {
pong_timeout: 6000, //default = 30000
unavailable_timeout: 2000 //default = 10000
});
Question 2.
No idea, I don't think the lib should throw those error's
Question 3.
The error's are from the WebSocket protocol: https://www.rfc-editor.org/rfc/rfc6455
1006 is a reserved value and MUST NOT be set as a status code in a
Close control frame by an endpoint. It is designated for use in
applications expecting a status code to indicate that the
connection was closed abnormally, e.g., without sending or
receiving a Close control frame.
1007 indicates that an endpoint is terminating the connection
because it has received data within a message that was not
consistent with the type of the message (e.g., non-UTF-8 [RFC3629]
data within a text message).
I get the following error:
Connection timeout. No heartbeat received.
When accessing my meteor app (http://127.0.0.1:3000). The application has been moved over to a new pc with the same code base - and the server runs fine with no errors, and I can access the mongodb. What would cause the above error?
The problem seems to occur when the collection is larger. however I have it running on another computer which loads the collections instantaneously. The connection to to sock takes over a minute and grows in size, before finally failing:
Meteor's DDP implements Sockjs's Heartbeats used for long-polling. This is probably due to DDP Heartbeat's timeout default of 15s. If you access a large amount of data and it takes a lot of time, in your case, 1 minute, DDP will time out after being blocked long enough by the operation to prevent connections being closed by proxies (which can be worse), and then try to reconnect again. This can go on forever and you may never get the process completed.
You can try hypothetically disconnecting and reconnecting in short amount of time before DDP closes the connection, and divide the database access into shorter continuous processes which you can pick up on each iteration and see if the problem persists:
// while cursorCount <= data {
Meteor.onConnection(dbOp);
Meteor.setTimeout(this.disconnect, 1500); // Adjust timeout here
Meteor.reconnect();
cursorCount++;
}
func dbOp(cursorCount) {
// database operation here
// pick up the operation at cursorCount where last .disconnect() left off
}
However, when disconnected all live-updating will stop as well, but explicitly reconnecting might make up for smaller blocking.
See a discussion on this issue on Google groupand Meteor Hackpad
I'm trying to get an HTTP server I'm writing on to behave well when under heavy load, but I'm getting some weird behavior that I cannot quite understand.
My testing consists of using ab (the Apache benchmark program) over the loopback interface at a concurrency level of 1000 (ab -n 50000 -c 1000 http://localhost:8080/apa), while straceing the server process. Strace both slows processing down well enough for the problem to be readily reproducible and allows me to debug the server internals post completion to some extent. I also capture the network traffic with tcpdump while the test is running.
What happens is that ab stops running a while into the test, complaining that a connection returned ECONNRESET, which I find a bit weird. I could easily buy into a connection timing out since the server might simply not have the bandwidth to process them all, but shouldn't that reasonably return ETIMEDOUT or even ECONNREFUSED if not all connections can be accepted?
I used Wireshark to extract the packets constituting the first connection to return ECONNRESET, and its brief packet list looks like this:
(The entire tcpdump file of this connection is available here.)
As you can see from this dump, the connection is accepted (after a few SYN retransmissions), and then the request is retransmitted a few times, and then the server resets the connection. I'm wondering, what could cause this to happen? Normally, Linux' TCP implementation ACKs data before the reading process even chooses to receive it so long as their is space in the TCP window, so why doesn't it do that here? Are there some kind of shared buffers that are running out? Most importantly, why is the kernel responding with a RST packet all of a sudden instead of simply waiting and letting the client re-transmit further?
For the record, the strace of the process indicates that it never even accepts a connection from the port in this connection (port 56946), so this seems to be something Linux does on its own. It is also worth noting that the server works perfectly well as long as ab's concurrency level is low enough (it works perfectly well up to about 100, and then starts failing intermittently somewhere between 100-500), and that its request throughput is rather constant regardless of the concurrency level (it processes somewhere between 6000-7000 requests per second as long as it isn't being straced). I have not found any particular correlation between the frequency of the problem occurring and my backlog setting to listen() (I'm currently using 128, but I've tried up to 1024 without it seeming to make a difference).
In case it matters, I'm running Linux 3.2.0 on this AMD64 box.
The backlog queue filled up: hence the SYN retransmissions.
Then a slot became available: hence the SYN/ACK.
Then the GET was sent, followed by four retransmissions, which I can't account for.
Then the server gave up and reset the connection.
I suspect you have a concurrency or throughput problem in your server which is preventing you from accepting connections rapidly enough. You should have a thread that is dedicated to doing nothing else but calling accept() and either starting another thread to handle the accepted socket or else queueing a job to handle it to a thread pool. I would then speculate that Linux resets connections on connections which are in the backlog queue and which are receiving I/O retries, but that's only a guess.
A while back I asked a question regarding keeping the control connection on an FTP session alive during a large transfer. Although I though I had success after implementing a solution for a question I'd already asked, it appears as though the ISP is the problem, i.e. they are causing my control connections to die during large transfers.
Interestingly, the old-school FTP client program "Leap-FTP" gets around this issue by just sending 'NOOP' commands to the server on the control connection during a download. While other popular clients die during transfers (Filezilla, my Python FTP script), LeapFTP runs strong due to this workaround.
I've done some research into threading and Queue, but am having trouble coming up with the code to make this happen.
The solution seems simple enough (in my head, at least): initiate a download, while that download function runs, send a NOOP command every n seconds. Stop sending the NOOP command after the download function completes.
I'm hoping that someone can give me a suggestion as to how this might be done. Will it involve the use of threading, Queue, or is there a more simple solution?
Bottom line is, after a lot of testing, the 'NOOP' command is going to have to be sent during the large downloads (which take place on high-numbered TCP ports).
Thanks!
In order to handle multiple sockets at one time in a single program, you can use the select function instead of threads. This is either simpler or more complicated, depending on your programming experience.
I find threads are usually simple but when something does go wrong debugging it is a real pain, while writing the code for socket multiplexing using select is more complex but less difficult to debug than threads.
The basics of using select is that you set up your sockets and call the select function. It will tell you which sockets are ready to read or write. Then you check the time. If it's been X seconds since your last NOOP, send one on the control socket. If the transfer socket is ready to read or write, handle it. If the control socket is ready to read, read it and check for NOOP response, error messages, control channel being closed, etc.
Since you don't care (much, anyway) about performance in this case, it's probably easiest to use a separate thread for it that sits in a loop simply sleeps for N seconds, checks to see if it's been cancelled, and if not sends a NOP and sleeps again.
If you are running on a Unix, it would be just as efficient to have the control connection program open the sockets for a transfer and then spawn a new process to do the transfer. That would leave the control program ready to wait for completion, send NOOP commands, or even start new transfers if the FTP server can support it.
That is sort of how the original FTP model was supposed to work and the reason it uses a control connection and separate data connections instead of the HTTP model with control and data mixed together.