I have a simple script reading messages from a websocket server and I don't fully understand the keep-alive system.
I'm getting two errors with practically the same meaning sent 1011 (unexpected error) keepalive ping timeout; no close frame received and no close frame received or sent.
I am using websockets module. link to the docs
I'd like to know when is my job to send a ping or when to send a pong or if I should be changing the timeout to a longer period since I'll be running multiple connections at the same time (to the same server but a different channel).
I have tried running another asyncio task which was pinging the server every 10 to 20 seconds AND
replying only after I receive a packet (which in my case can be 1 second apart or I can get a new one the next day. with a normal websocket.ping() and with a custom payload (heartbeat json string {"event": "bts:heartbeat"}
One solution I can see is to just reopen the connection after I get the error but it feels wrong.
async with websockets.connect(self.ws,) as websocket:
packet = {
"event": "bts:subscribe",
"data": ...,
}
await websocket.send(json.dumps(packet))
await websocket.recv() # reply
try:
async for message in websocket:
tr = json.loads(message)
await self.send(tr)
packet = {"event": "bts:heartbeat"}
await websocket.pong(data=json.dumps(packet))
except Exception as e: # websockets.ConnectionClosedError:
await self.send_status(f"Suscription Error: {e}", 0)
Keep-alive packets are send automatically by the library (see https://websockets.readthedocs.io/en/latest/topics/timeouts.html#keepalive-in-websockets), so there should be no need to do that yourself.
In your case it seems like that the server is not responding to the "ping" by your client in timely manner. This FAQ entry and its recommendation to catch ConnectionClosed looks relevant.
Related
I get the exception "Cannot write to closing transport" raised from aiohttp.http_writer.StreamWriter#_write, but only in a fraction of cases.
The relevant snippet.
session: aiohttp.ClientSession
async with session.get(url, timeout=60) as response:
txt = await response.text()
response.close()
return txt
What is going on? I don't think the server-size is closing the socket.
Answer: We should create a new session on each request. There is also no need to close the response() explicitly, as the context manager handles that.
async with aiohttp.ClientSession().get(url, timeout=60) as response:
txt = await response.text()
return txt
It means that your connection already closed. It occurs when client break connection, but server still tried respond him.
Remove response.close() from your code.
This happens if a client prematurely disconnects before reading some or all of the response. You may encounter that case frequently if you're either dealing with mobile clients (which may switch between WiFi and mobile networks) or if you have views that take some time, but clients have a lower timeout. Since you can't control how clients talk to your service, it's probably safe to ignore this.
aiohttp 3.6.0 introduces code to slience this exception
We have an xmpp connection server that connects sockets to GCM XMPP endpoints and starts sending notifications.
One thing We've noticed is upon sending a semi-large notification (say to as little as a 1000 devices), the sockets keep getting suddenly disconnected receiving the following error message:
Client disconnected socket=b913-512-904dc69, code=EPIPE, errno=EPIPE, syscall=write
For example, this is the log of the live server when starting to send a notification to different registration IDS.
info: Sent downstream message msgId=P#c1uq... socketId=512
info: Sent downstream message msgId=P#c3tE... socketId=512
info: Sent downstream message msgId=P#c1TF... socketId=512
info: Sent downstream message msgId=P#c3sy... socketId=512
info: Sent downstream message msgId=P#c41N... socketId=512
...
info: Sent downstream message msgId=P#cJbr... socketId=512
info: Sent downstream message msgId=P#cJXO... socketId=512
info: Client disconnected socket=b913-512-904dc69, code=EPIPE, errno=EPIPE,
syscall=write
This keeps happening all the time and everywhere in our system and is making service QA pretty difficult.
Another thing that we've noticed is that sometimes when calling socket.send(stanza), the value false is returned, even when the socket is definitely connected. This one is even worse since we have to do re-queueing of the messages and it's really resource heavy when sending millions of messages. This will be explained below.
Additional Information:
From the 1st message to the 84th (where disconnection happens), less
than a 100 milliseconds have passed.
We have about 52 sockets open for this JID/PASSWORD
(senderId,Api_key in GCM's terms), on 3 different servers. All keep
disconnecting now and then when a large notification send task comes
along (say to 10000 recipients).
Sockets successfully re-connect, but they're disconnected for several seconds and this reduces efficiency and reliability of our system.
How the connection is setup:
const xmpp = require('node-xmpp-client');
let socket = new xmpp.Client({
port: 5235,
host: 'gcm-xmpp.googleapis.com',
legacySSL: true,
preferredSaslMechanism: 'PLAIN',
reconnect: true,
jid: $JID,
password: $PASSWORD
});
socket.connection.socket.setTimeout(0);
socket.connection.socket.setKeepAlive(true, 10000);
socket.on('stanza', (stanza) => handleStanza(stanza));
...
Acks are sent for every upstream message received.
But one thing we see is that the following returns false sometimes when sending downstream messages, "even when the socket is connected".
// This returns false many times! even when the socket.connection.connected === true!
socket.send(xmppStanza)
If this happens, we queue the ack message to be retried later but keep sending messages to the gcm.
Why does socket.send return false sometimes? (This obviously is not an error like EPIPE or whatever, it's just a false, meaning flushing the socket was unsuccessful, maybe the socket becomes un-writeable even-though it's connected ?).
If acks are delayed, will GCM close the connection with the delayed acks or will it just stop sending upstreams?
(AFAIK, it'll just stop sending upstreams, so maybe this has nothing to do with the connections being closed (EPIPEs)?)
I'd be really grateful if anyone could shed some light on this behavior.
Thanks !
I'm testing communication between two NodeJS instances over TCP, using the net module.
Since the TCP doesn't rely on messages (socket.write()), I'm wrapping each message in a string like msg "{ json: 'encoded' }"; in order to handle them individually (otherwise, I'd receive packets with a random number of concatenated messages).
I'm running two NodeJS instances (server and client) on a CentOS 6.5 VirtualBox VM with bridged network and a Core i3-based host machine. The test lies on the client emitting a request to the server and waiting for the response:
Client connects to the server.
Client outputs current timestamp (Date.now()).
Client emits n requests.
Server replies to n requests.
Client increments a counter on every response.
When finished, client outputs the current timestamp.
The code is quite simple:
Server
var net = require('net');
var server = net.createServer(function(socket) {
socket.setNoDelay(true);
socket.on('data', function(packet) {
// Split packet in messages.
var messages = packet.toString('utf-8').match(/msg "[^"]+";/gm);
for (var i in messages) {
// Get message content (msg "{ content: 'json' }";). Actually useless for the test.
//var message = messages[i].match(/"(.*)"/)[1];
// Emit response:
socket.write('msg "PONG";');
}
});
});
server.listen(9999);
Client
var net = require('net');
var WSClient = new net.Socket();
WSClient.setNoDelay(true);
WSClient.connect(9999, 'localhost', function() {
var req = 0;
var res = 0;
console.log('Start:', Date.now());
WSClient.on('data', function(packet) {
var messages = packet.toString("utf-8").match(/msg "[^"]+";/gm);
for (var i in messages) {
// Get message content (msg "{ content: 'json' }";). Actually useless for the test.
//var message = messages[i].match(/"(.*)"/)[1];
res++;
if (res === 1000) console.log('End:', Date.now());
}
});
// Emit requests:
for (req = 0; req <= 1000; req++) WSClient.write('msg "PING";');
});
My results are:
With 1 request: 9 - 24 ms
With 1000 requests: 478 - 512 ms
With 10000 requests: 5021 - 5246 ms
My pings (ICMP) to localhost are between 0.6 - 0.1 seconds. I've not intense network traffic or CPU usage (running SSH, FTP, Apache, Memcached, and Redis).
Is this normal for NodeJS and TCP or it is just my CentOS VM or my low-performance host? Should I move to another platform like Java or a native C/C++ server?
I think that a 15 ms delay (average) per request on localhost is not acceptable for my project.
Wrapping the messages in some text and searching for a Regex match isn't enough.
The net.Server and net.Socket interfaces have a raw TCP stream as an underlying data source. The data event will fire whenever the underlying TCP stream has data available.
The problem is, you don't control the TCP stack. The timing of it firing data events has nothing to do with the logic of your code. So you have no guarantee that the data event that drives your listeners has exactly one, less than one, more than one, or any number and some remainder, of messages being sent. In fact, you can pretty much guarantee that the underlying TCP stack WILL break up your data into chunks. And the listener only fires when a chunk is available. Your current code has no shared state between data events.
You only mention latency, but I expect if you check, you will also find that the count of messages received (on both ends) is not what you expect. That's because any partial messages that make it across will be lost completely. If the TCP stream sends half a message at the end of chunk 1, and the remainder in chunk 2, the split message will be totally dropped.
The easy and robust way is to use a messaging protocol like ØMQ. You will need to use it on both endpoints. It takes care of framing the TCP stream into atomic messages.
If for some reason you will connecting to or receiving traffic from external sources, they will probably use something like a length header. Then what you want to do is create a Transform stream that buffers incoming traffic, and only emits data when the amount identified in the header has arrived.
Have you done any network dump? You may be creating network congestion due to the overhead introduced by enabling 'no delay' socket property. This property will send data down to TCP stack as soon as possible and if you have very small chunks of information it will lead to many TCP packets with small payloads, thus the decreasing transmission efficiency and eventually having TCP pausing the transmission due to congestion. If u want to use 'no delay' for your sockets, try increasing your receiving socket buffer so that data is pulled from the tcp stack faster. Let us know if that helped.
In our redis configuration we have set timeout: 7 seconds
In node_redis We handle redis connection ready and end event as
client.on("ready", function() {
logger.info("Connection Successfully Established to ", this.host, this.port);
}
client.on("end", function() {
logger.fatal("Connection Terminated to ", this.host, this.port);
}
Sample log
[2012-07-11 08:21:29.545] [FATAL] Production - Connection Terminated
on end to 'x.x.x.9' '6399'
[2012-07-11 08:21:29.803] [INFO] Production - Connection Successfully Established to 'x.x.x.9' '6399'
But in some cases (most probably redis is closing the connection without notifying the client) we see the command queue getting piled up and requests are taking too much time to get the response [till the time node-redis client able to sense the close event]. In all such cases command callback is returned with this error Redis connection gone from close event. even after so much waiting. It looks as if this is not an issue because of timeout since the usual end event wasn't triggered.
Issue seems to be similar to this - http://code.google.com/p/redis/issues/detail?id=368
Is this a known thing happening in redis?
Is there a way to specify that execution of a command [sending and receiving a reply back] should not exceed the threshold and reply with an error in that case, instead of making the client stall?
Or is there anyother way of triggering close event in such cases like socket_timeout?
Or should we check something from our redis side? We monitored our redis log at debug level and we found nothing useful related to this issue
When we run the node-redis on debug mode we are clearly able to see the client getting stalled with the requests getting piled up in the command queue. We logged the why and queue length inside flush_on_error function. We have kept offline_queuing disabled.
Sample Log
Redis connection is gone from close event.
offline queue 0
command queue 8
Response time of failed a request: 30388 ms [this varies as per the waiting in the command queue. First queued guy has the max response time and the ones following him lesser]
Usual Resonse time: 1 ms
PS: We have filed an issue in node_redis too
We had a bunch of connection trouble with Redis as well. It seems like it would close the connection without it telling the client. We noticed that it was possibly a timeout problem on the server. This is the solution that we use and we haven't had a problem since July.
var RETRY_EVERY = 1000 * 60 * 3;
var startTimer = function(){
console.log('Begin the hot tub!')
setInterval(function(){
try{
client.set('hot',new Date());
console.log(client.get('hot'))
}
catch(e){
console.log(e);
}
},RETRY_EVERY)
}();
Considering it's only one call every 3 minutes, it shouldn't be a problem for performance ;)
With regards to oconnecp's answer, can't you just do:
setInterval(client.ping(), 1000 * 60 * 30);
I have the standard code for sending out http request. Using http.globalAgent.
I set my maxSockets be 2500.
And then when I send out multiple requests at once, I get this error:
['{'code':'ECONNRESET'}']
However, if I sent out the request after a bit of timeout between each request, then it works.
So, questions are:
1) what does ECONNRESET really mean? Why this error happen?
2) How to send out multiple requests instantly without getting that error?
original code to send out multiple requests:
// I'm using Seq()
Seq().
seq(function() {
this(null, ['p1','p2','p3','p4','p5']);
})
.flatten(false)
.parEach(fuctnion(data) {
// send out request
sendRemoteRequest(data); // a function that uses http.request
})
.seq(function(data) {
console.log("done");
})
ECONNRESET basically means that the remote server has closed the connection. I assume it only allows a certain number of concurrent connections and if that limit is reached it just drops the connection, resulting in a ECONNRESET in your program.