I wonder how can I "abort" a message after it has not been sent for sometime.
The scenario is simple:
1) Client connects to server
2) The server goes down
3) client send a message, there's no issue here as Zmq queues the message locally (so the "send" operation is successful)
4) Assume I've set RCVTIMEO I get the timeout
5) After I got the timeout I no longer wish to send the message, but once the server goes up again Zmq will transmit the message. How can I prevent it?
The reason I want to prevent this is because once I got the timeout I responded back to my customer with failure message (e.g "the request could not be processed due to timeout"), and it would be a real issue if eventually his request would get transmitted and processed...
Hope my question is clear... Thx!
Step 1 ) set aClientSOCKET instance ZMQ_LINGER parameter not to spend any time to re-send any en-queued messages during the forthcoming socket dismantling operations ( which is fairly in-line with the modern high-performance / low-latency distributed messaging systems design -- things go wrong, quite often go wrong, so work with this as a matter of fact rather than lose time fighting against un-avoidable ... )
and
Step 2 ) force the .close() to discard the socket
and
Step 3 ) if needed, re-instate another socket / communication means or use another a-priori-prepared means ( alike the binary-star fault-resilient shading ) to resolve the intended application processing after the situation the { "primary" | "observed" }-connection-peer handshaking time-out-ed.
Related
When a client/server application needs to request data from the client, I send out the request message over the established socket and wait for the response - with a 60 second timeout to 'guarantee' that the server app waits long enough - but not 'forever' for a response. Occasionally these timeouts get hit, and the server app will fail. These failures tend to come in bursts.
Is there any way to know when these occur whether they're simply caused by heavy network traffic - and will eventually succeed - or whether they're caused by a harder kind of outage, and will never get a response within a reasonable time? I.e., is 60 seconds long enough to wait for such a data request over an existing socket - and if not, what would a better timeout value be? Would the TCP/IP stack (Amazon linux, in this case) end up retrying the transmission shortly after I've given up on it...?
Would the TCP/IP stack (Amazon linux, in this case) end up retrying the transmission shortly after I've given up on it...?
Giving up by closing the socket will also make the underlying TCP stack stop the retransmission of unacknowledged data. This does not mean that the data are not processed by the peer though since one cannot determine if the peer failed to receive the message or one failed to receive the response.
Is there any way to know when these occur whether they're simply caused by heavy network traffic - and will eventually succeed - or whether they're caused by a harder kind of outage, and will never get a response within a reasonable time?
No. It is up to the application protocol to handle this in a robust way, like detecting retransmission of the same message inside a newly established connection.
I.e., is 60 seconds long enough to wait for such a data request over an existing socket - and if not, what would a better timeout value be?
To detect network connectivity problems it is better to rely on TCP keep-alive instead of waiting a specific time for a response to arrive. If the response might come late since the peer application is not responding fast enough the acceptable timeout depends on the specific use case.
I'm having issues with Node.js and the "WS" implementation of websocket (https://www.npmjs.com/package/ws). After a surge (plenty of messages in a short window of time), I'm having data that suggests that I've "missed" a message.
I've contacted the owner of the emitter server and he assures me that all messages have been sent on his side.
I've logger every message received on my side (at the start of the function on('message', () => {}), and I can't seem to find the message, so my assumption is that it doesn't even reached this point
So I'm wondering:
Messages are reveived and treated in a FIFO order. During the treatment of the current message, new ones will be stacked in the node event loop to be computed immediatly after. Correct ? Is there a way for that event loop to be "too big" that may drop new incomming messages ? If so, does it drop it quietly ? or does the program crashes vigorously (in other words, how can I see if a message has been dropped this way ?)
Does the 'ws' module have any kind of kown limitations for a maximum number of message received ? Does it have an internal way of dropping messages ?
Is there a better alternative than the 'ws' module ?
Is there any other ways to explain a "missed" message ?
Thanks a lot for your insights,
I use ws in nodejs to handle large message flows from many clients simultaneously in production, and I have never had it lose messages. Each server handles several thousand messages each second from hundreds of different client connections. The way my system works, if ws dropped messages or changed their order, my users would complain loudly.
That makes me guess you are not hitting any limitation of ws.
Early in my programming work I had the not-so-bright idea of putting incoming messages in queue objects in my nodejs code and processing them "later." That led to a hideously confusing message flow through my server. It sometimes looked like I had lost ws messages. I was happy to delete all that code, and dispatch every message completely within its message event handler.
Websocket connections sometimes close abnormally. Because network. You can catch those situations with error and close event handlers. It can take a while for the sender of a message, or the receiver, to detect that a network fault of some kind disrupted its connection. That can lead to disagreement about message count between sender and receiver. It's worth investigating.
I adorn ws's connection objects with message counts ("adorn" -- add an application-specific property to an object) and put those message counts into the log when a connection closes.
Server 1 is sending an xml message via IIS to Server 2.
Server 2 receives it, and send back an acknowledgment message to Server 1.
Upon receipt of that message, Server 1 sends the next message in the queue.
However, Server 1 intermittently (4/5 times a week) does not receive the acknowledgment message (we tested the issue and proved that Server 1 is sending the acknowledgment message).
The IIS logs for the time is is occurring tells us there's an error 1236 (sc-win32-status 1236 - which means "The network connection was aborted by the local system").
We're at a loss as to what is causing this or how to fix it. Interested to see if anyone has come across an issue like this before...
How did you prove that Server 2 is sending the acknowledgement message -- through network tracing on Server 1, or some other means? Logs within the software may not be enough. Barring anything bad going on at the networking level, it is possible that one of the sides is having an exception, and aborting the connection as a result. The application pools may be auto-recycling due to IIS recycle rules, and although IIS should properly handle it a pool re-start, maybe something did not occur as expected. When one pool starts, and the other one is processing the final requests on shutdown, maybe there is some locking going on, not expecting two processes running at the same time.
I'm surely missing something about how the whole MQTT protocol works, as I can't grasp the usage pattern of Last Will Testament messages: what's their purpose?
One example I often see is about informing that a device has gone offline. It doesn't make very much sense to me, since it's obvious that if a device isn't publishing any data it may be offline or there could be some network problems.
So, what are some practical usages of the LWT? What was it invented for?
LWT messages are not really concerned about detecting whether a client has gone offline or not (that task is handled by keepAlive messages).
LWT messages are about what happens after the client has gone offline.
The analogy is that of a real last will:
If a person dies, she can formulate a testament, in which she declares what actions should be taken after she has passed away. An executor will heed those wishes and execute them on her behalf.
The analogy in the MQTT world is that a client can formulate a testament, in which it declares what message should be sent on it's behalf by the broker, after it has gone offline.
A fictitious example:
I have a sensor, which sends crucial data, but very infrequently.
It has formulated a last will statement in the form of [topic: '/node/gone-offline', message: ':id'], with :id being a unique id for the sensor. I also have a emergency-subscriber for the topic 'node/gone-offline', which will send a SMS to my phone every time a message is published on that channel.
During normal operation, the sensor will keep the connection to the MQTT-broker open by sending periodic keepAlive messages interspersed with the actual sensor readings. If the sensor goes offline, the connection to the broker will time out, due to the lack of keepAlives.
This is where LWT comes in: If no LWT is specified, the broker doesn't care and just closes the connection. In our case however, the broker will execute the sensor's last will and publish the LWT-message '/node/gone-offline: :id'. The message will then be consumed to my emergency-subscriber and I will be notified of the sensor's ID via SMS so that I can check up on what's going on.
In short:
Instead of just closing the connection after a client has gone offline, LWT messages can be leveraged to define a message to be published by the broker on behalf of the client, since the client is offline and cannot publish anymore.
Just because a device is not publishing does not mean it is not online or there is a network problem.
Take for example a sensor that monitors a value that only changes very infrequently, good design says that the sensor should only publish the changes to help reduce bandwidth usage as periodically publishing the same value is wasteful. If the value is published as a retained value then any new subscriber will always get the current value without having to wait for the sensor value to change and it publish again.
In this case the LWT is used to published when the sensor fails (or there is a network problem) so we know of the problem as soon at the client keep alive times out.
A in-depth article about Last-Will-and-Testament messages is available in the MQTT Essentials Blog Post series: http://www.hivemq.com/mqtt-essentials-part-9-last-will-and-testament/.
To summarize the blog post:
The Last Will and Testament feature is used in MQTT to notify other clients about an ungracefully disconnected client.
MQTT is often used in scenarios were unreliable networks are very common. Therefore it is assumed that some clients will disconnect ungracefully from time to time, because they lost the connection, the battery is empty or any other imaginable case. It would be good to know if a connected client has disconnected gracefully (which means with a MQTT DISCONNECT message) or not, in order to take appropriate action.
I'm working on an application that is divided in a thin client and a server part, communicating over TCP. We frequently let the server make asynchronous calls (notifications) to the client to report state changes. This avoids that the server loses too much time waiting for an acknowledgement of the client. More importantly, it avoids deadlocks.
Such deadlocks can happen as follows. Suppose the server would send the state-changed-notification synchronously (please note that this is a somewhat constructed example). When the client handles the notification, the client needs to synchronously ask the server for information. However, the server cannot respond, because he is waiting for an answer to his question.
Now, this deadlock is avoided by sending the notification asynchronously, but this introduces another problem. When asynchronous calls are made more rapidly than they can be processed, the call queue keeps growing. If this situation is maintained long enough, the call queue will get totally full (flooded with messages). My question is: what can be done when that happens?
My problem can be summarized as follows. Do I really have to choose between sending notifications without blocking at the risk of flooding the message queue, or blocking when sending notifications at the risk of introducing a deadlock? Is there some trick to avoid flooding the message queue?
Note: To repeat, the server does not stall when sending notifications. They are sent asynchronously.
Note: In my example I used two communicating processes, but the same problem exists with two communicating threads.
If the server is sending informational messages to the client, which you yourself say are asynchronous, it should not have to wait for a reply from the client. If they are not informational, in other words they require an answer, I would say a server should never send such messages to a client, and their presence indicates a poor design.
If you have a constant congestion problem, there is little you can do other than gracefully fail and notify the client that no new messages can be posted; then it is up to the client to maintain a backlog of messages to be posted.
Introducing a priority queue and using message expiration/filtering could allow you to free up space in the queue, but that really just postpones the problem. If possible, you could also aggregate messages or ignore duplicate messages, but again the problem does not seem to be the queue itself. (Not to mention that the more complex queue logic could eat up valuable resources that would be better used actually processing messages.)
Depending on what the server side does, you could introduce result hashing for long computations, offload some types of messages to a dedicated device, check if the server waits unreasonably long for I/O operations, and a myriad of other techniques. Profile if possible, at least try to find out which message(s) causes congestion.
Oh, and the business solution: Compare cost of estimated development time to the cost of better hardware and conclude that you should just buy a more powerful server (or an additional one).
Depending on how important these messages are you might want to look into Message Expiration, or perhaps a Message Filter, though it sounds like your architecture may be incorrect.
I would rather fix the logic in the server side. The message queue should not stall waiting for the answer. Rather have a state machine which can also receive those info queries while it is waiting for the answer from the client.
Of course you can still flood your message queue, but with TCP you can handle it pretty easily.
The best way, I believe, would be to add another state to your client. This I borrowed from the SMPP protocol specs.
Add a congestion state to the client, whereby it always checks the queue length, assuming this is possible, and therefore once a certain threshold is attained, say 1000 unprocessed messages, the client sends the server a message indicating that it's congested and the server will be required to cease all messaging until it receives a notification indicating that the client is no longer congested.
Alternatively, on the server side, if there is a certain number of pending replies, the server could simply cease sending messages until the client replies a certain number of them.
These thresholds can be dynamically calculated or fixed, depending.....