at-most-once and exactly-once - rpc

I am studying Distributed Systems and when it comes to the RPC part, I have heard about these two semantics (at-most-once and exactly-once). I understand that the at-most-once is used on databases for instances, when we don't want duplicate execution.
First question:
How is this achieved? How does the server know that it shouldnt execute the request again? It might be a duplicate but it might be a legitimate request as well.
The second question is:
What is the difference between the two semantics in the title? I can read :). I know that at-most-once might not be executed at all but, what does exactly-once do that guarantees the execution?

Here is a pretty good explanation of the different types of messaging semantics for your second question:
At-most-once semantics: The easiest type of semantics to achieve, from an engineering complexity perspective, since it can be done in a fire-and-forget way. There's rarely any need for the components of the system to be stateful. While it's the easiest to achieve, at-most-once is also the least desirable type of messaging semantics. It provides no absolute message delivery guarantees since each message is delivered once (best case scenario) or not at all.
At-least-once semantics: This is an improvement on at-most-once semantics. There might be multiple attempts at delivering a message, so at least one attempt is successful. In other words, there's a chance messages may be duplicated, but they can't be lost. While not ideal as a system-wide characteristic, at-least-once semantics are good enough for use cases where duplication of data is of little concern or scenarios where deduplication is possible on the consumer side.
Exactly-once semantics: The ultimate message delivery guarantee and the optimal choice in terms of data integrity. As its name suggests, exactly-once semantics means that each message is delivered precisely once. The message can neither be lost nor delivered twice (or more times). Exactly-once is by far the most dependable message delivery guarantee. It’s also the hardest to achieve.
That's all part of this blog post about Exactly-once message processing (Disclosure: I work for Ably)
Hope this helps 😄

In cases of at most once semantics, request is sent again in case of failure, but request is filtered on the server for duplicates.
In exactly once semantics, request is sent again, request is filtered for duplicate and there is a guarantee for the server to restart after failure and start processing requests from where it crashed.
But exactly once is not realizable because what happens when client sends request, and before it reaches the server, server crashes. There is no way of tracking the request.
http://de.wikipedia.org/wiki/Remote_Procedure_Call#Fehlersemantik

To correct Hesper's answer-
Earlier, exactly once RPC was not realisable but a research paper in 2015 [1] proved that it is possible to do so. Basically RIFL paradigm guarantees safety of exactly one execution of an RPC that is executed is stored durably
[1]: Lee, Collin, et al. "Implementing linearizability at large scale and low latency." Proceedings of the 25th Symposium on Operating Systems Principles. ACM, 2015

Bump, I'm studying this too and found this, hope it helps (helped me),
At-least-once versus at-most-once?
let's take an example: acquiring a lock
if client and server stay up, client receives lock
if client fails, it may have the lock or not (server needs a plan!)
if server fails, client may have lock or not
at-least-once: client keeps trying
at-most-once: client will receive an exception
what does a client do in the case of an exception?
need to implement some application-specific protocol
ask server, do i have the lock?
server needs to have a plan for remembering state across reboots
e.g., store locks on disk.
at-least-once (if we never give up)
clients keep trying. server may run procedure several times
server must use application state to handle duplicates
if requests are not idempotent
but difficult to make all request idempotent
e.g., server good store on disk who has lock and req id
check table for each requst
even if server fails and reboots, we get correct semantics
What is right?
depends where RPC is used.
simple applications:
at-most-once is cool (more like procedure calls)
more sophisticated applications:
need an application-level plan in both cases
not clear at-once gives you a leg up
=> Handling machine failures makes RPC different than procedure calls
quoted from distributed systems and paradigms 2nd edition

For the first question I believe that each request should have a unique id attached to it. Therefore even if the client sends two requests that have the exact same command the server is able to filter and distinguish via the unique id of the request.
For the second question I think this article helps define the semantics for an rpc call. http://www.cs.unc.edu/~dewan/242/f97/notes/ipc/node27.html

Related

None value for paho_mqtt::create_options::CreateOptionsBuilder persistance

The documentation for CreateOptionsBuilder method.persistence indicates that setting this value as None will improve the performance, but ending up with a less reliable system.
Could someone elaborate on this? Please. Under which circumstances should I consider setting this to None?
The Eclipse Paho MQTT Rust Client Library is a "safe wrapper around the Paho C Library". The persistence options are mapped to values accepted by the C library with None becoming MQTTCLIENT_PERSISTENCE_NONE. The docs for the C client provide a more detailed explanation of the options:
persistence_type The type of persistence to be used by the client:
MQTTCLIENT_PERSISTENCE_NONE: Use in-memory persistence. If the device or system on which the client is running fails or is switched off, the current state of any in-flight messages is lost and some messages may not be delivered even at QoS1 and QoS2.
MQTTCLIENT_PERSISTENCE_DEFAULT: Use the default (file system-based) persistence mechanism. Status about in-flight messages is held in persistent storage and provides some protection against message loss in the case of unexpected failure.
MQTTCLIENT_PERSISTENCE_USER: Use an application-specific persistence implementation. Using this type of persistence gives control of the persistence mechanism to the application. The application has to implement the MQTTClient_persistence interface.
The upshot is that calling persistence(None) means that messages will be held in memory rather than being written to disk (assuming QOS1/2). This has the potential to improve performance (writing to disk can be expensive) but, because the info is only stored in memory, messages may be lost if your application shuts down without completing delivery.
A quick example might help (simplifying things a little); lets say you publish a message with QOS=1 and a network issue means that the broker does not receive it. When the connection is re-established (failed delivery will generally mean the connection will drop) the client will resend the message (because it has not processed an acknowledgment from the broker). With the default persistence (disk) the message will be retransmitted even if the failure was due to a power outage that affected the server your app was running on (obviously this only happens when power is restored and your app restarts); that message would be lost if you had called persistence(None).
The appropriate setting is going to depend upon your needs and other options may have an impact (e.g. if Clean Start/CleanSession is true then there unlikely to be any benefit to persisting to disk).
When you don't care if all messages are received. E.g. when using only QOS 0 messages

ZeroMQ: Recommended pattern for inproc clients from multiple threads pushing ordered messages to a server?

My requirements
Clients from different threads in the same process
Server in a separate thread in the same process
Clients produces messages to Server
Server consumes messages by printing them out in the send-order by world clock on the source side, transparent to threading and any scheduling.
Answers to questions like
zmq: can multiple threads PUSH in a simple PUSH-PULL pattern
Pulling requests from multiple clients with ZMQ
give different opinions. So should I simply ask clients to PUSH to a single inproc PULL server created in another thread or use a router-dealer pattern?
And in one of the comments of the second question, I get STREAMER pattern that seems to exist in pyzmq, but I'm not sure if it's the right solution or is it available with C API at all?
Q : Recommended pattern for inproc clients from multiple threads pushing ordered messages to a server?
Any answer to such formulated question is dependent on a missing piece of information: what is the set o preferences, that lead to distinguish between insufficient, sufficient, better and best solution to the above described operations.
Do you need a confimatory feedback from server to client as there is Zero-Warranty for a message delivery?
Do you need to handle a static or a dynamic set of clients?
Do you prefer performance to RAM-footprint?
Without any of these "criteria" expressed a serious man would never "recommend", as any such statement would be a just opinion.
PUSH/PULL may suffice for unconfirmed delivery ( an optimistic blindness use-case, if an out-of-sight out-of-mind design philosophy is acceptable in production )
PAIR/PAIR may suffice for a fast .poll( ZeroWait, ZMQ_POLLIN ) server-side scanner, with server-side POSACK-responses being possible to dispatch to respective client-threads, whose messages were delivered and accepted for server-side processing ( user-defined message-confirmation handshaking protocol, handling POSACKs / NACK-timeouts / out-of-order escallations etc, goes beyond the scope of this post )
PUB/SUB or XPUB/XSUB may suffice for some more tricky management of topic-based signalling, bidirectional in the X-versions, if that justifies the add-on costs of topic-filtering overheads ( depending on ZeroMQ version, whether distributed over all client-threads, or centralised on the server-thread side )
The decision is yours.

Handling failures in Thrift in general

I read through the official documentation and the official whitepaper, but I couldn't find a satisfying answer to how Thrift handles failures in the following scenario:
Say you have a client sending a method call to a server to insert an entry in some data structure residing in that server (it doesn't really matter what it is). Suppose the server has processed the call and inserted the entry but the client couldn't receive a response due to a network failure. In such a case, how should the client handle this? A simple retry of sending the call would possibly result in a duplicate entry being inserted. Does the Thrift library persist the response somewhere so that it can resend to the client when it is back online? Or is it the application's responsibility to do so?
Would appreciate it if someone could point out the details of how it works, besides directing to its source code.
The question is an interesting one, but it is by no means limited to Thrift. A better name would be
Handling failures in asynchronous or remote calls in general
because that's in essence, what it is. Altough in the specific case of an RPC-style API like, for example, a Thrift service, the client blocks and it seems to be an synchronous call, it really isn't that way.
The whole problem can be rephrased to the more general question about
Designing robust distributed systems
So what is the main problem, that we have to deal with? We have to assume that every call we do may fail. In particular, it can fail in three ways:
request died
request sent, server processing successful, response died
request sent, server processing failed, response died
In some cases, this is not a big deal, regardless of the exact case we have. If the client just wants to retrieve some values, he can simply re-query and will get some results eventually if he tries often enough.
In other cases, especially when the client modifies data on the server, it may become more problematic. The general recommendation in such cases is to make the service calls idempotent, meaning: regardless, how often I do the same call, the end result is always the same. This could be achieved by various means and more or less depends on the use case.
For example, one method is it to send some logical "ticket" values along with each request to filter out doubled or outdated requests on the server. The server keeps track and/or checks these tickets, before the processing starts eventually. But again, if that method suits your needs depends on your use case.
The Command and Query Responsibility Segregation (CQRS) pattern is another approach to deal with the complexity. It basically breaks the API into setters and getters. I'd recommend to look into that topic, but it is not useful for every scenario. I'd also recommend to look at the Data Consistency Primer article. Last not least the CAP theorem is always a good read.
Good Service/API design is not simple, and the fact, that we have to deal with a distributed parallel system does not make it easier, quite the opposite.
Let me try to give a straight answer.
... is it the application's responsibility to do so?
Yes.
There're 4 types of Exceptions involved in Thrift RPC, including TTransportException, TProtocolException, TApplicationException, and User-defined exceptions.
Based on the book Programmer's Guide to Apache Thrift, the former 2 are local exceptions, while the latter 2 are not.
As the names imply, TTransportException includes exceptions like NOT_OPEN, TIMED_OUT, and TProtocolException includes INVALID_DATA, BAD_VERSION, etc. These exceptions are not propagated from the server the the client and act much like normal language exceptions.
TApplicationExceptions involve problems such as calling a method that isn’t implemented or failing to provide the necessary arguments to a method.
User-defined Exceptions are defined in IDL files and raised by the user code.
For all of these exceptions, no retry operations are done by Thrift RPC framework itself. Instead, they should be handled properly by the application code.

Using SSL with Netty at the beginning of a connection, then disabling it

I'm writing a server application and its client counterpart that both use Netty for the network layer. I find myself facing typical safety concerns about sending a password from a client to the server so I decided SSL was the safest way of doing this.
I know of the securechat example and will use this to modify my pipelines accordingly. However, I would also like to disable SSL after password transmission and acknowledge to save a few precious CPU cycles on server side, which may be busy with many other clients. The ChannelPipeline documentation states that:
"Once attached, the coupling between the channel and the pipeline is permanent; the channel cannot attach another pipeline to it nor detach the current pipeline from it."
The idea is then to not change the pipeline on-the-fly, which is prohibited, but to somehow tell the SslHandler in the pipeline that it should stop encrypting messages at some point. I was thinking of creating a class inheriting from SslHandler, overriding its handleDownstream function to call context.sendDownstream(evt) after some point in the communication.
Question 1: Is this a bad idea, that is, disabling SSL at some point ?
To allow a block in the pipeline (say a Decoder) telling another block (say SslHandler) that it should change its behaviour from now on, I thought I could create, say, an AtomicBoolean in my ChannelPipelineFactory's getPipeline() and pass it to the constructor of both the Decoder and the SslHandler.
Question 2: Is this a bad idea, that is, sharing state between pipeline blocks ? I'm worried I might screw up the multithreading of Netty here: are the blocks of a pipeline working on a single message, one at a time ? i.e.: does the first block wait for the completion of the last block before pulling the next message ?
EDIT:
Oh my bad, this is from the ChannelPipeline page I had been visiting many times and quoting in this very question:
"A ChannelHandler can be added or removed at any time because a ChannelPipeline is thread safe. For example, you can insert a SslHandler when sensitive information is about to be exchanged, and remove it after the exchange."
So this answers question 2 about modifying the pipeline's content on-the-fly, and not the pipeline reference itself.
I'm not sure about the efficacy of turning off SSL once established, but I think you have misinterpreted the mutability of the pipeline. Once a given channel is associated with a pipeline, that association is immutable. However, the handlers in the pipeline can be safely modified. That is to say, you can add and remove handlers as your protocol requires. Accordingly,you should be able to remove the SSL handler once it has served its purpose.
You can remove SslHandler from the pipeline with ChannelPipeline.remove(..) then it should turn your connection to plaintext. Please file a bug if it does not work - we actually have not tried that scenario in production :-)
I'm not sure about Netty, but in principle, you could indeed carry on with plain traffic on the same TCP connection. There are a few downsides:
Only the authentication would be secured. A MITM could perform actions other than those intended by the user. (This is similar to using HTTP Digest to some extent: the credentials are protected, but the request/response entities aren't.)
From an implementation point of view, this is tricky to get right. The TLS specification says:
If the application protocol using TLS provides that any data may be
carried over the underlying transport after the TLS connection is
closed, the TLS implementation must receive the responding
close_notify alert before indicating to the application layer that
the TLS connection has ended.
This implies that you're going to synchronise your stream somehow to wait for the close_notify response, before carrying on with your plain traffic.
The SSLEngine programming model is rather complex, and you may find that the Netty API isn't necessary handling this situation.
While it may make sense to want to save a few CPU cycles, most of the SSL/TLS overhead is in the handshake, which you'll be doing anyway. The symmetric cryptographic operations used for the actual encryption of the data are much less expensive. (You should try to measure this overhead to see if it really is a problem.)

How to avoid flooding a message queue?

I'm working on an application that is divided in a thin client and a server part, communicating over TCP. We frequently let the server make asynchronous calls (notifications) to the client to report state changes. This avoids that the server loses too much time waiting for an acknowledgement of the client. More importantly, it avoids deadlocks.
Such deadlocks can happen as follows. Suppose the server would send the state-changed-notification synchronously (please note that this is a somewhat constructed example). When the client handles the notification, the client needs to synchronously ask the server for information. However, the server cannot respond, because he is waiting for an answer to his question.
Now, this deadlock is avoided by sending the notification asynchronously, but this introduces another problem. When asynchronous calls are made more rapidly than they can be processed, the call queue keeps growing. If this situation is maintained long enough, the call queue will get totally full (flooded with messages). My question is: what can be done when that happens?
My problem can be summarized as follows. Do I really have to choose between sending notifications without blocking at the risk of flooding the message queue, or blocking when sending notifications at the risk of introducing a deadlock? Is there some trick to avoid flooding the message queue?
Note: To repeat, the server does not stall when sending notifications. They are sent asynchronously.
Note: In my example I used two communicating processes, but the same problem exists with two communicating threads.
If the server is sending informational messages to the client, which you yourself say are asynchronous, it should not have to wait for a reply from the client. If they are not informational, in other words they require an answer, I would say a server should never send such messages to a client, and their presence indicates a poor design.
If you have a constant congestion problem, there is little you can do other than gracefully fail and notify the client that no new messages can be posted; then it is up to the client to maintain a backlog of messages to be posted.
Introducing a priority queue and using message expiration/filtering could allow you to free up space in the queue, but that really just postpones the problem. If possible, you could also aggregate messages or ignore duplicate messages, but again the problem does not seem to be the queue itself. (Not to mention that the more complex queue logic could eat up valuable resources that would be better used actually processing messages.)
Depending on what the server side does, you could introduce result hashing for long computations, offload some types of messages to a dedicated device, check if the server waits unreasonably long for I/O operations, and a myriad of other techniques. Profile if possible, at least try to find out which message(s) causes congestion.
Oh, and the business solution: Compare cost of estimated development time to the cost of better hardware and conclude that you should just buy a more powerful server (or an additional one).
Depending on how important these messages are you might want to look into Message Expiration, or perhaps a Message Filter, though it sounds like your architecture may be incorrect.
I would rather fix the logic in the server side. The message queue should not stall waiting for the answer. Rather have a state machine which can also receive those info queries while it is waiting for the answer from the client.
Of course you can still flood your message queue, but with TCP you can handle it pretty easily.
The best way, I believe, would be to add another state to your client. This I borrowed from the SMPP protocol specs.
Add a congestion state to the client, whereby it always checks the queue length, assuming this is possible, and therefore once a certain threshold is attained, say 1000 unprocessed messages, the client sends the server a message indicating that it's congested and the server will be required to cease all messaging until it receives a notification indicating that the client is no longer congested.
Alternatively, on the server side, if there is a certain number of pending replies, the server could simply cease sending messages until the client replies a certain number of them.
These thresholds can be dynamically calculated or fixed, depending.....

Resources