While studying consensus protocols, I notice for the typical ones, such Paxos, MultiPaxos, Fast Paxos, EPaxos etc, they all guarantee linearizability.
But it seems there is no literature explicitly saying linearizability is the necessary property of consensus protocol. So, I am wondering whether there is any consensus protocol which do not provide linearizability.
Can we say linearizability is a necessary property for consensus protocol?
I don't think they "guarantee" linearizability; it's just they can be used to implements libearizability if needed - due to nature of linear log.
Let me out it this way: let's say in an interview you would be asked - please, implement an linearizable register. You have many options, and one of those options is to use a consensus based log; where you would make all writes and, more important, read via that log.
Linearizability is a property of a system and consensus protocol(s) is one of solutions. There are plenty of systems who do use consensus protocols, but don't offer linearizability as that system's feature.
Related
I'm learning about microservices.
On one hand, the literature recommends using asynchronous event-publishing for microservices that need to collaborate on sagas or take action on events published by other services.
On the other hand, the same literature recommends not using a shared library to define common events because that couples the microservices through that event library.
Am I taking crazy pills? Aren't those microservices coupled by those events anyway if they rely on them? If so, what is the advantage of coding the exact same events with the same definition in two (or even more) different places? Isn't that a total violation of the DRY principle?
I'm starting to smell a code smell that starts with the initials BS. Will someone help me drink the rest of this koolaid? Or did I just see the emperor with his clothes off for a second?
If so, what is the advantage of coding the exact same events with the same definition in two (or even more) different places?
There could be a number of advantages -- the microservices might be implemented using different languages. Or using the same language, but different in memory representations of the data to suit there specific needs. Or even the "same" in memory representations, but different versions, because they are on different deployment schedules.
There's nothing inherently wrong with sharing the labor of preparing a messaging library among the implementations of your services. But that should be an opt-in, rather than being a requirement. In particular, a team always has the option of replacing the library if the shared implementation is getting in the way.
Two services that agree that the messages are going to use UTF-8 encoded JSON documents should not be required to use the same parser -- the choice of parser is an implementation detail. The coupling is to the schema (the agreement about the semantics of the bytes in the message), not to the implementation.
If you treat events as plain data objects, you don't need a library to deal with them - other than generic messagning and serialization/deserialization code.
The whole point of microservices is to have independent development cycles, so as soon as you introduce the common library, you are starting to make a "distributed monolyth". Any change in this library will cause a redeployment of all microservices.
Without event-specific library the only dependency you introduce it a knowledge of particular event structure from another microservice. Well, this is a necessary evil.
If you create a new application which uses a distributed hash table (DHT), you need to bootstrap the p2p network. I had the idea that you could join an existing DHT (e.g. the Bittorrent DHT).
Is this feasable? Of course, we assume the same technology. Combining Chord with Kademlia is obviously not feasable.
If yes, would this be considered parasitic or symbiotic? Parasitic meaning that it conflicts with the original use somehow. Symbiotic, if it is good for both applications as they support each other.
In general: Kademlia and Chord are just abstract designs, while implementations provide varying functionality.
If its feature-set is too narrow you won't be able to map your application logic onto it. If it's overly broad for your needs it might be a pain to re-implement if no open source library is available.
For bittorrent: The bittorrent DHT provides 20byte key -> List[IP,Port] lookups as its primary feature, where the IP is determined by the sender IP and thus cannot be used to store arbitrary data. There are some secondary features like bloom filter statistics over those lists but they're probably even less useful for other applications.
It does not provide general key-value storage, at least not as part of the core specification. There is an extension proposal for that
Although implementations provide some basic forward-compatibility for unknown message types by treating them like node lookup requests instead of just ignoring them that is only of limited usefulness if your application supplies a small fraction of the nodes, since you're unlikely to encounter other nodes implementing that functionality during a lookup.
If yes, would this be considered parasitic or symbiotic?
That largely depends on whether you are a "good citizen" in the network.
Does your implementation follow the spec, including commonly used extensions?
Does your general use-case stay within an order of magnitude compared to other nodes when it comes to the traffic it causes?
Is the application lifecycle long enough to not lie outside the expected churn rates of the target DHT?
Is there a formal/traditional way to describe data/command exchange protocols? For example, for programming languages there are multiple approaches to describe the syntax and semantics (like: http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form).
The approach I am looking for is rather utilitarian (in contrast to academic). I need something for day-to-day use for data exchange description while working on specifications, just to transfer/spread clearly the idea to others. So, if there is something that is not recognized as a de-facto standard but is useful - it is ok too.
I took a look at UML sequence diagrams and "Formal Methods for Communication Protocol Specification And Verification, by Carl A. Sunshine, 1979". Former method is missing the description of "payloads" (at least from what I understood) while latter one is rather an educative paper describing considerations rather than methods (I am still going through this paper, though).
Thanks in advance
Protocols are about messages sent in accordance to a series of interactions.
The best way to specify protocols that I have seen is with Colored Petri Nets (CPNs).
CPNs are based on ("uncolored") Petri Nets (PNs), which define how parallel activities synchronize, e.g., the message responses, by using Places to represent possible states, Tokens-in-places to represent state, and transition (synchronization) gates to indicate where parallel states must coincide to make progress. Petri Nets can model Finite State Machines (an FSA is a PN that always has a "single token", e.g., the "current state") and so are a generalization; in fact, they can "exponentially compress" certain FSAs into very small descriptions and can thus be quite succinct even for complex interaction sequences. But a conventional PN does not address the data being exchanged.
CPNs generalize PNs to add data types. The tokens now have "colors" (funny way to say "data type") and transitions can not only synchronize but can combine tokens to produce other tokens, e.g., compute new values.
A protocol modelled as CPN thus has message content as data types, and PNs states to indicate the syncronization. If you've never used a CPN, it is really worth your trouble to learn what they are, because they are such a pretty generalization of FSAs.
Regarding OPs' "utilitarian" remark, there are very good tools available at CPN Tools, including graphical modelling and code generation.
In telecommunications, the standard for describing interaction between network elements is Z.100 : Specification and Description Language (SDL) and the companion Z.120 : Message Sequence Chart (MSC) recommendations. The suite includes a testing framework.
A more mathematically bent approach would be to use various state machine models of some type.
One of the early publications, Design and Validation of Computer Protocols (1991), was written by Gerard Holzmann to describe the SPIN model checker and the PROMELA language.
Almost any other notation like TLA+, Petri-nets, Alloy, CSP, Z, ... can also be used to reason about protocols and the choice often depends on familiarity and tools availability.
If rigour is not essential, then Harel state charts provide a notation familiar to many engineers.
Fundamentally, the problem with sequence charts on their own is that they describe a single trace through the protocol. They cannot easily show the non-determinism required to describe parallel operations, and struggle to succinctly represent choice. When extended with hierarchical message charts (HMC) then they fall back into the state machine space.
If by "utilitarian" you mean "useful", consider Petri Nets. Please see my reply below or consider a PDF version of the reply.
first page of reply http://www.aespen.ca/AEnswers/lMtbX1428143440-0_Page_1.jpg
second page of reply http://www.aespen.ca/AEnswers/lMtbX1428143440-0_Page_2.jpg
For what it's worth, since you mention BNF: I believe I have read that Wirth used EBNF to specify protocols, with prose explaining which parts of the string were to be emitted by the client and which parts by the server. I am unable to find the reference off-hand, but my recollection is that the example I read was clearer than most protocol descriptions I have read elsewhere.
The questions below are intended to be serious rather than frivolous. I lack experience in distributed systems, but I do understand how Basic Paxos works and why leader selection is useful. Unfortunately, my understanding is not deep enough to fathom the questions below.
In the paper Consensus on Transaction Commit, page 8 (page 11 of the linked PDF), we have the following statement.
Selecting a unique leader is equivalent to solving the consensus
problem.
If this statement is true, and the very purpose of Paxos to achieve consensus, why is Paxos itself not generally used for leader election?
Moreover, the same paper endorses the leader election algorithm described the Stable Leader Election paper.
If the two problems are equivalent, and the same paper endorses a different leader election algorithm, why isn't the other algorithm used for solving the general consensus problem instead of Paxos?
Paxos is used in leader election. In the paxos variants that have leaders (eg. Multi-paxos, Raft), the leader is the node that has its data chosen by the Paxos instance, either that or the leader is elected in its own transition (Some people use the term Paxos instance; I prefer to think of consensus algorithms as choosing the transitions in a distributed finite state machine.)
All correct consensus algorithms can be mapped to Basic Paxos, but each are optimized for different things. These include Multi Paxos, Raft, ZAB, Vertical Paxos, Cheap Paxos, and Chain Replication. (The latter three—and all consensus algorithms which only need failure_tolerance+1 nodes—also require another consensus system for reconfiguration. But I digress.)
The Stable Leader Election paper is more than just Paxos: it includes a failure detector (from a cursory glance, it's a lease-based leadership model.) Thus, it is more expensive than Basic Paxos.
In the systems I maintain that require leaders, the failure detectors will utilize the consensus protocols to depose/elect leaders, but otherwise they are completely separate protocols.
I didn't read the papers you mentioned above, but I've learned during my studies that Paxos is in fact mostly used only to elect a leader, since the algorithm would be too much overhead to sort every message. And the reason you should use it for leader elections is, that it's 100% partition tolerant. All the other algorithms, which I know, aren't. - But there might be more which fulfil this criteria and I don't know of.
I'll read the papers, but what I could get from the Stable Leader Election paper, is that it's just a concept. They first introduce what it is, and afterwards algorithms how to do it. And when they introduce the algorithms they reference Paxos again. (but that was only scanning through the paper, nothing more).
How do I determine what should I add to my use case diagrams? 1 for each button/form? Should things like sort and search be included? Or are they under "list items" for example? Though, a list of items seems understood?
The Use Case diagram is intended to help define the high-level business tasks that are important, not a list of functions of the system. For example, a system for use in customer service might involve a research task of looking up information to help someone on a support call.
Most of the literature describes Use Cases as a starting point for defining what the system needs to accomplish. The temptation has always been to be as complete as possible; adding ever more details to define the use case down to a functional (code-wise) level. While it is useful to have a comprehensive understanding of the requirements, the Use Case diagram is not intended to provide that level of documentation.
One thing that makes the issue worse is the syntax which I've never seen used in a working project. It isn't that the terms aren't useful, it's due to the lack of consensus over when to use either term for a given use case. The UML artifacts expect a process that is more focused on the business language instead of the implementation language - and by that I do not mean a computer language. The tendency by some has been to approach the diagrams with a legalistic bent and worry about things like when to use for related use cases or how to express error-handling as exceptions to a defined list of process tasks.
If you have ever tried to work through the Automated Teller Machine (ATM) example, you'll know what I mean. In the solar system of UML learning, the ATM example is a black hole that will suck you into the details. Avoid using it to understand UML or the Object Oriented Analysis and Design. It has many of the problems, typical of real-world domains, that distract from getting an overall understanding even though it would make for a good advanced study.
Yes, code will eventually be produced from the UML artifacts, but that does not mean they have to be debated like a treaty in the Senate.
The OMG UML spec says:
Use cases are a means for specifying required usages of a system. Typically, they are used to capture the requirements of a system, that is, what a system is supposed to do. The key concepts associated with use cases are actors, use cases, and the subject. The subject is the system under consideration to which the use cases apply. The users and any other systems
that may interact with the subject are represented as actors. Actors always model entities that are outside the system.
The required behavior of the subject is specified by one or more use cases, which are defined according to the needs of actors. Strictly speaking, the term “use case” refers to a use case type. An instance of a use case refers to an occurrence of the
emergent behavior that conforms to the corresponding use case type. Such instances are often described by interaction specifications.
An actor specifies a role played by a user or any other system that interacts with the subject. (The term “role” is used
informally here and does not necessarily imply the technical definition of that term found elsewhere in this specification.)
Now most people would agree that business and user level interactions are the sweet spot, but there is no limitation. Think about the actors/roles being outside of the main system/systems you are focusing on. But in one view a system could be an actor, but in another the implementer of other use cases.