Why is Paxos leader election not done using Paxos? - protocols

The questions below are intended to be serious rather than frivolous. I lack experience in distributed systems, but I do understand how Basic Paxos works and why leader selection is useful. Unfortunately, my understanding is not deep enough to fathom the questions below.
In the paper Consensus on Transaction Commit, page 8 (page 11 of the linked PDF), we have the following statement.
Selecting a unique leader is equivalent to solving the consensus
problem.
If this statement is true, and the very purpose of Paxos to achieve consensus, why is Paxos itself not generally used for leader election?
Moreover, the same paper endorses the leader election algorithm described the Stable Leader Election paper.
If the two problems are equivalent, and the same paper endorses a different leader election algorithm, why isn't the other algorithm used for solving the general consensus problem instead of Paxos?

Paxos is used in leader election. In the paxos variants that have leaders (eg. Multi-paxos, Raft), the leader is the node that has its data chosen by the Paxos instance, either that or the leader is elected in its own transition (Some people use the term Paxos instance; I prefer to think of consensus algorithms as choosing the transitions in a distributed finite state machine.)
All correct consensus algorithms can be mapped to Basic Paxos, but each are optimized for different things. These include Multi Paxos, Raft, ZAB, Vertical Paxos, Cheap Paxos, and Chain Replication. (The latter three—and all consensus algorithms which only need failure_tolerance+1 nodes—also require another consensus system for reconfiguration. But I digress.)
The Stable Leader Election paper is more than just Paxos: it includes a failure detector (from a cursory glance, it's a lease-based leadership model.) Thus, it is more expensive than Basic Paxos.
In the systems I maintain that require leaders, the failure detectors will utilize the consensus protocols to depose/elect leaders, but otherwise they are completely separate protocols.

I didn't read the papers you mentioned above, but I've learned during my studies that Paxos is in fact mostly used only to elect a leader, since the algorithm would be too much overhead to sort every message. And the reason you should use it for leader elections is, that it's 100% partition tolerant. All the other algorithms, which I know, aren't. - But there might be more which fulfil this criteria and I don't know of.
I'll read the papers, but what I could get from the Stable Leader Election paper, is that it's just a concept. They first introduce what it is, and afterwards algorithms how to do it. And when they introduce the algorithms they reference Paxos again. (but that was only scanning through the paper, nothing more).

Related

Is there any consensus protocol that does not guarantee linearizability?

While studying consensus protocols, I notice for the typical ones, such Paxos, MultiPaxos, Fast Paxos, EPaxos etc, they all guarantee linearizability.
But it seems there is no literature explicitly saying linearizability is the necessary property of consensus protocol. So, I am wondering whether there is any consensus protocol which do not provide linearizability.
Can we say linearizability is a necessary property for consensus protocol?
I don't think they "guarantee" linearizability; it's just they can be used to implements libearizability if needed - due to nature of linear log.
Let me out it this way: let's say in an interview you would be asked - please, implement an linearizable register. You have many options, and one of those options is to use a consensus based log; where you would make all writes and, more important, read via that log.
Linearizability is a property of a system and consensus protocol(s) is one of solutions. There are plenty of systems who do use consensus protocols, but don't offer linearizability as that system's feature.

Why is compare-and-swap (CAS) algorithm a good choice for lock-free synchronization?

CAS belongs to the read-modify-write (RMW) family, a set of algorithms that allow you to perform complex transactions atomically.
Specifically, Wikipedia says that
CAS is used to implement synchronization primitives like semaphores and mutexes, as well as more sophisticated lock-free and wait-free algorithms. [...] CAS can implement more of these algorithms than atomic read, write, or fetch-and-add, and assuming a fairly large amount of memory, [...] it can implement all of them.
https://en.wikipedia.org/wiki/Compare-and-swap#Overview
So it seems that a CAS algorithm is the "one size fits all" product of its category. Why is that so? What do other RMW algorithms lack of? If CAS is the best tool, what are the other algorithms for?
CAS belongs to a class of objects called "consensus objects", each of which has a consensus number; the maximum number of threads for which a given consensus object can solve the consensus problem.
The consensus problem goes like this: for some number of threads n, propose some value p and decide on one of the proposed values d such that n threads agrees on d.
CAS is the most "powerful" consensus object because its consensus number is infinite. That is, CAS can be used to solve the consensus problem among a theoretically infinite number of threads. It even does it in a wait-free manner.
This can't be done with atomic registers, test-and-set, fetch-add, stacks since they all have finite consensus numbers. There are proofs for these consensus numbers but that is another story...
The significance of all of this is that it can be proven that there exists a wait-free implementation of an object for n threads using a consensus object with a consensus number of at least n. CAS is especially powerful because you can use it to implement wait-free objects for an arbitrary number of threads.
As to why other RMW operations are useful? Some problems in multiprocessing don't really involve solving the consensus problem for some arbitrary number of threads. For example, mutual exclusion can be solved using less powerful RMW operations like test-and-set (a simple TAS lock), fetch-add (ticket lock) or atomic swap (CLH lock).
More information on consensus for shared memory at Wikipedia Consensus (computer_science) section: In_shared-memory_systems
Also, there's a whole chapter on consensus and universal constructions in Herlihy and Shavit's The Art of Multiprocessor Programming (WorldCat) that I highly recommend.

Can applications coexist within the same DHT?

If you create a new application which uses a distributed hash table (DHT), you need to bootstrap the p2p network. I had the idea that you could join an existing DHT (e.g. the Bittorrent DHT).
Is this feasable? Of course, we assume the same technology. Combining Chord with Kademlia is obviously not feasable.
If yes, would this be considered parasitic or symbiotic? Parasitic meaning that it conflicts with the original use somehow. Symbiotic, if it is good for both applications as they support each other.
In general: Kademlia and Chord are just abstract designs, while implementations provide varying functionality.
If its feature-set is too narrow you won't be able to map your application logic onto it. If it's overly broad for your needs it might be a pain to re-implement if no open source library is available.
For bittorrent: The bittorrent DHT provides 20byte key -> List[IP,Port] lookups as its primary feature, where the IP is determined by the sender IP and thus cannot be used to store arbitrary data. There are some secondary features like bloom filter statistics over those lists but they're probably even less useful for other applications.
It does not provide general key-value storage, at least not as part of the core specification. There is an extension proposal for that
Although implementations provide some basic forward-compatibility for unknown message types by treating them like node lookup requests instead of just ignoring them that is only of limited usefulness if your application supplies a small fraction of the nodes, since you're unlikely to encounter other nodes implementing that functionality during a lookup.
If yes, would this be considered parasitic or symbiotic?
That largely depends on whether you are a "good citizen" in the network.
Does your implementation follow the spec, including commonly used extensions?
Does your general use-case stay within an order of magnitude compared to other nodes when it comes to the traffic it causes?
Is the application lifecycle long enough to not lie outside the expected churn rates of the target DHT?

Does statemachine and statechart mean the same?

I have heard people using these terms.
I wonder if they refer to the same thing or is there a difference between these two?
Wikipedia actually covers this pretty well. http://en.wikipedia.org/wiki/State_diagram
State machines have been around for a long time (decades at least). They consist of states (usually circles) and arrows between the states where certain actions can trigger an transition along an arrow. Moore and Mealy machines are the two main variants, which indicate whether the output is derived from the transitions or the states themselves.
Statecharts were invented by David Harel, and are sometimes called Harel Statecharts. He defined a pretty broad extension to typical state machines, with the goal of making state machines more useful for actual work with complicated systems.
A variant of Statecharts are build into Matlab now, as stateflow, which is an extension of simulink. Statesharts are also the basis of the UML "State Machine Diagrams".
Learn more about Stateflow in general at: https://www.mathworks.com/help/stateflow/examples.html
Stateflow has been updated for making it very easy to create state machines and flow charts in R2012b.
The major updates include a new graphical editor, state transition tables, MATLAB as the action language and an integrated debugger.
From the seminal 1999 book "Constructing the User Interface with Statecharts" by Ian Horrocks, published by Addison-Wesley (bold/italicized for emphasis):
From the very nature of user interfaces, it is apparent that states and events are a natural medium for describing their behaviour. Finite state machines are a formal mechanism for collecting and co-ordinating such fragments to form a whole. However, it is
generally agreed that, because of the large number of states and events organized in an
unstructured way, finite state machines are not appropriate for describing complex
systems. The feasibility of a state-based approach for specifying a user interface relies
on a specification language that results in diagrams that are concise, well structured,
modular and hierarchical.
There are many different notations used to represent finite state machines, such as state
transition diagrams and state transition matrices. However, such notations do not address
the fundamental problems associated with finite state machines. The statechart notation is
not just another notation for a finite state machine; statecharts are a major step forward for state-based notations. They provide a much richer and much more powerful specification
language than any finite stale machine notation. All the serious problems associated with
finite stale machines are solved by statecharts:
The number of states in a statechart rises in proportion to the complexity of the
system being specified. In finite state machines, the number of states tends to
increase rapidly with only a modest increase in the complexity of the system
being specified.
Statecharts avoid duplicate states and duplicate event arrows. This avoids large,
chaotic diagrams that are difficult to understand and difficult to modify.
The states in a statechart have a hierarchical structure, which means the system
being modelled can be considered at different levels of abstraction. The modular
nature of the states ensures that it is not necessary to understand an entire statechart in order to understand just one part of it. In a nutshell, statecharts are to
state transition diagrams what modular decomposition and abstraction are to
monolithic code

Under what conditions could we justify an attempt to introduce a one size fits all term when it contradicts working experience?

I have just been re-reading "Domain-Driven Design: Tackling Complexity in the Heart of Software" by Eric Evans. I could not help but notice a hint towards creating a language where there is a one-to-one mapping between a noun and an entity. For example, we might call a phone, a phone and no other noun is accepted. However, can this always been achieved with every other entity. Let us take for example, language used to denote a bid on a phone. Here, there are several different names that refer to a bid on that phone where all these refering names mean the same thing, e.g., negotiate bid, negotiate offer, phone bid, etc. Also, there are additional terms used by other customers. Using these terms interchangably does not cause confusion. Nevertheless, attempts to introduce a single term to be used across all the source code as well as in conversations with all customers can cause confusion.
There is the obverse problem when we talk about similar phones where similar means something different to each customer. Here, we have the same term, which is sought after. However, it has many different meanings.
So, what justification in this instance could be used to attempt to introduce a one size fits all term when it contradicts working experience?
Your argument "begs the question" (in the logic sense of the term).
You ask: "Under what conditions could we justify an attempt to introduce a one size fits all term when it contradicts working experience?" How about under those conditions where it actually doesn't contradict working experience?
You suggest: "attempts to introduce a single term to be used across all the source code as well as in conversations with all customers can cause confusion." Indeed, it can... and it can also avoid confusion.
Source code is a great example of a limited domain where we can expect a minimum level of familiarity and training for all of the users expected to work in that domain (at least, in most commercial settings).
It is quite reasonable for a style-guide to declare the preferred term, and expect everyone to follow it, as consistency in this situation has a big upside. Using your example, in my particular project, I use the term "offer" over "bid" every time, and the code is better for it. I can point to other terms which have not yet been standardised, and can see the extra effort it takes to code for them.
Similarly, it is a widely accepted design goal in User Interface design and in User Documentation to use consistent terms. Using multiple terms for the same item is more difficult for users to follow - particularly non-native speakers. (I disagree with your claim that it does not (ever) cause confusion.) When introducing a new term, it is a good idea to mention other terms that could be used.
(Funnily enough, I worked at an organisation where the User Documentation referred to phones as "Voice Terminals", as the term 'phone' was ambiguous; this was, I suspect, going too far?)
On the other hand, someone selling a product or training users would generally do well to mimic the language of the users to best engage them.
You said,
There is the obverse problem when we talk about similar phones where similar means something different to each customer. Here, we have the same term, which is sought after. However, it has many different meanings.
What about bounded contexts? Probably, when same term means two different things, they should reside in two different contexts?
I quote from Martin Fowler's page on Bounded Context:
As you try to model a larger domain, it gets progressively harder to build a single unified model. Different groups of people will use subtly different vocabularies in different parts of a large organization. The precision of modeling rapidly runs into this, often leading to a lot of confusion. Typically this confusion focuses on the central concepts of the domain. Early in my career I worked with a electricity utility - here the word "meter" meant subtly different things to different parts of the organization: was it the connection between the grid and a location, the grid and a customer, the physical meter itself (which could be replaced if faulty).
His and yours problem descriptions sound similar.

Resources