does Pulsar has data repair mechanism like read repair in Cassandra? - apache-pulsar

As we know in Cassandra, there is read repair which is used to make data consistent in quorum write.
In Pulsar, let's assume that Write Quorum Size (Qw) is 4, Ack Quorum Size (Qa) is 3.
If client try to write msg01 and ack is 2,then this request has been failed. But msg01 has been wrote successfully in 2 bookie. How does Pulsar repair/rollback this msg01?

Related

Insert Data using Spark in Cassandra

I am writing 1.2 billion rows of data (two columns) in Cassandra using spark and datastax spark connector. I have a two DC setup, I will be writing with local_quorum. I have 3 replications in both DC. Will there be latency introduced due to other DC. What other things should I keep in mind while inserting Data. I have tested on single DC and results are satisfactory.
Writes will be sent to other DC anyway, but because you're using LOCAL_QUORUM, Spark won't wait for confirmation from nodes in that DC, so it shouldn't affect the latency. The only thing that I would monitor - if the another DC is far away, and/or have a slow link, then the nodes where write happens may start to collect hints, and if this happens, then this may slightly affect performance because hints need to be written & then replayed after the remote node is back.

Meaning of "Cassandra provides atomicity and isolation when targeting a single partition for batch mutations"

Assume the following:
My batch statements belongs to a single partition
Requested consistency of the write is Quorum
Replication factor is 3
Q. Does that mean it will not be visible for reads (i.e isolation) with consistency 1 until the write is confirmed in all replicas?
From: https://docs.datastax.com/en/ddaccql/doc/cql/cql_using/useBatch.html
All replicas for the single partition receive the data, and the coordinator waits for acknowledgement.
A write consistency of QUORUM is irrelevant here. Data will not be available for reads at any consistency level until the write is confirmed (batch completes). In other words, there shouldn't be a time period where reads # consistency ONE might fail after a BATCH operation.

Cassandra: Are SS table created for replicated data

Suppose we have 3 node cluster in Cassandra and replication factor is 2.
During the compaction process, the replicated data of some other primary node also get compacted during the node compaction process.
For example, If node 1 is range partitioned for 1-10 and node 2 and node 3 are replicas for node 1. When we initiate the compaction process on node 2, will the SS table of node 2 have replicated the replication data of node 1?
TIA
The data replication happens either in real-time, when you write data, or via hints, if node was offline, or via explicit repair operation (although repair is the special type of compaction).
Compaction process is independent on every node - there are different factors that could lead to flushes happening on different times, different size of the data, etc. Compaction of data on one node won't send data to other nodes, until it's a validation compaction triggered via repair.
I recommend to read DSE Architecture guide, that explains how data is replicated, etc.
P.S. in your example, if you have RF=2, then only one node will be the replica for Node1...

ResponseError: Not enough replicas available for query at consistency SERIAL (2 required but only 1 alive)

I am a newcomer for Cassandra, current I met an issue, my cassandra setup as following,
1 DC, 1 Cluster
3 Nodes.
SimpleStrategy
durable write : true
Replication factor : 2 when creating keyspace.
Use IF NOT EXISTS to insert data into table.
Seed node: 2 of them
Then I bring down one seed node, and I got the following error:
ResponseError: Not enough replicas available for query at consistency SERIAL (2 required but only 1 alive)
That's normal, SERIAL requires a Paxos transaction with a quorum of replicas. For RF 2, the quorum is 2; iow, you cannot tolerate any node down to write at SERIAL to a keyspace with RF 2.
Rule of thumb: don't use RF 2, it's useless. Your quorum is: (2/2)+1 = 2, but for RF 3, it's the same quorum. So you should always prefer RF 3. If you change your keyspace to RF 3, your application would be able to write at SERIAL even if one replica is down.
Also see https://www.ecyrd.com/cassandracalculator/
As per understanding Consistency serial is equivalent to QUORUM.You have RF=2 in 3 node cluster so data in Cassandra inserted based on hash. so when you have inserted the data into the cluster, data may be inserted on both seed nodes.So when you are retrieving the data with one seed node down you can get this error as cluster is not achieving the desired consistency level.
Please refer link for more details.
https://docs.datastax.com/en/ddac/doc/datastax_enterprise/dbInternals/dbIntConfigSerialConsistency.html

There is no rollback in Cassandra, then how does Cassandra remove failed writes?

Suppose I have a 2 node cluster with Replication Factor(RF) = 2.
I fire an insert with Consistency 2. Cassandra starts to write to these 2 nodes while client is waiting for a response. In between one node fails and could not complete the write, while write on other node succeeded. Client will not get a success message as consistency cannot be met. There is no rollback in Cassandra. So how and when does Cassandra remove the inserted record from that one node or mark it as 'not to be used'?
Related Question: Does Cassandra write to a node(which is up) even if Consistency cannot be met?
It doesn't. Cassandra will try to replicate the data on each write operation and your application will be notified if the consistency level couldn't be met. But Cassandra doesn't rollback writes.
What you probably want to do in such cases is to use a higher CL for your reads as well. E.g. using CL QUORUM will read the data from both nodes and will automatically repair data in case its missing on one of the nodes.

Resources