What does Autorecovery do in Apache Pulsar? - apache-pulsar

What is the function of the Autorecovery process in Apache Pulsar? What are the risks if the process is not run?

Autorecovery process is actually to detect under replicated ledgers on bookkeepers. The description and functionalities (audit and replication worker) can be found at https://bookkeeper.apache.org/docs/4.5.0/admin/autorecovery/#autorecovery
If the autorecovery process is not running, you need a way to detect bookkeeper and ledger replication issue and then fix them manually. Bookkeeper provides cli to fix under replicated ledgers. Here is the manual recovery https://bookkeeper.apache.org/docs/4.5.0/admin/autorecovery/#manual-recovery

Related

What is the Correct order to restart a cluster for point-in-time restore?

I have a mixed workload cluster across multiple datacenters. I have ran the sstableloader command for the tables I want to restore using snapshots which I had backed up. I have added commit log files which I had backed up from archive to a restore directory on all nodes. I have updated the commitlog_archiving.properties file with these configs.
What is the correct way and order to restart nodes of my cluster?
Do these considerations apply for restarting as well?
As a general rule, we recommend restarting seed nodes in the DC first before other nodes so gossip propagation happens faster particularly for larger clusters (arbitrarily 15+ nodes). It is important to note that a restart is not required if you restored data using sstableloader.
If you are just performing a rolling restart then the order of the DCs does not matter. But it matters if you are starting up a cluster from a cold shutdown meaning all nodes are down and the cluster is completely offline.
When starting from a cold shutdown, it is important to start with the "Analytics DC" (nodes running in Analytics mode, i.e with Spark enabled) because it makes it easier to elect a Spark master. Assuming that the replication for Analytics keyspaces are configured with the recommended replication factor of 3, you will need to start 2 or 3 nodes beginning with the seeds ideally 1 minute apart because the LeaderManager requires a quorum of nodes to elect a Spark master.
We recommend leaving DCs with nodes running in Search mode (with Solr enabled) last as a matter of convenience so that all the other DCs are operational before the cluster starts accepting Search requests from the application(s). Cheers!
If you've done all of that, I don't think the order matters too much. Although, you should restart your seed nodes first, that way the nodes in the cluster have a common cluster entrypoint to find their way back in and correctly rejoin.

Fire triggers in replica nodes (cassandra)

I am using a Cassandra 4-node cluster with full replication in all nodes.
I have defined a trigger on a table. However, when I update a row in this table, trigger is fired only on the local node.
Is there any way to fire this trigger in all nodes (based on replication)?
Triggers run on the coordinator before they are passed off on be applied. To see it on a per replica the best way is to use CDC (which is also more reliable than triggers) and follow the changes as they are flushed to commitlog.
With CDC you have to solve another problems:
validate order of the pockets, since it is not guaranteed
make tradeoff between single point of failure vs implementing tool for CDC logs duplication checker, let me explain:
You either enable CDC logging on one node and this will become your bottleneck. Or you enable CDC on all nodes and then you have to somehow manage data duplication since leader will send logs to repications.
You can deploy triggers on every node of your cluster. It won't cause any data duplication and works perfectly fine.

Is it possible to recover a Cassandra node without a snapshot?

Offsite backups for Cassandra seem like a challenging thing. You basically have to make yet another copy of ALL your data, including the copies of data that exist due to the replication factor. Snapshots make backups easy when you don't mind storing it on the same disk that your node already uses. I'm curious - in the event of a catastrophic failure of this disk, is it possible to recover the node using the nodes that the data was replicated to?
Yes, you can restore data on crashed node using a procedure in documentation - Replacing a dead node or dead seed node. It's for Cassandra 3.x, please pick your Cassandra version from a drop-down menu on the top of the page.
But please note that you still need to do backups if your data is valuable. If you using AWS you can use this project to backup Cassandra to S3 storage.
If you are looking for offsite or off-host backups, you can also look at opscenter from Datastax or Talena software (my company). Both provide you the ability to backup your database locally or to S3. As you may expect, you also have the ability to restore data in case of hardware failures, user errors or logical corruptions which the replicas will not protect you against.
Yes, it is possible. Just execute in terminal "nodetool repair" on the node with missed data. It can take a lot of time. Also I would recommend execute repair operation on each node every month to keep your data always replicated because cassandra does not repairs data automatically (for example after node(s) falling).

How to update configuration of a Cassandra cluster

I have a 3 node Cassandra cluster and I want to make some adjustments to the cassandra.yaml
My question is, how should I perform this? One node at a time or is there a way to make it happen without shutting down nodes?
Btw, I am using Cassandra 2.2 and this is a production cluster.
There are multiple approaches here:
If you edit the cassandra.yaml file, you need to restart cassandra to re-read the contents of that file. If you restart all nodes at once, your cluster will be unavailable. Restarting one node at a time is almost always safe (provided you have sane replication-factors and consistency-levels). If your cluster is configured to survive a rack or datacenter outage, then you can safely restart more nodes concurrently.
Many settings can be changed without a restart via JMX, though I don't have a documentation link handy. Changing via JMX WON'T change cassandra.yml though, so you'll need to update that also or your config will revert back to what's in the file when the node restarts.
If you're using DSE, OpsCenter's Lifecycle Manager feature makes updating configs a simple point-and-click affair (disclaimer, I'm biased as I'm an LCM dev).

DataStax: Will back up work if OpsCenter goes down

If I configure backups with OpsCenter will the agents continue to function, will the back up service still run, if OpsCenter goes down?
Or do I need to build redundancy/set up cron jobs to complete snap shots and incremental backups?
Backups will stop if you loose opscenterd. You may want to set up HA opscenter if you need guarantees that your backups will happen on opscenter downtime:
https://docs.datastax.com/en/opscenter/5.2/opsc/configure/configFailover.html
Note that Opscenter only provides node level snapshots and does not give you a cluster wide, consistent snapshot. This means, you may lose data if a Cassandra node goes down during a backup window. Any change in the cluster topology during the backup window may also result in some data loss. So you should be careful to schedule them appropriately.
If you need your backups to be resilient across Cassandra node failures and topology changes, you may want to checkout “DatosIO”.
There are a number of commercial and opensource solutions appearing in the market. Check out Priam and Talena if you interested in Cassandra backup. They provide the capabilities you are referring to.

Resources