Is there an equivalent of DSE's Nodesync in open-source Apache Cassandra? - cassandra

Is there any equivalent tool in Apache cassandra(Open source) for continuous background repair process same as of datastax Cassandra nodesync.
Thanks and Regards

The Nodesync service is an enterprise feature that is only available in DataStax Enterprise and doesn't have an open-source equivalent since it
uses a different mechanism to the traditional anti-entropy repairs (does not use nodetool repair).
The closest you can get is the open-source tool Cassandra Reaper which automates repairs by splitting the repair jobs into small segments (token ranges) and Reaper puts these segments into a schedule to get repaired periodically. Cheers!

Related

OpsCenter reports "Cannot run anti-entropy repair on tables with NodeSync enabled"

I'm monitoring a DSE cluster and I see the following problem:
As you can see it says that the Repair is currently failing, this value keeps going up with time. Can someone explain to me what's happening in here? In the Opscenter logs I can only find this error:
Is this related to the problem?
Checked logs and documentation.
In DSE there are two ways to perform anti-entropy repair:
Traditional Cassandra repair using nodetool repair command
NodeSync that is often faster and more intelligent (see this blog post for more details)
But you couldn't use traditional repair on the tables where NodeSync is enabled. So you need to click on settings icon for Repair and disable running it on the keyspaces/tables with NodeSync enabled.
To add to Alex Ott's excellent response, NodeSync is a new feature in DataStax Enterprise which runs a repair continuously in the background using the same mechanism as read-repairs and replaces the traditional anti-entropy repairs.
The OpsCenter Repair Service will skip repairs on tables which have NodeSync enabled because it isn't possible to run traditional repairs on them as I've explained in this post -- https://community.datastax.com/questions/3879/.
If NodeSync was enabled on a table while a repair on that same table was already scheduled and running, it would explain why you're seeing error messages.
You can stop the errors from being generated by explicitly excluding the keyspace(s) or table(s) from subrange repairs with:
[repair_service]
ignore_keyspaces=ks_name_1,ks_name_2
ignore_tables=ks_name_3.table_name_1,ks_name_3.table_name_2

Cassandra reaper - should I repair also reapers database?

So I have installed cassandra-reaper, and I have setup schedules for every Wednesday to repair my projects db. I'm just wondering if there is any need to schedule also a repair for the cassandra-reapers database, which was created?
I think, No because Reaper is just UI to schedule and manage Cassandra cluster.
It improves the existing nodetool repair process by
Splitting repair jobs into smaller tunable segments.
Handling back-pressure through monitoring running repairs and pending compaction.
Adding ability to pause or cancel repairs and track progress precisely.
Reaper ships with a REST API, a command line tool and a web UI.

What is the impact of writes to a cassandra cluster that is hosting analytics jobs?

I am considering a heavily analytics based spark cluster that also has to consume some writes from a response time sensitive UI.
Will the analytics jobs impact or hinder my rest response time ?
Will it have any other impact ?
Your best bet would be to keep multiple separate data centres. I couldn't find the blog about newest version of DSE but all the principles from the article still apply and the principles are applicable for the community cassandra too.
https://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/deploy/deployWkLdSep.html

Cassandra cluster monitoring

How to collect data from all nodes within cluster from single node in cassandra.
Does jmx provide aggregated values for all nodes which are present on same cluster on single node?
Yes. For Cassandra cluster you will be able to do so.As per my knowledge there are two well know ways for monitoring and getting cluster status.
nodetool utility :
The nodetool utility is a command-line interface for monitoring Cassandra and performing routine database operations. Included in the Cassandra distribution, nodetool and is typically run directly from an operational Cassandra node.
Datastax Ops-center : OpsCenter provides a graphical representation of performance trends in a summary view that is hard to obtain with other monitoring tools. The GUI provides views for different time periods as well as the capability to drill down on single data points. Both real-time and historical performance data for a Cassandra or DataStax Enterprise cluster are available in OpsCenter. OpsCenter metrics are captured and stored within Cassandra.
I think the the first way (nodetool utility) will be more useful to meet your requirements.
You will get more information at
Cassandra cluster monitoring and nodetool options.
JMX provides information from a single node. To have information about entire cluster we collect data from all nodes into Zabbix. Zabbix allows to create graphs and screens that show jmx values from all nodes in one place. E.g. we can see all Read Pending Tasks for all nodes in single graph.
I think, to have separate information for each node in one place it's better solution to diagnose possible issues than to have common aggregate information.
Regarding metrics, I can recommend Guide to Cassandra Thread Pools that provides a description of the different cassandra metrics and how to monitor them.

Cassandra production Monitoring

I am new to Cassandra and trying to setup monitoring to Cassandra production cluster.
Apart from monitoring using nodetool commands in crontab what else is recommended?
is it a general practice to use ganglia for monitoring?
can you direct me to a good resource on setting up monitoring in production.
we are using apache cassandra so opscenter was not very useful.
The free version of OpsCenter works with OSS Cassandra and most monitoring capabilities are available. You do miss a good amount of cluster management capabilities if you don't have DSE:
http://www.datastax.com/what-we-offer/products-services/datastax-opscenter/compare

Resources