How to calculate objects' size in Cassandra programatically - cassandra

I am new to Cassandra and was wondering how to calculate Cassandra cache size programmatically.
For example, after inserting several objects into Cassandra, I want to know what volume has been taken by those objects in Cassandra's memory table, via code.
The Cfstats are command-line tools, which does not meet my requirement.
Is there anything in the Hector API that can help? Thanks.

The CLI tools actually use JMX to interrogate the Cassandra instance(s). You could use this approach programatically, but it would be cumbersome. This page has some details on the monitoring interface:
http://www.datastax.com/docs/1.0/operations/monitoring
There is no other API support for retrieval of cache statistics information.

Related

how to find if tables are being read usging metadata query/logs in cassandra

We are using DSE Cassandra v4.8.9. I have a requirement where I need to find if the tables in Cassandra are being read using some metadata query or analyzing the logs.
We can definitely find from the application but for that we need to do code change, and we don't have cycles, so thought of considering the ways to get these details using features of DSE/Cassandra if that already available. Please advise.
You can obtain this information via JMX metrics - you need to look to the table metrics. You can obtain these metrics via JMX client, or use the nodetool cfstats command (it was renamed to tablestats in the later versions) - look to local read & write counts...

Time Series Visualization from Cassandra

I have a Cassandra Database and a Spark cluster that will get his inputs from Cassandra to do some processing.
In my Cassandra database, I have some table that are time series. I am looking for a way to visualize those time series easily without multiplying databases.
Grafana is a great tool for that, but infortunately, it seems like there is no way to plug it to Cassandra.
So, for now I am using Zeppelin notebooks using my Cassandra/Spark cluster, but the available features to display time series aren't as good as those from Grafana.
I also cannot replace my Cassandra by InfluxDB, because my Cassandra is not used only for time series storing.
Unfortunately there is no direct plugin for Cassandra as a datasource for Grafana. Below are the different possible ways you can achieve Cassandra to Grafana integration.
There is a pull request for Cassandra as a datasource https://github.com/grafana/grafana/pull/9774, this is not merged to Grafana master branch though.
you could run a fork of Grafana with this PR and use the plugin.
You can use KairosDB on top of Cassandra (We can configure KairosDB to use Cassandra as a Datastore, so no multiple databases:) and use KairosDB plugin. but this approach has some drawbacks:
we need to map the Cassandra schema to KairosDB schema, KairosDB
schema is metrics based schema.
Though KairosDB uses cassandra as
a Datastore, it will store the data in different schema and table, so
data is duplicated.
If your app is writing data to Cassandra
directly, you need to write simple client to pull the latest data
from cassandra and push to KairosDB
You can implement the SimpleJSON plugin for Grafana (https://github.com/grafana/simple-json-datasource). There are lots examples available for SimpleJSON implementation, write one for Cassandra and opensource :)
You can push the data ElasticSearch and use it as a Datasource. ES is supported as a Datasource for all major visualization tools.
A bit too late but there is a direct integration now, Cassandra datasource for Grafana
https://github.com/HadesArchitect/GrafanaCassandraDatasource
I would suggest to use Banana Visualization, but for this Solr should be enabled on Timeseries Table. Banana is a forked version of KIBANA. Also has powerful dashboard configuration capabilities.
https://github.com/lucidworks/banana

Access cassandra nodetool API progmatically

I need to provide similar utility functions such as is available through
nodetool tablestats
I've gone over their source code but didn't find a convenient solution to accessing it through code.
Is there a library available for this?
https://github.com/mariusae/cassandra/blob/master/src/java/org/apache/cassandra/tools/NodeProbe.java
The nodetool utility is connecting Cassandra via JMX and fetch all necessary data from corresponding beans. You can fetch necessary data from your program via JMX as well, but I wouldn't say that this is recommended way to do - it's better to setup some "standard" monitoring solution, like, Prometheus, connect it to Cassandra, and fetch data via it...

Cassandra cluster monitoring

How to collect data from all nodes within cluster from single node in cassandra.
Does jmx provide aggregated values for all nodes which are present on same cluster on single node?
Yes. For Cassandra cluster you will be able to do so.As per my knowledge there are two well know ways for monitoring and getting cluster status.
nodetool utility :
The nodetool utility is a command-line interface for monitoring Cassandra and performing routine database operations. Included in the Cassandra distribution, nodetool and is typically run directly from an operational Cassandra node.
Datastax Ops-center : OpsCenter provides a graphical representation of performance trends in a summary view that is hard to obtain with other monitoring tools. The GUI provides views for different time periods as well as the capability to drill down on single data points. Both real-time and historical performance data for a Cassandra or DataStax Enterprise cluster are available in OpsCenter. OpsCenter metrics are captured and stored within Cassandra.
I think the the first way (nodetool utility) will be more useful to meet your requirements.
You will get more information at
Cassandra cluster monitoring and nodetool options.
JMX provides information from a single node. To have information about entire cluster we collect data from all nodes into Zabbix. Zabbix allows to create graphs and screens that show jmx values from all nodes in one place. E.g. we can see all Read Pending Tasks for all nodes in single graph.
I think, to have separate information for each node in one place it's better solution to diagnose possible issues than to have common aggregate information.
Regarding metrics, I can recommend Guide to Cassandra Thread Pools that provides a description of the different cassandra metrics and how to monitor them.

Change Capture from DB2 to Cassandra

I am trying to get all inserts, updates, deletes to a normalized DB2 database (hosted on an IBM Mainframe) synced to a Cassandra database. I also need to denormalize these changes before I write them to Cassandra so that the data structure meets my Cassandra model.
Searched on google but tools either lack processing support or streaming CDC support.
Is there any tool out there that can help me achieve the above?
It's likely that no stock tool exists. What's the format of the CDC stream coming out? What queries do you need to run? Like any other Cassandra data modeling question, start with the queries you need to run and work backwards to the table structure(s).

Resources