Metrics reporter doesn't report some metrics

Metrics reporter doesn't report some metrics - cassandra

Based on metrics-reporter-config-sample.yaml, some of metrics are not exported by either CSVreporter or ConsoleReporter,
in particular:
org.apache.cassandra.metrics.DroppedMessage.+
org.apache.cassandra.metrics.ReadRepair.+
org.apache.cassandra.metrics.ColumnFamily.system.+
// or any other keyspace metrics
Observed with cassandra versions DSE 5.x and DDC-3.7.
However, the keyspace metrics can be found in e.g.: JCONSOLE.
(I've built and installed newer metrics reported JAR (reporter-config3-3.0.2.jar) but the same outcome)

Figured it out by myself, the patterns are different to the ones listed in sample config file.
So, keyspace and table metrics are:
org.apache.cassandra.metrics.keyspace.+
org.apache.cassandra.metrics.Table.+
metrics-reporter-config-sample.yaml needs to be updated to the new cassandra versions
The trick was to export all metrics by changing white-list to black-list:
predicate:
color: "black"
patterns:
- ".*JMXONLY$"
and then figure out the right patterns for white-list.

Related

CF : Local Reads MBean for Cassandra

I am using JMX MBeans for monitoring Cassandra metrics. I am looking to enable CF : Local Reads equivalent in opscenter, does anyone know what is the equivalent JMX Mbean for it?

The metrics are listed out here: http://cassandra.apache.org/doc/latest/operating/metrics.html#table-metrics
The ReadLatency mbean on the table metrics so: org.apache.cassandra.metrics:type=Table,keyspace=YOURKEYSPACE,scope=YOURTABLE,name=ReadLatency
or type=ColumnFamily for older versions of C*.
OpsCenter uses the values operation to get the raw histogram so it may look slightly different than when reading the mbeans directly, but the decaying histogram is less accurate when monitoring over time so going based on raw values is better. It is described in this presentation.

Cassandra latency metric flat [duplicate]

I have setup new Cassandra 3.3 cluster. Then I use jvisualvm to monitor Cassandra read/write latency by using MBean (jmx metric).
The result of read/write latency is always stable in all nodes for many weeks whereas read/write request in that cluster have normally movement (heavy or less in some day).
As I use jvisualvm to monitor Cassandra 2.0 cluster. The read/write latency have normally behavior. It have movement depending on read/wire requests.
I wonder that Why the read/write latency statistics of Cassandra 3.0+ are always stable? And I think it is incorrect result. (I have load tested in Cassandra v3.3, v3.7).
[Updated]
I have found bug relate with this issue.
Cassandra metric flat. https://issues.apache.org/jira/browse/CASSANDRA-11752
The detail show that this problem has been solved in C* version 2.2.8, 3.0.9, 3.8. But after I have tested in version 3.0.9, The result of latency still show flat line.
Any Idea?
Thanks.

have not found any metrics problem When using C*3.3
first,try to monitor with jconsole,have met same issue？
second,which attribute do you see?avg value or percentage?there value always count from node up,so it is common to see percentage value is same.but not always happens on average value.try to restart cassandra node and check the value.

Titan 1.0.0 to Datastax Enterprise Migration

I have some existing code that I have written in Groovy for data ingestion into Titan w/ Cassandra + Elasticsearch backend. With the release of Datastax Enterprise 5.0, I was looking to see if the existing code for Titan could be migrated over.
The primary use of the code was to parse out some fields,transform some of the values (ex: datetimestamp -> epoch), and checking for edge uniqueness when adding new edges (ex: 'A likes Apples' relation should only appear once in the graph even though multiple 'A likes Apples' relations may appear in the raw file).
What I have tried so far:
Using the DSE Graph Loader with edge label multiplicity as single (no properties) and vertices multiplicity as single:
data = File.text(filepath).delimiter(',').header('a', 'b', 'c')
load(data).asVertices { }
load(data).asEdges { }
Using this template, vertices are unique (one vertex per vertex label). However, edge labels defined in the schema as single will throw an exception every time the "same" edge is attempted to be added. Is it possible to add checks within the loading script for uniqueness?
Loading data through the gremlin console
:load filepath
I'm finding that my pre-existing code throws quite a few exceptions upon executing the load command. After getting rid of a few Java/Titan classes that weren't importing (TitanManagement, SimpleDateFormat could not be imported), I am getting a
org.apache.tinkerpop.gremlin.groovy.plugin.RemoteException
Any tips on getting gremlin-console integration working?
One last question: Are there any functions that have been removed with the Datastax acquisition of Titan?
Thanks in advance!

We are looking at a feature enhancement to the Graph Loader to support the duplicate edge check. If your edges are only single cardinality, you can enforce that using cardinality property of an edge .single()
For the second item, are you using the DSE supplied Gremlin Console? Is your console local and your cluster located on another machine? What was the setup of your Titan environment?
For context, DataStax did not purchase Titan. Titan is an open source Graph Database and remains an open source Graph Database. DataStax acquired the Aurelius team, the creators of Titan. The Aurelius team built a new Graph Database that was inspired by Titan and is compliant with TinkerPop. There are feature and implementation detail differences between DSE Graph and Titan which can be found here - http://docs.datastax.com/en/latest-dse/datastax_enterprise/graph/graphTOC.html
One that may interest you is the integration of DSE Search and DSE Graph.

cassandra stress testing distribution of writes

How do I build a test that will tell me which Cassandra nodes are being written to, so I would want to specify number of nodes and replication factor and get back which nodes are affected by each write as the result of an attempted insert. this will tell me how evenly the data would be distributed at runtime. I have test data, so what i really need is a way to call mock Cassandra that's configured the way i would run in production that would return to me which node is affected.
I don't see a way to do that with the Cassandra stress tool, unless i am completely missing it...

Since you are interested in knowing all nodes that were impacted by a query, in I would recommend looking into tracing.
Here are a few approaches you could take:
Use cassandra-stress and enable tracing with nodetool settraceprobability on each of your C* nodes and set it to a low value like .01. This will enable query on 1% of your queries for which you can observe the results of the trace in the system via the system_traces.events and sessions tables (see this article for more information on how to use these tables). The trace will include information like which node was used as the coordinator, what other nodes were used as replicas for reads/writes and how long it took to process individual steps. Note that how your application will end up querying data may be slightly different then cassandra-stress since what nodes are queried is influenced by your Cluster configuration. cassandra-stress uses JavaDriverClient#connect. You will want to compare your configuration with what JavaDriverClient is doing and understand the differences. You could also modify JavaDriverClient to match your application.
You may also want to write a test against your application that uses cassandra. The java-driver has an API for enabling tracing and observing the data which I've documented in a video here. Additionally when you get a ResultSet back, there is a method getExecutionInfo() that provides information such as which hosts were tried, but this only includes nodes that were used as a coordinator, not all the replicas.

What is a good Bulk data loading tool for Cassandra

I'm looking for a tool to load CSV into Cassandra. I was hoping to use RazorSQL for this but I've been told that it will be several months out.
What is a good tool?
Thanks

1) If you have all the data to be loaded in place you can try sstableloader(only for cassandra 0.8.x onwards) utility to bulk load the data.For more details see:cassandra bulk loader
2) Cassandra has introduced BulkOutputFormat bulk loading data into cassandra with hadoop job in latest version that is cassandra-1.1.x onwards.
For more details see:Bulkloading to Cassandra with Hadoop

I'm dubious that tool support would help a great deal with this, since a Cassandra schema needs to reflect the queries that you want to run, rather than just being a generic model of your domain.
The built-in bulk loading mechanism for cassandra is via BinaryMemtables: http://wiki.apache.org/cassandra/BinaryMemtable
However, whether you use this or the more usual Thrift interface, you still probably need to manually design a mapping from your CSV into Cassandra ColumnFamilies, taking into account the queries you need to run. A generic mapping from CSV-> Cassandra may not be appropriate since secondary indexes and denormalisation are commonly needed.

For Cassandra 1.1.3 and higher, there is the CQL COPY command that is available for importing (or exporting) data to (or from) a table. According to the documentation, if you are importing less than 2 million rows, roughly, then this is a good option. Is is much easier to use than the sstableloader and less error prone. The sstableloader requires you to create strictly formatted .db files whereas the CQL COPY command accepts a delimited text file. Documenation here:
http://www.datastax.com/docs/1.1/references/cql/COPY
For larger data sets, you should use the sstableloader.http://www.datastax.com/docs/1.1/references/bulkloader. A working example is described here http://www.datastax.com/dev/blog/bulk-loading.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string