Cassandra trigger explanation - cassandra

I could not find any proper documentation on Cassandra triggers. Can anyone please explain how they actually work, I mean right from the client making a write request to when the trigger is fired by the coordinator node?
If some documentation is present then a link to that would also work.

Whenever the client makes a write query request to the Cassandra cluster, it goes to a coordinator node which first runs the trigger, and then the query is executed in batch logged form.
References:
https://www.datastax.com/blog/2013/08/whats-new-cassandra-20-prototype-triggers-support
https://docs.datastax.com/en/dse/6.0/cql/cql/cql_using/useBatch.html

Related

How to add/register and generate metrics for cassandra client program

In our java application we have a client program that insert bulk records into cassandra asynchronously. We are using guava Future and added callback to track success and failure for our insert operations.
Now I want to add and generate metrics to track number of record being executed through our program (method), number of success,number of failure, time taken for each insert. I would also like to get this information in hourly basis.
I am very new to cassandra and using metrics for the first time.Can you please help me to implement the above requirements. I want to know how we can register and generate metrics for client.
I have gone through https://docs.datastax.com/en/latest-java-driver-api/com/datastax/driver/core/Metrics.html - but it seems it provides statistics about cassanra server. But I want to register and generate metrics for client.
Thanks.

Cassandra timeout during write query but entry present in Datebase

We are using Cassandra 3.0 on our system. For insertion in the db, we are using the Datastax C# driver.
We have a query regarding the timeout and retry during insertion. We faced an instance where a timeout during insert was thrown yet there is that entry present in the database. All are settings are default in the Cassandra.yaml file as well as in the driver.
How can we know the actual status of the insert even if there is a timeout? If there was a timeout thrown, how could possibly the insert have gone through ahead? Whether the insert was successful or there was some default retry policy in place that was applied, we don't have any tangible answer on it currently and we need to know exactly about that.
How do we make sure that the status of that insertion was actually successful/failed with or without the timeout?
A write timeout is not necessarily a failure to write, moreover it's a notification that not enough replicas acknowledged the write within a time period. The write will still eventually happen on all replicas.
If you do observe a write timeout, it indicates that not enough replicas responded for the configured consistency level within the configured write_request_timeout_in_ms value in cassandra.yaml, the default being 2 seconds. Keep in mind however that the write will still happen.
The coordinating Cassandra node responsible for that write sends write mutations to all replicas and responds to the client as soon as enough have replied or the timeout is reached. Because of this, if you get a WriteTimeoutException you should assume the write happened. If any of the replicas are down, the coordinator maintains a hint for that write, which will be delivered to the replica when it becomes available again.
Cassandra also employs Read Repairs and Operators should run recurring Repairs to help keep data consistent.
If your operations are idempotent, you can simply retry the write until it succeeds. Or you can attempt to read the data back to make sure the write was processed. However, depending on your application requirements, you may not need to employ these strategies and you can safely assume the write did or will happen.
Please note on the other hand that unavailable errors (i.e. Not enough replicas available at consistency level X) indicate that not enough replicas were available to perform a write and therefore the write is never attempted.

Read/write request Waiting queue and waiting time in Apache Cassandra

I'm new to Apache Cassandra and I'm working on research about it. Especially the waiting queue length and waiting time for read/write request.
Apache Cassandra is based on the SDEA (Stage-driven-event-archetecture). That means for each request, for example read request, it'll be put in a queue to be processed. Based on this, there's should be some metrics for the waiting queue and waiting time for each request.
According to the post in this link: https://www.pythian.com/blog/guide-to-cassandra-thread-pools/, the queue is based on the messaging service part.
Meanwhile, I did some search and found that my question is still a ticket for the Cassandra developers: https://issues.apache.org/jira/browse/CASSANDRA-8398. The ticket has been there for a long time.
I also noticed that I can get this information with the usage of tpstats. However, to get this information I need to run the command in the terminal and print it out. So it's not accurate at all for my case.
I'm wondering anyone could have some hints, on where should I start to get the information/metrics of the waiting time and queue length for each request.
Thanks!
Steven

Cancel a running query

I have an application where users are running a geospatial query against a mongo database. The query can return many thousands of results (~50k). These results are then streamed to the client over a websocket. However, users can abort a request mid result set and execute a new query. Users will frequently start, abort, and re-start requests on the order of several times per minute. Sometimes they even cancel/restart every couple of seconds.
The question is, when a user aborts a request, how do I cancel the query on the server so it doesn't continue to tie up resources streaming back thousands of unneeded results? I'm currently calling destroy() on the cursor, but it's not clear that this is actually stopping the query from executing on the server.
What's the best practice in this case?
Have you tried this?
db.currentOp()
db.killOp(IDRETURNEDHE)
This is a good example.
The answer is it depends upon a lot of your implementation details.
If your server is in the middle of streaming results (e.g. still hasn't sent or queued everything) when the server receives some sort of other message that the previous results should be cancelled, then it is possible for you to communicate with that other stream and tell it to stop sending. How exactly you would do that depends entirely upon your code and you would have to show us your code for us to know.
Chances are the db query is long since complete and what is going on is the server is in the process of streaming results to the client. So, if that's the case, then it isn't the db you're looking for, it's the code that streams the response to the client. Since node.js JS is single threaded, the only time another request would actually get run on the server would be while the streaming code was in some async write operation, waiting for that to finish. You would probably have to set some flag that was uniquely associated with a particular user and then your stream code would have to check for that flag before each chunk of data was sent. If it saw the cancel flag, it could abandon sending the rest of the results.
You could make things more cancellable by explicitly chunking your results (say 500 at a time) and checking for a cancel flag between the sending of each chunk.
If, on the other hand, all the data has already been buffered up by the TCP layer on the server, then the only way to stop that from being sent is to tear down the webSocket and force the client to reconnect.

Validating Hector connection for Cassandra

Supposing I've connected to a cluster with
HFactory.getOrCreateCluster(cluster, address)
Is there a way to check later on whether or not I'm still connected? There doesn't seem to be an obvious way of doing that from looking through their javadocs.
One option might be to just fire a simple query to the cluster within a try-catch statement and see whether it returns properly, or it throws an exception.

Resources