We use CDC for our Cassandra DB. Our data in tables changes by upsert(update/insert) operations and TTLs expiring. Upsert operations are written to CDC logs, but TTLs expiring are not. I hoped, TTLs will look like delete operations in CDC Log.
Can anybody say, does db Cassandra write info about ttl expire in CDC logs?
Thanks a lot!
Related
In Mongo and HBase, we have a way to track the client connection, Is there a way to get the total client connection in Cassandra?
On client-side Datastax driver reports metrics on connections, task queues, queries and errors (connection errors, r/w timeouts, retries, speculative executions).
Can be accessed via the Cluster.getMetrics() operation (java).
Datastax provides a cassandra structure, CassMetrics (cpp driver). This structure contains min and max microseconds to execute a query, pending requests, timed-out requests, total connections and available connections. It gives the entire performance metrics of a particular snapshot of the session.
Each member of the structure is explained clearly in the following datastax documentation- https://docs.datastax.com/en/developer/cpp-driver/2.3/api/struct.CassMetrics/
Here, is an example program on how to use the structure effectively - https://github.com/datastax/cpp-driver/blob/master/examples/perf/perf.c
I'm interested in finding out which Cassandra replica responded to a read or write request performed at the ONE consistency level. Is there some way I can do this?
Running your queries with TRACING ON will get you that information. If you are using the Java driver, most of the trace information can be fetched via the ExecutionInfo class which you can get by calling ResultSet.getExecutionInfo. Else query the system_traces keyspace as the documentation suggests.
We have 2 node Cassandra cluster. Replication factor is 1 and consistency level is 1. We are not using replication as the data we are inserting is very huge for each record.
How does Cassandra reacts when a node is down when write is being performed in that node? We are using Hector API from Java client.
My understanding is that Cassandra will perform the write to other node which is running.
No, using CL.ONE the write would not be performed if the inserted data belongs to the tokenrange of the downed node. The consistency level defines how many replica nodes have to respond to accept the request.
If you want to be able to write, even if the replica node is down, you need to use CL.ANY. ANY will make sure that the coordinator stores a hint for the request. Hints are stored in System.Hints table. After the replica comes back up again, all hints will be processed and sent to the upcoming node.
Edit
You will receive the following error:
com.datastax.driver.core.exceptions.UnavailableException: Not enough replica available for query at consistency ONE (1 required but only 0 alive)
I have a doubt when i read datastax documentation about cassandra write consistency. I have a question on how cassandra will maintain consistent state on following scenario:
Write consistency level = Quorum
replication factor = 3
As per docs, When a write occurs coordinator node will send this write request to all replicas in a cluster. If one replica succeed and other fails then coordinator node will send error response back to the client but node-1 successfully written the data and that will not be rolled back.
In this case,
Will read-repair (or hinted-handoff or nodetool repair) replicate the inconsistent data from node-1 to node-2 and node-3?
If not how will cassandra takes care of not replicating inconsistent data to other replicas?
Can you please clarify my question
You are completely right, the read repair or other methods will update the node-2 and node-3.
This means even the failed write will eventually update other nodes (if at least one succeeded). Cassandra doesn't have anything like rollback that relational databases have.
I don't see there is anything wrong - the system does what you tell it, i.e., two override one, and since the error messages sent back to the client as "fail", then the ultimate status should be "fail" by read repair tool.
Cassandra Coordinator node maintains the failed replica data in its storage and it will retry periodically (3 times or so) then if it succeeds then it will send the latest data, otherwise it will truncate the data in its storage.
In case of any read query, Coordinator node sends requests to all the replica nodes, and it will compare the results from all the replica nodes. If one of the replica node is not sending the latest data, then it will send read repair command to that node in order to keep the nodes in sync.
I'm using Node.js+Express+Mongoose to connect to my MongoDB replica set (3x instances).
I was under the impression that when I used Mongoose's "connectSet" command, thereby connecting to the replica set, my queries would be load-balanced between my replica set.
However, using nodetime, I can see that all queries (including find() queries) are going to the PRIMARY instance in the replica set.
Am I misunderstanding something here? Is there some practice I am missing, or a setting in the replica set? I thought the purpose of a replica set was to balance read-only queries with the SECONDARY MongoDB servers in the set...
Thanks.
I was under the impression that when I used Mongoose's "connectSet" command, thereby connecting to the replica set, my queries would be load-balanced between my replica set.
This impression is incorrect.
By default, MongoDB reads & writes are sent to the Primary member of a Replica Set. The primary purpose of a Replica Set is to provide high availability (HA). When the primary node goes down, the driver will throw an exception on existing connections and then auto-reconnect to whatever node is elected the new primary.
The idea here being that the driver will find the new primary with no intervention and no configuration changes.
Is there some practice I am missing, or a setting in the replica set?
If you really want to send queries to a secondary you can configure a flag on the query that states "this query can be sent to a secondary". Implementation of this will vary, here's a version for Mongoose.
Please note that sending queries to Secondary nodes is not the default behaviour and there are many pitfalls here. Most implementations of MongoDB are limited by the single write lock, so load-balancing the reads is not necessary. Spreading the reads is not guaranteed to increase performance and it can easily result in dirty reads.
Before undertaking such a load balancing, please be sure that you absolutely need it. Sharding may be a better option.