How to override default_fetch_size for any query to a cassandra cluster or keyspace? - cassandra

I have a cassandra 2.0.5 cluster setup with 3 nodes. Multiple services use the same cluster with separate keyspace. Due to the large size of the blob entry in the table, a query which goes over all the rows causes an OutOfMemory error and crashes the cluster. This is unacceptable for me as different services use the same cluster and one should not affect others.
Now, there is a way to restrict the number of rows which are loaded in memory at a time per query using the fetchSize parameter with a query with most of the drivers supporting cluster 2.0 protocol.
The default_fetch-size value is 5000, and I want to override this value to something smaller like 500 to avoid OOM error. I cannot assume all clients will use a small fetchSize while issuing an expensive query. Is there a way to do so? I cannot find any such configuration in cassandra.yaml.
Can I set this per keyspace level, so all queries to a particular keyspace will have a smaller default_fetch_size?

To set fetchsize on all statements, you can do that in QueryOptions when you build your session. Something like this,
Session session = Cluster.builder()
.addContactPoint("127.0.0.1")
.withQueryOptions(new QueryOptions().setFetchSize(100))
.build().connect();
If you want to have different fetchsize for different keyspaces then maintain session objects for each keyspace. Something like this,
Session sessionForKeyspace1 = Cluster.builder()
.addContactPoint("127.0.0.1")
.withQueryOptions(new QueryOptions().setFetchSize(100))
.build().connect("keyspace1");
Session sessionForKeyspace2 = Cluster.builder()
.addContactPoint("127.0.0.1")
.withQueryOptions(new QueryOptions().setFetchSize(200))
.build().connect("keyspace2");

What I know is you can only set fetch size of a particular statement. statement.setFetchSize(100);

Related

Data Inconsistency in Cassandra Cluster after migration of data to a new cluster

I see some data inconsistency after moving data to a new cluster.
Old cluster has 9 nodes in total and each has got 2+ TB of data on it.
New cluster has same set of nodes as old and configuration is same.
Here is what I've performed in order:
nodetool snapshot.
Copied that snapshot to destination
Created a new Keyspace on Destination Cluster.
Used sstableloader utility to load.
Restarted all nodes.
After successful completion of transfer, I ran few queries to compare(Old vs New Cluster) and found out that the new cluster is not consistent but the data I see is properly distributed on each node (nodetool status).
Same query returns different sets of results for some of the partitions and I get zero rows first time, second time 100 rows,200 rows and eventually it becomes consistent for few partitions and record count matches with old cluster.
Few partitions have no data in the new cluster where as old cluster has data for those partitions.
I tried running queries on cqlsh with CONSISTENCY ALL but the problem still exist.
Did i miss any important steps to consider before and after?
Is there any procedure to find out the root cause of this?
I am currently running "nodetool repair" but I doubt if that could solve as I tried with Consistency ALL.
Highly Appreciate your help!
The fact that the results eventually becomes consistent indicates that the replicas are out-of-sync.
You can verify this by reviewing the logs around the time that you were loading data, particularly for dropped mutations. You can also check the output of nodetool netstats. If you're seeing blocking read repairs, that's another confirmation that the replicas are out-of-sync.
If you still have other partitions you can test, enable TRACING ON in cqlsh when you query with CONSISTENCY ALL. You will see if there are digest mismatches in the trace output which should also trigger read repairs. Cheers!
[EDIT] Based on your comments below, it sounds like you possibly did not load the snapshots from ALL the nodes in the source cluster with sstableloader. If you've missed loading SSTables to the target cluster, then that would explain why data is missing.

How to set WRITE consistency explicitly with Datastax java driver?

With datastax java driver to connect to Cassandra, I wish to set explicitly WRITE consistency, but seems like we can set consistency level only for queries. Below is the sample code. How do i mention write consistency from driver lever ?
Cluster cluster = Cluster
.builder()
.addContactPoint(host)
.withQueryOptions(new QueryOptions().setConsistencyLevel(ConsistencyLevel.ONE))
.withRetryPolicy(DefaultRetryPolicy.INSTANCE)
.withCredentials(userName,password)
.withLoadBalancingPolicy(
new TokenAwarePolicy(DCAwareRoundRobinPolicy.builder().build()))
.build();
We have completely different requirements for reads and writes (reads have really tight SLA regarding latency numbers and writes are not that important to us to finish fast).
We decided to split sessions, we created two Cluster objects and out of those we created two sessions, one for read and one for write. When we are writing we are using writeSession and we write with CL QUORUM while when we read we use readSession which is tuned for latency requirements, with CL ONE, speculative executions and tight socket read timeout.
Long story short, you can define session specific for all your writes and define consistency level on Cluster object. Be aware only that this will implicate some more connections from driver to Cassandra cluster.
Consistency can be set at Cluster level, in which case any queries run with session.execute will have that consistency level. You can also set the consistency level as a part of the session.execute statement itself.

existing Cassandra 2.2.x cluster, changing the number of vNodes - will data be lost or not?

If the number of vNodes in the existing Cassandra 2.2.x cluster is changed - will it cause all the data in that cluster to be lost or not?
Is it possible to change # of vNodes and keep all the data stored in the Cassandra cluster?
The value in the config (cassandra.yaml) is only read on startup. Changing the value here will basically have no effect. You won't lose data.
There used to be a feature called shuffle - but it turned out you really don't want to change the token layout in this way, the streaming associated with shuffle will pretty much kill your cluster.
If you need to do this - the best method is to create a new DC with the desired token ranges and then rebuild them as per the instructions here:
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html
You can then point your app at the new DC and throw away the old.

spark datasax cassandra connector slow to read from heavy cassandra table

I am new to Spark/ Spark Cassandra Connector. We are trying spark for the first time in our team and we are using spark cassandra connector to connect to cassandra Database.
I wrote a query which is using a heavy table of the database and I saw that Spark Task didn't start until the query to the table fetched all the records.
It is taking more than 3 hours just to fetch all the records from the database.
To get the data from the DB we use.
CassandraJavaUtil.javaFunctions(sparkContextManager.getJavaSparkContext(SOURCE).sc())
.cassandraTable(keyspaceName, tableName);
Is there a way to tell spark to start working even if all the data didn't finish to download ?
Is there an option to tell spark-cassandra-connector to use more threads for the fetch ?
thanks,
kokou.
If you look at the Spark UI, how many partitions is your table scan creating? I just did something like this and I found that Spark was creating too many partitions for the scan and it was taking much longer as a result. The way I decreased the time on my job was by setting the configuration parameter spark.cassandra.input.split.size_in_mb to a value higher than the default. In my case it took a 20 minute job down to about four minutes. There are also a couple more Cassandra read specific Spark variables that you can set found here.
These stackoverflow questions are what I referenced originally, I hope they help you out as well.
Iterate large Cassandra table in small chunks
Set number of tasks on Cassandra table scan
EDIT:
After doing some performance testing with regards to fiddling with some Spark configuration parameters, I found that Spark was creating far too many table partitions when I wasn't giving the Spark executors enough memory. In my case, upping the memory by a gigabyte was enough to render the input split size parameter unnecessary. If you can't give the executors more memory, you may still need to set spark.cassandra.input.split.size_in_mbhigher as a form of workaround.

How to migrate single-token cluster to a new vnodes cluster without downtime?

We have Cassandra cluster with single token per node, total 22 nodes, average load per node is 500Gb. It has SimpleStrategy for the main keyspace and SimpleSnitch.
We need to migrate all data to the new datacenter and shutdown the old one without a downtime. New cluster has 28 nodes. I want to have vnodes on it.
I'm thinking of the following process:
Migrate the old cluster to vnodes
Setup the new cluster with vnodes
Add nodes from the new cluster to the old one and wait until it balances everything
Switch clients to the new cluster
Decommission nodes from the old cluster one by one
But there are a lot of technical details. First of all, should I shuffle the old cluster after vnodes migration? Then, what is the best way to switch to NetworkTopologyStrategy and to GossipingPropertyFileSnitch? I want to switch to NetworkTopologyStrategy because new cluster has 2 different racks with separate power/network switches.
should I shuffle the old cluster after vnodes migration?
You don't need to. If you go from one token per node to 256 (the default), each node will split its range into 256 adjacent, equally sized ranges. This doesn't affect where data lives. But it means that when you bootstrap in a new node in the new DC it will remain balanced throughout the process.
what is the best way to switch to NetworkTopologyStrategy and to GossipingPropertyFileSnitch?
The difficulty is that switching replication strategy is in general not safe since data would need to be moved around the cluster. NetworkToplogyStrategy (NTS) will place data on different nodes if you tell it nodes are in different racks. For this reason, you should move to NTS before adding the new nodes.
Here is a method to do this, after you have upgraded the old cluster to vnodes (your step 1 above):
1a. List all existing nodes as being in DC0 in the properties file. List the new nodes as being in DC1 and their correct racks.
1b. Change the replication strategy to NTS with options DC0:3 (or whatever your current replication factor is) and DC1:0.
Then to add the new nodes, follow the process here: http://www.datastax.com/docs/1.2/operations/add_replace_nodes#adding-a-data-center-to-a-cluster. Remember to set the number of tokens to 256 since it will be 1 by default.
In step 5, you should set the replication factor for DC0 to be 0 i.e. change replication options to DC0:0, DC1:3. Now those nodes aren't being used so decommission won't stream any data but you should still do it rather than powering them off so they are removed from the ring.
Note one risk is that writes made at a low consistency level to the old nodes could get lost. To guard against this, you could write at CL.LOCAL_QUORUM after you switch to the new DC. There is still a small window where writes could get lost (between steps 3 and 4). If it is important, you can run repair before decommissioning the old nodes to guarantee no losses or write at a high consistency level.
If you are trying to migrate to a new cluster with vnodes, wouldn't you need to change the Partitioner. The documents say that it isn't a good idea to migrate data between different Partitioners.

Resources