Cassandra sometimes skips records in SELECT query - cassandra

My setup is:
cassandra 1.2.19
single datacenter cluster with 4 nodes
NetworkTopologyStrategy with replication factor of 3
consistency level of writes to the db is set to LOCAL_QUORUM
I am trying to iterate all records in a given table and I do so with some legacy application code which fetches the data in batches with consecutive select queries of this type:
SELECT * FROM records WHERE TOKEN(partition_key) > last_partition_key_of_previous_batch LIMIT 1000;
The problem is that sometimes some records are skipped. I also noticed that those skipped records are old, added months ago to the database.
All of the select queries are executed with consistency level ONE.
Is it possible that this is the cause?
From what I understood about consistency levels when the consistency level for reads is ONE, only one node is asked to execute the query.
Is it possible that sometimes the node that executes the query does not contain all the records and that's why sometimes some records are missing?

Changing the consistency level of the query to QUORUM fixed the issue.

Related

Cassandra query results vary

I have 4-node 2-dc cassandra cluster, and there I have faced an unexplainable behaviour.
dc1 - node1, node2, node3.
dc2 - node4.
In my database I have columnFamily1 with replication factor dc1:2 dc2:1. The column family contains 28 columns, and 11 columns are in primary key. I query node4 from dc2, that contains 100% data. So, the problem is:
'Select * from columnFamily' returns 5103 records.
'Select (any column from primaryKey) from columnFamily' returns 733 records.
If I export columnFamily to CSV and then import to truncated table, the problem disappears. Can anyone explain, how could this happen? Is there any solution without truncating columnFamily?
So it looks like from your above statements, putting aside your comment (the portion about the "node that has 100% of the data"), you must have some consistency issues. As you're using consistency one, you will query one node for the details. Which node could be any one of them. The fact that you do "*" or "any column from primary key" shouldn't matter, but each time you run the query, you could get a different node for the results. If you're using CQLSH, you could try to log onto each node, one at a time, set your consistency to LOCAL_ONE and run a "select " or "select count()" and see what results you get. I'm guessing it would be different. As you have truncated the table and re-loaded it, that could clean up the consistency problems. Another approach you could have done is run repair on the column family and then re-try your experiment. What did you do to export the data? Did you use the CQLSH COPY command? If so, that will query all nodes for the data. Did you count the number of rows in the CSV? If so, did it match any of the counts from the queries themselves?

Select All Performance in Cassandra

I'm current using DB2 and planning to use cassandra because as i know cassandra have a read performance greater than RDBMS.
May be this is a stupid question but I have experiment that compare read performance between DB2 and Cassandra.
Testing with 5 million records and same table schema.
With query SELECT * FROM customer. DB2 using 25-30s and Cassandra using 40-50s.
But query with where condition SELECT * FROM customer WHERE cusId IN (100,200,300,400,500) DB2 using 2-3s and Cassandra using 3-5ms.
Why Cassandra faster than DB2 with where condition? So i can't prove which database is greater with SELECT * FROM customer right?
FYI.
Cassandra: RF=3 and CL=1 with 3 nodes each node run on 3 computers (VM-Ubuntu)
DB2: Run on windows
Table schema:
cusId int PRIMARY KEY, cusName varchar
If you look at the types of problems that Cassandra is good at solving, then the reasons behind why unbound ("Select All") queries suck become quite apparent.
Cassandra was designed to be a distributed data base. In many Cassandra storage patterns, the number of nodes is greater than the replication factor (I.E., not all nodes contain all of the data). Therefore, limiting the number of network hops becomes essential to modeling high-performing queries. Cassandra performs very well with specific queries (which utilize the partition/clustering key structure), because it can quickly locate the node primarily responsible for the data.
Unbound queries (A.K.A. multi-key queries) incur the extra network time because a coordinator node is required. So one node acts as the coordinator, queries all other nodes, collates data, and returns the result set. Specifying a WHERE clause (with at least a partition key) and while using a "Token Aware" load balancing policy, performs well for two reasons:
A coordinator node is not required.
The node primarily responsible for the range is queried, returning the result set in a single netowrk hop.
tl;dr;
Querying Cassandra with an unbound query, causes it to incur a lot of extra processing and network time that it normally wouldn't have to do, had the query been specified with a WHERE clause.
Even as a troublesome query like a no-condition range query, 40-50s is pretty extreme for C*. Is the coordinator hitting GCs with the coordination? Can you include code used for your test?
When you make a select * vs millions of records, it wont fetch them all at once, it will grab the fetchSize at a time. If your just iterating through this, the iterator will actually block even if you used executeAsync initially. This means that every 10k (default) records it will issue a new query that you will block on. The serialized nature of this will take time just from a network perspective. http://docs.datastax.com/en/developer/java-driver/3.1/manual/async/#async-paging explains how to do it in a non-blocking way. You can use this to to kick off the next page fetch while processing the current which would help.
Decreasing the limit or fetch size could also help, since the coordinator may walk token ranges (parallelism is possible here but its heuristic is not perfect) one at a time until it has read enough. If it has to walk too many nodes to respond it will be slow, this is why empty tables can be very slow to do a select * on, it may serially walk every replica set. With 256 vnodes this can be very bad.

Cassandra ReadTimeOuts

I have problem with Cassandra ReadTimeOuts.
Scenario:
3 GB data loaded to Cassandra,
9 nodes of Cassandra's within 1 DataCenter,
Replication equals 3,
Consistency level equals 1,
Cassandra version 2.2.9
link to cassandra.yaml
https://pastebin.com/x0bF7nLf
Tests:
For Testing I am using jmeter plug-in for Cassandra.
Request is a select with condition for row ID within provided list of ID's. List always contains 100 ID's. Each request should always return 100 rows (all ID's are in database).
ID's are random so cache role is reduced.
Sample select:
select * from price.item_vat_posting_group where no in ('B7B7A6','B2DD05','A34751','B4BC7D','C0BB53','D07DCB','C03716','BB99DF','A975C2','C2AE27','AF621C','242448','B30CDA','508336','B44D6B','D07422','AC44EA','C6F34D','9B25AC','C4CF12','AC25BD','C3D9C7','AE7DB2','C5E03E','BF7AC1','B499B5','A7787E','645180','A9BEFE','AFFEA4','A88955','D95B50','B0F9FC','C09174','253953','9ED9CA','CAF896','536951','214502','427776','DA14CB','422282','A4B10A','C56BF5','B373E0','D171EF','C70607','B350AB','9D809B','586563','BF6308','A4BF5A','C42716','C3261C','C45B79','C6FE55','D1F0D4','C483B5','A67D59','DC5898','9BACAD','D9C6B0','D17DAE','D8D4F3','A05946','BBEBA8','A87B37','A13E97','BB7099','A3FC26','C461DF','309810','BF6306','D07603','C59F70','C5906C','A515ED','B50056','A8390E','A0CCC7','BF2713','C6EC7D','D7EB9D','A5D5EB','984076','D88F44','257058','D61635','D40CDE','B0A347','B7617F','D6277E','B4286F','C41F99','D84232','DC1636','BFF15D','DD0972','9B3138');
Scenario 1.
While sending requests by 100 threads in 10 minutes time Cassandra has 5% ReadTimeOuts for total number of handled requests.
Average request time is 100 ms.
Processor usage on each Cassandra node is between 40% - 50%.
Scenario 2.
While sending requests by 4 threads in 24 hours time, about 10 ReadTimeOuts occurs per 100 000 requests.
Processor usage on each Cassandra node is 5%.
In both scenarios Garbage Collector works less then 300 ms.
Error message:
Cassandra time-out during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded)
com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded)
at com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:69)
at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:258)
Some statistics:
QUESTION:
Is that typical for Cassandra or am I doing something wrong ?
You are using in query, In query put a lot of presser in the coordinator node. When you execute in query, this means that you’re waiting on this single coordinator node to give you a response, it’s keeping all those queries and their responses in the heap, and if one of those queries fails, or the coordinator fails, you have to retry the whole thing.
Instead of using in query use executeAsync with separate query for each no. Now if one query failed retry requires only one small fast query.
Or
Change your data model so that you can specify partition key when using in query.
Note : To much executeAsync at a time can also put presser on your cluster. Check this link https://stackoverflow.com/a/30526719/2320144
More : https://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/
Your query isn't efficient because you scan lot of partition.
Each partition stored in different node.
You should scan one or less 10 partitions with range condition.
Change your data model, check theses links :
https://www.datastax.com/dev/blog/the-most-important-thing-to-know-in-cassandra-data-modeling-the-primary-key
https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause

CQL query with QUORUM consistency returning large result set

We have a CQL query that returns around 8000 rows. We see occasional query timeouts due to QUORUM consistency not met. After doing some investigation, we are suspecting this is because one or more rows are constantly changing so Cassandra can't get QUORUM consistency on some rows in given 10 seconds (rpc timeout). I wanted to reach out to Cassandra dev community and ask if anyone had success in using QUORUM consistency on queries that return large result set while rows are constantly changing or are we left with using consistency of ONE only?
Any input is appreciated.
The fact that rows are changing doesn't seem to be the problem here.
Getting a Quorum doesn't mean getting up to date data from a quorum, but just getting answers from a quorum. If data doesn't match through the quorum, timestamps will decide on which data "wins" and gets returned.
Quorum queries on 8000 rows should be no problem with a proper data model (you can extract hundreds of thousands rows with quorum consistency, if not more).
Try to set a rather small page size (100 records), and split your query in one asynchronous query per partition.
Also check if your nodes aren't overloaded when you get timeouts.
Give us your table model and query if you want more insights on what to improve, as well as how you're accessing data (which language, driver, etc...)

Cassandra consistency issue?

We have cassandra cluster in three different datacenters (DC1, DC2 and DC3) and we have 10 machines in each datacenter. We have few tables in cassandra in which we have less than 100 records.
What we are seeing - some tables are out of sync between machines in DC3 as compared to DC1 or DC2 when we do select count(*) on it.
As an example we did select count(*) while connecting to one cassandra machine in dc3 datacenter as compared to one cassandra machine in dc1 datacenter and the results were different.
root#machineA:/home/david/apache-cassandra/bin# python cqlsh dc3114.dc3.host.com
Connected to TestCluster at dc3114.dc3.host.com:9160.
[cqlsh 2.3.0 | Cassandra 1.2.9 | CQL spec 3.0.0 | Thrift protocol 19.36.0]
Use HELP for help.
cqlsh> use testingkeyspace ;
cqlsh:testingkeyspace> select count(*) from test_metadata ;
count
-------
12
cqlsh:testingkeyspace> exit
root#machineA:/home/david/apache-cassandra/bin# python cqlsh dc18b0c.dc1.host.com
Connected to TestCluster at dc18b0c.dc1.host.com:9160.
[cqlsh 2.3.0 | Cassandra 1.2.9 | CQL spec 3.0.0 | Thrift protocol 19.36.0]
Use HELP for help.
cqlsh> use testingkeyspace ;
cqlsh:testingkeyspace> select count(*) from test_metadata ;
count
-------
16
What could be the reason for this sync issue? Is it suppose to happen ever? Can anyone shed some light on this?
Since our java driver code and datastax c++ driver code are using these tables with CONSISTENCY LEVEL ONE.
What's your replication strategy? For cross datacenter, you should be looking at NetowrokTopologyStrategy with replications factors specified for each data center. Then during your queries, you can specify quorum / local quorum, etc. However, think about this for a minute:
You have a distributed cluster with multiple datacenters. If you want an each_quorum, think what your asking cassandra to do - for reads or writes, you ask it to quorum persist to both data centers separately before returning a success. Think about the latencies, and network connections going down. For a read, the client's requested node becomes the coordinator. It sends the write to the datacenter local replicas and to one node for the remote data centers. The recipient there coordinates to its local quorum. Once done, it returns results, and when the initial coordinator receives enough responses, it returns. All is well. Slow, but well. Now for writes, kind of a similar thing happens, but if a coordinator doesn't know that a node is down, it still sends to nodes. The write completes when the node comes back up, but the client can get a write timeout (note, not a failure - the write will eventually succeed). This can happen more often between multiple data centers.
Your looking to do count(*) queries. This is in general a terrible idea. It needs to hit every partition for a table. Cassandra likes queries that hit a single partition, or at least a small number of partitions (via IN filter).
Think about what select count(*) does in a distributed system. What does the result even mean? The result can be stale an instant later. There may be another insert in some other data center while you're processing the result of the query.
If you're looking to do aggregations over lots or all partitions, consider pairing cassandra with spark, rather than trying to do select(*) across data centers. And to go back to the earlier point, don't assume (or depend on) cross data center immediate consistency. Embrace eventual consistency, and design your applications around that.
Hope that helps.
Related point, you can query with different consistency levels from cqlsh. Just run:
CONSISTENCY EACH_QUORUM;
or
CONSISTENCY ALL;
etc.
The setting will persist as long as your cqlsh session or until you replace it with another CONSISTENCY statement.
EACH_QUORUM or ALL should guarantee you the same response regardless of your coordinator node. Though performance will take a hit. See ashic's point on count(*) in general. If this is a common query another option is to maintain the count in a separate table.

Resources