I have 4-node 2-dc cassandra cluster, and there I have faced an unexplainable behaviour.
dc1 - node1, node2, node3.
dc2 - node4.
In my database I have columnFamily1 with replication factor dc1:2 dc2:1. The column family contains 28 columns, and 11 columns are in primary key. I query node4 from dc2, that contains 100% data. So, the problem is:
'Select * from columnFamily' returns 5103 records.
'Select (any column from primaryKey) from columnFamily' returns 733 records.
If I export columnFamily to CSV and then import to truncated table, the problem disappears. Can anyone explain, how could this happen? Is there any solution without truncating columnFamily?
So it looks like from your above statements, putting aside your comment (the portion about the "node that has 100% of the data"), you must have some consistency issues. As you're using consistency one, you will query one node for the details. Which node could be any one of them. The fact that you do "*" or "any column from primary key" shouldn't matter, but each time you run the query, you could get a different node for the results. If you're using CQLSH, you could try to log onto each node, one at a time, set your consistency to LOCAL_ONE and run a "select " or "select count()" and see what results you get. I'm guessing it would be different. As you have truncated the table and re-loaded it, that could clean up the consistency problems. Another approach you could have done is run repair on the column family and then re-try your experiment. What did you do to export the data? Did you use the CQLSH COPY command? If so, that will query all nodes for the data. Did you count the number of rows in the CSV? If so, did it match any of the counts from the queries themselves?
Related
I have a question about query in scylladb. I want to count the rows in a table with:
SELECT COUNT(*)
FROM tabledata;
First run returns a result of 5732 rows
Second run returns a result of 5432 rows
Always different result.
Any suggestions on how to count rows in scylla?
Consistency level?
(you can find on internet a very funny picture about eventual consistency)
IF you have RF=3
If you wrote all your rows with LOCAL_QUORUM
then I'd set CONSISTENCY LOCAL_QUORUM
and rerun the count
if you are not sure whether all your writes were properly done, use CL ALL
another option is to run a full repair and rerun the count
ALSO your table might have TTL, in such case having a different count every time is expected (and if you wrote it might be bigger, if you just read, then it will be smaller)
For efficient count look at https://github.com/scylladb/scylla-code-samples/tree/master/efficient_full_table_scan_example_code - but the same applies re consistency level (and of course this script will tell you with a timeout error that a token range couldn't queried and it means that node/shard was overloaded with other traffic, by default it doesn't retry, it's a simple script)
The problem you're running into is inherent in any distributed row store (Cassandra or Scylla). In order for that to work, a coordinator node needs to contact all other nodes, query them, and assemble the result set. That causes a lot of contention which may prevent some replicas from reporting properly.
I recommend (downloading) using DSBulk for this type of operation. It has a count feature designed just for this purpose.
dsbulk count -k ks1 -t table1 -h '10.200.1.3,10.200.1.4'
I am a newbie to Cassandra.I have created a keyspace in Cassandra in NetworkTopology Strategy with 2 replicas in one datacenter. Is there a cql command or some other way to view my data in two replicas?
Like SELECT * FROM tablename in replica1 / replica2
Whether there is another way such that I can visually see the data in two replicas?
Thanks in advance.
So your question is not real clear "See the data in 2 replicas". If you ever want to validate your data, you can run some commands to visually see things.
The first thing you'd want to do is log onto the node you want to investigate. Go to the data directory of the interested table -> DataDir/keyspace/table. In there you'll see one or more files that look like *Data.db. Those are your sstables. Data in memory is flushed to sstables in certain scenarios. You want to be sure your data is flushed from memory to disk if you're validating (as you may not find what you're looking for otherwise). To do that, you issue a "nodetool flush" command (you can use the keyspace and table as parameters if you only want to flush the specific table).
Like I said, after that, everything in memory would be flushed to disk. So you'd be able to see your sstables (again, *Data.db) files. Once you have those sstables, you can run the "sstabledump" command on each sstable to see the data that resides in them, thus validating your data.
If you have only a few rows you want to validate and a lot of nodes, you can find which node the rows would reside by running "nodetool getendpoints" with the keyspace, table, and partition key. That will tell you every node that will have the data. That way you're not guessing which node the row(s) should be on. Unfortunately, there is no way to know which sstable the rows should exist in (and it could be more than one if updates/deletes, etc. occurred). You'll have to go through each sstable on the specific node(s).
Hope that helps answer your question?
Good luck.
-Jim
You can for a specific partition. If you are sure host1 is a replica (nodetool getendpoints or from query trace), then if you make your query with CL.ONE and explicitly to that host, the coordinator will always pick local first. So
Statement q = new SimpleStatement("SELECT * FROM tablename WHERE key = X");
q.setHost("host1")
Where host1 owns X.
For SELECT * FROM tablename its a bit harder because you are looking over entire data set and coordinator will send out multiple queries for each part of ring. If you do some queries with CL.ONE it will still only go to one node for each part of that range so if you set q.enableTracing() you can see what node answered for each range. You have no control over which coordinator picks so may take few queries.
If you just want to see if theres differences you can use preview repair. nodetool repair --preview --full.
My setup is:
cassandra 1.2.19
single datacenter cluster with 4 nodes
NetworkTopologyStrategy with replication factor of 3
consistency level of writes to the db is set to LOCAL_QUORUM
I am trying to iterate all records in a given table and I do so with some legacy application code which fetches the data in batches with consecutive select queries of this type:
SELECT * FROM records WHERE TOKEN(partition_key) > last_partition_key_of_previous_batch LIMIT 1000;
The problem is that sometimes some records are skipped. I also noticed that those skipped records are old, added months ago to the database.
All of the select queries are executed with consistency level ONE.
Is it possible that this is the cause?
From what I understood about consistency levels when the consistency level for reads is ONE, only one node is asked to execute the query.
Is it possible that sometimes the node that executes the query does not contain all the records and that's why sometimes some records are missing?
Changing the consistency level of the query to QUORUM fixed the issue.
I'm current using DB2 and planning to use cassandra because as i know cassandra have a read performance greater than RDBMS.
May be this is a stupid question but I have experiment that compare read performance between DB2 and Cassandra.
Testing with 5 million records and same table schema.
With query SELECT * FROM customer. DB2 using 25-30s and Cassandra using 40-50s.
But query with where condition SELECT * FROM customer WHERE cusId IN (100,200,300,400,500) DB2 using 2-3s and Cassandra using 3-5ms.
Why Cassandra faster than DB2 with where condition? So i can't prove which database is greater with SELECT * FROM customer right?
FYI.
Cassandra: RF=3 and CL=1 with 3 nodes each node run on 3 computers (VM-Ubuntu)
DB2: Run on windows
Table schema:
cusId int PRIMARY KEY, cusName varchar
If you look at the types of problems that Cassandra is good at solving, then the reasons behind why unbound ("Select All") queries suck become quite apparent.
Cassandra was designed to be a distributed data base. In many Cassandra storage patterns, the number of nodes is greater than the replication factor (I.E., not all nodes contain all of the data). Therefore, limiting the number of network hops becomes essential to modeling high-performing queries. Cassandra performs very well with specific queries (which utilize the partition/clustering key structure), because it can quickly locate the node primarily responsible for the data.
Unbound queries (A.K.A. multi-key queries) incur the extra network time because a coordinator node is required. So one node acts as the coordinator, queries all other nodes, collates data, and returns the result set. Specifying a WHERE clause (with at least a partition key) and while using a "Token Aware" load balancing policy, performs well for two reasons:
A coordinator node is not required.
The node primarily responsible for the range is queried, returning the result set in a single netowrk hop.
tl;dr;
Querying Cassandra with an unbound query, causes it to incur a lot of extra processing and network time that it normally wouldn't have to do, had the query been specified with a WHERE clause.
Even as a troublesome query like a no-condition range query, 40-50s is pretty extreme for C*. Is the coordinator hitting GCs with the coordination? Can you include code used for your test?
When you make a select * vs millions of records, it wont fetch them all at once, it will grab the fetchSize at a time. If your just iterating through this, the iterator will actually block even if you used executeAsync initially. This means that every 10k (default) records it will issue a new query that you will block on. The serialized nature of this will take time just from a network perspective. http://docs.datastax.com/en/developer/java-driver/3.1/manual/async/#async-paging explains how to do it in a non-blocking way. You can use this to to kick off the next page fetch while processing the current which would help.
Decreasing the limit or fetch size could also help, since the coordinator may walk token ranges (parallelism is possible here but its heuristic is not perfect) one at a time until it has read enough. If it has to walk too many nodes to respond it will be slow, this is why empty tables can be very slow to do a select * on, it may serially walk every replica set. With 256 vnodes this can be very bad.
This is a question regarding the behavior of cassandra for a select * query.
It's more for understanding, I know that normaly I should not execute such a query.
Assuming I have 4 Nodes with RF=2.
Following table (column family):
create table test_storage (
id text,
created_on TIMESTAMP,
location int,
data text,
PRIMARY KEY(id)
);
I inserted 100 entries into the table.
Now I do a select * from test_storage via cqlsh. Doing the query multiple times I get different results, so not all entries. When changing consistency to local_quorum I always get back the complete result. Why is this so?
I assumed, despite from the performance, that I also get for consistency one all entries since it must query the whole token range.
Second issue, when I add a secondary index in this case to location, and do a query like select * from test_storage where location=1 I also get random results wiht consistency one. And always correct results when changing to consistency level local_quorum. Also here I don't understand why this happens?
When changing consistency to local_quorum I always get back the complete result. Why is this so?
Welcome to the eventual consistency world. To understand it, read my slides: http://www.slideshare.net/doanduyhai/cassandra-introduction-2016-60292046/31
I assumed, despite from the performance, that I also get for consistency one all entries since it must query the whole token range
Yes, Cassandra will query all token ranges because of the non restricted SELECT * but it will only request data from one replicas out of 2 (RF=2)
and do a query like select * from test_storage where location=1 I also get random results wiht consistency one
Same answer as above, native Cassandra secondary index is just using a Cassandra table under the hood to store the reverse-index so the same eventual consistency rules apply there too