What happens if our query contains several tokens that finally there on different nodes?
Are possible that the client runs multiple queries Sync or Async on nodes?
sample:
//Our query
SELECT * FROM keyspace1.standard1 WHERE key = 1 or key = 2 or key = 3;
//Client change our query to multiple queries depends on the token ranges and run them sync or async.
SELECT * FROM keyspace1.standrad1 WHERE key = 1 or key = 3; //Token On node X
SELECT * FROM keyspace1.standard1 WHERE key = 3; //token On node Y
Sample2:
//Our Query
SELECT * FROM kspc.standard1;
//Client Change our query to multiple queries on the token ranges and run them sync or async.
SELECT * FROM kspc.standard1 WHERE token(key) > [start range node1] and token(key) < [end range node1];
SELECT * FROM kspc.standard1 WHERE token(key) > [start range node2] and token(key) < [end range node2];
and ...
As Manish mentioned, if query contains several partitions then token aware policy won't select anything and will send query to any node in the cluster (the same behaviour is for unpreparred queries and DDLs). And in general, it's an anti-pattern as it put more load onto the nodes, so it should be avoided. But if you really need, then you can force driver to send query to one of the nodes that owns a specific partition key. In Java driver 3.x there was a function statement.setRoutingKey, for Java driver 4.x should be something similar. For other drivers there should be similar stuff, but maybe not in all.
For second class of queries - it's the same, by default driver can't find to which node to send the query, and routing key should be set explicitly. But in general, full table scan could be tricky as you need to handle conditions on the lower & upper bounds, and you can't expect that token range starts exactly at lower bound - it could be a situations when token range starts near upper bound & ends slightly above the lower bound - and this is a typical error that I have seen regularly. If you interested, I have an example of how to perform full table scan using the Java (it uses the same algorithm as Spark Cassandra Connector and DSBulk) - the main part is this cycle over the available token ranges. But if you're looking into writing full table scan yourself, think about using the DSBulk parts as an SDK - you need to look onto partitioner module that was designed specifically for that.
For Sample 1, just query for single partition and merge results at the client end. This will be much faster. Datastax driver has token aware policy but it will only work when query refers to single partition.
You can refer this link.
For Sample 2, it is an anti pattern query and you cannot expect the client to do all the work for you. If you want to read complete table then you can use spark. Datastax provides spark-cassandra-connector which can provide somewhat same functionality which you have given. Here you can find description of spark-cassandra-connector.
Related
I currently have a table set up in Cassandra that has either text, decimal or date type columns with a composite partition key of a business_date and an account_number. For queries to this table, I need to be able to support look-ups for a single account, or for a list of accounts, for a given date.
Example:
select x,y,z from my_table where business_date = '2019-04-10' and account_number IN ('AAA', 'BBB', 'CCC')
//Note: Both partition keys are provided for this query
I've been struggling to resolve performance issues related to accessing this data because I'm noticing latency patterns that I am having trouble trying to understand / explain.
In many scenarios, the same exact query can be run a total of three times in a short period by the client application. For these scenarios, I see that two out of three requests will have really bad response times (800 ms), and one of them will have a really fast one (50 ms). At first I thought this would be due to key or row caches, however, I'm not so sure since I believe that if this were true, the third request out of the three should always be the fastest, which isn't the case.
The second issue I believed I was facing was the actual data model itself. Although the queries are being submitted with all the partition keys being provided, since it's an IN clause, the results would be separate partitions and can be distributed across the cluster and so, this would be a bad access pattern. However, I see these latency problems when even single account queries are run. Additionally, I see queries that come with 15 - 20 accounts performing really well (under 50ms), so I'm not sure if the data model is actually an issue.
Cluster setup:
Datacenters: 2
Number of nodes per data center: 3
Keyspace Replication:local_dc = 2, remote_dc = 2
Java Driver set:
Load-balancing: DCAware with LatencyAware
Protocol: v3
Queries are still set up to use "IN" clauses instead of async individual queries
Read_consistency: LOCAL_ONE
Does anyone have any ideas / clues of what I should be focusing on in terms of really identifying the root cause of this issue?
the use of IN on the partition key is always the bad idea, even for composite partition keys. The value of partition key defines the location of your data in cluster, and different values of partition key will most probably put data onto different servers. In this case, coordinating node (that received the query) will need to contact nodes that hold the data, wait that these nodes will deliver results, and only after that, send you results back.
If you need to query several partition keys, then it will be faster if you issue individual queries asynchronously, and collect result on client side.
Also, please note that TokenAware policy works best when you use PreparedStatement - in this case, driver is able to extract value of partition key, and find what server holds data for it.
Cassandra doesn't imply particular order in which the statements are executed.
Executing statements like the code below doesn't execute in the order.
INSERT INTO channel
JSON ''{"cuid":"NQAA0WAL6drA"
,"owner":"123"
,"status":"open"
,"post_count":0
,"mem_count":1
,"link":"FWsA609l2Og1AADRYODkzNjE2MTIyOTE="
, "create_at":"1543328307953"}}'';
BEGIN BATCH
UPDATE channel
SET title = ? , description = ? WHERE cuid = ? ;
INSERT INTO channel_subscriber
JSON ''{"cuid":"NQAA0WAL6drA"
,"user_id":"123"
,"status":"subscribed"
,"priority":"owner"
,"mute":false
,"setting":{"create_at":"1543328307956"}}'';
APPLY BATCH ;
According to system_traces.sessions each of them are received by different nodes.
Sometimes the started_at time in both query are equal (in milliseconds), sometimes the started_at time of second query is less than the first one.
So, this ruins the order of statements and data.
We use erlang, marina driver, consistency_level is QUORUM and time of all cassandra nodes and application server are sync.
How can I force Cassandra to execute queries in order?
Because of the distributed nature, queries in Cassandra could be received by different nodes, and depending on the load on particular node, it could be that some queries that sent later, are executed earlier. In your case you can put first insert into batch itself. Or, as it's implemented in some drivers (for example, Java driver), use whitelist policy to send queries to only one node - but it will be bottleneck in this case. (and I really not sure that your driver has such functionality).
Accessing all rows from all nodes in cassandra would be inefficient. Is there a way to have some access to index.db which already has row keys? is something of this sort supported in built in cassandra?
There is no way to get all keys with one request without reaching every node in the cluster. There is however paging built-in in most Cassandra drivers. For example in the Java driver: https://docs.datastax.com/en/developer/java-driver/3.3/manual/paging/
This will put less stress on each node as it only fetches a limit amount of data each request. Each subsequent request will continue from the last, meaning you will touch every result for the request you're making.
Edit: This is probably what you want: How can I get the primary keys of all records in Cassandra?
One possible option could be querying all the token ranges.
For example,
SELECT distinct <partn_col_name> FROM <table_name> where token(partn_col_name) >= <from_token_range> and token(partn_col_name) < <to_token_range>
With above query, you can get the all the partition keys available within given token range. Adjust token ranges depending on execution time.
So, I have a Cassandra CQL statement that looks like this:
SELECT * FROM DATA WHERE APPLICATION_ID = ? AND PARTNER_ID = ? AND LOCATION_ID = ? AND DEVICE_ID = ? AND DATA_SCHEMA = ?
This table is sorted by a timestamp column.
The functionality is fronted by a REST API, and one of the filter parameters that they can specify to get the most recent row, and then I appent "LIMIT 1" to the end of the CQL statement since it's ordered by the timestamp column in descending order. What I would like to do is allow them to specify multiple device id's to get back the latest entries for. So, my question is, is there any way to do something like this in Cassandra:
SELECT * FROM DATA WHERE APPLICATION_ID = ? AND PARTNER_ID = ? AND LOCATION_ID = ? AND DEVICE_ID IN ? AND DATA_SCHEMA = ?
and still use something like "LIMIT 1" to only get back the latest row for each device id? Or, will I simply have to execute a separate CQL statement for each device to get the latest row for each of them?
FWIW, the table's composite key looks like this:
PRIMARY KEY ((application_id, partner_id, location_id, device_id, data_schema), activity_timestamp)
) WITH CLUSTERING ORDER BY (activity_timestamp DESC);
IN is not recommended when there are a lot of parameters for it and under the hood it's making reqs to multiple partitions anyway and it's putting pressure on the coordinator node.
Not that you can't do it. It is perfectly legal, but most of the time it's not performant and is not suggested. If you specify limit, it's for the whole statement, basically you can't pick just the first item out from partitions. The simplest option would be to issue multiple queries to the cluster (every element in IN would become one query) and put a limit 1 to every one of them.
To be honest this was my solution in a lot of the projects and it works pretty much fine. Basically coordinator would under the hood go to multiple nodes anyway but would also have to work more for you to get you all the requests, might run into timeouts etc.
In short it's far better for the cluster and more performant if client asks multiple times (using multiple coordinators with smaller requests) than to make single coordinator do to all the work.
This is all in case you can't afford more disk space for your cluster
Usual Cassandra solution
Data in cassandra is suggested to be ready for query (query first). So basically you would have to have one additional table that would have the same partitioning key as you have it now, and you would have to drop the clustering column activity_timestamp. i.e.
PRIMARY KEY ((application_id, partner_id, location_id, device_id, data_schema))
double (()) is intentional.
Every time you would write to your table you would also write data to the latest_entry (table without activity_timestamp) Then you can specify the query that you need with in and this table contains the latest entry so you don't have to use the limit 1 because there is only one entry per partitioning key ... that would be the usual solution in cassandra.
If you are afraid of the additional writes, don't worry , they are inexpensive and cpu bound. With cassandra it's always "bring on the writes" I guess :)
Basically it's up to you:
multiple queries - a bit of refactoring, no additional space cost
new schema - additional inserts when writing, additional space cost
Your table definition is not suitable for such use of the IN clause. Indeed, it is supported on the last field of the primary key or the last field of the clustering key. So you can:
swap your two last fields of the primary key
use one query for each device id
using cassandra 2.2.8
my challenge is as follows. In my database we've bunch of tables with millions of rows. Unfortunately, due to loose design partition keys on few tables have grown in Gigabytes in size - this is causing negative pressure on system and issues like jvm out-of-memory/node crashes are happening.
we need to redesign the partition keys on few tables. we've data in tables that we would need to retain/or migration to new table.
I'm looking for solution that enables me to export data from source table to target table (i.e. with re-designed composite partitioned keys); I hope this would help spread partitions in more balance manner.
I tried to use COPY[tablename(column1,column2...)] command but that is probing number of nodes , causing pressure on system/heap i.e. affecting application. I'm seeking guidance here as how best i can address this challenge - thank you in advance for any help.
Since you have very big tables and already failed at using COPY, you must export and import your data manually. To perform such task you need to use the TOKEN function.
With some small client code you can write queries to perform a full table data extraction with something like:
SELECT * FROM mytable WHERE token(pk) >= MIN_TOKEN AND TOKEN(pk) < MIN_TOKEN + QUERY_INTERVAL;
SELECT * FROM mytable WHERE token(pk) >= MIN_TOKEN + QUERY_INTERVAL AND TOKEN(pk) < MIN_TOKEN + 2*QUERY_INTERVAL;
....
SELECT * FROM mytable WHERE token(pk) >= MAX_TOKEN - QUERY_INTERVAL AND TOKEN(pk) < MIN_TOKEN;
where MIN_TOKEN and MAX_TOKEN are both the constant minimum and the maximum token value of your cluster partitioner, and QUERY_INTERVAL is the range window you want to query. The bigger the QUERY_INTERVAL, the more data you will fetch in a single query (and more likely trigger a timeout).
Please note that Cassandra never allows a range operator (> >= <= <) in the WHERE clause on partition key column specifiers. The exception is with the use of the TOKEN function.
I also suggest these readings:
Understanding the Token Function in Cassandra
Displaying rows from an unordered partitioner with the TOKEN function
COPY just import/export to/from a file. If you want to redesign you data model, it probably will be better to implement specialized tool for your task, which will:
read data from source table by portions (e.g. by Tokens as #xmas79 discribed above)
transform the data portion to new model
write the data portion to new tables
Here is an example how to read big tables by token ranges with java and datastax driver