My data is stored among multiple partitions. I was to send this data to the client but I want to paginate the response. So say my 1st partition has 100 rows and 2nd partition has 100 rows. I want to send 10 rows per page along with PagingState. The client would send PagingState back to server and I'll use it to fetch next 10 records running the same query. Once I have exhausted 100 rows of 1st partition, I'll have to change the query. Is it possible to find which query was executed from PagingState so that I could read the PagingState, find for which partition it was for and using this information, I can determine what should be next partition
Its possible, but not straight forward or safe. The content changes between (protocol and cassandra) versions. Its also not very trivial to parse, as latest uses var ints to mark size of both partition key and row marker. On older versions it requires to send a cell level marker as well which it still sends for backwards compatibility in some scenarios so should really handle both. And with new versions of C* you will need to check to see if it changes.
You can always do paging on client side which will give you control over it and knowledge of the state that wont change on versions.
Related
I currently have a table set up in Cassandra that has either text, decimal or date type columns with a composite partition key of a business_date and an account_number. For queries to this table, I need to be able to support look-ups for a single account, or for a list of accounts, for a given date.
Example:
select x,y,z from my_table where business_date = '2019-04-10' and account_number IN ('AAA', 'BBB', 'CCC')
//Note: Both partition keys are provided for this query
I've been struggling to resolve performance issues related to accessing this data because I'm noticing latency patterns that I am having trouble trying to understand / explain.
In many scenarios, the same exact query can be run a total of three times in a short period by the client application. For these scenarios, I see that two out of three requests will have really bad response times (800 ms), and one of them will have a really fast one (50 ms). At first I thought this would be due to key or row caches, however, I'm not so sure since I believe that if this were true, the third request out of the three should always be the fastest, which isn't the case.
The second issue I believed I was facing was the actual data model itself. Although the queries are being submitted with all the partition keys being provided, since it's an IN clause, the results would be separate partitions and can be distributed across the cluster and so, this would be a bad access pattern. However, I see these latency problems when even single account queries are run. Additionally, I see queries that come with 15 - 20 accounts performing really well (under 50ms), so I'm not sure if the data model is actually an issue.
Cluster setup:
Datacenters: 2
Number of nodes per data center: 3
Keyspace Replication:local_dc = 2, remote_dc = 2
Java Driver set:
Load-balancing: DCAware with LatencyAware
Protocol: v3
Queries are still set up to use "IN" clauses instead of async individual queries
Read_consistency: LOCAL_ONE
Does anyone have any ideas / clues of what I should be focusing on in terms of really identifying the root cause of this issue?
the use of IN on the partition key is always the bad idea, even for composite partition keys. The value of partition key defines the location of your data in cluster, and different values of partition key will most probably put data onto different servers. In this case, coordinating node (that received the query) will need to contact nodes that hold the data, wait that these nodes will deliver results, and only after that, send you results back.
If you need to query several partition keys, then it will be faster if you issue individual queries asynchronously, and collect result on client side.
Also, please note that TokenAware policy works best when you use PreparedStatement - in this case, driver is able to extract value of partition key, and find what server holds data for it.
In Datastax's documentation, it said:
During a write, Cassandra adds each new row to the database without
checking on whether a duplicate record exists. This policy makes it
possible that many versions of the same row may exist in the database.
As far as I understand, that means there are possibly more than 1 non-compacted SSTables that contains different versions of the same row. How does Cassandra handle duplicated data when it read data from these SSTables?
#quangh : As already stated in document :
This is why Cassandra performs another round of comparisons during a read process. When a client requests data with a particular primary key, Cassandra retrieves many versions of the row from one or more replicas. The version with the most recent timestamp is the only one returned to the client ("last-write-wins").
All the writes operation have a timestamp associated. In this case different node will have different version of same row. But during read operation Cassandra will pick row with latest timestamp. I hope this solves your query.
Accessing all rows from all nodes in cassandra would be inefficient. Is there a way to have some access to index.db which already has row keys? is something of this sort supported in built in cassandra?
There is no way to get all keys with one request without reaching every node in the cluster. There is however paging built-in in most Cassandra drivers. For example in the Java driver: https://docs.datastax.com/en/developer/java-driver/3.3/manual/paging/
This will put less stress on each node as it only fetches a limit amount of data each request. Each subsequent request will continue from the last, meaning you will touch every result for the request you're making.
Edit: This is probably what you want: How can I get the primary keys of all records in Cassandra?
One possible option could be querying all the token ranges.
For example,
SELECT distinct <partn_col_name> FROM <table_name> where token(partn_col_name) >= <from_token_range> and token(partn_col_name) < <to_token_range>
With above query, you can get the all the partition keys available within given token range. Adjust token ranges depending on execution time.
I am thinking about using Kafka connect to stream updates from Cassandra to a Kafka topic. The existing connector from StreamReactor seems to use a timestamp or uuidtimestamp to extract new changes since the last poll. The value of the timestamp is inserted using now() in the insert statement. The connector then saves the maximum time is received last time.
Since Cassandra is eventually consistent I am wondering what actually happens when doing repeated queries using a time range to get new changes. Is there not risk to miss rows inserted into Cassandra because it "arrived late" to the node queried when using WHERE create >= maxTimeFoundSoFar?
Yes it might happen that you have newer data in front of your "cursor" when you already went on with processing if you are using consistency level one for reading and writing, but even if you use higher consistency you might run into "problems" depending on the setup that you have. Basically there are a lot of things that can go wrong.
You can increase the chances of not doing this by using an old cassandra formula NUM_NODES_RESPONDING_TO_READ + NUM_NODES_RESPONDING_TO_WRITE > REPLICATION_FACTOR but since you are using now() from cassandra the node clocks might have millisecond offsets between them so you might even miss data if you have high frequency data. I know of some systems where people are actually using raspberry pi's with gps modules to keep the clock skew really tight :)
You would have to provide more about your use case but in reality yes you can totally skip some inserts if you are not "careful" but even then there is no 100% guarantee other then you process the data with some offset that would be enough for the new data to come in and settle.
Basically you would have to keep some moving time window in the past and then move it along plus making sure that you don't take into account anything newer than the let's say last minute. That way you are making sure the data is "settling".
I had some use cases where we processed sensory data that would came in with multiple days of delay. On some projects we simply ignored it on some the data was for reporting on the month level so we always processed the old data and added it to reporting database. i.e. we kept a time window 3 days back in history.
It just depends on your use case.
Using a hector Mutator I update some row over N sequential operation. Is there a guaranty, that changes happens in the order they where added to Mutator?
The simplest example, if I delete some row and then immediately recreate it. Could it not happen, that the deletion happens after inserting?
How cassandra cluster manages it, if two sequential requests are sent to different nodes? It is always possible there is few milliseconds difference between nodes...
Cassandra resolves conflicts using timestamps supplied by the client. In your example the 'recreate' of the row will have a higher timestamp than the row delete so it doesn't matter if somehow they got to the server in the wrong order.
One consequence of client supplied timestamps is that you either need to sync the clocks on your client machines or design your data model so that different clients don't conflict with each other.