In Datastax's documentation, it said:
During a write, Cassandra adds each new row to the database without
checking on whether a duplicate record exists. This policy makes it
possible that many versions of the same row may exist in the database.
As far as I understand, that means there are possibly more than 1 non-compacted SSTables that contains different versions of the same row. How does Cassandra handle duplicated data when it read data from these SSTables?
#quangh : As already stated in document :
This is why Cassandra performs another round of comparisons during a read process. When a client requests data with a particular primary key, Cassandra retrieves many versions of the row from one or more replicas. The version with the most recent timestamp is the only one returned to the client ("last-write-wins").
All the writes operation have a timestamp associated. In this case different node will have different version of same row. But during read operation Cassandra will pick row with latest timestamp. I hope this solves your query.
Related
We use a very simple key-value datamodel in Cassandra, and our partition key is in 17 SSTables. I would like to understand how read works in our concrete case.
If I undestand correctly, general Cassandra reads will need to search for the newest version of each column in the memtable and in different SSTables, until it retrieves all columns and merges them.
Since SSTables are sorted by time, and our data-model is single-column, Ideally our read operations should just hit the newest SSTable containing our partition key since this will contain the whole data.
Will our read operations hit the 17 SSTables? or just the newest one containing the searched partition key?
Cassandra will search all of them as it isn't sure which columns exist where (DML occurs at the cell level and because of that, variants can exist where reconciliation is performed). Reads are done at the partition level. However, Cassandra can filter out sstables if it knows the partition key doesn't exist in certain ones. That's why compaction is important for optimal reads - to remove the unnecessary cells.
Will our read operations hit the 17 SSTables? or just the newest one containing the searched partition key?
To add to Jim's answer, Cassandra has something called a bloom filter for this. Essentially, it's a probabilistic structure that can tell you one of two things:
The SSTable might contain the data requested.
OR
The SSTable definitely does not contain the data requested.
This should prevent Cassandra from having to scan all 17 SSTables. My advice would be to run a query with TRACING ON in cqlsh, and it'll tell you just how many SSTables it needed to look through.
I am using the Datasax nodejs driver for cassandra (https://github.com/datastax/nodejs-driver)
I am wondering if I can get the number of rows updated after executing an update statement.
Thanks,
Updates to Cassandra is treated as another insert. In other words, there is no read before write (update in this case) in Cassandra. The updates are just writes to another immutable sstable and during read Cassandra stitches them together. Its during read, the latest value for that column is choosen.
So in short, there isn't a way in Cassandra to deduce the number of rows updated.
In the Cassandra documentation here it says:
While STCS works well to compact a write-intensive workload, it makes reads slower because the merge-by-size process does not group data by rows. This makes it more likely that versions of a particular row may be spread over many SSTables.
1) What does 'group data by rows' mean? Aren't all rows for a partition already grouped?
2) How is it possible for a row to have multiple versions on a single node? Doesn't the upsert behavior ensure that only the latest version of a row is accessible via the memtable and partition indices? Isn't it true that when a row is updated and the memtable flushed, the partition indices are updated to point to the latest version? Then, on compaction, this latest version (because of the row timestamp) is the one that ends up in the compacted SSTable?
Note that I'm talking about a single node here - NOT the issue of replicas being out of sync.
Either this is incorrect or I am misunderstanding what that paragraph says.
Thanks!
OK, I think I found the answer myself - I would be grateful for any confirmation that this is correct.
A row may have many versions because updates/upserts can write only part of a row. Thus, the latest version of a complete row is made up of all the latest updates for all the columns in that row - which can be spread out across multiple SSTables.
My misunderstanding seemed to stem from the idea that the partition indices can only point to one location in one SSTable. If I relax this constraint, the statement in the doc makes sense. I must therefore assume that an index in the partition indices for a primary key can hold multiple locations for that key. Can someone confirm that all this is true?
Thanks.
I use WSO2BAM in version 2.3.0 where I defined a stream holding much amount of data in Cassandra datasource. Currently my Hive script processes all events from keyspace where 99% of data is unneccesary. And it takes disk space too.
My idea is to clear this data after it becomes unnecessary.
The format of stream is:
{"streamId":"kroki_i_kolejki_zlecen:1.0.0","name":"kroki_i_kolejki_zlecen","version":"1.0.0","nickName":"Kroki i kolejki zlecen","description":"Wyniki i daty zamkniecia zlecen","payloadData":[{"name":"casenum","type":"STRING"},{"name":"type_id","type":"STRING"},{"name":"id_zlecenie","type":"STRING"},{"name":"sid","type":"STRING"},{"name":"step_name","type":"STRING"},{"name":"proc_name","type":"STRING"},{"name":"step_desc","type":"STRING"},{"name":"audit_date","type":"STRING"},{"name":"audit_usecs","type":"STRING"},{"name":"user_name","type":"STRING"}]}
My intention is to delete data with the same column payload_id_zlecenie after I receive event with specific payload_type_id.
In relational database it would be equal to query:
delete from kroki_i_kolejki_zlecen where payload_id_zlecenie = [argument];
Is it possible to do?
In Hive you cannot delete Cassandra data according to my knowledge. The [1] link given by Inosh describes how to archive Cassandra records older than a specific time duration. (e.g. records older than 3 months) All the archived data will be stored in a column family with the postfix, "_arch". In that feature a custom analyzer is used inside the generated Hive script to delete Cassandra rows. And also note that deleted records will take about 10 days to completely delete entire rows with it's row key. Until that happens you will see some empty fields associated with the Cassandra row ID.
Inosh's [2] is the real solution for your problem. Once incremental processing is enabled, hive script will process only the Cassandra rows unprocessed in the previous hive script execution. That means, the Hive will aggregate the values processed in each execution and will keep them for future. The next time hive will use that value, and previously processed last timestamp and process all the records came after that timestamp. The new aggregated value and older aggregated value will be used to get the overall value.
[1] - http://docs.wso2.org/display/BAM240/Archive+Cassandra+Data
[2] - http://docs.wso2.org/pages/viewpage.action?pageId=32345660
You can use Cassandra data archival feature [1] to archive cassandra data.
Also refer to Incremental Analysis [2] which is a new feature released with BAM 2.4.0. Using that feature, received data can be analyzed incrementally, without processing all events in CFs.
[1] - http://docs.wso2.org/display/BAM240/Archive+Cassandra+Data
[2] - http://docs.wso2.org/pages/viewpage.action?pageId=32345660
Is there some timestamp/counter that can be used to validate that in a read-modify-write cycle, the data in the row did not change between reading and modifying?
In other words, can I read some kind of ID while reading the row, and when I write it back tell Cassandra what that ID was, and the write then fails if the ID changed since then? (Which amounts to saying that some other write took place after I read the data)
Each column in cassandra is a Tuple (or a triplet) that contains a name, value and a timestamp. The timestamp of the column represents the last time it was modified. If you have 100's of nodes, whichever node has an update with a the most recent timestamp will win. This is how Eventual Consistency is achieved.
zznate has a good presentation: Introduction to Apache Cassandra for Java Developers where this topic is referenced (slide 37)
Accessing timestamp of a Cassandra column
In summary, you don't need "some kind of ID" when you have the ability to retrieve the timestamp for a given column representing the last time it was modified. However, at scale, with 100's of nodes, how can you be sure that the node you are connecting to, has the most up to date column? (refer back to the zznate presentation)
Point is, you can't, without enabling transactions:
Cassandra - transaction support
Cassandra Transaction with ZooKeeper - Does this work?
how to integrate cassandra with zookeeper to support transactions
And many more: cassandra & transactions