When read-your-own-writes can fail? - cassandra

I use RL/WL=QUORUM and send two updates, is it possible that next SELECT reads my first update, in some circumstances?
CREATE TABLE aggr(
id int,
mysum int,
PRIMARY KEY(id)
)
INSERT INTO aggr(id, mysum) VALUES(1, 2)
INSERT INTO aggr(id, mysum) VALUES(1, 3)
SELECT mysum FROM aggr WHERE id=1 -- expect mysum=3 here, but is it a must?
As I can judge from here it is possible even to lost part of the second update if two updates come within same timestamp.
If I work around timestamp problem, can I be sure that I always read what I wrote last time?

No, assuming your using client side monotonic timestamps (current default, wasn't in past). But it is possible with other settings. I am assuming here that its a single client issuing those two writes. If the 2 inserts are coming from two different servers it all depends on their timestamps.
This is the default for java driver 3.x but if using a version of cassandra pre CQL3 (2.0) you need to provide them with USING TIMESTAMP in your query since the protocol didn't support it. Otherwise the two writes can go to different coordinators, and if the coordinators have clock drift between them the 1st insert may be considered "newer" than the 2nd. With client side timestamps though (should be the default on your driver if using new versions) thats not the case.

If you do your updates synchronously with CL=QUORUM the second update will always overwrite the first one. A lower consistency level on any of the requests would not guarantee this.

Related

Cassandra get latest entry for each element contained within IN clause

So, I have a Cassandra CQL statement that looks like this:
SELECT * FROM DATA WHERE APPLICATION_ID = ? AND PARTNER_ID = ? AND LOCATION_ID = ? AND DEVICE_ID = ? AND DATA_SCHEMA = ?
This table is sorted by a timestamp column.
The functionality is fronted by a REST API, and one of the filter parameters that they can specify to get the most recent row, and then I appent "LIMIT 1" to the end of the CQL statement since it's ordered by the timestamp column in descending order. What I would like to do is allow them to specify multiple device id's to get back the latest entries for. So, my question is, is there any way to do something like this in Cassandra:
SELECT * FROM DATA WHERE APPLICATION_ID = ? AND PARTNER_ID = ? AND LOCATION_ID = ? AND DEVICE_ID IN ? AND DATA_SCHEMA = ?
and still use something like "LIMIT 1" to only get back the latest row for each device id? Or, will I simply have to execute a separate CQL statement for each device to get the latest row for each of them?
FWIW, the table's composite key looks like this:
PRIMARY KEY ((application_id, partner_id, location_id, device_id, data_schema), activity_timestamp)
) WITH CLUSTERING ORDER BY (activity_timestamp DESC);
IN is not recommended when there are a lot of parameters for it and under the hood it's making reqs to multiple partitions anyway and it's putting pressure on the coordinator node.
Not that you can't do it. It is perfectly legal, but most of the time it's not performant and is not suggested. If you specify limit, it's for the whole statement, basically you can't pick just the first item out from partitions. The simplest option would be to issue multiple queries to the cluster (every element in IN would become one query) and put a limit 1 to every one of them.
To be honest this was my solution in a lot of the projects and it works pretty much fine. Basically coordinator would under the hood go to multiple nodes anyway but would also have to work more for you to get you all the requests, might run into timeouts etc.
In short it's far better for the cluster and more performant if client asks multiple times (using multiple coordinators with smaller requests) than to make single coordinator do to all the work.
This is all in case you can't afford more disk space for your cluster
Usual Cassandra solution
Data in cassandra is suggested to be ready for query (query first). So basically you would have to have one additional table that would have the same partitioning key as you have it now, and you would have to drop the clustering column activity_timestamp. i.e.
PRIMARY KEY ((application_id, partner_id, location_id, device_id, data_schema))
double (()) is intentional.
Every time you would write to your table you would also write data to the latest_entry (table without activity_timestamp) Then you can specify the query that you need with in and this table contains the latest entry so you don't have to use the limit 1 because there is only one entry per partitioning key ... that would be the usual solution in cassandra.
If you are afraid of the additional writes, don't worry , they are inexpensive and cpu bound. With cassandra it's always "bring on the writes" I guess :)
Basically it's up to you:
multiple queries - a bit of refactoring, no additional space cost
new schema - additional inserts when writing, additional space cost
Your table definition is not suitable for such use of the IN clause. Indeed, it is supported on the last field of the primary key or the last field of the clustering key. So you can:
swap your two last fields of the primary key
use one query for each device id

Spark: Continuously reading data from Cassandra

I have gone through Reading from Cassandra using Spark Streaming and through tutorial-1 and tutorial-2 links.
Is it fair to say that Cassandra-Spark integration currently does not provide anything out of the box to continuously get the updates from Cassandra and stream them to other systems like HDFS?
By continuously, I mean getting only those rows in a table which have changed (inserted or updated) since the last fetch by Spark. If there are too many such rows, there should be an option to limit the number of rows and the subsequent spark fetch should begin from where it left off. At-least once guarantee is ok but exactly-once would be a huge welcome.
If its not supported, one way to support it could be to have an auxiliary column updated_time in each cassandra-table that needs to be queried by storm and then use that column for queries. Or an auxiliary table per table that contains ID, timestamp of the rows being changed. Has anyone tried this before?
I don't think Apache Cassandra has this functionality out of the box. Internally [for some period of time] it stores all operations on data in sequential manner, but it's per node and it gets compacted eventually (to save space). Frankly, Cassandra's (as most other DB's) promise is to provide latest view of data (which by itself can be quite tricky in distributed environment), but not full history of how data was changing.
So if you still want to have such info in Cassandra (and process it in Spark), you'll have to do some additional work yourself: design dedicated table(s) (or add synthetic columns), take care of partitioning, save offset to keep track of progress, etc.
Cassandra is ok for time series data, but in your case I would consider just using streaming solution (like Kafka) instead of inventing it.
I agree with what Ralkie stated but wanted to propose one more solution if you're tied to C* with this use case. This solution assumes you have full control over the schema and ingest as well. This is not a streaming solution though it could awkwardly be shoehorned into one.
Have you considered using composite key composed of the timebucket along with a murmur_hash_of_one_or_more_clustering_columns % some_int_designed_limit_row_width? In this way, you could set your timebuckets to 1 minute, 5 minutes, 1 hour, etc depending on how "real-time" you need to analyze/archive your data. The murmur hash based off of one or more of the clustering columns is needed to help located data in the C* cluster (and is a terrible solution if you're often looking up specific clustering columns).
For example, take an IoT use case where sensors report in every minute and have some sensor reading that can be represented as an integer.
create table if not exists iottable {
timebucket bigint,
sensorbucket int,
sensorid varchar,
sensorvalue int,
primary key ((timebucket, sensorbucket), sensorid)
} with caching = 'none'
and compaction = { 'class': 'com.jeffjirsa.cassandra.db.compaction.TimeWindowedCompaction' };
Note the use of TimeWindowedCompaction. I'm not sure what version of C* you're using; but with the 2.x series, I'd stay away from DateTieredCompaction. I cannot speak to how well it performs in 3.x. Any any rate, you should test and benchmark extensively before settling on your schema and compaction strategy.
Also note that this schema could result in hotspotting as it is vulnerable to sensors that report more often than others. Again, not knowing the use case it's hard to provide a perfect solution -- it's just an example. If you don't care about ever reading C* for a specific sensor (or column), you don't have to use a clustering column at all and you can simply use a timeUUID or something random for the murmur hash bucketing.
Regardless of how you decide to partition the data, a schema like this would then allow you to use repartitionByCassandraReplica and joinWithCassandraTable to extract the data written during a given timebucket.

cassandra - how to update decimal column by adding to existing value

I have a cassandra table that looks like the following:
create table position_snapshots_by_security(
securityCode text,
portfolioId int,
lastUpdated date,
units decimal,
primary key((securityCode), portfolioId)
)
And I would like to something like this:
update position_snapshots_by_security
set units = units + 12.3,
lastUpdated = '2017-03-02'
where securityCode = 'SPY'
and portfolioId = '5dfxa2561db9'
But it doesn't work.
Is it possible to do this kind of operation in Cassandra? I am using version 3.10, the latest one.
Thank you!
J
This is not possible in Cassandra (any version) because it would require a read-before-write (anti-)pattern.
You can try the counter columns if they suit your needs. You could also try to caching/counting at application level.
You need to issue a read at application level otherwise, killing your cluster performance.
Cassandra doesn't do a read before a write (except when using Lightweight Transactions) so it doesn't support operations like the one you're trying to do which rely on the existing value of a column. With that said, it's still possible to do this in your application code with Cassandra. If you'll have multiple writers possibly updating this value, you'll want to use the aforementioned LWT to make sure the value is accurate and multiple writers don't "step on" each other. Basically, the steps you'll want to follow to do that are:
Read the current value from Cassandra using a SELECT. Make sure you're doing the read with a consistency level of SERIAL or LOCAL_SERIAL if you're using LWTs.
Do the calculation to add to the current value in your application code.
Update the value in Cassandra with an UPDATE statement. If using a LWT you'll want to do UPDATE ... IF value = previous_value_you_read.
If using LWTs, the UPDATE will be rejected if the previous value that you read changed while you were doing the calculation. (And you can retry the whole series of steps again.) Keep in mind that LWTs are expensive operations, particularly if the keys you are reading/updating are heavily contended.
Hope that helps!

Is it possible to specify the WRITETIME in a Cassandra INSERT command?

I am having a problem where a few INSERT commands are viewed as being send simultaneously on the Cassandra side when my code clearly does not send them simultaneously. (When you get a little congestion on the network, then the problem happens, otherwise, everything works just fine.)
What I am thinking would solve this problem is a way for me to be able to specify the WRITETIME myself. From what I recall, that was possible in thrift, but maybe not (i.e. we could read it for sure.)
So something like this (to simulate the TTL):
INSERT INTO table_name (a, b, c) VALUES (1, 2, 3) USING WRITETIME = 123;
The problem I'm facing is overwriting the same data and once in a while the update is ignored because it ends up with the same or even an older timestamp (probably because it is sent to a different node and the time of each node is slightly different and since the C++ process uses threads, it can be send before/after without your control...)
The magic syntax you're looking for is:
INSERT INTO tbl (col1, col2) VALUES (1,2) USING TIMESTAMP 123456789000
Be very cautious using this approach - make sure you use the right units (microseconds, typically).
You can override the meaning of time stamps in some cases - it's a sneaky trick we've used in the past to do clever things like first-write-wins and even stored leaderboard values in the TIMESTAMP field so highest score would be persisted, but you should REALLY understand the concept before trying these (deletes become nontrivial)

DateTieredCompaction without Timestamp Col

I think this question: Does DateTieredCompactionStrategy work with composite keys? is essentially the same question, but I would like to confirm..
My table (simplified) looks like:
CREATE TABLE foo (
name text,
textId text,
message text
PRIMARY KEY
((name),textId))
WITH CLUSTERING ORDER BY (textId ASC)
We see here we have no TIMESTAMP column, which to confirm is in fact not needed for DataTieredCompaction, (since this strategy leverages the actual write time, correct?)
The textId does in fact have 'TS' encoded in, which is close the actual writeTimeStamp, but will vary a little bit.
Are use case is, all data is inserted (never updated), with a TTL.
In most cases we expect the records to be deleted prior to TTL. Does this sound like an appropriate use for DateTieredCompaction? If not why not?
I've started running some tests, and it does appear to working so far, but will require longer runs (In particular I'm trying to avoid an issue we saw before..where we saw performance plummet after x amount time, which I believe is either due heavy compaction and/or repair processes (we were running on 3 node cluster with LevelTieredCompaction), but have moved up to a 5 node cluster, in the hopes repair will only slow down 2 of the nodes at a time, leaving a full quorum operational at ~100% performance. (In fact for our use case, we may have a dedicated cluster with repair turned off, and replication factor 1 (we self replicate to other DCs)
Thoughts?
ps. We are running cassandra 2.1

Resources