How to interpret cassandra consistency from tracing a query - cassandra

Could you advice even though my query do not have option
IF EXISTS and IF NOT EXISTS.
still in query tracing result shows both consistency_level "QUORUM" which is what we wanted
but it also shows 'serial_consistency_level': 'SERIAL', what is this behavior
session_id | client | command | coordinator | duration | parameters | request | started_at
--------------------------------------+-------------+---------+-------------+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------+---------------------------------
278a2000-3dfb-11e9-b459-9775e6c46fc6 | 10.244.*.* | QUERY | 10.244.*.* | 3338 |
`
`{'bound_var_0_stream_id': '''3c17230d-ea24-4ff7-9599-352fef883b31''',
'bound_var_1_property_name': '''Location:rxRSSI''',
'bound_var_2_shard_date': '2019-03-03T00:00:00.000Z',
'bound_var_3_time': '2019-03-03T21:27:30.749Z',
'bound_var_4_source_id': '''fe30653c-467f-401a-9646-67b10378e1c9''',
'bound_var_5_time_lag': '1328',
'bound_var_6_property_class': '''java.lang.Integer''',
'bound_var_7_property_type': '''ByteType''',
'bound_var_8_property_value': '''-44''',
'consistency_level': 'LOCAL_QUORUM',
'page_size': '5000',
'query': 'INSERT INTO "cloudleaf"."stream_48" ("stream_id", "property_name", "shard_date", "time", "source_id", "time_lag", "property_class", "property_type", "property_value")
VALUES (:"stream_id", :"property_name", :"shard_date", :"time", :"source_id", :"time_lag", :"property_class", :"property_type", :"property_value")
USING TTL 432000',
'serial_consistency_level': 'SERIAL'}

IF EXISTS and IF NOT EXISTS trigger a lightweight transaction, which can have one of the two consistency levels SERIAL or LOCAL_SERIAL. Those are defined as follows:
SERIAL:
Achieves linearizable consistency for lightweight transactions by preventing unconditional updates. This consistency level is only for use with lightweight transaction. Equivalent to QUORUM.
LOCAL_SERIAL:
Same as SERIAL but confined to the datacenter. A conditional write must be written to the commit log and memtable on a quorum of replica nodes in the same datacenter. Same as SERIAL but used to maintain consistency locally (within the single datacenter). Equivalent to LOCAL_QUORUM.
see: https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlConfigSerialConsistency.html

Related

Kappa architecture - conceptual question about historical data processing

This is a question about building a pipeline for data-analytics in a kappa architecture. The question is conceptional.
Assume you have a system that emits events, for simplicity let's assume you just have two events CREATED and DELETED which tell that an item get's created or deleted at a given point in time. Those events contain an id and a timestamp. An item will get created and deleted again after a certain time. Assume the application ensures correct order of events and prevents duplicate events and no event is emitted with the exact same timestamp.
The metrics that should be available in data analytics are:
Current amount of items
Amount of items as graph over the last week
Amount of items per day as historical data
Now a proposal for an architecture for such a scenario would be like this:
Emit events to Kafka
Use kafka as short term storage
Use superset to display live data directly on kafka with presto
Use spark to consume kafka events to write aggregations to analytics Postgres db
Schematically it would look like this:
Application
|
| (publish events)
↓
Kafka [topics: item_created, item_deleted]
| ↑
| | (query short-time)
| |
| Presto ←-----------┐
| |
| (read event stream) |
↓ |
Spark |
| |
| (update metrics) |
↓ |
Postgres |
↑ |
| (query) | (query)
| |
└-----Superset-----┘
Now this data-analytics setup should be used to visualise historical and live data. Very important to note is that in this case the application can have already a database with historical data. To make this work when starting up the data analytics first the database is parsed and events are emitted to kafka to transfer the historical data. Live data can come at any time and will also be progressed.
An idea to make the metric work is the following. With the help of presto the events can easily be aggregated through the short term memory of kafka itself.
For historical data the idea could be to create a table Items that with the schema:
--------------------------------------------
| Items |
--------------------------------------------
| timestamp | numberOfItems |
--------------------------------------------
| 2021-11-16 09:00:00.000 | 0 |
| 2021-11-17 09:00:00.000 | 20 |
| 2021-11-18 09:00:00.000 | 5 |
| 2021-11-19 09:00:00.000 | 7 |
| 2021-11-20 09:00:00.000 | 14 |
Now the idea would that the spark program (which would need of course to parse the schema of the topic messages) and this will assess the timestamp check in which time-window the event falls (in this case which day) and update the number by +1 in case of a CREATED or -1 in case of a DELTED event.
The question I have is whether this is a reasonable interpretation of the problem in a kappa architecture. In startup it would mean a lot of read and writes to the analytics database. There will be multiple spark workers to update the analytics database in parallel and the queries must be written such that it's all atomic operations and not like read and then write back because the value might have been altered in the meanwhile by another spark node. What could be done to make this process efficient? How would it be possible to prevent kafka being flooded in the startup process?
Is this an intended use case for spark? What would be a good alternative for this problem?
In terms of data-throughput assume like 1000-10000 of this events per day.
Update:
Apparently spark is not intended to be used like this as it can be seen from this issue.
Apparently spark is not intended to be used like this
You don't need Spark, or at least, not completely.
Kafka Streams can be used to move data between various Kafka topics.
Kafka Connect can be used to insert/upsert into Postgres via JDBC Connector.
Also, you can use Apache Pinot for indexed real-time and batch/historical analytics from Kafka data rather than having Presto just consume and parse the data (or needing a separate Postgres database only for analytical purposes)
assume like 1000-10000 of this events per day
Should be fine. I've worked with systems that did millions of events, but were mostly written to Hadoop or S3 rather than directly into a database, which you could also have Presto query.

How to select max timestamp in a partition using Cassandra

I have a problem modeling my data using Cassandra. I would like to use it as an event store. My events have creation timestamp. Those event belong to a partition which is identified by an id.
Now I'd like to see most recent event for each id and then filter this ids according to the timestamp.
So I have something like this:
ID | CREATION_TIMESTAMP | CONTENT
---+---------------------------------+----------------
1 | 2018-11-09 12:15:45.841000+0000 | {SOME_CONTENT}
1 | 2018-11-09 12:15:55.654656+0000 | {SOME_CONTENT}
2 | 2018-11-09 12:15:35.982354+0000 | {SOME_CONTENT}
2 | 2018-11-09 12:35:25.321655+0000 | {SOME_CONTENT}
2 | 2018-11-09 13:15:15.068498+0000 | {SOME_CONTENT}
I tried grouping by partition id and querying for max of creation_timestamp but that is not allowed and I should specify partition id using EQ or IN. Additional reading led me to believe that this is entirely wrong way of approaching this problem but I don't know whether NoSQL is not suitable tool for the job or I am simply approaching this problem from wrong angle?
You can easily achieve this by having your CREATION_TIMESTAMP as clustering column and ordered DESC. Then you would query by your id and using limit 1 (which will return the most recent event since the data is order DESC in that partition key).
can you please share your table definition .
by looking at your data you can use ID as partition key and CREATION_TIMESTAMP as clustering column.
you can use select MAX(CREATION_TIMESTAMP) from keyspace.table where ID='value';

Frequent Spikes in Cassandra write latency

In Production cluster , the Cluster Write latency frequently spikes from 7ms to 4Sec. Due to this clients face a lot of Read and Write Timeouts. This repeats in every few hours.
Observation:
Cluster Write latency (99th percentile) - 4Sec
Local Write latency (99th percentile) - 10ms
Read & Write consistency - local_one
Total nodes - 7
I tried to enable trace using settraceprobability for few mins and observed that mostly of the time is taken in internode communication
session_id | event_id | activity | source | source_elapsed | thread
--------------------------------------+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------------+---------------+----------------+------------------------------------------
4267dca2-bb79-11e8-aeca-439c84a4762c | 429c3314-bb79-11e8-aeca-439c84a4762c | Parsing SELECT * FROM table1 WHERE uaid = '506a5f3b' AND messageid >= '01;' | cassandranode3 | 7 | SharedPool-Worker-47
4267dca2-bb79-11e8-aeca-439c84a4762c | 429c5a20-bb79-11e8-aeca-439c84a4762c | Preparing statement | Cassandranode3 | 47 | SharedPool-Worker-47
4267dca2-bb79-11e8-aeca-439c84a4762c | 429c5a21-bb79-11e8-aeca-439c84a4762c | reading data from /Cassandranode1 | Cassandranode3 | 121 | SharedPool-Worker-47
4267dca2-bb79-11e8-aeca-439c84a4762c | 42a38610-bb79-11e8-aeca-439c84a4762c | REQUEST_RESPONSE message received from /cassandranode1 | cassandranode3 | 40614 | MessagingService-Incoming-/Cassandranode1
4267dca2-bb79-11e8-aeca-439c84a4762c | 42a38611-bb79-11e8-aeca-439c84a4762c | Processing response from /Cassandranode1 | Cassandranode3 | 40626 | SharedPool-Worker-5
I tried checking the connectivity between Cassandra nodes but did not see any issues. Cassandra logs are flooded with Read timeout exceptions as this is a pretty busy cluster with 30k reads/sec and 10k writes/sec.
Warning in the system.log:
WARN [SharedPool-Worker-28] 2018-09-19 01:39:16,999 SliceQueryFilter.java:320 - Read 122 live and 266 tombstone cells in system.schema_columns for key: system (see tombstone_warn_threshold). 2147483593 columns were requested, slices=[-]
During the spike the cluster just stalls and simple commands like "use system_traces" command also fails.
cassandra#cqlsh:system_traces> select * from sessions ;
Warning: schema version mismatch detected, which might be caused by DOWN nodes; if this is not the case, check the schema versions of your nodes in system.local and system.peers.
Schema metadata was not refreshed. See log for details.
I validated the schema versions on all nodes and its the same but looks like during the issue time Cassandra is not even able to read the metadata.
Has anyone faced similar issues ? any suggestions ?
(from data from your comments above) The long full gc pauses can definitely cause this. Add -XX:+DisableExplicitGC you are getting full GCs because of calls to system.gc which is most likely from a silly DGC rmi thing that gets called at regular intervals regardless of if needed. With the larger heap that is VERY expensive. It is safe to disable.
Check your gc log header, make sure min heap size is not set. I would recommend setting -XX:G1ReservePercent=20

Sum aggregation for each columns in cassandra

I have a Data model like below,
CREATE TABLE appstat.nodedata (
nodeip text,
timestamp timestamp,
flashmode text,
physicalusage int,
readbw int,
readiops int,
totalcapacity int,
writebw int,
writeiops int,
writelatency int,
PRIMARY KEY (nodeip, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC)
where, nodeip - primary key and timestamp - clustering key (Sorted by descinding oder to get the latest),
Sample data in this table,
SELECT * from nodedata WHERE nodeip = '172.30.56.60' LIMIT 2;
nodeip | timestamp | flashmode | physicalusage | readbw | readiops | totalcapacity | writebw | writeiops | writelatency
--------------+---------------------------------+-----------+---------------+--------+----------+---------------+---------+-----------+--------------
172.30.56.60 | 2017-12-08 06:13:07.161000+0000 | yes | 34 | 57 | 19 | 27 | 8 | 89 | 57
172.30.56.60 | 2017-12-08 06:12:07.161000+0000 | yes | 70 | 6 | 43 | 88 | 79 | 83 | 89
This is properly available and whenever I need to get the statistics I am able to get the data using the partition key like below,
(The above logic seems similar to my previous question : Aggregation in Cassandra across partitions) but expectation is different,
I have value for each column (like readbw, latency etc.,) populated for every one minute in all the 4 nodes.
Now, If I need to get the max value for a column (Example : readbw), It is possible using the following query,
SELECT max(readbw) FROM nodedata WHERE nodeip IN ('172.30.56.60','172.30.56.61','172.30.56.60','172.30.56.63') AND timestamp < 1512652272989 AND timestamp > 1512537899000;
1) First question : Is there a way to perform max aggregation on all nodes of a column (readbw) without using IN query?
2) Second question : Is there a way in Cassandra, whenever I insert the data in Node 1, Node 2, Node 3 and Node 4.
It needs to be aggregated and stored in another table. So that I will collect the aggregated value of each column from the aggregated table.
If any of my point is not clear, please let me know.
Thanks,
Harry
If you are dse Cassandra you can enable spark and write the aggregation queries
Disclaimer. In your question you should define restrictions to query speed. Readers do not know whether you're trying to show this in real time, or is it more for analytical purposes. It's also not clear on how much data you're operating and the answers might depend on that.
Firstly decide whether you want to do aggregation on read or write. This largely depends on your read/write patterns.
1) First question: (aggregation on read)
The short answer is no - it's not possible. If you want to use Cassandra for this, the best approach would be doing aggregation in your application by reading each nodeip with timestamp restriction. That would be slow. But Cassandra aggregations are also potentially slow. This warning exists for a reason:
Warnings :
Aggregation query used without partition key
I found C++ Cassandra driver to be the fastest option, if you're into that.
If your data size allows, I'd look into using other databases. Regular old MySQL or Postgres will do the job just fine, unless you have terabytes of data. There's also influx DB if you want a more exotic one. But I'm getting off-topic here.
2) Second question: (aggregation on write)
That's the approach I've been using for a while. Whenever I need some aggregations, I would do them in memory (redis) and then flush to Cassandra. Remember, Cassandra is super efficient at writing data, don't be afraid to create some extra tables for your aggregations. I can't say exactly how to do this for your data, as it all depends on your requirements. It doesn't seem feasible to provide results for arbitrary timestamp intervals when aggregating on write.
Just don't try to put large sets of data into a single partition. You're better of with traditional SQL databases then.

Cassandra's lightweight transactions & Paxos consensus algorithm

I have a very particular question regarding Paxos algorithm, which is implemented in Cassandra's lightweight transactions:
What happens if two nodes issue the same proposal at the same time? Do them both get '[applied]: true' ?
For example, consider this table:
ids:
+-------------------+---------------+
| id_name (varchar) | next_id (int) |
+-------------------+---------------+
| person_id | 1 |
+-------------------+---------------+
And this query:
UPDATE ids
SET next_id = 2
WHERE id_name = 'person_id'
IF next_id = 1
If I execute this query, I get as a response:
[{[applied]: True}]
If I execute it again, which then it won't be accepted since next_id != 1, I get:
[{[applied]: False, next_id: 2}]
My question is - what happens if I execute this query from two nodes in parallel. Is there a chance that they both get accepted?
(My use case is described in this stackoverflow question)
The effect of Paxos, is that queries get "linearized": 2 queries executed at the same time on the same row on 2 different nodes will result in one of them being executed after the other. And the second will not be applied. Obviously, both queries must use CAS for this to work. More info here and here.
It's not possible that both queries would be executed concurrently. For each query a proposal is created that will be used to reach consensus based on paxos. This will happen based on the timestamp associated with the proposal where an identical timestamp will still make on of the two proposals fail.

Resources