I have a message queue table, with messages that are sent from user A to user B, but not yet read by user B. After user B reads a message, it gets transferred from queue table to history table. (delete, insert) As I understood queue table represents an anti-pattern.
What is the right way to approach this problem? Having just one table with additional column like (ReadStatus)? But then we run into problems related to partition keys (queue partition key is enough - ReceiverId & history partition key should be - (SenderId, ReceiverId)).
You are right in knowing that Cassandra is not a good fit with queue datasets.
There's a way of modeling your data so you it doesn't suffer from having to iterate over tombstones (deleted or expired rows) by grouping data into tables and then truncating the table when you're done processing the data.
I learnt the technique from Ryan Svihla's blog post, Understanding deletes. Cheers!
Related
I have a table to store messages which are failed to process and I am retrying to process messages every 5 minutes through scheduler.
When message gets processed successfully, respective row from table is deleted, so that same message should not get processed again.
To fetch rows from table query is SELECT * FROM <table_name> , due to which we are facing tombstone issues if large number of rows gets deleted.
Table have timestamp as partition key and message_name(TEXT) as clustering key, TTL of 7 days and gc_grace_second of 2 days
As per my requirement, I need to delete records otherwise duplicate record will get processed. Is there any solution to avoid tombstone issues?
So I see two problems here.
Cassandra is being used as a queuing mechanism, which is an established anti-pattern.
All partitions are being queried with SELECT * FROM <table_name>, because there isn't a WHERE clause.
So with Cassandra, some data models and use cases will generate tombstones. At that point, there's not a whole lot to be done, except to design the data model so as to not query them.
So my thought here, would be to partition the table differently.
CREATE TABLE messages (
day TEXT,
message_time TIMESTAMP,
message_text TEXT,
PRIMARY KEY ((day),message_time))
WITH CLUSTERING ORDER BY (message_time DESC);
With this model, you can query all messages for a particular day. You can also run a range query on day and message_time. Ex:
SELECT * FROM messages
WHERE day='20210827'
AND message_time > '2021-08-27 04:00';
This will build a result set of all messages since 2021-08-27 04:00. Any tombstones generated outside of the requested time range (in this case, before 04:00) will not be queried.
Note that (based on the delete pattern) you could still have tombstones within the given time range. But the idea here, is that the WHERE clause limits the "blast radius," so querying a smaller number of tombstones shouldn't be a problem.
Unfortunately, there isn't a quick fix to your problem.
The challenge for you is that you're using Cassandra as a queue and it isn't a good idea because you run exactly into that tombstone hell. I'm sure you've seen this blog post by now that talks queues and queue-like datasets being an anti-pattern for Cassandra.
It is possible to avoid generating lots of tombstones if you model your data differently in buckets with each bucket mapping to a table. When you're done processing all the items in the bucket, TRUNCATE the table. This idea came from Ryan Svihla in his blog post Understanding Deletes where he goes through the idea of "partitioning tables". Cheers!
I am trying to extract data from a table as part of a migration job.
The schema is as follows:
CREATE TABLE IF NOT EXISTS ${keyspace}.entries (
username text,
entry_type int,
entry_id text,
PRIMARY KEY ((username, entry_type), entry_id)
);
In order to query the table we need the partition keys, the first part of the primary key.
Hence, if we know the username and the entry_type, we can query the table.
In this case the username can be whatever, but the entry_type is an integer in the range 0-9.
When doning the extraction we iterate the table 10 times for every username to make sure we try all versions of entry_type.
We can no longer find any entries as we have depleted our list of usernames. But our nodetool tablestats report that there is still data left in the table, gigabytes even. Hence we assume the table is not empty.
But I cannot find a way to inspect the table to figure out what usernames remains in the table. If I could inspect it I could add the usernames left in the table to our extraction job and eventually we could deplete the table. But I cannot simply query the table as such:
SELECT * FROM ${keyspace}.entries LIMIT 1
as cassandra requires the partition keys to make meaningful queries.
What can I do to figure out what is left in our table?
As per the comment, the migration process includes a DELETE operation from the Cassandra table, but the engine will have a delay before actually removing from disk the affected records; this process is controlled internally with tombstones and the gc_grace_seconds attribute of the table. The reason for this delay is fully explained in this blog entry, for a tl dr, if the default value is still in place, Cassandra will need to pass at least 10 days (864,000 seconds) from the execution of the delete before the actual removal of the data.
For your case, one way to proceed is:
Ensure that all your nodes are "Up" and "Healthy" (UN)
Decrease the gc_grace_seconds attribute of your table, in the example, it will set it to 1 minute, while the default is
ALTER TABLE .entries with GC_GRACE_SECONDS = 60;
Manually compact the table:
nodetool compact entries
Once that the process is completed, nodetool tablestats should be up to date
To answer your first question, I would like to put more light on gc_grace_seconds property.
In Cassandra, data isn’t deleted in the same way it is in RDBMSs. Cassandra is designed for high write throughput, and avoids reads-before-writes. So in Cassandra, a delete is actually an update, and updates are actually inserts. A “tombstone” marker is written to indicate that the data is now (logically) deleted (also known as soft delete). Records marked tombstoned must be removed to claim back the storage space. Which is done by a process called Compaction. But remember that tombstones are eligible for physical deletion / garbage collection only after a specific number of seconds known as gc_grace_seconds. This is a very good blog to read more in detail : https://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html
Now possibly you are looking into table size before gc_grace_seconds and data is still there.
Coming to your second issue where you want to fetch some samples from the table without providing partition keys. You can analyze your table content using Spark. The Spark Cassandra Connector allows you to create Java applications that use Spark to analyze database data. You can follow the articles / documentation to write a quick handy spark application to analyze Cassandra data.
https://www.instaclustr.com/support/documentation/cassandra-add-ons/apache-spark/using-spark-to-sample-data-from-one-cassandra-cluster-and-write-to-another/
https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/spark/sparkJavaApi.html
I would recommend not to delete records while you do the migration. Rather first complete the migration and post that do a quick validation / verification to ensure all records are migrated successfully (this use can easily do using Spark buy comparing dataframes from old and new tables). Post successful verification truncate the old table as truncate does not create tombstones and hence more efficient. Note that huge no of tombstone is not good for cluster health.
I have a table in Cassandra where i am storing events as they are coming in , different processing are done on the events at different stages. The events are entered into the table with the event occurrence time. I need to get all the events whose event time is less than a certain time and do some processing on them. As its a select range query and its invariably will use scatter gather. Can some one suggest best way to do this. This process is going to happen in every 5 secs and scatter gather happening in Cassandra happening frequently is not a good idea as its an overhead on Cassandra itself which will degrade my overall application Performance.
The table is as below:
PAS_REQ_STAGE (PartitionKey = EndpointID, category ; clusterkey= Automation_flag,alertID)
AlertID
BatchPickTime: Timestamp
Automation_Threshold
ResourceID
ConditionID
category
Automation_time: Timestamp
Automation_flag
FilterValue
Eventtime which i have referred above is the BatchPickTime..
A scheduler wakes up at regular interval and gets all the records whose BatchPickTime is Less than the current scheduler wakeup time and sweeps them off from the table to process them.
Because of this usecase i cannot provide any specific Partition key for the query as it will have to get all data which has expired and is less than the current scheduler wake-up time.
Hi and welcome to Stackoverflow.
Please post your schema and maybe some example code with your question - you can edit it :)
The Cassandra-way of doing this is to denormalize data if necessary and build your schema around your queries. In your case I would suggest putting your events in to a table together with a time bucket:
CREATE TABLE events (event_source int, bucket timestamp,
event_time timestamp, event_text text PRIMARY KEY ((event_source, bucket),event_time));
The reason for this is that it is very efficent in cassandra to select a row by its so called partition key (in this example (event_source, bucket)) as such a query hits only one node. The reminder of the primary key is called clustering columns and defines the order of data, here all events for a day inside the bucket are sorted by event_time.
Try to model your event table in a way that you do not need to make multiple queries. There is a good and free data modeling course from DataStax available: https://academy.datastax.com/resources/ds220-data-modeling
One note - be careful when using cassandra as queue - this is maybe an antipattern and you might be better of with a message queue as ActiveMQ or RabbitMQ or similar.
The Cassandra database is not very good for aggregation and that is why I decided to do the aggregation before write. I am storing some data (eg. transaction) for each user which I am aggregating by hour. That means for one user there will be only one row for each our.
Whenever I receive new data, I read the row for current hour, aggregate it with received data and write it back.I use this data to generate hourly reports.
This works fine with low velocity data but I observed considerably high data loss when velocity is very high (eg 100 records for 1 user in a min). This is because reads and writes are happening very fast and because of "delayed write", I am not getting updated data.
I think my approach "aggregate before write" itself is wrong. I was thinking about UDF but I am not sure how will it impact on performance.
What is the best way to store aggregated data in Cassandra ?
My idea would be:
Model data in Cassandra on hour-by-hour buckets.
Store plain data into Cassandra immediately when they arrive.
Process at X all the data of the X-1 hour and store the aggregate result on another table
This would allow you to have very fast incoming rates, process data only once, store the aggregates into another table to have fast reads.
I use Cassandra to pre-aggregate also. I have different tables for hourly, daily, weekly, and monthly. I think you are probably getting data loss as you are selecting the data before your last inserts have replicated to other nodes.
Look into the counter data type to get around this.
You may also be able to specify a higher consistency level in either the inserts or selects to ensure you're getting the most recent data.
Imaging something like a blog posting system, built using Azure Storage Table.
A user posts a message and the database records user's Region, City and Language along with it.
After that, a user is able to browse all other user's posts and able to filter them by any combination of Region, City and Language. Or neither and see all of them.
I see several solutions:
Put each message in 8 different partitions with combinations of Region-City-Language (pros: lightning fast point queries on read; cons: 8 transactions per message on write).
Put each message in 4 different partitions with combinations of Region-City and an ability to do partition scan to filter by languages (pros: less transactions than (1); cons: partition scan, 4 transactions per message).
Put each message in partitions, based on user's ID (pros: single transaction per message; cons: slow table scan and partition scan after that).
The way i see it:
Fast reads, slow (and perhaps costly) writes.
Balanced reads/writes/cost.
Fast writes, slow (but cheap) reads.
By "cost/cheap" i mean pricing based on transactions (not space).
And by "balanced" i mean just among these variants.
Thought about using index tables, but can't see how they help here.
So the question is, perhaps there is another, better way?
I've decided to go with a variation of (1).
The difference is that i won't be storing ALL of combinations for Region-Location-Language. Instead i decided to store only uniques:
Table: FiltersByRegion
----------------------
Partition: Region
Row: Location.Language
Prop: Message
Table: FiltersByRegionPlace
---------------------------
Partition: Region.Location
Row: Language
Prop: Message
Table: FiltersByRegionLanguage
------------------------------
Partition: Region.Language
Row: Location
Prop: Message
Table: FiltersByLanguage
------------------------
Partition: Language
Row: Region.Location
Prop: Message
Because of the fact that i'm storing only uniques there won't be a lot of transactions per every post. Only those, that are not already present in database.
In other words, if there are a lot of posts from the same region-location-language, filter tables won't be updated and transactions won't be spent. Tests for uniques could use Redis to speed things a bit.
Filtering is now only a matter of picking the right table.
It depends on your scenarios and read/write pattern. You might want to consider some aspects:
Design for how the records will be queried. Putting them into a "Region-City-Language" partition with message ID as entity data may help in your fast query.
Each message may have a unique message ID and ID-Message mappings are saved in other tables, then every time you only need to update one table when a message is updated and the message ID referenced in other tables keeps unchanged.
Leverage ParitionKey and RowKey in your table design and query entities with both keys. For instance: "Region-City-Language" as partition keys and "User" as row keys.
Consider storing duplicate copies of entities for query scenarios. For example, if you have heavy users based and language based queries, you may consider have two tables with "user" and "language" as keys respectively.
Please also refer to https://azure.microsoft.com/en-us/documentation/articles/storage-table-design-guide/ for a full guide.