Divide data to increase read performance - cassandra

we have 8 cassandra nodes in 2 DC configuration and we have around 250k record insertion per second and our ttl is 1 year , Recently we upgraded to 3.0.14 . Our main problem is slow read performance .
to improve this we increased compaction period from 1 hour to 1 week .( Full compaction still needs to run ) But now we are thinking to install new cassandra nodes and divide data to 2 pieces ( 6 months period each ) and new casandra nodes can be responsible only querying first 6 months .Is this possible to move data like this ? is the procedure to follow ? thanks

Related

Adding a new node in the topology after the given time interval

I am writing an algorithm for which I want to add new nodes in the topology after every 1 minute for 5 minutes. Initially the topology contains 5 nodes, so after 5 minutes it should have total 10 nodes. How can I implement this in the simulation script. Which behaviour will be best suited to do this ?

Can we modify the number of tablets/shards per table after we create a universe?

As described in this example each tserver started with 12 tablet as we set number of shards to 4.
And when we added a new node the number of tablet per tserver became 9. it seems the total number of tablet, which is 36, will not increase.
My question is:
How many node could we add while we have 36 total tablet(in this example)?
And Is it possible to increase shards count in a running universe to be able to add more node?
How many node could we add while we have 36 total tablet(in this example)?
In this example, you can expand to 12 nodes (each node would end up with 1 leader and 2 followers).
Reasoning: There are 36 total tablets for this table and the replication factor is 3. So there will 12 tablet leaders and 24 tablet followers. Leaders are responsible for handling writes and reads (unless you're doing follower reads, lets assume that is not the case). If you go to 12 nodes, each node would at least have one leader and be doing some work.
Ideally, you should create enough tablets upfront so that you end up with 4 tablets per node eventually.
And Is it possible to increase shards count in a running universe to be able to add more node?
This is currently not possible, but being worked on and getting close to the finish. expected to be released in Q1 2020. If you are interested in this feature, please subscribe to this GitHub issue for updates.
Until that is ready, as a workaround, you can split the table into sufficient number of tablets.

How to avoid selecting too many data

What we are doing is pretty much like
putting time series data into cassandra
running an spark aggregation job every hour and put aggregated data back to cassandra
One of the problems we found is, if the hourly job does not succeed, for example, continuously, 1 AM ~ 2 AM, 2 AM ~ 3 AM, 3 AM ~ 4 AM (or more), then next time, it'll aggregate the data from 1 AM to 5 AM (last success time is recorded in cassandra). The issue comes at this hour, because it's now 4 (or more) hours data, and it's way larger than one hour data which then results in an OutofMemory exception by selecting too many data from cassandra into dataframe.
Well, adding memory to spark executor is a way fixing this. However, considering it's an edge issue, I'm wondering if there's any mature pattern or architecture to deal with this issue.

DateTieredCompactionStrategy sub-properties in Cassandra

Have some questions on DateTieredCompactionStrategy sub-properties in Cassandra
The blog http://www.datastax.com/dev/blog/datetieredcompactionstrategy, says:
Base_time_seconds: “This is the size of the first window, defaults to 3600 seconds (1 hour). The rest of the windows will be min_threshold (default 4) times the size of the previous window.”
With default value of 3600 i.e., 1hr for base_time_seconds does it mean first compaction triggers at 1st hour, next at 4,16, 64 hours and so on?
max_window_size_seconds: Default 1 day. Does it mean my compaction is run atleast once in a day?
tombstone_compaction_interval: Default 10 days.
If my sstable is say 7 day old, but is full of expired data due to ttl 1 day and GC_grace_sec of 1 day. Does it mean that still my sstables are not removed?
Does tombstone_compaction_interval take priority over ttl and GC_grace_sec
min_threshold: When compaction is run, and no, of sstables is < min_threshold, then my compaction is not run?
No - DTCS finds the sstables within one of those windows (1h, 4h, ..) and if it thinks it needs to compact them together (iirc for the first window it has to be more than min_threshold, for the rest 2 or more), it will.
No. The number of compactions is only depending on the number of flushed/streamed sstables. max window size is just to make sure we don't get huge older windows which hurt when bootstrapping/streaming etc.
No, with DTCS you should not touch the tombstone_compaction_interval - the whole idea is that once the whole sstable is expired, the entire thing will get dropped automatically without compaction.
Correct, but it is per window, so you could have 100 sstables in separate windows with DTCS
Note that DTCS is deprecated and you should really be using TWCS instead. If you use cassandra < 3.0, you can just build the jar file and drop it in the lib directory to use it. https://github.com/jeffjirsa/twcs https://issues.apache.org/jira/browse/CASSANDRA-9666

Azure Sql Database replication - 9 1/2 hour delay + cant remove the replica

Note
This is getting quite long so I will try and re-edit parts through the day.
These databases are no long active, which means I can play with them to work out what is going wrong.
The only thing left to answer: Given two databases running on Azure Databases at S3 (100 DTU). Should any secondary ever get significantly behind the primary database? Even while the DTU is hammered to 100% for over half the day. The reason for the DTU being hammered being IO writes mostly.
The Start: a few problems.
DTU limits were hit on Monday, Tuesday and to some extent on Wednesday for a significant amount of time. 3PM UTC - 6AM UTC.
Problem 1 (lag in data on the secondary): This had appeared to have caused a lag of data in the secondary of about 9 1/2 hours. The servers were effectively being spammed with updates causing a lot of IO updates. 6-8 million records on one table for the 24 hour period for example. This problem drove the reason for the post:
Shouldn't these be much more in sync?
The data became out of sync on Monday morning and continued out of sync until Friday. On Thursday some new databases were started up to replace these standard SQL databases and so they were left to rot. Well, for me to experiment with at least.
The application causing the redundant queries couldn't be fixed for a few days (I'm doubting they will ever fix it now) so that leaves changing the instance type. That action was attempted on the current instance but, the instance must disconnect with all standard replicas to increase to the performance tier. This led to the second problem (see below). The replica taking its time to be removed. Began on Wednesday morning and did not complete until Friday.
Problem 2 (can't remove the replica):
(Solved itself after two days)
Disconnecting the secondary database process began ~ Wed 8UTC (when the primary was at about 80GBs). The secondary being about 11GB behind in size at this point.
Setup
The databases (primary and secondary) are S3 that is geo-replicated (North + West Europe).
It has an audit log table(which I read from the secondary - normally with an SQL query), but this is currently 9 1/2 hours behind the last entry for the primary database. Running the query again on the secondary a few seconds later it is slowly catching up, but appears to be relative to the refresh rather than playing catch-up.
Both primary and secondary (read-only) databases are S3 (about to be bumped to P2).
the azure documentation states:
Active Geo-Replication (readable secondaries) is now available for all databases in all service tiers. In April 2017, the non-readable secondary type will be retired and existing non-readable databases will automatically be upgraded to readable secondaries.
How has the secondary has got so far behind? seconds to minutes would be acceptable. Hours not so much. The link above describes it as slightly:
While at any given point, the secondary database might be slightly behind the primary database, the secondary data is guaranteed to always be transactionally consistent with changes committed to the primary database.
Given the secondary is about to be destroyed and replaced by a higher level (need to remove replicas when upgrading from standard -> premium). I'm curious to know if it will happen again as well as what the definition of slight might be in this instance?
Notes: The primary did reach maximum DTU for a few hours but didn't harm the performance significantly, which is where the 9-hour difference was noticed.
Stats
Update for TheGameiswar:
I can't query it right now as it started removing itself as a replica (to be able to move the primary up to the P2 level, but that began hours ago at ~8.30UTC and 5 hours later it is still going). I think it's quite broken now.
Query - nothing special:
SELECT TOP 1000 [ID]
,[DateCreated]
,[SrcID]
,[HardwareID]
,[IP_Port]
,[Action]
,[ResponseTime]
,[ErrorCode]
,[ExtraInfo]
FROM [dbo].[Audit]
order by datecreated desc
I can't compare the tables anymore as it's quite stuck and refusing connections.
The 586 hours (10-14GB) are inserts into the primary database audit table. It was similar yesterday when noticing the 9 1/2 hour difference in data.
When the attempt to remove the replica (another person stated the process) it had about 10GB difference in size.
Cant compare data but can show DB-size at equivalent time
Primary DB Size (last 24 hours):
Secondary DB Size (last 24 hours):
Primary database size - week view
Secondary database size - week view
As mentioned ... it is being removed as a replica... but is still playing catch up with the DB size if you observe the charts above.
Stop replication errored for serverName: ---------, databaseName: Cloud4
ErrorCode: 400
ErrorMessage: A terminate operation is already in progress for database 'Cloud4'.
Update 2 - Replication - dm_continuous_copy_status
Replication is still removing ... moving on...
select * from sys.dm_continuous_copy_status
sys.dm_exec_requests
Querying from Thursday
Appears to be quite empty. The only record being
Replica removed itself at last.
The replica has removed itself after 2 days. At the 80GB mark that I predicted. It waited to replay the data in the transactions (till the point it was removed as a replica) before it would remove the replica.
A Week after on the P2 databases
DTU is holding between 20-40% at busy periods and currently performing ~12 million data entries every 24 hours (a similar amount for reads, but writing is much worse on the indexes and the table). 70-100% inserts extra in a week. This time, the replica is not struggling to keep up, which is good but that is likely due to it not reaching 100% DTU.
Conclusion
The replicas are useful but not in this case. This one caused degraded performance for several days that could have been averted. A simple increase to the performance tier until the cause of the problem could be fixed. IF the replica looks like it is dragging behind and you are on the border of Basic -> Standard or Standard -> Performance it would be safe to remove the replica as soon as possible and increase to a different tier.
Now we are on P2. The database is increasing at 20GB a day... and they say they have fixed the problem that sends 15 thousand redundant updates per minute. Thanks to the query performance insight for highlight that as querying the table is extremely painful on the DTU (even querying the last minute of data in that table is bad on the DTU. ~15 thousand new records every minute).
62617: insert ...
62618: select ...
62619: select ...
A positive from the above is that it's moved from 586 hours combined time for the insert statements (7.5 million entry rows per day) on S3 to 3 hours on P2 (12.4 million row rows per day). An extremely significant decrease in processing time. It did start with an empty table on Thursday but that has surpassed the previous size in a week whereas the previous one took a few months to get there.
It's doing well on the new tier. It should be ~5% if the applications were using the database responsibly and the secondary is up to date.
Spoke too soon. Now on P2
Someone thought it was a good idea to run an SQL query that repeats itself that deletes a thousand rows at a time. 12 million new rows a day.
10AM - 12AM it's managed to remove about 5.2 million rows. Now the database is showing signs of being in the same state as last week. Im curious if that is what happened now.

Resources