Cassandra TTL not working - cassandra

I am using datastax with cassandra. I want a row to be automatically deleted after 15 minutes of its insertion. But the row still remains.
My code is below:
Insert insertStatement = QueryBuilder.insertInto(keySpace, "device_activity");
insertStatement.using(QueryBuilder.ttl(15* 60));
insertStatement.value("device", UUID.fromString(persistData.getSourceId()));
insertStatement.value("lastupdatedtime", persistData.getLastUpdatedTime());
insertStatement.value("devicename", persistData.getDeviceName());
insertStatement.value("datasourcename", persistData.getDatasourceName());
The table consist of 4 columns : device (uuid), datasourcename(text), devicename(text), lastupdatedtime (timestamp).
If I query the TTL of some field it shows me 4126 seconds which is wrong.
//Select TTL(devicename) from device_activity; // Gives me 4126 seconds

In the below link, the explanation of TTL is provided.
https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html
"TTL data has a precision of one second, as calculated on the server. Therefore, a very small TTL probably does not make much sense. Moreover, the clocks on the servers should be synchronized; otherwise reduced precision could be observed because the expiration time is computed on the primary host that receives the initial insertion but is then interpreted by other hosts on the cluster."
After reading this i could resolve by setting proper time on the corresponding node(machine.)

Related

Cassandra - IN or TOKEN query for querying an entire partition?

I want to query a complete partition of my table.
My compound partition key consists of (id, date, hour_of_timestamp). id and date are strings, hour_of_timestamp is an integer.
I needed to add the hour_of_timestamp field to my partition key because of hotspots while ingesting the data.
Now I'm wondering what's the most efficient way to query a complete partition of my data?
According to this blog, using SELECT * from mytable WHERE id = 'x' AND date = '10-10-2016' AND hour_of_timestamp IN (0,1,...23); is causing a lot of overhead on the coordinator node.
Is it better to use the TOKEN function and query the partition with two tokens? Such as SELECT * from mytable WHERE TOKEN(id,date,hour_of_timestamp) >= TOKEN('x','10-10-2016',0) AND TOKEN(id,date,hour_of_timestamp) <= TOKEN('x','10-10-2016',23);
So my question is:
Should I use the IN or TOKEN query for querying an entire partition of my data? Or should I use 23 queries (one for each value of hour_of_timestamp) and let the driver do the rest?
I am using Cassandra 3.0.8 and the latest Datastax Java Driver to connect to a 6 node cluster.
You say:
Now I'm wondering what's the most efficient way to query a complete
partition of my data? According to this blog, using SELECT * from
mytable WHERE id = 'x' AND date = '10-10-2016' AND hour_of_timestamp
IN (0,1,...23); is causing a lot of overhead on the coordinator node.
but actually you'd query 24 partitions.
What you probably meant is that you had a design where a single partition was what now consists of 24 partitions, because you add the hour to avoid an hotspot during data ingestion. Noting that in both models (the old one with hotspots and this new one) data is still ordered by timestamp, you have two choices:
Run 1 query at time.
Run 2 queries the first time, and then one at time to "prefetch" results.
Run 24 queries in parallel.
CASE 1
If you process data sequentially, the first choice is to run the query for the hour 0, process the data and, when finished, run the query for the hour 1 and so on... This is a straightforward implementation, and I don't think it deserves more than this.
CASE 2
If your queries take more time than your data processing, you could "prefetch" some data. So, the first time you could run 2 queries in parallel to get the data of both the hours 0 and 1, and start processing data for hour 0. In the meantime, data for hour 1 arrives, so when you finish to process data for hour 0 you could prefetch data for hour 2 and start processing data for hour 1. And so on.... In this way you could speed up data processing. Of course, depending on your timings (data processing and query times) you should optimize the number of "prefetch" queries.
Also note that the Java Driver does pagination for you automatically, and depending on the size of the retrieved partition, you may want to disable that feature to avoid blocking the data processing, or may want to fetch more data preemptively with something like this:
ResultSet rs = session.execute("your query");
for (Row row : rs) {
if (rs.getAvailableWithoutFetching() == 100 && !rs.isFullyFetched())
rs.fetchMoreResults(); // this is asynchronous
// Process the row ...
}
where you could tune that rs.getAvailableWithoutFetching() == 100 to better suit your prefetch requirements.
You may also want to prefetch more than one partition the first time, so that you ensure your processing won't wait on any data fetching part.
CASE 3
If you need to process data from different partitions together, eg you need both data for hour 3 and 6, then you could try to group data by "dependency" (eg query both hour 3 and 6 in parallel).
If you need all of them then should run 24 queries in parallel and then join them at application level (you already know why you should avoid the IN for multiple partitions). Remember that your data is already ordered, so your application level efforts would be very small.

Azure Sql Database replication - 9 1/2 hour delay + cant remove the replica

Note
This is getting quite long so I will try and re-edit parts through the day.
These databases are no long active, which means I can play with them to work out what is going wrong.
The only thing left to answer: Given two databases running on Azure Databases at S3 (100 DTU). Should any secondary ever get significantly behind the primary database? Even while the DTU is hammered to 100% for over half the day. The reason for the DTU being hammered being IO writes mostly.
The Start: a few problems.
DTU limits were hit on Monday, Tuesday and to some extent on Wednesday for a significant amount of time. 3PM UTC - 6AM UTC.
Problem 1 (lag in data on the secondary): This had appeared to have caused a lag of data in the secondary of about 9 1/2 hours. The servers were effectively being spammed with updates causing a lot of IO updates. 6-8 million records on one table for the 24 hour period for example. This problem drove the reason for the post:
Shouldn't these be much more in sync?
The data became out of sync on Monday morning and continued out of sync until Friday. On Thursday some new databases were started up to replace these standard SQL databases and so they were left to rot. Well, for me to experiment with at least.
The application causing the redundant queries couldn't be fixed for a few days (I'm doubting they will ever fix it now) so that leaves changing the instance type. That action was attempted on the current instance but, the instance must disconnect with all standard replicas to increase to the performance tier. This led to the second problem (see below). The replica taking its time to be removed. Began on Wednesday morning and did not complete until Friday.
Problem 2 (can't remove the replica):
(Solved itself after two days)
Disconnecting the secondary database process began ~ Wed 8UTC (when the primary was at about 80GBs). The secondary being about 11GB behind in size at this point.
Setup
The databases (primary and secondary) are S3 that is geo-replicated (North + West Europe).
It has an audit log table(which I read from the secondary - normally with an SQL query), but this is currently 9 1/2 hours behind the last entry for the primary database. Running the query again on the secondary a few seconds later it is slowly catching up, but appears to be relative to the refresh rather than playing catch-up.
Both primary and secondary (read-only) databases are S3 (about to be bumped to P2).
the azure documentation states:
Active Geo-Replication (readable secondaries) is now available for all databases in all service tiers. In April 2017, the non-readable secondary type will be retired and existing non-readable databases will automatically be upgraded to readable secondaries.
How has the secondary has got so far behind? seconds to minutes would be acceptable. Hours not so much. The link above describes it as slightly:
While at any given point, the secondary database might be slightly behind the primary database, the secondary data is guaranteed to always be transactionally consistent with changes committed to the primary database.
Given the secondary is about to be destroyed and replaced by a higher level (need to remove replicas when upgrading from standard -> premium). I'm curious to know if it will happen again as well as what the definition of slight might be in this instance?
Notes: The primary did reach maximum DTU for a few hours but didn't harm the performance significantly, which is where the 9-hour difference was noticed.
Stats
Update for TheGameiswar:
I can't query it right now as it started removing itself as a replica (to be able to move the primary up to the P2 level, but that began hours ago at ~8.30UTC and 5 hours later it is still going). I think it's quite broken now.
Query - nothing special:
SELECT TOP 1000 [ID]
,[DateCreated]
,[SrcID]
,[HardwareID]
,[IP_Port]
,[Action]
,[ResponseTime]
,[ErrorCode]
,[ExtraInfo]
FROM [dbo].[Audit]
order by datecreated desc
I can't compare the tables anymore as it's quite stuck and refusing connections.
The 586 hours (10-14GB) are inserts into the primary database audit table. It was similar yesterday when noticing the 9 1/2 hour difference in data.
When the attempt to remove the replica (another person stated the process) it had about 10GB difference in size.
Cant compare data but can show DB-size at equivalent time
Primary DB Size (last 24 hours):
Secondary DB Size (last 24 hours):
Primary database size - week view
Secondary database size - week view
As mentioned ... it is being removed as a replica... but is still playing catch up with the DB size if you observe the charts above.
Stop replication errored for serverName: ---------, databaseName: Cloud4
ErrorCode: 400
ErrorMessage: A terminate operation is already in progress for database 'Cloud4'.
Update 2 - Replication - dm_continuous_copy_status
Replication is still removing ... moving on...
select * from sys.dm_continuous_copy_status
sys.dm_exec_requests
Querying from Thursday
Appears to be quite empty. The only record being
Replica removed itself at last.
The replica has removed itself after 2 days. At the 80GB mark that I predicted. It waited to replay the data in the transactions (till the point it was removed as a replica) before it would remove the replica.
A Week after on the P2 databases
DTU is holding between 20-40% at busy periods and currently performing ~12 million data entries every 24 hours (a similar amount for reads, but writing is much worse on the indexes and the table). 70-100% inserts extra in a week. This time, the replica is not struggling to keep up, which is good but that is likely due to it not reaching 100% DTU.
Conclusion
The replicas are useful but not in this case. This one caused degraded performance for several days that could have been averted. A simple increase to the performance tier until the cause of the problem could be fixed. IF the replica looks like it is dragging behind and you are on the border of Basic -> Standard or Standard -> Performance it would be safe to remove the replica as soon as possible and increase to a different tier.
Now we are on P2. The database is increasing at 20GB a day... and they say they have fixed the problem that sends 15 thousand redundant updates per minute. Thanks to the query performance insight for highlight that as querying the table is extremely painful on the DTU (even querying the last minute of data in that table is bad on the DTU. ~15 thousand new records every minute).
62617: insert ...
62618: select ...
62619: select ...
A positive from the above is that it's moved from 586 hours combined time for the insert statements (7.5 million entry rows per day) on S3 to 3 hours on P2 (12.4 million row rows per day). An extremely significant decrease in processing time. It did start with an empty table on Thursday but that has surpassed the previous size in a week whereas the previous one took a few months to get there.
It's doing well on the new tier. It should be ~5% if the applications were using the database responsibly and the secondary is up to date.
Spoke too soon. Now on P2
Someone thought it was a good idea to run an SQL query that repeats itself that deletes a thousand rows at a time. 12 million new rows a day.
10AM - 12AM it's managed to remove about 5.2 million rows. Now the database is showing signs of being in the same state as last week. Im curious if that is what happened now.

Cassandra sstables accumulating

I've been testing out Cassandra to store observations.
All "things" belong to one or more reporting groups:
CREATE TABLE observations (
group_id int,
actual_time timestamp, /* 1 second granularity */
is_something int, /* 0/1 bool */
thing_id int,
data1 text, /* JSON encoded dict/hash */
data2 text, /* JSON encoded dict/hash */
PRIMARY KEY (group_id, actual_time, thing_id)
)
WITH compaction={'class': 'DateTieredCompactionStrategy',
'tombstone_threshold': '.01'}
AND gc_grace_seconds = 3600;
CREATE INDEX something_index ON observations (is_something);
All inserts are done with a TTL, and should expire 36 hours after
"actual_time". Something that is beyond our control is that duplicate
observations are sent to us. Some observations are sent in near real
time, others delayed by hours.
The "something_index" is an experiment to see if we can slice queries
on a boolean property without having to create separate tables, and
seems to work.
"data2" is not currently being written-- it is meant to be written by
a different process than writes "data1", but will be given the same
TTL (based on "actual_time").
Particulars:
Three nodes (EC2 m3.xlarge)
Datastax ami-ada2b6c4 (us-east-1) installed 8/26/2015
Cassandra 2.2.0
Inserts from Python program using "cql" module
(had to enable "thrift" RPC)
Running "nodetool repair -pr" on each node every three hours (staggered).
Inserting between 1 and 4 million rows per hour.
I'm seeing large numbers of data files:
$ ls *Data* | wc -l
42150
$ ls | wc -l
337201
Queries don't return expired entries,
but files older than 36 hours are not going away!
The large number SSTables is probably caused by the frequent repairs you are running. Repair would normally only be run once a day or once a week, so I'm not sure why you are running repair every three hours. If you are worried about short term downtime missing writes, then you could set the hint window to three hours instead of running repair so frequently.
You might have a look at CASSANDRA-9644. This sounds like it is describing your situation. Also CASSANDRA-10253 might be of interest.
I'm not sure why your TTL isn't working to drop old SSTables. Are you setting the TTL on a whole row insert, or individual column updates? If you run sstable2json on a data file, I think you can see the TTL values.
Full disclosure: I have a love/hate relationship with DTCS. I manage a cluster with hundreds of terabytes of data in DTCS, and one of the things it does absolutely horribly is streaming of any kind. For that reason, I've recommended replacing it ( https://issues.apache.org/jira/browse/CASSANDRA-9666 ).
That said, it should mostly just work. However, there are parameters that come into play, such as timestamp_resolution, that can throw things off if set improperly.
Have you checked the sstable timestamps to ensure they match timestamp_resolution (default: microseconds)?

Max value for TTL in cassandra

What is the maximum value we can assign to TTL ?
In the java driver for cassandra TTL is set as a int. Does that mean it is limited to Integer.MAX (2,147,483,647 secs) ?
The maximum TTL is actually 20 years. From org.apache.cassandra.db.ExpiringCell:
public static final int MAX_TTL = 20 * 365 * 24 * 60 * 60; // 20 years in seconds
I think this is verified along both the CQL and Thrift query paths.
I don't know why do you need this but default TTL is null in Cassandra which means it won't be deleted until you force.
One very powerful feature that Cassandra provides is the ability to
expire data that is no longer needed. This expiration is very flexible
and works at the level of individual column values. The time to live
(or TTL) is a value that Cassandra stores for each column value to
indicate how long to keep the value.
The TTL value defaults to null, meaning that data that is written will
not expire.
https://www.oreilly.com/library/view/cassandra-the-definitive/9781491933657/ch04.html
20 years max TTL is not correct anymore. As per Cassandra news read this notice:
PLEASE READ: MAXIMUM TTL EXPIRATION DATE NOTICE (CASSANDRA-14092)
The maximum expiration timestamp that can be represented by the
storage engine is 2038-01-19T03:14:06+00:00, which means that inserts
with TTL thatl expire after this date are not currently supported. By
default, INSERTS with TTL exceeding the maximum supported date are
rejected, but it's possible to choose a different expiration overflow
policy. See CASSANDRA-14092.txt for more details.
Prior to 3.0.16 (3.0.X) and 3.11.2 (3.11.x) there was no protection
against INSERTS with TTL expiring after the maximum supported date,
causing the expiration time field to overflow and the records to
expire immediately. Clusters in the 2.X and lower series are not
subject to this when assertions are enabled. Backed up SSTables can be
potentially recovered and recovery instructions can be found on the
CASSANDRA-14092.txt file.
If you use or plan to use very large TTLS (10 to 20 years), read
CASSANDRA-14092.txt for more information.

RPC timeout in cqlsh - Cassandra

I have 5 nodes in my ring with SimpleTopologyStrategy and replication_factor=3. I inserted 1M rows using stress tool . When am trying to read the row count in cqlsh using
SELECT count(*) FROM Keyspace1.Standard1 limit 1000000;
It fails with error:
Request did not complete within rpc_timeout.
It fetches for limit 100000. Fails even for 500000.
All my nodes are up. Do I need to increase the rpc_timeout?
Please help.
You get this error because the request is timing out on the server side. One should know that this is a very expensive operation in Cassandra as others have pointed out.
Still, if you really want to do this you should update your /etc/cassandra/cassandra.yaml file and change the range_request_timeout_in_ms parameter. This will be valid for all your range queries.
Example to set a 40 second timeout:
range_request_timeout_in_ms: 40000
You will probably have to adjust at the client side as well. When using cqlsh as a client this is accomplished by creating/updating your configuration file for cqlsh under ~/.cassandra/cqlshrc and add the client_timeout parameter to the connection section.
Example to set a 40 second timeout:
[connection]
client_timeout=40
It takes a long time to read in 1M rows so that is probably why it is timing out. You shouldn't use count like this, it is very expensive since it has to read all the data. Use Cassandra counters if you need to count lots of items.
You should also check your Cassandra logs to confirm there aren't any other issues - sometimes exceptions in Cassandra lead to timeouts on the client.
If you can live with an approximate row count, take a look at this answer to Row count of a column family in Cassandra.

Resources