Cassandra debug log analysis

Cassandra debug log analysis - cassandra

I have a cassandra debug.log. It has a lot of SELECT * queries that are not fired by any application. Applications request specific fields in SELECT queries, also the queries seem to have a LIMIT 5000 clause which I am pretty sure is not there in any application. Are these queries fired by cassandra internally? The debug log is filled with such queries. The application uses gocql driver to connect to cassandra.
<SELECT * FROM table_name WHERE id = 0 LIMIT 5000>, was slow 45 times: avg/min/max 4969/4925/4996 msec - slow timeout 500 msec/cross-node
DEBUG [ScheduledTasks:1] 2021-01-14 18:02:33,271 MonitoringTask.java:152 - 160 operations timed out in the last 5004 msecs:
<SELECT * FROM table_name WHERE id = abcd LIMIT 5000>, total time 7038 msec, timeout 5000 msec/cross-node
<SELECT * FROM table_name WHERE id = efgh LIMIT 5000>, total time 5793 msec, timeout 5000 msec/cross-node
<SELECT * FROM table_name WHERE id = hijk LIMIT 5000>, total time 5289 msec, timeout 5000 msec/cross-node
<SELECT * FROM table_name WHERE id = lmnop LIMIT 5000>, total time 5826 msec, timeout 5000 msec/cross-node
<SELECT * FROM table_name WHERE id = qrst LIMIT 5000>, total time 6006 msec, timeout 5000 msec/cross-node
<SELECT * FROM table_name WHERE id = uvwx LIMIT 5000>, total time 5905 msec, timeout 5000 msec/cross-node
<SELECT * FROM table_name WHERE id = yzabc LIMIT 5000>, total time 5217 msec, timeout 5000 msec/cross-node
.
..
....
.....
... (110 were dropped)

All those queries are coming from your application. They are not done by Cassandra.
Those messages from MonitoringTask are logged by a feature in Cassandra 3.10+ called slow query logging (CASSANDRA-12403). I've previously explained it in this post -- https://community.datastax.com/questions/7835/.
The slow query logging aggregates queries which took longer than slow_query_log_timeout_in_ms (default is 500ms) into groups of 5-second "windows". As part of the aggregation, columns are not enumerated in the logging and are instead replaced with an asterisk (*) so they can be easily grouped.
In addition, drivers have paging enabled. When your application does not set a page size, the drivers default to a page size of 5000 (LIMIT 5000). This is the limit which gets logged in the slow query message you posted. Cheers!

Related

How to limit the number of results by flexible search in hybris

I want to limit the number of result using JobSearchRestriction. I want to limit not by a condition, but by "hard coded" number. Somethig like "LIMIT 10".
Is it possible to do that in hybris using JobSearchRestriction?

Try this
SELECT * FROM {Product} LIMIT 10
or
SELECT TOP 10 * FROM {Product}
For Oracle
SELECT * FROM {Product} WHERE rownum <= 10
Though API
final FlexibleSearchQuery query = new FlexibleSearchQuery("SELECT * FROM {Product}");
query.setCount(10);

If you run a scan on DynamoDB with an AttributesToGet argument are you charged for the data footprint of every item or just the requested attributes?

Suppose you run the following code on a table with 1,000 items that are 400KB in size, and suppose that the attribute name for 'column1' + the actual data are 10 bytes:
import boto3
def get_column_1_items():
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('DynamoTable')
resp = table.scan(AttributesToGet=['column1'])
return resp['Items']
Will you be charged for retrieving 1000 * 400 KB = 400 MB of data retrieval, or for retrieving 1,000 * 10B = 10KB by running this query?

Based on the doc,
Note that AttributesToGet has no effect on provisioned throughput consumption. DynamoDB determines capacity units consumed based on item size, not on the amount of data that is returned to an application.
You will be charged for retrieving 400 MB of data.
Also be aware that a single Scan request can retrieve a maximum of 1 MB of data. So in order to retrieve 400 MB of data, you need multiple requests.

cassandra aggregate query timeout

I am new to Cassandra and running User-defined Aggregate on a Cassandra 3-node cluster on local machine.
Issue is that when i am running this aggregate on a smaller data set, result is fine and as expected.
But when data is too large, query fails with error -
OperationTimedOut: errors={'127.0.0.1': 'Client request timeout. See Session.exe
cute_async'}, last_host=127.0.0.1
I found bellow questions similar to my issue but those are not answered. Find link to Other questions -
How to set a timeout and throttling rate for a large user defined aggregate query
Cassandra CQLSH OperationTimedOut error=Client request timeout. See Session.execute[_async](timeout)
I have modified cassandra.yaml and time limits are -
read_request_timeout_in_ms: 555000
range_request_timeout_in_ms: 10000
write_request_timeout_in_ms: 2000
counter_write_request_timeout_in_ms: 5000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 10000
But this did not help me. Please guide what is the correct configuration for these timings in order to run the same query on large data set without query-timeout.
Aggregate code -
CREATE FUNCTION countSessions(datamap map<text,int>,host text)
RETURNS NULL ON NULL INPUT
RETURNS map<text, int>
LANGUAGE java as
'
Integer countValue = (Integer)datamap.get(host);
if(countValue == null) {
countValue = 1;
} else {
countValue++;
} datamap.put(host,countValue);
return datamap;
';
CREATE OR REPLACE AGGREGATE hostaggregate(text)
SFUNC countSessions
STYPE map<text, int>
INITCOND {};
Thanks and regards,
Vibhav
PS - If anybody chooses to down-vote this question, please do mention the reason for the same in comments.

Azure SQL 100% DTU usage

i'm having some problems with the reliability of Azure SQL servers.
sometimes doing complicated queries with subqueries like the following:
SELECT DISTINCT [DeviceName] ,name ,data.[Addr] ,[Signal] FROM (SELECT [DeviceName] ,[Signal] ,MAX([Signal]) OVER (PARTITION BY [Addr]) AS 'MaxSignal',[Timestamp] ,[Addr] ,[PartitionId] ,[EventEnqueuedUtcTime] FROM [dbo].[mytable] WHERE CAST([Timestamp] AS DATETIME) > DATEADD(HOUR,+2,(DATEADD(MINUTE, -10, GETDATE()))) ) data LEFT JOIN mytable ON [dbo].[myreftable].[Addr] = data.[Addr] WHERE [Signal] = [MaxSignal];
Is done in almost an instant, like i would assume, at other times simply doing a SELECT COUNT(*) FROM mytable
Is taking upwards of 30 minutes, and showing a DTU usage graph like this:
Anyone know any solutions to this? is it me doing something completely wrong? or is Azure simply not there yet?

what you pay is what you get.You will need to look at what are the top resources consumers in your system.DTU is nothing, but a limit on CPU,IO,Memory available to your database..
so to troubleshoot DTU problems,I would follow below steps..
1.)Below query gives me Resource usage for last 14 days for all resources..
SELECT
(COUNT(end_time) - SUM(CASE WHEN avg_cpu_percent > 80 THEN 1 ELSE 0 END) * 1.0) / COUNT(end_time) AS 'CPU Fit Percent'
,(COUNT(end_time) - SUM(CASE WHEN avg_log_write_percent > 80 THEN 1 ELSE 0 END) * 1.0) / COUNT(end_time) AS 'Log Write Fit Percent'
,(COUNT(end_time) - SUM(CASE WHEN avg_data_io_percent > 80 THEN 1 ELSE 0 END) * 1.0) / COUNT(end_time) AS 'Physical Data Read Fit Percent'
FROM sys.dm_db_resource_stats
running above query gives you an idea on how much is your CPU percentage consistently
2.) Below query gives me an idea of resource usage over time..
SELECT start_time, end_time,
(SELECT Max(v) FROM (VALUES (avg_cpu_percent), (avg_physical_data_read_percent), (avg_log_write_percent)) AS value(v)) as [avg_DTU_percent]
FROM sys.resource_stats where database_name = ‘<your db name>’ order by end_time desc
Now that ,i have enough data to look out which metric is more resource intensive,I can follow normal approach of trying to troubleshoot..
Say for Example,if my CPU usage is above 90% consistently over time,I will gather all the queries which are consuming more CPU and try to fine tune them

How to improve cassandra 3.0 read performance and throughput using async queries?

I have a table:
CREATE TABLE my_table (
user_id text,
ad_id text,
date timestamp,
PRIMARY KEY (user_id, ad_id)
);
The lengths of the user_id and ad_id that I use are not longer than 15 characters.
I query the table like this:
Set<String> users = ... filled somewhere
Session session = ... builded somewhere
BoundStatement boundQuery = ... builded somewhere
(using query: "SELECT * FROM my_table WHERE user_id=?")
List<Row> rowAds =
users.stream()
.map(user -> session.executeAsync(boundQuery.bind(user)))
.map(ResultSetFuture::getUninterruptibly)
.map(ResultSet::all)
.flatMap(List::stream)
.collect(toList());
The Set of users has aproximately 3000 elements , and each users has aproximately 300 ads.
This code is excecuted in 50 threads in the same machine, (with differents users), (using the same Session object)
The algorithm takes between 2 and 3 seconds to complete
The Cassandra cluster has 3 nodes, with a replication factor of 2. Each node has 6 cores and 12 GB of ram.
The Cassandra nodes are in 60% of their CPU capacity, 33% of ram, 66% of ram (including page cache)
The querying machine is 50% of it's cpu capacity, 50% of ram
How do I improve the read time to less than 1 second?
Thanks!
UPDATE:
After some answers(thank you very much), I realized that I wasn' t doing the queries in parallel, so I changed the code to:
List<Row> rowAds =
users.stream()
.map(user -> session.executeAsync(boundQuery.bind(user)))
.collect(toList())
.stream()
.map(ResultSetFuture::getUninterruptibly)
.map(ResultSet::all)
.flatMap(List::stream)
.collect(toList());
So now the queries are being done in parrallel, this gave me times of aprox 300 milliseconds, so great improvement there!.
But my question continues, can it be faster?
Again, thanks!

users.stream()
.map(user -> session.executeAsync(boundQuery.bind(user)))
.map(ResultSetFuture::getUninterruptibly)
.map(ResultSet::all)
.flatMap(List::stream)
.collect(toList());
A remark. On the 2nd map() you're calling ResultSetFuture::getUninterruptibly. It's a blocking call so you don't benefit much from asynchronous exec ...
Instead, try to transform a list of Futures returned by the driver (hint: ResultSetFuture is implementing the ListenableFuture interface of Guava) into a Future of List
See: http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/util/concurrent/Futures.html#successfulAsList(java.lang.Iterable)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Cassandra debug log analysis - cassandra

Related

How to limit the number of results by flexible search in hybris

If you run a scan on DynamoDB with an AttributesToGet argument are you charged for the data footprint of every item or just the requested attributes?

cassandra aggregate query timeout

Azure SQL 100% DTU usage

How to improve cassandra 3.0 read performance and throughput using async queries?

Categories

Resources