how can I increase response window size of cassendra datastax dev centre from 300 to 1000 - cassandra

Screenshot of DataStax DevCenter
How can I increase response window size (highlighted in image) from 300 to 1000 in DataStax DevCenter?

Notice at the very top of your image, it says "with limit" followed by a text box containing the number "300." Try increasing that to 1000.
Also, how many glusr_ids are you specifying with your IN clause? Judging by the size of the window, it looks like a lot. Multi-key queries are considered to be anti-patterns, because of all the extra network traffic they create. That might be why it's taking 3384ms to return just 300 rows.

Related

Datastax DSBulk Utility giving errors on load CSV data to Astra

I am migrating data from EC2 Cassandra Nodes to DataStax Astra (Premium Account) using DSBulk utility.
Command used:
dsbulk load -url folder_created_during_unload -header true -k keyspace -t table -b "secure-connect-file.zip" -u username -p password
This command gives error after a few seconds. On checking the documentation, i found that i can add --executor.maxPerSecond in this command to limit the loading.
After this, the load command executed without any error. But if i enter a value over 15,000, the load command starts giving the error again.
Now, if a table has over 100M entries and 15,000 entries are migrated every second, it would hours and hours to complete the migration of one table. The complete database would take several days to migrate.
I want to understand what is causing this error and if there is a way to load the data at a higher speed.
What's happening here, is that DSBulk is running into the rate limit on the database. At the moment, it looks like the only way to increase that rate limit is to submit a ticket to support.
To submit a ticket, look for the "Other Resources" section of the Astra Dashboard's left nav. Click "Get Support" on the bottom.
When the "Help Center" pops up, click "Create Request" in the lower right corner.
On the next page, click the green/cyan "Submit a Ticket" button in the upper right corner. Describe the problem you're having (rate limit) along with what DSBulk outputs when set for more than 15k/sec.
To add to Aaron's response, you are hitting the default limit of 4K operations per second on your Astra DB.
We contacted you directly last week when we detected that you were hitting the limit but haven't heard back. I've reached out to you directly again today to let you know that I've logged a request on your behalf to increase the limit on your DB. Cheers!

Hazelcast management center shows get latency of 0 ms for replicated map

Setup :
3 member embedded cluster deployed as a spring boot jar.
Total keys on each member: 900K
Get operation is being attempted via a rest api.
Background:
I am trying to benchmark the replicated map of hazelcast.
Management center UI shows around 10k/s request being executed but avg get latency per sec is coming 0ms.
I believe it is not showing because it might be in microseconds.
Please let me know how to configure management center UI to show latency in micro/nanoseconds?
Management center UI shows around 10k/s request being executed but avg get latency per sec is coming 0ms.
I believe you're talking about Replicated Map Throughput Statistics in the replicated map details page. The Avg Get Latency column in that table shows on average how much time it took for a cluster member to execute the get operations for the time period that is selected on the top right corner of the table. For example, if you select Last Minute there, you only see the average time it took for the get operations in the last minute.
I believe it is not showing because it might be in microseconds.
Cluster is sending it as milliseconds (calculating it as nanoseconds in a newer cluster version but still sending as milliseconds). However, since a replicated map replicates all data on all members and every member contains the whole data set, get latency is typically very low as there's no network trip.
I guess that the way we render very small metric values confused you. In Management Center UI, we only show two fractional digits. You can see it in action in the below screenshots:
As you can see, since the value is very low, it is shown as 0. I believe we can do a better job rendering these values though (using a smaller time unit for example). I will create an issue for this on our private issue tracker.

Nodetool load and own stats

We are running 2 nodes in a cluster - replication factor 1.
After writing a burst of data, we see the following via node tool status.
Node 1 - load 22G (owns 48.2)
Node 2 - load 17G (owns 51.8)
As the payload size per record is exactly equal - what could lead to a node showing higher load despite lower ownership?
Nodetool status uses the Owns column to indicate the effective percentage of the token range owned by the nodes. While GB is Size of your records
Dont see anything wrong here. Your data is almost evenly distributed around your two nodes which is exactly what you want for perfekt performance.

Liferay: huge DLFileRank table

I have a Liferay 6.2 server that has been running for years and is starting to take a lot of database space, despite limited actual content.
Table Size Number of rows
--------------------------------------
DLFileRank 5 GB 16 million
DLFileEntry 90 MB 60,000
JournalArticle 2 GB 100,000
The size of the DLFileRank table sounds to me as abnormally big (if it is totally normal please let me know).
While the file ranking feature of Liferay is nice to have, we would not really mind resetting it if it halves the size of the database.
Question: Would a DELETE * FROM DLFileRank be safe? (stop Liferay, run that SQL command, maybe set dl.file.rank.enabled=false in portal-ext.properties, start Liferay again)
Is there any better way to do it?
Bonus if there is a way to keep recent ranking data and throw away only the old data (not a strong requirement).
Wow. According to the documentation here (Ctrl-F rank), I'd not have expected the number of entries to be so high - did you configure those values differently?
Set the interval in minutes on how often CheckFileRankMessageListener
will run to check for and remove file ranks in excess of the maximum
number of file ranks to maintain per user per file. Defaults:
dl.file.rank.check.interval=15
Set this to true to enable file rank for document library files.
Defaults:
dl.file.rank.enabled=true
Set the maximum number of file ranks to maintain per user per file.
Defaults:
dl.file.rank.max.size=5
And according to the implementation of CheckFileRankMessageListener, it should be enough to just trigger DLFileRankLocalServiceUtil.checkFileRanks() yourself (e.g. through the scripting console). Why you accumulate that large number of files is beyond me...
As you might know, I can never be quoted by stating that direct database manipulation is the way to go - in fact I refuse thinking about the problem from that way.

Write Request metric

I'm currently using 1-node cluster with DataStax Opscenter 5.2.1 (Cassandra 2.2.3) installed on Windows.
There is not too much data is sent to the cluster, and here is the graph (last 20 minutes) of write requests that I can see in Opscenter. The graph looks normal and expected for me:
write_requests(20min)
However, when I've switched the data range to last 1 hour, as turns out there were much more write requests (according to cluste(max) line):
write_requests(1h)
I'm confused, could someone clarify what cluster(max) means in my case? Why these values are so big in comparison with cluster(total) or cluster(min)?
The first graph (20 minute) uses an average. The 1h graph will have 3 lines - min per sample, average, and max per sample.
What you're likely seeing is that something (perhaps opscenter itself) is doing a flood of writes, about 700/second for a few seconds, and on the 20 minute graph it gets averaged out, but with the min/max lines, you'll see the outliers.

Resources