Cassandra concurrent read and write - cassandra

I am trying to understand the Cassandra concurrent read and writes. I come across the property called
concurrent_reads (Defaults are 8)
A good rule of thumb is 4 concurrent_reads per processor core. May increase the value for systems with fast I/O storage
So as per the definition, Correct me If am wrong, 4 threads can access the database concurrently. So let's say I am trying to run the following query,
SELECT max(column1) from 'testtable' WHERE duration = 'month';
I am just trying to execute this query, What will be the use of concurrent read in executing this query?

Thats how many active reads can run at a single time per host. This is viewable if you type nodetool tpstats under the read stage. If the active is at pegged at the number of concurrent readers and you have a pending queue it may be worth trying to increase this. Its pretty normal for people to have this at ~128 when using decent sized heaps and SSDs. This is very hardware dependent so defaults are conservative.
Keep in mind that the activity on this thread is very fast, usually measured in sub ms but assuming they take 1ms even with only 4, given little's law you have a maximum of 4000 (local) reads per second per node max (1000/1 * 4), with RF=3 and quorum consistency that means your doing a minimum of 2 reads per request so can divide in 2 to think of a theoretical (real life is ickier) max throughput.
The aggregation functions (ie max) are processed on the coordinator, after fetching the data of the replicas (each doing a local read and sending response) and are not directly impacted by the concurrent reads since handled in the native transport and request response stages.

From cassandra 2.2 onward, the standard aggregate functions min, max, avg, sum, count are built-in. So, I don't think concurrent_reads will have any effect on your query.

Related

Best batching size choice

In Microsoft's documents regarding CosmosDB it has been said that Stored Procedure and UDF Programming are good when you have a batch save or submit, but it hasn't said anything regarding batch size/record count.
Batching – Developers can group operations like inserts and submit them in bulk. The network traffic latency cost and the store overhead to create separate transactions are reduced significantly.
Is there any limits? What is the best practice?
For example lets say I have a million record that I'd like to save and each record is 2-4KB. I think it is not a good idea to call the SP with 3 GB of data. :)
Should I go for 1000 rows in 1 call (~3MB) or is it still too big/small?
*P.S: Since it has been promised to complete a write action in less than 15 Milliseconds, I would assume that 1000 records should take less than 15 seconds and 5000 records less than 75 seconds which both are still valid duration.
I will say, you should experiment to come up with the correct batch size.
However, remember sprocs can run only for 5 seconds. See https://learn.microsoft.com/en-us/azure/cosmos-db/programming#bounded-execution for how to handle this from code.
Hope this help.
There a few things you need to consider while doing Batching.
When you use a stored procedure to do Batch upsert, it can only work on a single partition.
If each of your record is 4 KB, then a write operation would consume around 4 times 6 RUs/sec = 24 RUs/sec for a single write.
A single physical partition can only have a maximum of 10K RUs, which means you could at best you could insert 416 documents/sec.
This is assuming there is no additional cost of indexing and there are no other writes happening to the same physical partitions.
Batching definitely saves on the network hops you make.
But you should consider the below when you are using batching:
Executing a stored procedure would consume some extra RUs that will be consumed from the RUs that are allocated to your partition.
If a stored procedure throws an un-handled error , then the whole transaction will be rolled back. Which means the RUs are used up with out adding any data.
So you need to do good exception handling and if there are failures after executing half of the batch, re-try only for the rest.
The code of the Stored procedure does not necessarily run as good as the document db internal code.
Also there is bounded execution limit of 5 secs before the transaction is killed.

Limit max parallelism for a single RDD without decreasing the number of partitions

Is it possible to limit the max number of concurrent tasks at the RDD level without changing the actual number of partitions? The use case is to not overwhelm a database with too many concurrent connections without reducing the number of partitions. Reducing the number of partitions causes each partition to become larger and eventually unmanageable.
I'm re-posting this as an "answer" because I think it may be the least-dirty hack that might get the behavior you want:
Use a mapPartitions(...) call, and at the beginning of the mapping function, do some kind of blocking check on a globally viewable state (REST-call, maybe?) that only allows some maximum number of checks to succeed at any given time. Since that will delay the full RDD operation, you may need to increase the timeout on RDD finishing to prevent an error
Primary significance of partitioning in spark is for providing parallelism, and your requirement is to reduce parallelism!!! But the the requirement is genuine :)
What is the real problem with less number of partition? Is writing too much data at once is creating problem? If that is the case, you could breakdown the per partition writing.
Can you put the data in some intermediate queue and process the at a controlled manner?
One approach might be to enable dynamic allocation, and set the maximum number of executors to your desired maximum parallelism.
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.maxExecutors <maximum>
You can read more about configuring dynamic allocation is described here:
https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
https://spark.apache.org/docs/latest/configuration.html#scheduling
If you are trying to control one specific computation, you could experiment with programmatically controlling the number of executors:
https://github.com/jaceklaskowski/mastering-apache-spark-book/blob/master/spark-sparkcontext.adoc#dynamic-allocation

Will Elasticsearch survive this much load or simply die?

We have Elasticsearch Server with 1 cluster 3 Nodes, we are expecting that queries fired per second will be 800-1000, so we want to know if we get load like 1000 queries per second then will the elasticsearch server respond with delays or it will simply stop working ?
Queries are all query_string, fuzzy (prefix & wildcard queries are not used).
There's a few factors to consider assuming that your network has the necessary throughput:
What's the CPU speed and number of cores for each node?
Should have 2GHZ quad cores at the very least. Also the nodes should be dedicated to ELK, so they aren't busy with other tasks.
How much ram do your nodes have?
Probably want to be north of 10GB at least
Are your logs filtered and indexed?
Having your logs filtered will greatly reduce the work load generated by the queries. Additionally, filtered logs can make it so that you don't have to query as much with wild cards (which are very expensive).
Hope that helps point in a better direction :)
One immediate suggestion: if you are expecting sustained query rates of 800 - 1K/sec you do not want the nodes storing the data (which will be handling indexing of new records, merging and shard rebalancing) to also be having to deal with query scatter/gather operations. Consider a client + data node topology where you keep your 3 nodes and add n client nodes (data and master set to false in their configs.) The actual value for n will vary based on your actual performance; this will be something you'll want to determine via experimentation.
Other factors equal or unknown, abundant memory is a good resource to have. Review the Elastic team's guidance on hardware and be sure to link through to the discussion on heap.

Elasticsearch bad indexing time

I am trying to migrate (copy) 35 million documents (which is a standard amount, not too big) between couchbase to elasticsearch.
My elasticsearch (version 1.3) cluster composed from 3 A3 (4 cores, 7 GB memory) CentOS Severs on Microsoft Azure (each server equals to a large server on Amazon)..
I used "timing data flow" indexing to store the docuemnts. each index represents a month and composed by 3 shards and 2 replicas.
when i start the migration script i see that the insertion time is becoming very slow (about 10 documents per second) and the load average of each server in the cluster jumping over than 1.5.
In addition, the JVM memory is being increased almost to 100% while the cpu shows 20% and the IOps shows 20 at max.
(i used Marvel CNC to get all these data)
Does anyone faced these kind of indexing problems in elasticsearch?
I would like to know if there are any parameters that i should be aware about to extend java memory?
is my cluster specifications good enough to handle 100 indexing per second.
is the indexing time depends on how big is the index? and should it be that slow?
Thnx Niv
I am quoting an answer I got in google group (link)
A couple of suggestions:
Disable replicas before large amounts of inserts (set replica count to 0), and only enable it afterwards again.
Use batching, actual batch size would depends on many factors (doc sizes, network, instances strengths)
Follow ES's advice on node setup, e.g. allocate 50% of the available memory size to the Java heap of ES, don't run anything else
on that machine, and disable swappiness.
Your index is already sharded, try spreading it out to 3 different servers instead of having them on one server ("virtual shards"). This
will help fan out the indexing load.
If you don't specify the document IDs yourself, make sure you use the latest ES, there's a significant improvement there in the ID
generation mechanism which could help speeding up things.
I applied points 1 & 3 and it seems that the problems solved :)
now i am indexing in rate of 80 docs per second and the load avg is low (0.7 at max)
I have to give the credit to Itamar Syn-Hershko that posted this reply.

Azure Table Storage transaction limitations

I'm running performance tests against ATS and its behaving a bit weird when using multiple virtual machines against the same table / storage account.
The entire pipeline is non blocking (await/async) and using TPL for concurrent and parallel execution.
First of all its very strange that with this setup i'm only getting about 1200 insertions. This is running on a L VM box, that is 4 cores + 800mbps.
I'm inserting 100.000 rows with unique PK and unique RK, that should leverage the ultimate distribution.
Even more deterministic behavior is the following.
When I run 1 VM i get about 1200 insertions per second.
When I run 3 VM i get about 730 on each insertions per second.
Its quite humors to read the blog post where they are specifying their targets.
https://azure.microsoft.com/en-gb/blog/windows-azures-flat-network-storage-and-2012-scalability-targets/
Single Table Partition– a table partition are all of the entities in a table with the same partition key value, and usually tables have many partitions. The throughput target for a single table partition is:
Up to 2,000 entities per second
Note, this is for a single partition, and not a single table. Therefore, a table with good partitioning, can process up to the 20,000 entities/second, which is the overall account target described above.
What shall I do to be able to utilize the 20k per second, and how would it be possible to execute more than 1,2k per VM?
--
Update:
I've now also tried using 3 storage accounts for each individual node and is still getting the performance / throttling behavior. Which i can't find a logical reason for.
--
Update 2:
I've optimized the code further and now i'm possible to execute about 1550.
--
Update 3:
I've now also tried in US West. The performance is worse there. About 33% lower.
--
Update 4:
I tried executing the code from a XL machine. Which is 8 cores instead of 4 and the double amount of memory and bandwidth and got a 2% increase in performance so clearly this problem is not on my side..
A few comments:
You mention that you are using unique PK/RK to get ultimate
distribution, but you have to keep in mind that the PK balancing is
not immediate. When you first create a table, the entire table will
be served by 1 partition server. So if you are doing inserts across
several different PKs, they will still be going to one partition
server and be bottlenecked by the scalability target for a single
partition. The partition master will only start splitting your
partitions among multiple partition servers after it has identified hot
partition servers. In your <2 minute test you will not see the
benefit of multiple partiton servers or PKs. The throughput in the
article is targeted towards a well distributed PK scheme with
frequently accessed data, causing the data to be divided amongst
multiple partition servers.
The size of your VM is not the issue as
you are not blocked on CPU, Memory, or Bandwidth. You can achieve
full storage performance from a small VM size.
Check out
http://research.microsoft.com/en-us/downloads/5c8189b9-53aa-4d6a-a086-013d927e15a7/default.aspx.
I just now did a quick test using that tool from a WebRole VM in the
same datacenter as my storage account and I acheived, from a single
instance of the tool on a single VM, ~2800 items per second upload
and ~7300 items per second download. This is using 1024 byte
entities, 10 threads, and 100 batch size. I don't know how efficient this tool is or if it disables Nagles Algorithm as I was unable to get great results (I got ~1000/second) using a batch size of 1, but at least with the 100 batch size it shows that you can achieve high items/second. This was done in US West.
Are you using Storage client library 1.7 (Microsoft.Azure.StorageClient.dll) or 2.0 (Microsoft.Azure.Storage.dll)? The 2.0 library has some performance improvements and should yield better results.
I suspect this may have to do with TCP Nagle.
See this MSDN article and this blog post.
In essence, TCP Nagle is a protocol-level optimization that batches up small requests. Since you are sending lots of small requests this is likely to negatively affect your performance.
You can disable TCP Nagle by executing this code when starting your application
ServicePointManager.UseNagleAlgorithm = false;
Are the compute instances and storage account in the same affinity group? Affinity groups ensure that network proximity between the services is optimal and should result in lower latency at the network level.
You can find affinity group configuration under the network tab.
I would tend to believe that the maximum throughput is for an optimized load. For example, I bet you that you can achieve higher performance using Batch requests than individual requests you are doing now. And of course, if you use GUIDs for your PK, you can't Batch in your current test.
So what if you changed your test to batch insert entities in groups of 100 (maximum per batch), still using GUIDs, but for which 100 entities would have the same PK?

Resources