Scaling elastic search for read heavy applications - node.js

We have Node JS (Nest JS / Express) - based application services on the GCP cloud.
We are using elastic search to support full-text search on a blog/news website.
Our requirement is to support 2000 reads per second minimum.
While performing load testing, we observed that until a concurrency of 300 is reached, elastic search performs well and response times are acceptable.
CPU usage also spikes under this load. But, when the load is increased to 500 or 1000, CPU usage drops, and response times increase drastically.
What we don't understand is, why our CPU usage is 80% for a load of 300 and just 30 ~ 40% when load increases. Shouldn't CPU pressure increase with load?
What is the right way to tune elastic search for read-heavy usage? (Our write frequency is just 1 document in 2-3 hours)
We have one single index with approx 2 million documents. The index size is just 6GB.
Elastic cluster is deployed on Kubernetes using helm charts with:
- 1 dedicated master node
- 1 dedicated coordinating node
- 5 dedicated data nodes
Considering the small data size, the index is not sharded and the number of reading replicas is set to 4.
The index refresh rate is set to 30 sec.
RAM allocated to each data node is 2GB and the heap size is 1GB
CPU allocated to each data node is 1 vCPU
We tried to increase the search thread pool size up to 20 and queue_size to 10000 but that didn't help much either

Related

Node web app running in Fargate crashes under load with memory and CPU relatively untaxed

We are running a Koa web app in 5 Fargate containers. They are pretty straightforward crud/REST API's with Koa over Mongo Atlas. We started doing capacity testing, and noticed that the node servers started to slow down significantly with plenty of headroom left on CPU (sitting at 30%), Memory (sitting at or below 20%), and Mongo (still returning in < 10ms).
To further test this, we removed the Mongo operations and just hammered our health-check endpoints. We did see a lot of throughput, but significant degradation occurred at 25% CPU and Node actually crashed at 40% CPU.
Our fargate tasks (containers) are CPU:2048 (2 "virtual CPUs") and Memory 4096 (4 gigs).
We raised our ulimit nofile to 64000 and also set the max-old-space-size to 3.5 GB. This didn't result in a significant difference.
We also don't see significant latency in our load balancer.
My expectation is that CPU or memory would climb much higher before the system began experiencing issues.
Any ideas where a bottleneck might exist?
The main issue here was that we were running containers with 2 CPUs. Since Node only effectively uses 1 CPU, there was always a certain amount of CPU allocation that was never used. The ancillary overhead never got the container to 100%. So node would be overwhelmed on its 1 cpu while the other was basically idle. This resulted in our autoscaling alarms never getting triggered.
So adjusted to 1 cpu containers with more horizontal scale out (ie more instances).

Is there a limitation in Solr 7.6.0 that it cannot use the system's full data transfer capacity?

In order to find out how well Solr 7.6.0 performs on Azure VMs (specifically the replica recovery scenario) I did a performance test.
The objective was to find out how quickly a failed replica node could recover in a shard.
What I did ?
I ran a load test for 48 hours
Captured the full recovery time of a replica node by manually by restarting it at a point when each shard has begun hosting 250+ GB of data (close to RAM of the VM with 32-core machine)
Experiment's findings
One of the biggest red marks with this experiment was the time it took for replicas to recover:
It took ~2 hours for 350 GB recovery which means 350 / (2*60) = 2.91 GB / minute which is 2.91 * 8 / 60 = 0.4 Gb/s (not that it is small b signifying bit, not Byte).
So it seems that Solr7.6.0 is just able to use 0.4 Gb/s bandwidth for transferring the data from its leader to failed replica.
To confirm whether the respective VM supports more bandwidth I ran
iperf3
on these VMs and found out that it was supporting 11 Gb/s (however as per Azure its should be 16 Gb/s which is not deal breaker here).
So, the issue here is that why Solr is only able to utilize 0.4 Gb/s bandwidth while the VMs are able support ~11 Gb/s ?

High CPU usage when using parse-server with clusters

I am using latest version of parse-server(2.3.8) for my slots casino app and app is comunicating to server frequently just like real time game.
My server configuaration is -
DB Server - Mongo Atlas(M30 - 8GB RAM, 40GB storage), Mongodb 3.4
Parse Server - Rackspace(15GB RAM, 50GB Storage, 8 Core), Nodejs - 6.10.2
For multithread support i am using clustering using pm2(latest version). I am using 6 core from 8 for clustering.
Currently there is max 10-20 Concurrent users. Initial when i startup the parse server using clustering -
pm2 start main.js -i 6
The cpu usage us max 2% - 5%. After 1 day the cpu usage becomes 70% - 90% while the concurrent users are 10-20 or less.
I am sharing image the process list by result of "$ top " command.
https://i.stack.imgur.com/BY0fV.png
Please help me how to minimize the cpu usage, where i am going wrong.

Apache solr indexing slowdown over time

We suffer from "slow down of solr's indexing performance over time"
We try to measure indexing throughput of SolrCloud. 8 SolrJ clients sends update request to a solr and each client has 20 threads. (Total 160 threads sends request to solr) We believe that throughput depends on disk I/O performance of solr machine.
The results are below :
At first few time, solr uses full disk write ability. But IO amounts per time is getting lower over time. As the result indexing speed also slow down.
Test environment:
1 physical machine,
256GB RAM,
RAID-0 data volume (11disk),
16core CPU.
1 collection,
1 shard,
1 rf,
60gb HeapSize,
java8
Zookeeper is running on same machine.
CPU and network, zookeeper are not bottleneck.

CPU usage when searching using solr

We have a solr cloud setup of 4 shards (one shard per physical machine) having ~100 million documents. Zookeeper is on one of those 4 machines. We encounter complex queries having wild cards and proximity searches together and it sometimes takes more than 15 secs to get top 100 documents. Query traffic is very very low at the moment (2-3 queries every minute). 4 Servers hosting cloud have following specs:
(2 servers -> 64 GB RAM, 24 CPU cores, 2.4 GHz) + (2 servers -> 48 GB RAM, 24 CPU cores, 2.4GHz).
We are providing 8 GB JVM memory per shard. Our 510GB index on SSDs per machine (totalling to 4*510 GB = 2.4TB) is mapped into OS disk cache on remaining RAM on each server. So I suppose RAM is not an issue for us.
Now Interesting thing to note is: When a query is fired to the cloud, only one CPU core is utilized to 100% and rest are all at 0%. Same behaviour is replicated on all the machines. No other processes are running on these machines.
Shouldn't solr be doing multi-threading of some-kind to utilize the CPU cores? Can I anyhow increase CPU consumption for each queries as traffic is not a problem. If so, how?
A single request to a Solr shard is largely processed single-threaded (you can set threads for faceting on multiple fields). Rule of thumb is to keep document count for shards to no more than a very few hundreds of millions. You are well below that with 25M/shard, but as you say, your queries are complex. What you see is simple the effect of single-threaded processing.
The solution to your problem is to use more shards, as all shards are queried in parallel. As you have a lot of free CPU cores and very little traffic, you might want to try running 10 shards on each machine. It is not a problem for SolrCloud to use 40 shards in total and the increased merging overhead should be insignificant compared to your heavy queries.

Resources