is there a way to pre-start threads on sphinxsearch - multithreading

we have large sphinx distributed cluster,
single frontend distributed index, with 3 backend servers each with small local distributed index.
from time to time, during high traffic spikes, on backend servers we see load average as high as 80.
at the same time, I can see sphinx hawe spawn 150 - 200 threads, that quickly drops to say 50 and then they go back to 150 - 200, then again back down to 50 and so on.
Is there a way to "prefork" / prestart those threads? e.g. something like apache's MinSpareServers.
In fact we use dedicated hardware, so we will not have problem, if sphinx use more memory when is idle.
on the backend servers, we are using realtime index and we can no swith to prefork model.
we have full root access so we can tweak linux system settings too.

Only in 2.3 workers=thread_pool creates at start a fixed number of working threads, which is either 1.5*detected cores or max_children directive. The threads are put under a pool. The incoming connections are handled by separate thread(s), which allocates the queries to the working pool. The old workers=threads creates a thread for every query.

Related

How to find optimal size of connection pool for single mongo nodejs driver

I am using official mongo nodejs driver with default settings, but was digging deeper into options today and apparently there is an option of maxPoolSize that is set to 100 by default.
My understanding of this is that single nodejs process can establish up to 100 connections, thus allowing mongo to handle 100 reads/writes simultaneously in paralel?
If so, it seems that setting this number higher could only benefit the performance, but I am not sure hence decided to ask here.
Assuming default setup with no indexes, is there a way to determine (based on cpu's and memory of the db) what the optimal connection number for pool should be?
We can also assume that nodejs process itself is not a bottleneck (i.e can be scaled horizontally).
Good question =)
it seems that setting this number higher could only benefit the performance
It does indeed. I mean it seems, and it would be the case for an abstract nodejs process in a vacuum with unlimited resources. Connections are not free, so there are things to consider:
limited connection quota on the server. Atlas in particular, but even self-hosted cluster has only 65k sockets. Remember the driver keeps them open to reuse, and the default timeout per cursor is 30 minutes of inactivity.
single thread clientside. BSON serialisation blocks event loop and is quite expensive, e.g. see the flamechart in this answer https://stackoverflow.com/a/72264469/1110423 . Blocking the loop, you increase time cursors from the previous point remain open, and in worst case get performance degradation.
limited RAM. Each connection require ~1 MB serverside.
Assuming default setup with no indexes
You have at least _id, and you should have more if we are talking about performance
is there a way to determine what the optimal connection number for pool should be?
I'd love to know that too. There are too many factors to consider, not only CPA/RAM, but also data shape, query patterns, etc. This is what dbops are for. Mongo cluster requires some attention, monitoring and adjustments for optimal operations. In many cases it's more cost efficient to scale up the cluster than optimise the app.
We can also assume that nodejs process itself is not a bottleneck (i.e can be scaled horizontally).
This is quite wild assumption. The process cannot scale horisontally. It's on the OS level. Once you have a process descriptor, it's locked to it till the death. You can use a node cluster to utilise all CPU cores, can even have multiple servers running the same nodejs and balance the load, but none of them will share connections from the pool. The pool is local to nodejs process.

IIS - Worker threads not increasing beyond certain number even though the CPU usage is less than 40 percent

We are running a web API hosted in IIS 10 on an 8 core machine with 16 GB Memory and running Windows 10, and throwing a load of say 100 to 200 requests per second through JMeter on the server.
Individual transactions are taking less than 500 milliseconds. When we throw the load initially, IIS threads grow up to around 150-160 mark (monitored through resource monitor and Performance monitor) and throughput increases up to 22-24 transactions per second but throughput and number of threads stop to grow beyond this point even though the CPU usage is less than 40 per cent and we have enough physical memory also available at the peak, the resource monitor does not show any choking at the network or IO level.
The web API is making calls to the Oracle database (3-4 select calls and 2-3 inserts/updates).
We fail to understand what is stopping IIS to further grow its thread pool to process more requests in parallel while all the resources including processing power, memory, network etc are available.
We have placed many performance counters as well, there is no queue build-up (that's probably because jmeter works in synchronous mode)
Also, we have tried to set the min and max threads settings through machine.config as well as ThreadPool.SetMin and Max threads APIs but no difference was observed and seems like those setting are not taking any effect.
Important to mention that we are using synchronous calls/operations (no asnch and await). Someone has advised to convert all our blocking IO calls e.g. database calls to asynchronous mode to achieve more throughput but my understanding is that if threads cant be grown beyond this level then making async calls might not help or may indeed negatively impact the throughput. Since our code size is huge, that would be a very costly activity in terms of time and effort and we dont want to invest in it till we are sure that it would really help. If someone has anything to share on these two problems, pls do share.
Below is a screenshot of the permanence monitor.

Loading Streaming Data from RabbitMQ to Postgres in Parallel

I'm still somewhat new to Node.js, so I'm not as conversant in how parallelism works with concurrent I/O operations as I'd like to be.
I'm planning a Node.js application to load streaming data from RabbitMQ to Postgres. These loads will happen during system operation, so it is not a bulk load.
I expect throughput requirements to be fairly low to start (maybe 50-100 records per minute). But I'd like to plan the application so it can scale up to higher volumes as the requirements emerge.
I'm trying to think through how parallelism would work. My first impressions of flow and how parallelism would be introduced is:
Message read from the queue
Query to load data into Postgres kicked off, which pushes callback to the Node stack
Event loop free to read another message from the queue, if available, which will launch another query
Repeat
I believe the queries kicked off in this fashion will run in parallel up to the number of connections in my PG connection pool. Is this a good assumption?
With this simple flow, the limit on parallel queries would seem to be the size of the Postgres connection pool. I could make that as big as required for throughput (and that the server and backend database can handle) and that would be the limiting factor on how many messages I could process in parallel. Does that sound right?
I haven't located a great reference on how many parallel I/Os Node will instantiate. Will Node eventually block as my event loop generates too many I/O requests that aren't yet resolved (if not, I assume pg will put my query on the callback stack when I have to wait for a connection)? Are there dials I can turn to affect these limits by setting switches when I launch Node? Am I assuming correctly that libuv and the "pg" lib will in fact run these queries in parallel within one Node.js process? If those assumptions are correct, I'd think I'd hit connection pool size limits before I'd run into libuv parallelism limits (or possibly at the same time if I size my connection pool to the number of cores on the server).
Also, related to the discussion above about Node launching parallel I/O requests, how do I prevent Node from pulling messages off the queue as quick as they come in and queuing up I/O requests? I'd think at some point this could cause problems with memory consumption. This relates back to my question about startup parameters to limit the amount of parallel I/O requests created. I don't understand this too well at this point, so maybe it's not a concern (maybe by default Node won't create more parallel I/O requests than cores, providing a natural limit?).
The other thing I'm wondering is when/how running multiple copies of this program in parallel would help? Does it even matter on one host since the Postgres connection pool seems to be the driver of parallelism here? If that's the case, I'd probably only run one copy per host and only run additional copies on other hosts to spread the load.
As you can see, I'm trying to get some basic assumptions right before I start down this road. Insight and pointers to good reference doc would be appreciated.
I resolved this with a test of the prototype I wrote. A few observations:
If I don't set pre-fetch on the RabbitMQ channel, Node will pull ALL the messages off the queue in seconds. I did a test with 100K messages off the queue and Node pulled all 100K off in seconds, though it took many minutes to actually process the messages.
The behavior mentioned in #1 above is not desireable, because then Node must cache all the messages in memory. In my test, Node took up 2GB when pulling down all those message quickly, whereas if I set pre-fetch to match the number of database connections, Node took up only 80 MB and drained the queue slowly, as it finished processing the messages and sent back ACKs.
A single instance of Node running this program kept my CPUs 100% utilized.
So, the morals of the story seem to be:
Node can spawn any number of async I/O handlers (limited by available memory)
In a case like this, you want to limit how many async I/O requests Node spawns to avoid excessive memory usage.
Creating additional child processes for this workload made no difference. The unit of parallelism was the size of the database connection pool. If my workload did more in JavaScript instead of just delegating to Postgres, additional child processes would help. But in this case, it's all I/O (and thankfully I/O that doesn't need the Node threadpool), so the additional child processes do nothing.

Strategies for scale a nodeJS application?

I have an app in NodeJS.
Recently we have been getting a lot more traffic (this is a new experience for me) and so I have been running into the "EMFILE: too many open files" error that is caused when a single process tries to open more files than the filesystem allows.
I have increased this limit, so we are good for now. However I'm not sure how long this solution will last...
I am wondering: What are other commonly used options for scaling a Node Application that is getting increasing amounts of traffic? (specifically with a mind to the open files limit problem.)
The PM2 process manager which allows clustering catches my eye (am I correct in understanding that every instance of the application requires it's own core -- ie you can't run 4 instances on a single core?). Are there any other techniques that are regularly used?
Thanks (in advance)
PM2 is a simple solution when you want to run more than one instance of Node, another common alternative is the cluster module http://nodejs.org/api/cluster.html Keep in mind, that you will need to configure another http server such as Nginx to reverse proxy your user requests to your Node processes.
You can run any number of Node processes, regardless of the amount of cores. But since each node process is a single thread, and each core can execute a single thread a time, the optimal configuration is when the number of cores match the number of Node processes. If the number of Node processes is greater than the number of cores, under load, you will experience reduced performance due to redundant context switches your processor will have to perform.

How to determine the best number of threads in Tomcat?

How does one determine the best number of maxSpare, minSpare and maxThreads, acceptCount etc in Tomcat? Are there existing best practices?
I do understand this needs to be based on hardware (e.g. per core) and can only be a basis for further performance testing and optimization on specific hardware.
the "how many threads problem" is quite a big and complicated issue, and cannot be answered with a simple rule of thumb.
Considering how many cores you have is useful for multi threaded applications that tend to consume a lot of CPU, like number crunching and the like. This is rarely the case for a web-app, which is usually hogged not by CPU but by other factors.
One common limitation is lag between you and other external systems, most notably your DB. Each time a request arrive, it will probably query the database a number of times, which means streaming some bytes over a JDBC connection, then waiting for those bytes to arrive to the database (even is it's on localhost there is still a small lag), then waiting for the DB to consider our request, then wait for the database to process it (the database itself will be waiting for the disk to seek to a certain region) etc...
During all this time, the thread is idle, so another thread could easily use that CPU resources to do something useful. It's quite common to see 40% to 80% of time spent in waiting on DB response.
The same happens also on the other side of the connection. While a thread of yours is writing its output to the browser, the speed of the CLIENT connection may keep your thread idle waiting for the browser to ack that a certain packet has been received. (This was quite an issue some years ago, recent kernels and JVMs use larger buffers to prevent your threads for idling that way, however a reverse proxy in front of you web application server, even simply an httpd, can be really useful to avoid people with bad internet connection to act as DDOS attacks :) )
Considering these factors, the number of threads should be usually much more than the cores you have. Even on a simple dual or quad core server, you should configure a few dozens threads at least.
So, what is limiting the number of threads you can configure?
First of all, each thread (used to) consume a lot of resources. Each thread have a stack, which consumes RAM. Moreover, each Thread will actually allocate stuff on the heap to do its work, consuming again RAM, and the act of switching between threads (context switching) is quite heavy for the JVM/OS kernel.
This makes it hard to run a server with thousands of threads "smoothly".
Given this picture, there are a number of techniques (mostly: try, fail, tune, try again) to determine more or less how many threads you app will need:
1) Try to understand where your threads spend time. There are a number of good tools, but even jvisualvm profiler can be a great tool, or a tracing aspect that produces summary timing stats. The more time they spend waiting for something external, the more you can spawn more threads to use CPU during idle times.
2) Determine your RAM usage. Given that the JVM will use a certain amount of memory (most notably the permgen space, usually up to a hundred megabytes, again jvisualvm will tell) independently of how many threads you use, try running with one thread and then with ten and then with one hundred, while stressing the app with jmeter or whatever, and see how heap usage will grow. That can pose a hard limit.
3) Try to determine a target. Each user request needs a thread to be handled. If your average response time is 200ms per "get" (it would be better not to consider loading of images, CSS and other static resources), then each thread is able to serve 4/5 pages per second. If each user is expected to "click" each 3/4 seconds (depends, is it a browser game or a site with a lot of long texts?), then one thread will "serve 20 concurrent users", whatever it means. If in the peak hour you have 500 single users hitting your site in 1 minute, then you need enough threads to handle that.
4) Crash test the high limit. Use jmeter, configure a server with a lot of threads on a spare virtual machine, and see how response time will get worse when you go over a certain limit. More than hardware, the thread implementation of the underlying OS is important here, but no matter what it will hit a point where the CPU spend more time trying to figure out which thread to run than actually running it, and that numer is not so incredibly high.
5) Consider how threads will impact other components. Each thread will probably use one (or maybe more than one) connection to the database, is the database able to handle 50/100/500 concurrent connections? Even if you are using a sharded cluster of nosql servers, does the server farm offer enough bandwidth between those machines? What else will run on the same machine with the web-app server? Anache httpd? squid? the database itself? a local caching proxy to the database like mongos or memcached?
I've seen systems in production with only 4 threads + 4 spare threads, cause the work done by that server was merely to resize images, so it was nearly 100% CPU intensive, and others configured on more or less the same hardware with a couple of hundreds threads, cause the webapp was doing a lot of SOAP calls to external systems and spending most of its time waiting for answers.
Oce you've determined the approx. minimum and maximum threads optimal for you webapp, then I usually configure it this way :
1) Based on the constraints on RAM, other external resources and experiments on context switching, there is an absolute maximum which must not be reached. So, use maxThreads to limit it to about half or 3/4 of that number.
2) If the application is reasonably fast (for example, it exposes REST web services that usually send a response is a few milliseconds), then you can configure a large acceptCount, up to the same number of maxThreads. If you have a load balancer in front of your web application server, set a small acceptCount, it's better for the load balancer to see unaccepted requests and switch to another server than putting users on hold on an already busy one.
3) Since starting a thread is (still) considered a heavy operation, use minSpareThreads to have a few threads ready when peak hours arrive. This again depends on the kind of load you are expecting. It's even reasonable to have minSpareThreads, maxSpareThreads and maxThreads setup so that an exact number of threads is always ready, never reclaimed, and performances are predictable. If you are running tomcat on a dedicated machine, you can raise minSpareThreads and maxSpareThreads without any danger of hogging other processes, otherwise tune them down cause threads are resources shared with the rest of the processes running on most OS.

Resources