Performance issue with REST API - multithreading

I have a REST API built with ASP .NET Core 3.1
I use this for stock trading. A single API call places orders in Stock Exchanges for 200-250 users. For this, it create threads using Task.WhenAll to create multiple threads for users.
Single Thread includes:
A HttpClient call (takes 200-250ms)
Few C# code lines (takes <5ms)
Issue is API call latency increases as the threads increases. It goes upto 30 seconds for 200 threads. But 250ms for single thread.
I need maximum latency upto 250-300ms with 1000 threads.
I'm using following AWS EC2 instance:
t3a.xlarge
16GB RAM
4 vCPU
Any help would be highly appreciated. Thanks!

Related

Is it a good practice to use multithreading to handle requests in bulk in a micro services architecture systems?

Requirement:
I have to design a micro service which performs search query in a sql db multiple times(say 7 calls) along with multiple third party http calls(say 8 calls) in sequential and interleaved manner to complete an order, by saying sequential I mean before next call of DB or third party previous call to DB or third party must be completed as the result of these calls will be used in further third party or search operations in DB.
Resources:
I) CPU: 4 cores(per instance)
II) RAM: 4 GB(per instance)
III) It can be auto scaled upto at max of 4 pods or instances.
IV) Deployment: Open Shift (Own cloud architecture)
V) Framework: Spring Boot
My Solution:
I've created a fixed thread pool of 5 threads(Size of blocking queue is not configured, also there are another 20 fixed pool threads running apart from these 5 threads for creating orders of multiple types i.e. in total there are 25 threads running per instance) using thread pool executor of Java. So when multiple requests are sent to this micro service I keep submitting the job and the JVM by using some scheduling algorithms schedules these jobs and complete the jobs.
Problem:
I'm not able to achieve the expected through put, using above approach the micro service is able to achieve only 3 to 5 tps or orders per second which is very low. Sometimes it also happens that tomcat gets choked and we have to restart services to bring back the system in responsive situation.
Observation:
I've observed that even when orders are processed very slowly by the thread pool executor if I call orders api through jmeter at the same time when things are going slow, these kind of requests which are directly landing on the controller layer are processed faster than the request getting processed by thread pool executor.
My Questions
I) What changes I should make at the architectural level to make through put upto 50
to 100 tps.
II) What changes should be done so that even if traffic on this service increases in
future then the service can either be auto scaled or justification to increase
hardware resources can be given easily.
III) Is this the way tech giants(Amazon, Paypal) solve scaling problems like these
using multithreading to optimise performance of their code.
You can assume that third parties are responding as expected and query optimisation is already done with proper indexing.
Tomcat already has a very robust thread pooling algorithm. Making your own thread pool is likely causing deadlocks and slowing things down. The java threading model is non-trivial, and you likely are causing more problems than you are solving. This is further evidenced by the fact that you are getting better performance relying on Tomcat's scheduling when you hit the controller directly.
High-volume services generally solve problems like this by scaling wide, keeping things as stateless as possible. This allows you to allocate many small servers to solve the solution much more efficiently than a single large server.
Debugging multi-threaded executions is not for the faint of heart. I would highly recommend you simplify things as much as possible. The most important bit about threading is to avoid mutable state. Mutable state is the bane of shared executions, moving memory around and forcing reads through to main memory can be very expensive, often costing far more than savings due to threading.
Finally, the way you are describing your application, it's all I/O bound anyway. Why are you bothering with threading when it's likely I/O that's slowing it down?

can we reduce thread pool size to less than four in node.js

i am planning to using node.js as an aggregate layer, i.e it would just perform API requests to different services and would send aggregated response back to client.
according to this, nodejs relies on operating system's API to perform network I/O.
In my usecase there would be no dns,file,crypto,zip operations involved.
So, can i reduce thread pool size from 4 to 1 or maybe to 0 and have no performance issues?
I require this because i am running multiple containers on same host so trying to keep thread count for different services as low as possible.

IIS - Worker threads not increasing beyond certain number even though the CPU usage is less than 40 percent

We are running a web API hosted in IIS 10 on an 8 core machine with 16 GB Memory and running Windows 10, and throwing a load of say 100 to 200 requests per second through JMeter on the server.
Individual transactions are taking less than 500 milliseconds. When we throw the load initially, IIS threads grow up to around 150-160 mark (monitored through resource monitor and Performance monitor) and throughput increases up to 22-24 transactions per second but throughput and number of threads stop to grow beyond this point even though the CPU usage is less than 40 per cent and we have enough physical memory also available at the peak, the resource monitor does not show any choking at the network or IO level.
The web API is making calls to the Oracle database (3-4 select calls and 2-3 inserts/updates).
We fail to understand what is stopping IIS to further grow its thread pool to process more requests in parallel while all the resources including processing power, memory, network etc are available.
We have placed many performance counters as well, there is no queue build-up (that's probably because jmeter works in synchronous mode)
Also, we have tried to set the min and max threads settings through machine.config as well as ThreadPool.SetMin and Max threads APIs but no difference was observed and seems like those setting are not taking any effect.
Important to mention that we are using synchronous calls/operations (no asnch and await). Someone has advised to convert all our blocking IO calls e.g. database calls to asynchronous mode to achieve more throughput but my understanding is that if threads cant be grown beyond this level then making async calls might not help or may indeed negatively impact the throughput. Since our code size is huge, that would be a very costly activity in terms of time and effort and we dont want to invest in it till we are sure that it would really help. If someone has anything to share on these two problems, pls do share.
Below is a screenshot of the permanence monitor.

Api with high compute requirements written in Node.js

we are running an api on a 12 cores server with a high performance ssd. The app is run using pm2 cluster mode (--i 0). The problem i am facing is that we are using the api to do calculations, which for instance spans over 40 years along with 12 months of each year.
In some situations i have to run them in parallel. when i run 5 tasks in parallel using async.parallel, one core is captured for 10 seconds. According to those stats we can assume that if one core is busy at 100% usage for 10 seconds, a server with 12 cores is going to server 12 concurrent requests from users.
My question is should we be using node.js for this purpose? or is there a better way or a recommended way to handle this kind of a situation. thanks
This seems unrelated to node.js. You would have this problem regardless of programming language. Your problems are more related to your design. It seems to me you need some form of caching mechanism, so future requests don't need to take 10 seconds each. You could also do the computing ahead of time and store the results in a database.

Transaction Per Second not increasing with more threads or servers using WCF

I have a windows service which wakes up everyday at a particular time and finds around 100k transactions that it needs to process. It will spawn 25 threads, which look at the bucket of transactions that need to be processed and will make a call to a WCF service.
This WCF service will do some internal processing and make a synchronous call to an external service (which we have mocked and written an emulator for the sake of volume testing). Using this setup for shorter runs for around 10k transactions we were able to achieve a TPS of around 10.
I scaled this setup to have three load balanced servers running our WCF services and two other servers running the emulator, also we increased the number of threads on the windows service to 75 threads. With this new setup we expected an increase in performance, but the TPS is still at 10.
I have performance monitor running on all five machines. The three loaded balanced servers which have the WCF service are showing an "Outstanding Calls" of around 25 constantly in "ServiceModelService" category for the WCF service. But the two servers which have the emulators running show only around 9 "Outstanding Calls" constantly for the mocked out service. This same emulator was showing around 20 "Outstanding Calls" when it was running on a single server.
My questions are:
Why is there no increase in TPS in the three load balanced machines setup?
Where is the bottleneck in this system?
The target is to get to a TPS of around 30 with the three loaded balanced servers running the WCF service.
Note: I have increased the maxconnection limit in the web config on the WCF service and windows service to 200 which increased the TPS from around 6 to the current value of 10.
Edit: More information, if each of the load balanced server has 25 outstanding calls, shouldn't the mocked external service have 3*25=75 outstanding calls?
By maxconnection limit I meant:
<system.net>
<connectionManagement>
<add address="*" maxconnection="200" />
</connectionManagement>
</system.net>
You probably can't give us enough information to diagnose the problem accurately. However, what you describe gives me enough to recommend some places to look.
First, you probably shouldn't be spawning 25 (or more) threads in your Windows service. Rather, you can have a single thread looking at the "bucket of transactions," and make asynchronous calls to the WCF service. You can control the number of concurrent transactions using a Semaphore. Something like:
Semaphore _transactionSemaphore = new Semaphore(25, 25);
while (transactionCount > 0)
{
transactionSemaphore.WaitOne(); // wait for a free spot
var transaction = getTransaction();
DoAsyncWcfCall(transaction);
}
And the async completed event (see the above link about asynchronous calls) releases the semaphore:
void AsyncCompletedEvent(...)
{
// do after-call processing
// and then release the semaphore
transactionSemaphore.Release();
}
When transactionCount gets to 0, you have to wait for all outstanding calls to complete. You do that by repeatedly waiting on the semaphore: 25 times. That is:
for (int i = 0; i < 25; ++i)
{
transactionSemaphore.WaitOne();
}
If your main thread has gobbled up the semaphore, then you know that there can't be any outstanding calls.
You can extend that concurrent transaction count to 50 or 75, or whatever value you like. Your hardware being able to handle it, of course.
The difference here is that asynchronous service calls use I/O completion ports rather than individual threads. Allocating a thread that just sits there and waits is very wasteful. With the I/O completion port, the only time a thread gets involved is when the call completes--in the async completed method. Those threads are allocated automatically by the thread pool.
If the service is constantly showing 25 outstanding calls, then the total of outstanding calls for all of the servers better not be more than that. If the WCF services are showing more outstanding transactions than the Windows service is showing, then you have a problem. Come to think of it, if the service is showing more outstanding calls than do the load balanced servers, you also have a problem. If the two don't match, then somebody's dropping something: either the Windows service thinks that it has outstanding calls that the WCF services think are filled, or vice-versa.
It's hard to say where the bottleneck is. If you're experiencing high CPU usage on the Windows service machine, that's probably your bottleneck. You say that the WCF services call yet another external service. That external service could be the bottleneck. Depending on how you mocked it, the mock could be the bottleneck. Have you determined how much time that external service takes? That the services running the mocked service seem to have lower throughput than the WCF service that's talking to the real service makes me think there's a problem with performance of your mock.
I suppose it's possible that your WCF services aren't properly cleaning up resources, and they are spending an inordinate amount of time in garbage collection. Have you verified that you're using the server garbage collector? I think that's the default for WCF services, but you need to check.
Given the information you've provided, I consider those the most likely possible bottlenecks.
One other thing. It's incredibly wasteful to have a Windows service that does nothing but sit there and once a day "wakes up" to process some transactions. You should make that a console application and schedule a task to run it once per day. You can use the Task Scheduler GUI, or you can schedule it with the schtasks command. See Programs are not cats.
Another benefit of making your program a console app rather than a service is that it's a whole lot easier to debug a console app.

Resources