I’m testing the CPU performance of functions, so I made a function that finds the prime numbers in a number. It is triggered by Azure Service Bus.
On my local machine it runs in 350ms.
The function, when running in a consumption plan, takes around 1000ms.
When I batch send 100 messages to the function, it does scale up to around 16 instances, but the performance of each function decreases considerably to between 3000-7000ms.
When trying a standard service plan with 4 cores, the performance is better, but not that much. It is still considerably slower than my laptop.
This guy here has a similar issue.
Is this the performance/scaling to be expected from functions? E.q. not a great fit for batch processing of CPU intensive methods?
Would Azure batch be a better fit?
I don't know the exact specification of the hardware that Functions are running on, but you can assume that each instance of Consumption plan is a low-profile single-core VM. If you need to run CPU-intensive latency-critical workload, that's probably not a good match.
Your local machine is probably faster that those instances, so that's where 350ms vs. 1000ms difference comes from.
The decrease to 3000-7000ms is related to the fact that multiple executions of functions are running at the same time on the same instance. They are competing for CPU, slowing each other down. For pure CPU-bound workloads, you might be better off setting "maxConcurrentCalls": 1 in host.json.
Related
Requirement:
I have to design a micro service which performs search query in a sql db multiple times(say 7 calls) along with multiple third party http calls(say 8 calls) in sequential and interleaved manner to complete an order, by saying sequential I mean before next call of DB or third party previous call to DB or third party must be completed as the result of these calls will be used in further third party or search operations in DB.
Resources:
I) CPU: 4 cores(per instance)
II) RAM: 4 GB(per instance)
III) It can be auto scaled upto at max of 4 pods or instances.
IV) Deployment: Open Shift (Own cloud architecture)
V) Framework: Spring Boot
My Solution:
I've created a fixed thread pool of 5 threads(Size of blocking queue is not configured, also there are another 20 fixed pool threads running apart from these 5 threads for creating orders of multiple types i.e. in total there are 25 threads running per instance) using thread pool executor of Java. So when multiple requests are sent to this micro service I keep submitting the job and the JVM by using some scheduling algorithms schedules these jobs and complete the jobs.
Problem:
I'm not able to achieve the expected through put, using above approach the micro service is able to achieve only 3 to 5 tps or orders per second which is very low. Sometimes it also happens that tomcat gets choked and we have to restart services to bring back the system in responsive situation.
Observation:
I've observed that even when orders are processed very slowly by the thread pool executor if I call orders api through jmeter at the same time when things are going slow, these kind of requests which are directly landing on the controller layer are processed faster than the request getting processed by thread pool executor.
My Questions
I) What changes I should make at the architectural level to make through put upto 50
to 100 tps.
II) What changes should be done so that even if traffic on this service increases in
future then the service can either be auto scaled or justification to increase
hardware resources can be given easily.
III) Is this the way tech giants(Amazon, Paypal) solve scaling problems like these
using multithreading to optimise performance of their code.
You can assume that third parties are responding as expected and query optimisation is already done with proper indexing.
Tomcat already has a very robust thread pooling algorithm. Making your own thread pool is likely causing deadlocks and slowing things down. The java threading model is non-trivial, and you likely are causing more problems than you are solving. This is further evidenced by the fact that you are getting better performance relying on Tomcat's scheduling when you hit the controller directly.
High-volume services generally solve problems like this by scaling wide, keeping things as stateless as possible. This allows you to allocate many small servers to solve the solution much more efficiently than a single large server.
Debugging multi-threaded executions is not for the faint of heart. I would highly recommend you simplify things as much as possible. The most important bit about threading is to avoid mutable state. Mutable state is the bane of shared executions, moving memory around and forcing reads through to main memory can be very expensive, often costing far more than savings due to threading.
Finally, the way you are describing your application, it's all I/O bound anyway. Why are you bothering with threading when it's likely I/O that's slowing it down?
I'm trying to find the optimal cloud architecture to host a software on Microsoft Azure.
The scenario is the following:
A (containerised) REST API is exposed to the users through which they can submit POST and GET requests. POST requests trigger a backend that needs a robust configuration to operate properly and GET requests are sent to fetch the result of the backend, if any. This component of the solution is currently hosted on an Azure Web App Service which does the job perfectly.
The (containerised) backend (triggered by POST requests) perform heavy calculations during a short amount of time (typically 5-10 minutes are allotted for the calculation). This backend needs (at least) 4 cores and 16 Gb RAM, but the more the better.
The current configuration consists in the backend hosted together with the REST API on the App Service with a plan that accommodates the backend's requirements. This is clearly not very cost-efficient, as the backend is idle ~90% of the time. On top of that it's not really scalable despite an automatic scaling rule to spawn new instances based on the CPU use: it's indeed possible that if several POST requests come at the same time, they are handled by the same instance and make it crash due to a lack of memory.
Azure Functions doesn't seem to be an option: the serverless (consumption plan) solution they propose is restricted to 1.5 Gb RAM and doesn't have Docker support.
Azure Container Instances neither, because first the max number of CPUs is 4 (which is really few for the needs here, although acceptable) and second there are cold starts of approximately 2 minutes (I imagine due to the creation of the container group, pull of the image, and so on). Despite the process is async from a user perspective, a high latency is not allowed as the result is expected within 5-10 minutes, so cold starts are a problem.
Azure Batch, which at first glance appears to be a perfect fit (beefy configurations available, made for hpc, cost effective, made for time limited tasks, ...) seems to be slow too (it takes a couple of minutes to create a pool and jobs don't run immediately when submitted).
Do you have any idea what I could use?
Thanks in advance!
Azure Functions
You could look at Azure Functions Elastic Premium plan. EP3 has 4 cores, 14GB of RAM and 250GB of storage.
Premium plan hosting provides the following benefits to your functions:
Avoid cold starts with perpetually warm instances
Virtual network connectivity.
Unlimited execution duration, with 60 minutes guaranteed.
Premium instance sizes: one core, two core, and four core instances.
More predictable pricing, compared with the Consumption plan.
High-density app allocation for plans with multiple function apps.
https://learn.microsoft.com/en-us/azure/azure-functions/functions-premium-plan?tabs=portal
Batch Considerations
When designing an application that uses Batch, you must consider the possibility of Batch not being available in a region. It's possible to encounter a rare situation where there is a problem with the region as a whole, the entire Batch service in the region, or your specific Batch account.
If the application or solution using Batch always needs to be available, then it should be designed to either failover to another region or always have the workload split between two or more regions. Both approaches require at least two Batch accounts, with each account located in a different region.
https://learn.microsoft.com/en-us/azure/batch/high-availability-disaster-recovery
I am currently using express.js as the main backend. However, I have found that fastify is generally faster in performance.
But the downside of fastify is that it has a relatively slow startup time.
I am curious that would it slow down the autoscaling of Cloudrun?
I've seen that Cloudrun autoscales when the usage is over 60%.
In this case, I am thinking slow startup time can delay the response while autoscaling which can be a reason not to use fastify. How does this exactly work?
A slow cold start doesn't influence the autoscaller. The service scale according to the CPU usage and the number of queries.
If you track the number of created instance with benchmarks, you can see that you can have suddenly 50 concurrent requests, 5 to 10 instances are created by the Cloud Run autoscaller. Why? Because if the traffic suddenly increase with a lot of concurrent request, it can means that slope can continue and you can have soon 100 or 200 concurrent request and Cloud Run service is prepared to absorb that traffic.
If you scale slowly to 50 concurrent requests, you can have only 1 or 2 instances set up.
Anyway, it's just a thing that I noted in my previous tests. In addition, keep in mind that, if you have a sustained traffic, the cold start is a marginal case. You will lost a few milliseconds "rarely", and it doesn't imply a framework change.
My recommendation is to keep what is the best and the most efficient for you (you cost at least 100 times more than the Cloud Run cost!!)
I don't know whether it is purposefully done or Azure is poor by performance than AWS. Whenever I cold-start an Azure Function it takes close to a minute.
The same function cold starts less than a second (close to 250 msecs) with AWS.
What I see is, Azure stores all the function code in Azure Storage account and load it over the network creating this latency. This is with consumption plan.
If I use App Service Plan for functions, so junky to even have that in modern applications. It can reduce to 3 seconds but not even close to what AWS performs.
What are the other way I can improve performance with Azure, so that I can cold start my functions quickly?
I'm a member of the Azure Functions team. I can assure you we're not deliberately making JavaScript slow. It just comes with some challenges we're still working through.
As you noted, the 60 second cold start performance you are getting is due to the network latency incurred when loading a very large number of small files which is typical for node.js applications.
Our current recommendation is for your to take advantage of Azure-Functions-Pack. It uses webpack to dramatically decrease the number of files loaded by your application.
We are working on some improvements designed to make the manual process of running Functions Pack unnecessary. We aim to have some of these improvements in production sometime later in 2017.
On Azure I can get 3 extra small instances for the price 1 small.I'm not worried about my site not scaling.
Are there any other reasons I should not go for 3 extra small instead of 1 small?
See: Azure pricing calculator.
An Extra Small instance is limited to approx. 5Mbps bandwidth on the NIC (vs. approx. 100Mbps per core with Small, Medium, Large, and XL), and has less than 1GB of RAM. So, let's say you're running something that's very storage-intensive. You could run into bottlenecks accessing SQL Azure or Windows Azure storage.
With RAM: If you're running 3rd-party apps, such as MongoDB, you'll likely run into memory issues.
From a scalability standpoint, you're right that you can spread the load across 2 or 3 Extra Small instances, and you'll have a good SLA. Just need to make sure your memory and bandwidth are good enough for your performance targets.
For more details on exact specs for each instance size, including NIC bandwidth, see this MSDN article.
Look at the fine print - the I/O performance is supposed to be much better with the small instance compared to the x-small instance. I am not sure if this is due to a technology related bottleneck or a business decision, but that's the way it is.
Also I'm guessing the OS takes a bit of RAM in each of the instances, so in 3 X-small instances it takes it up three times instead of just once in a small instance. That would reduce the resources that are actually available for your application needs.
While 3 xtra-small instances theoretically may equal or even be better "on paper" than one small instance, do remember that xtra-small instances do not have dedicated cores and their raw computing resources are shared with other tenants. I've tried these xtra-small instances in an attempt to save money for tiny-load website and must say that there were simply outages or times of horrible performance that I've found unacceptable.
In short: I would not use xtra-small instances for any sort of production environment