Google Cloud Run not scaling as expected - node.js

I'm using Google Cloud Run to run a pretty basic Express / Node JS backend container. I receive fairly low number of requests per day, and only the occasional concurrent request.
However, I can see on my Cloud Run dashboard that Cloud Run sometimes scale up to 4 instances, most of the time to at least 2 instances. I know that my app load is so low that I'll pretty much never need more than 1 instance, so why is Cloud Run being so wasteful?
My settings is set as maximum 40 requests concurrently; minimum 0 containers and maximum 4 containers.
Container instance counts fluctuates substantially. Green line is idle containers and blue line is active containers.
My CPU usage is also very low:

You know your workload profile and the expected request. Cloud Run autoscaler does not. Therefore, it over provisions additional instances in case of traffic spike.
Of course, YOU know that will never happen, but IT doesn't.
Cloud Run is pretty well designed for average traffic. If you are at one extremity of this standard usage (very low traffic or very high, very spiky traffic), yes, the Cloud Run autoscaler provisioning model doesn't work so well.
However, what's the problem? You pay only when a request is processed on an instance. If there are over provisioned and not used instances, you won't pay them. It's a waste of money for Google, not for you.
Your only concern could be for the earth and the resource saving, and you have absolutely right.

Related

Choosing the right EC2 instance for three NodeJS Applications

I'm running three MEAN stack programmes. Each application receives over 10,000 monthly users. Could you please assist me in finding an EC2 instance for my apps?
I've been using a "t3.large" instance with two vCPUs and eight gigabytes of RAM, but it costs $62 to $64 per month.
I need help deciding which EC2 instance to use for three Nodejs applications.
First check CloudWatch metrics for the current instances. Is CPU and memory usage consistent over time? Analysing the metrics could help you to decide whether you should select a smaller/bigger instance or not.
One way to avoid too unnecessary costs is to use auto scaling groups and load balancers. By using them and finding and applying proper settings, you could have always right amount of computing power for your applications.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html
https://docs.aws.amazon.com/autoscaling/ec2/userguide/auto-scaling-groups.html
Depends on your applications. If your apps need more compute power or more memory or more storage? Deciding a server is similar to installing an app on system. Check what are basic requirements for it & then proceed to choose server.
If you have 10k+ monthly customers, think about using ALB so that traffic gets distributed evenly. Try caching to server some content if possible. Use unlimited burst mode of t3 servers if CPU keeps hitting 100%. Also, try to optimize code so that fewer resources are consumed. Once you are comfortable with ec2 choice, try to purchase saving plans or RIs for less cost.
Also, do monitor the servers & traffic using Cloudwatch agent, internet monitor etc features.

Aws Aurora serverless v2 will not scale down to .5 ACU even though 0 connections

I'm running a v2 instance and from the documentation aws states you should only be paying for resources that you are actually using. I have an instance than is most of the time at 0 connections but it never scales down under 2 ACUs. See images below for reference. I have the instance setup to scale between 0.5-16ACU. But it doesn't seem to matter the load it always stays at a baseline of 2ACUs.
I had to turn off the AI monitoring on the DB. Then restart the instance. This then started the db at the minimum.
I can confirm this behaviour but as yet can't explain it. We have three databases running, all with the same schema and with different ACU limits set. Our production and staging databases insist at near flatlines close to the max capacity allowed whilst one other behaves as we would expect and only shows an upscale when we actually send it load.
We have tried rebooting the instances but they immediately scale up and do not appear willing to scale down.
We have full support with AWS so will raise a ticket with them and should report back here if we get an explanation/solution

What is the optimal architecture design on Azure for an infrequently used backend that needs a robust configuration?

I'm trying to find the optimal cloud architecture to host a software on Microsoft Azure.
The scenario is the following:
A (containerised) REST API is exposed to the users through which they can submit POST and GET requests. POST requests trigger a backend that needs a robust configuration to operate properly and GET requests are sent to fetch the result of the backend, if any. This component of the solution is currently hosted on an Azure Web App Service which does the job perfectly.
The (containerised) backend (triggered by POST requests) perform heavy calculations during a short amount of time (typically 5-10 minutes are allotted for the calculation). This backend needs (at least) 4 cores and 16 Gb RAM, but the more the better.
The current configuration consists in the backend hosted together with the REST API on the App Service with a plan that accommodates the backend's requirements. This is clearly not very cost-efficient, as the backend is idle ~90% of the time. On top of that it's not really scalable despite an automatic scaling rule to spawn new instances based on the CPU use: it's indeed possible that if several POST requests come at the same time, they are handled by the same instance and make it crash due to a lack of memory.
Azure Functions doesn't seem to be an option: the serverless (consumption plan) solution they propose is restricted to 1.5 Gb RAM and doesn't have Docker support.
Azure Container Instances neither, because first the max number of CPUs is 4 (which is really few for the needs here, although acceptable) and second there are cold starts of approximately 2 minutes (I imagine due to the creation of the container group, pull of the image, and so on). Despite the process is async from a user perspective, a high latency is not allowed as the result is expected within 5-10 minutes, so cold starts are a problem.
Azure Batch, which at first glance appears to be a perfect fit (beefy configurations available, made for hpc, cost effective, made for time limited tasks, ...) seems to be slow too (it takes a couple of minutes to create a pool and jobs don't run immediately when submitted).
Do you have any idea what I could use?
Thanks in advance!
Azure Functions
You could look at Azure Functions Elastic Premium plan. EP3 has 4 cores, 14GB of RAM and 250GB of storage.
Premium plan hosting provides the following benefits to your functions:
Avoid cold starts with perpetually warm instances
Virtual network connectivity.
Unlimited execution duration, with 60 minutes guaranteed.
Premium instance sizes: one core, two core, and four core instances.
More predictable pricing, compared with the Consumption plan.
High-density app allocation for plans with multiple function apps.
https://learn.microsoft.com/en-us/azure/azure-functions/functions-premium-plan?tabs=portal
Batch Considerations
When designing an application that uses Batch, you must consider the possibility of Batch not being available in a region. It's possible to encounter a rare situation where there is a problem with the region as a whole, the entire Batch service in the region, or your specific Batch account.
If the application or solution using Batch always needs to be available, then it should be designed to either failover to another region or always have the workload split between two or more regions. Both approaches require at least two Batch accounts, with each account located in a different region.
https://learn.microsoft.com/en-us/azure/batch/high-availability-disaster-recovery

Does startup time for Cloudrun docker image delays autoscaling?

I am currently using express.js as the main backend. However, I have found that fastify is generally faster in performance.
But the downside of fastify is that it has a relatively slow startup time.
I am curious that would it slow down the autoscaling of Cloudrun?
I've seen that Cloudrun autoscales when the usage is over 60%.
In this case, I am thinking slow startup time can delay the response while autoscaling which can be a reason not to use fastify. How does this exactly work?
A slow cold start doesn't influence the autoscaller. The service scale according to the CPU usage and the number of queries.
If you track the number of created instance with benchmarks, you can see that you can have suddenly 50 concurrent requests, 5 to 10 instances are created by the Cloud Run autoscaller. Why? Because if the traffic suddenly increase with a lot of concurrent request, it can means that slope can continue and you can have soon 100 or 200 concurrent request and Cloud Run service is prepared to absorb that traffic.
If you scale slowly to 50 concurrent requests, you can have only 1 or 2 instances set up.
Anyway, it's just a thing that I noted in my previous tests. In addition, keep in mind that, if you have a sustained traffic, the cold start is a marginal case. You will lost a few milliseconds "rarely", and it doesn't imply a framework change.
My recommendation is to keep what is the best and the most efficient for you (you cost at least 100 times more than the Cloud Run cost!!)

Run many low traffic webapps on a single machine so that a webapp only starts when it is needed?

I'm working on several different webapps written in node. Each webapp has very little traffic (maybe a few HTTP requests per day) so I run them all on a single machine with haproxy as a reverse proxy. It seems each webapp is consuming almost 100MB RAM memory which adds up to a lot when you have many webapps. Because each webapp receives so little traffic I was wondering if there is a way to have all the webapps turned off by default but setup so that they automatically start if there is an incoming HTTP request (and then turn off again if there hasn't been any HTTP requests within some fixed time period).
Yes. These a dozen different ways to handle this. With out more details not sure the best way to handle this. One option is using node VM https://nodejs.org/api/vm.html Another would be some kind of Serverless setup. See: https://www.serverless.com/ Honestly, 100MB is a drop in the bucket with ram prices these days. Quick google shows 16GB ram for $32 or to put that differently, 160 nodes apps. I'm guessing you could find better prices on EBay or a something like that.
Outside learning this would be a total waste of time. Your time is worth more than the effort it would take to set this up. If you only make minimum wage in the US it'd take you less than 4 hours to make back the cost of the ram. Better yet go learn Docker/k8s and containerize each of those apps. That said learning Serverless would be a good use of time.

Resources