AWS AutoScaling Not Scaling Up

AWS AutoScaling Not Scaling Up - linux

I've setup an AWS AutoScaling group. Have 2 alarms to increase the number of servers if the average load is above 65% and decrease if it's less than 35%. Not sure what the final numbers will be, but this is what I initially used. I ran a yes >& /dev/null command on the linux server and the load very quickly went up to 100% (as reported by linux top command), but no new instances were being launched, because I think the alarms were not triggering. How exactly is the cpu load average computed/retrieved by the Auto Scaler?
I also, as an experiment, killed responding to the AWS ping commands from the server and thus, it was deemed not healthy by the AWS. The server was terminated and a new one was started up. So, I know that launching/terminating of servers is working in the Auto Scaler due to "health" reason.
What else should I look at to diagnose the problem?
Is my way of stressing the server not the "right" way as far as the Auto Scaler is concerned?
Is it using a different benchmark?

[This is a comment not an answer]
You can use set-alarm-state in aws cli to trigger your alarms
aws cloudwatch set-alarm-state --alarm-name "myalarm" --state-value ALARM --state-reason "testing purposes"
This way you can easily test them out. If you still have problems then maybe you can post the output of
aws cloudwatch describe-alarms --alarm-names "myalarm"

NOTE: Your Average load from both the instances should cross 65% only then a new instance is launched. So, in your case the load on both the instances must cross 65%. Only then AutoScaling Group launches a new instance.
You can use tools such as BeesWithMachineGuns, Loadrunner and other Load testing tools to increase load of your server such that it goes above 65%.
Suggestion: Check your server load on Cloudwatch metrics rather than from inside the server( using top). This will give you a clear picture of how AWS is calculating your Instance load.

Related

AWS Sagemaker inference endpoint doesn't scale in with autoscaling

I have an AWS Sagemaker inference endpoint with autoscaling enabled with SageMakerVariantInvocationsPerInstance target metric. When I send a lot of requests to the endpoint the number of instances correctly scales out to the maximum instance count. But after I stop sending the requests the number of instances doesn't scale in to 1, minimum instance count. I waited for many hours. Is there a reason for this behaviour?
Thanks

AutoScaling requires a cloudwatch alarm to trigger to scale in. Sagemaker doesn't push 0 value metrics when there's no activity (it just doesn't push anything). This leads to the alarm being put into insufficient data and not triggering the autoscaling scale in action when your workload suddenly ends.
Workarounds are either:
Have a step scaling policy using the cloudwatch metric math FILL() function for your scale in. This way you can tell CloudWatch "if there's no data, pretend this was the metric value when evaluating the alarm. This is only possible with step scaling since target tracking creates the alarms for you (and AutoScaling will periodically recreate them, so if you make manual changes they'll get deleted)
Have scheduled scaling set the size back down to 1 every evening
Make sure traffic continues at a low level for some times

AWS EBS runs into "504 Gateway Time-out"

I'm new to using AWS EBS and ECS, so please bear with me if I ask questions that might be obvious for others. To the issue:
I've got a single-container Node/Express application that runs on EBS. The local docker container works as expected. On EBS, I can access one endpoint of the API and get the expected output. For the second endpoint, which runs longer (around 10-15 seconds) I get no response and run after 60 seconds into a time out: "504 Gateway Time-out".
I wonder how I would approach debugging this as I can't connect to the container directly? Currently there isn't any debugging functionality in the code included either as I'm not sure what the best node approach for a EBS container is - any recommendations are highly appreciated.
Thank you in advance!

You can see the EC2 instances running on EBS in your AWS, and you can choose to give them IP addresses in your EBS options. That will let you SSH directly into them if you need to.
Otherwise check the keepAliveTimeout field in your server (the value returned by app.listen() of you're using express).
I got a decent number of 504s when my Node server timeout was less than my load balancer timeout.

Your application takes longer than expected (> 60 seconds) to respond, so either nginx or the Load Balancer terminates your request.
See my answer here

How to find/cure source of function app throughput issues

I have an Azure function app triggered by an HttpRequest. The function app reads the request, tosses one copy of it into a storage table for safekeeping and sends another copy to a queue for further processing by another element of the system. I have a client running an ApacheBench test that reports approximately 148 requests per second processed. That rate of processing will not be enough for our expected load.
My understanding of function apps is that it should spawn as many instances as is needed to handle the load sent to it. But this function app might not be scaling out quickly enough as it’s only handling that 148 requests per second. I need it to handle at least 200 requests per second.
I’m not 100% sure the problem is on my end, though. In analyzing the performance of my function app I found a LOT of 429 errors. What I found online, particularly https://learn.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-request-limits, suggests that these errors could be due to too many requests being sent from a single IP. Would several ApacheBench 10K and 20K request load tests within a given day cause the 429 error?
However, if that’s not it, if the problem is with my function app, how can I force my function app to spawn more instances more quickly? I assume this is the way to get more throughput per second. But I’m still very new at working with function apps so if there is a different way, I would more than welcome your input.
Maybe the Premium app service plan that’s in public preview would handle more throughput? I’ve thought about switching over to that and running a quick test but am unsure if I’d be able to switch back?
Maybe EventHub is something I need to investigate? Is that something that might increase my apparent throughput by catching more requests and holding on to them until the function app could accept and process them?
Thanks in advance for any assistance you can give.

You dont provide much context of you app but this is few steps how you can improve
If you want more control you need to use App Service plan with always on to avoid cold start, also you will need to configure auto scaling since you are responsible in this plan and auto scale is not enabled by default in app service plan.
Your azure function must be fully async as you have external dependencies so you dont want to block thread while you are calling them.
Look on the limits. Using host.json you can tweek it.
429 error means that function is busy to process your request, so probably when you writing to table you are not using async and blocking thread

Function apps work very well and scale as it says. It could be because request coming from Single IP and Azure could be considering it DDOS. You can do the following
AzureDevOps Load Test
You can load test using one of the azure service . I am very sure they have better criteria of handling IPs. Azure DeveOps Load Test
Provision VM in Azure
The way i normally do is provision the VM (windows 10 pro) in azure and use JMeter to Load test. I have use this method to test and it works fine. You can provision couple of them and subdivide the load.
Use professional Load testing services
If possible you may use services like Loader.io . They use sophisticated algos to run the load test and provision bunch of VMs to run the same test.
Use Application Insights
If not already you must be using application insights to have a better look from server perspective. Go to live stream and see how many instance it would provision to handle the load test . You can easily look into events and error logs that may be arising and investigate. You can deep dive into each associated dependency and investigate the problem.

Run Node JS on a multi-core cluster cloud

Is there a service or framework or any way that would allow me to run Node JS for heavy computations letting me choose the number of cores?
I'll be more specific: let's say I want to run some expensive computation for each of my users and I have 20000 users.
So I want to run the expensive computation for each user on a separate thread/core/computer, so I can finish the computation for all users faster.
But I don't want to deal with low level server configuration, all I'm looking for is something similar to AWS Lambda but for high performance computing, i.e., letting me scale as I please (maybe I want 1000 cores).
I did simulate this with AWS Lambda by having a "master" lambda that receives the data for all 20000 users and then calls a "computation" lambda for each user. Problem is, with AWS Lambda I can't make 20000 requests and wait for their callbacks at the same time (I get a request limit exceeded error).
With some setup I could user Amazon HPC, Google Compute Engine or Azure, but they only go up to 64 cores, so if I need more than that, I'd still have to setup all the machines I need separately and orchestrate the communication between them with something like Open MPI, handling the different low level setups for master and compute instances (accessing via ssh and etc).
So is there any service I can just paste my Node JS code, maybe choose the number of cores and run (not having to care about OS, or how many computers there are in my cluster)?
I'm looking for something that can take that code:
var users = [...];
function expensiveCalculation(user) {
// ...
return ...;
}
users.forEach(function(user) {
Thread.create(function() {
save(user.id, expensiveCalculation(user));
});
});
And run each thread on a separate core so they can run simultaneously (therefore finishing faster).

I think that your problem is that you feel the need to process 20000 inputs at once on the same machine. Have you looked into SQS from Amazon? Maybe you push those 20000 inputs into SQS and then have a cluster of servers pull from that queue and process each one individually.
With this approach you could add as many servers, processes or add as many AWS Lambda invokes as you want. You could even use a combination of the 3 to see what's cheaper or faster. Adding resources will only reduce the amount of time it would take to complete the computations. Then you wouldn't have to wait for 20000 requests or anything to complete. The process could tell you when it completes the computation by sending some notification after it completes.
So basically, you could have a simple application that just grabbed 10 of these inputs at a time and ran your computation on them. After it finishes you could then have this process delete them from SQS and send a notification somewhere (Maybe SNS?) to notify the user or some other system that they are done. Then it would repeat the process.
After that you could scale the process horizontally and you wouldn't need a super computer in order to process this. So you could either get a cluster of EC2 instances that ran several of these applications a piece or have a Lambda function invoked periodically in order to pull items out of SQS and process them.
EDIT:
To get started using an EC2 instance I would look at the docs here. To start with I would pick the smallest, cheapest instance (T2.micro I think), and leave everything at it's default. There's no need to open any port other than the one for SSH.
Once it's setup and you login, the first thing you need to do is run aws configure to setup your profile that way you can access AWS resources from the instance. After that install Node and get your application on there using git or something. Once it's setup though, go to the EC2 console and in your Actions menu there will be an option to create an image from the instance.
Once you create an image, then you can go to Auto Scaling groups and create a launch configuration using that AMI. Then it'll let you specify how many instances you want to run.
I feel like this could also be done more easily using their container service, but honestly I don't know how to use it yet.

How to fail over node.js timer on amazon load balancer?

I have setup 2 instance under aws load balancer. I have deployed node.js web services + mongodb in both instance. load balancer works fine with web services.
But, Problem is I have one timer service (node.js service only). the behavior of this timer is updating my mongodb based on some calculation.
My problem is, I must need to run this timer service (timer.js) at only one aws instance (out of 2) at same time. and expected that if one aws instance goes down then timer service at other instance will come up.
i know elb not providing this kind of facility.Can any one please help me to make it done ?
Condition : At a time only one timer service must be run with amazon load balancer.
Thanks.

You would have to implement this yourself using a locking algorithm using a shared data store that supports atomic operations
Alternatively, consider starting a "timer" server in an Auto Scale Group of Min:1, Max: 1 so Amazon keeps it running. This instance can be a t2.micro which is very cheap. It can either run the job itself, or just make an http request to your load balancer to run the job at the desired internal. If you so that, only one of your servers will run each job

Wouldn't it make more sense to handle this like any other "service" that needs to keep running?
upstart service
running node.js server using upstart causes 'terminated with status 127' on 'ubuntu 10.04'
This guy had a bad path in his file but his upstart script looks okay
monit
Node.js (sudo) and monit

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string