AWS EC2 boots via scheduled Lambda, how to alert of errors? - node.js

My EC2 instance boots daily for 5 minutes before shutting down.
On bootup, a NodeJS script is executed. Usually this script will complete long before the 5 minutes are up, but I'd like to be notified (SMS/email) whenever it doesn't.
What is the correct approach? I can try to send a notification within my NodeJS code after 5 minutes if execution wasn't finished, but Lambda could shut down the instance before this occurs.
I'm quite new to AWS so I apologize if this is rather basic, I haven't had luck on Google with this issue.

Can you check if whatever Node script is doing when EC2 instance is up could be replicated with one or more lambda functions.
Think about serverless and microservices architecture. Theoretically any workflow which need servers could be achived via AWS Lambda functions and various triggers. In you case I can think of the following:
SES to send out email messages
API gateway to expose your Lambda function for trigger
Cloud watch events to trigger lambda function like a cronjob.
I would be surprise to learn if Serverless won't work here. Please do share the case so that I can brainstorm more and share a solution.

Related

Publishing over 6000 messages to SNS from a lambda

I have a script that we wanted to run in AWS as a Lambda, written in Python. This will run once ever 24 hours. It describes all the ec2 instances in an environment and I'm checking some details for compliance. After that, we want to send all the instances details to an SNS topic that other similar compliance scripts are using, which are sent to a lambda and from there, into sumologic. I'm currently testing with just one environment, but there is over 1000 instances and so over 1000 messages to be published to SNS, one by one. The lambda times out long before it can send all of these. Once all environments and their instances are checked, it could be close to 6000 messages to publish to SNS.
I need some advice on how to architect this to work with all environments. I was thinking maybe putting all the records into an S3 bucket from my lambda, then creating another lambda that would read each record from the bucket say, 50 at a time, and push them one by one into the SNS topic. I'm not sure though how that would work either.
Any ideas appreciated!
I would recommend to use Step Functions for this problem. Even if you fix the issue for now, sooner or later, with a rising number of instances or additional steps you want to perform the 900 second maximum runtime duration of a single Lambda won't be sufficient anymore.
A simple approach with Step Functions could be:
Step 1: Create a list of EC2 instances to "inspect" with this first Lambda. Could be all instances or just instances with a certain tag etc. You can be as creative as you want.
Step 2: Process this list of instances with a parallel step that calls one Lambda per instance id.
Step 3: The triggered Lambda reads the details from the provided EC2 instance and then publishes the result to SNS. There might already be a pre-defined step for the publication to SNS, so you don't need to program it yourself.
With the new Workflow Studio this should be relatively easy to implement.
Note: This might not be faster than a single Lambda, but it will scale better with more EC2 instances to scan. The only bottleneck here might become the Lambda in Step 1. If that Lambda needs more than 15 minutes to "find" all the EC2 instances to "scan", then you need to become a bit creative. But this is solvable.
I think you could you a SQS queue to solve you problem.
From SNS send the message to a SQS queue. Then from the SQS you lambda can poll for messaged ( default is 10 but you can decrees it though cli ).

Handling time consuming operations using Nodejs and AWS

The current setup of the project I am working on is based on Nodejs/Express and AWS. AWS Lambda is triggered on a daily basis and is used to call an API endpoint which is expected to fire a varying number of emails via Sendgrid (hundreds to thousands). With a lower number of emails it worked fine but when the number of emails was around 1000 the Lambda timed out and the API crashed.
The limit on Lambda was 1 minute. Raising it up to 5 minutes might make this case of 1000 emails pass but might fail when the number is several thousands. Apart from that we would like to avoid keeping the server busy for several minutes because of which it was set to 1 minute initially.
We are now looking for better solutions to this specific situation. What would be a better approach, is it an option to use SNS Queue, or Serverless with moving all the code that sends the emails to Lambda?
Thanks for any inputs in advance and if more information is required please let me know.
Lambdas are not designed for long running operations. You can use Elastic Beanstalk Workers https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html
Briefly, the lambda publish the task to an SQS queue and an elastic beanstalk app handles it.

Schedule a task to run at some point in the future (architecture)

So we have a Python flask app running making use of Celery and AWS SQS for our async task needs.
One tricky problem that we've been facing recently is creating a task to run in x days, or in 3 hours for example. We've had several needs for something like this.
For now we create events in the database with timestamps that store the time that they should be triggered. Then, we make use of celery beat to run a scheduled task every second to check if there are any events to process (based on the trigger timestamp) and then process them. However, this is querying the database every second for events which we feel could be bettered somehow.
We looked into using the eta parameter in celery (http://docs.celeryproject.org/en/latest/userguide/calling.html) that lets you schedule a task to run in x amount of time. However it seems to be bad practice to have large etas and also AWS SQS has a visibility timeout of about two hours and so anything more than this time would cause a conflict.
I'm scratching my head right now. On the one had this works, and pretty decent in that things have been separated out with SNS, SQS etc. to ensure scaling-tolerance. However, it just doesn't feel write to query the database every second for events to process. Surely there's an easier way or a service provided by Google/AWS to schedule some event (pub/sub) to occur at some time in the future (x hours, minutes etc.)
Any ideas?
Have you taken a look at AWS Step Functions, specifically Wait State? You might be able to put together a couple of lambda functions with the first one returning a timestamp or the number of seconds to wait to the Wait State and the last one adding the message to SQS after the Wait returns.
Amazon's scheduling solution is the use of CloudWatch to trigger events. Those events can be placing a message in an SQS/SNS endpoint, triggering an ECS task, running a Lambda, etc. A lot of folks use the trick of executing a Lambda that then does something else to trigger something in your system. For example, you could trigger a Lambda that pushes a job onto Redis for a Celery worker to pick up.
When creating a Cloudwatch rule, you can specify either a "Rate" (I.e., every 5 minutes), or an arbitrary time in CRON syntax.
So my suggestion for your use case would be to drop a cloudwatch rule that runs at the time your job needs to kick off (or a minute before, depending on how time sensitive you are). That rule would then interact with your application to kick off your job. You'll only pay for the resources when CloudWatch triggers.
Have you looked into Amazon Simple Notification Service? It sounds like it would serve your needs...
https://aws.amazon.com/sns/
From that page:
Amazon SNS is a fully managed pub/sub messaging service that makes it easy to decouple and scale microservices, distributed systems, and serverless applications. With SNS, you can use topics to decouple message publishers from subscribers, fan-out messages to multiple recipients at once, and eliminate polling in your applications. SNS supports a variety of subscription types, allowing you to push messages directly to Amazon Simple Queue Service (SQS) queues, AWS Lambda functions, and HTTP endpoints. AWS services, such as Amazon EC2, Amazon S3 and Amazon CloudWatch, can publish messages to your SNS topics to trigger event-driven computing and workflows. SNS works with SQS to provide a powerful messaging solution for building cloud applications that are fault tolerant and easy to scale.
You could start the job with apply_async, and then use a countdown, like:
xxx.apply_async(..., countdown=TTT)
It is not guaranteed that the job starts exactly at that time, depending on how busy the queue is, but that does not seem to be an issue in your use case.

AWS Lambda with Node.JS in production

I want to use Node.js in AWS Lambda in production.
The question is how to make this reliable.
For me, reliability means:
Retrying some parts of code - this exists from-the-box in AWS Lambda
Exception notification - I tried Airbrake, but it does not work in AWS Lambda - process.on('uncaughtException') does not work
Possibility to know if something is down and even exception notification does not work - in a usual app, I have the healthcheck endpoint.
So how can I implement 2 and 3 points?
One idea is using a SQS queue as a Dead Letter Queue, which can be set up for that lambda (http://docs.aws.amazon.com/lambda/latest/dg/dlq.html). So you can monitor that queue to analyze the inputs that made the function to fail and take some action.
For logging, you can use winston(https://github.com/winstonjs/winston). Works fine with AWS lambdas.
In your lambda workflow I guess you have some kind of error handler function where is anything happens, It ends there. You can use that error handler to send you emails with the event and context inputs and other error thingy descriptions.
Also you can monitor lambdas via cloudfront, and create an alarm that sends you certain data if somethign went wrong.

How to optimize AWS Lambda?

I'm currently building web API using AWS Lambda with Serverless Framework.
In my lambda functions, each of them connects to Redis (elasticache) and RDB (Aurora, RDS) or DynamoDB to retrieve data or write new data.
And all my lambda functions are running in my VPC.
Everything works fine except that when a lambda function is first executed or executed a while after last execution, it takes quite a long time (1-3 seconds) to execute the lambda function, or sometimes it even respond with a gateway timeout error (around 30 seconds), even though my lambda functions are configured to 60 seconds timeout.
As stated in here, I assume 1-3 seconds is for initializing a new container. However, I wonder if there is a way to reduce this time, because 1-3 seconds or gateway timeout is not really an ideal for production use.
You've go two issues:
The 1-3 second delay. This is expected and well-documented when using Lambda. As #Nick mentioned in the comments, the only way to prevent your container from going to sleep is using it. You can use Lambda Scheduled Events to execute your function as often as every minute using a rate expression rate(1 minute). If you add some parameters to your function to help you distinguish between a real request and one of these ping requests you can immediately return on the ping requests and then you've worked around your problem. It will cost you more, but we're probably talking pennies per month if anything. Lambda has a generous free tier.
The 30 second delay is unusual. I would definitely check your CloudWatch logs. If you see logs from when your function is working normally but no logs from when you see the 30 second timeout then I would assume the problem is with API Gateway and not with Lambda. If you do see logs then maybe they can help you troubleshoot. Another place to check is the AWS Status Page. I've seen sometimes where Lambda functions timeout and respond intermittently and I pull my hair out only to realize that there's a problem on Amazon's end and they're working on it.
Here's a blog post with additional information on Lambda Container Reuse that, while a little old, still has some good information.

Resources