AWS Lambda with Node.JS in production - node.js

I want to use Node.js in AWS Lambda in production.
The question is how to make this reliable.
For me, reliability means:
Retrying some parts of code - this exists from-the-box in AWS Lambda
Exception notification - I tried Airbrake, but it does not work in AWS Lambda - process.on('uncaughtException') does not work
Possibility to know if something is down and even exception notification does not work - in a usual app, I have the healthcheck endpoint.
So how can I implement 2 and 3 points?

One idea is using a SQS queue as a Dead Letter Queue, which can be set up for that lambda (http://docs.aws.amazon.com/lambda/latest/dg/dlq.html). So you can monitor that queue to analyze the inputs that made the function to fail and take some action.
For logging, you can use winston(https://github.com/winstonjs/winston). Works fine with AWS lambdas.

In your lambda workflow I guess you have some kind of error handler function where is anything happens, It ends there. You can use that error handler to send you emails with the event and context inputs and other error thingy descriptions.
Also you can monitor lambdas via cloudfront, and create an alarm that sends you certain data if somethign went wrong.

Related

Is it possible for multiple AWS Lambdas to service a single HTTP request?

On AWS, is it possible to have one HTTP request execute a Lambda, which in turn triggers a cascade of Lambdas running in serial, where the final Lambda returns the result to the user?
I know one way to achieve this is for the initial Lambda to "stay running" and orchestrate the other Lambdas, but I'd be paying for that orchestration Lambda to effectively do nothing most of the time, i.e. paying for the time it's waiting on the others. If it were non-lambda code, that would be like blocking (and paying for) an entire thread while the other threads do their work.
Unless AWS stops the billing clock while async Lambdas are "sleeping"/waiting on network IO?
Unfortunately as you've found only a single Lambda function can be invoked, this becomes an orchestrator.
This is not ideal but will have to be the case if you want to use multiple Lambda functions as you're serving a HTTP request, you can either use the Lambda to call a number of Lambda or instead create a Step Function which can can orchestrate the individual steps. You would still need the Lambda to start this, and then poll the status of it before returning the results.

AWS EC2 boots via scheduled Lambda, how to alert of errors?

My EC2 instance boots daily for 5 minutes before shutting down.
On bootup, a NodeJS script is executed. Usually this script will complete long before the 5 minutes are up, but I'd like to be notified (SMS/email) whenever it doesn't.
What is the correct approach? I can try to send a notification within my NodeJS code after 5 minutes if execution wasn't finished, but Lambda could shut down the instance before this occurs.
I'm quite new to AWS so I apologize if this is rather basic, I haven't had luck on Google with this issue.
Can you check if whatever Node script is doing when EC2 instance is up could be replicated with one or more lambda functions.
Think about serverless and microservices architecture. Theoretically any workflow which need servers could be achived via AWS Lambda functions and various triggers. In you case I can think of the following:
SES to send out email messages
API gateway to expose your Lambda function for trigger
Cloud watch events to trigger lambda function like a cronjob.
I would be surprise to learn if Serverless won't work here. Please do share the case so that I can brainstorm more and share a solution.

Notify Lambda on CloudFront Distribution Creation End

At the moment, we are calling cloudfront.listDistributions() every minute to identify a change in the status of the distribution we are deploying. This cause Lambda to timeout because CloudFront never deploys faster than 30 minutes (where Lambda timeouts after 15 min).
I would like to notify a Lambda function after a CloudFront Distribution is successfully created. This would allow us to execute the post-creation actions while saving valuable Lambda exec time.
Creating a Rule on CloudWatch does not offer the option to chose CloudFront. Nevertheless, it seems to accept creating a Custom Event Pattern with the source aws.cloudformation.
Considering options:
Trigger a lambda every 5 minutes to list distributions and compare states with previous states stored in DynamoDB.
Anybody with an idea to overcome this lack of feature from AWS?
If you want and have time, there's a trickier and a bit more complex solution for doing that leveraging CloudTrail.
Disclaimer
CloudTrail is not a real-time log system, but ensure that all API calls will be reported on the console within 15 minutes (as stated here under the CloudTrail FAQs). Due to this, what's following makes sense only for long-running tasks like creating a CloudFront distribution, taking up an Aurora DB ans so on.
You can create a CloudWatch event based rule (let's call it CW-r1)
on specific pattern like CreateDistribution or
UpdateDistribution.
CW-r1 triggers a Lambda (LM-1) which enables another CloudWatch
event base rule (CW-r2).
CW-r2 on a scheduled base, triggers a Lambda (LM-2) which via API
request the state of specific distribution. Once distribution is
"Deployed", LM-2 can send a notification via SNS for example (you can
send EMAIL, SMS, Push Notification whatever is supported on SNS).
Once everything is finished, LM-2 can disable the CW-r2 rule in order
to stop processing information.
In this way you can have an automatic notification system based on which API call you desire.

Microservicess with Serverless (Lambda or Function)

I have some concern on getting an idea of migrating current microservices system into serverless.
Right now, between services are communicating with HTTP or API based.
Serverless like lambda or function can talk to each other with function call or lambda call. This way can be done by changing all HTTP code into lambda call within all services.
Another way is still using HTTP request to call another service that on lambda through API Gateway. This method of calling is not good because the service request gone to Internet and go back again into API Gateway then neighbor service get the request. Too long and does not make sense for me.
I will be glad if lambda app call another lambda app with local network HTTP request, this is still on my research on how to do it.
I would like to know from all of you about your experience on migrating microservices based on HTTP communication between services into serverless like Lambda or Functions ?
Do you change all your code into specific lambda function call ?
Do you use HTTP over internet and API Gateway again to call neighbor service ?
Have you guys figured it out on Local / Private network lambda call ?
Thank You
Am I correct that you're talking about the orchestration of your microservices/functions?
If so have you looked at AWS Step Functions or Durable Functions on Azure?
AWS Step Functions
AWS Step Functions lets you coordinate multiple AWS services into serverless workflows so you can build and update apps quickly. Using Step Functions, you can design and run workflows that stitch together services such as AWS Lambda and Amazon ECS into feature-rich applications. Workflows are made up of a series of steps, with the output of one step acting as input into the next. Application development is simpler and more intuitive using Step Functions, because it translates your workflow into a state machine diagram that is easy to understand, easy to explain to others, and easy to change. You can monitor each step of execution as it happens, which means you can identify and fix problems quickly. Step Functions automatically triggers and tracks each step, and retries when there are errors, so your application executes in order and as expected.
Source: https://aws.amazon.com/step-functions/
Azure Durable Functions
The primary use case for Durable Functions is simplifying complex, stateful coordination problems in serverless applications. The following sections describe some typical application patterns that can benefit from Durable Functions: Function Chaining, Fan-out/Fan-in, Async HTTP APIs, Monitoring.
Source: https://learn.microsoft.com/en-us/azure/azure-functions/durable-functions-overview
You should consider communicating using queues. When one function finishes, it puts the results into the Azure Storage Queue, which is picked up by another function. Therefore there is no direct communication between functions unless it's necessary to trigger the other function.
In other words, it may look like this
function1 ==> queue1 <== function2 ==> queue2 <== function 3 ==> somewhere else, e.g. storage

Schedule a task to run at some point in the future (architecture)

So we have a Python flask app running making use of Celery and AWS SQS for our async task needs.
One tricky problem that we've been facing recently is creating a task to run in x days, or in 3 hours for example. We've had several needs for something like this.
For now we create events in the database with timestamps that store the time that they should be triggered. Then, we make use of celery beat to run a scheduled task every second to check if there are any events to process (based on the trigger timestamp) and then process them. However, this is querying the database every second for events which we feel could be bettered somehow.
We looked into using the eta parameter in celery (http://docs.celeryproject.org/en/latest/userguide/calling.html) that lets you schedule a task to run in x amount of time. However it seems to be bad practice to have large etas and also AWS SQS has a visibility timeout of about two hours and so anything more than this time would cause a conflict.
I'm scratching my head right now. On the one had this works, and pretty decent in that things have been separated out with SNS, SQS etc. to ensure scaling-tolerance. However, it just doesn't feel write to query the database every second for events to process. Surely there's an easier way or a service provided by Google/AWS to schedule some event (pub/sub) to occur at some time in the future (x hours, minutes etc.)
Any ideas?
Have you taken a look at AWS Step Functions, specifically Wait State? You might be able to put together a couple of lambda functions with the first one returning a timestamp or the number of seconds to wait to the Wait State and the last one adding the message to SQS after the Wait returns.
Amazon's scheduling solution is the use of CloudWatch to trigger events. Those events can be placing a message in an SQS/SNS endpoint, triggering an ECS task, running a Lambda, etc. A lot of folks use the trick of executing a Lambda that then does something else to trigger something in your system. For example, you could trigger a Lambda that pushes a job onto Redis for a Celery worker to pick up.
When creating a Cloudwatch rule, you can specify either a "Rate" (I.e., every 5 minutes), or an arbitrary time in CRON syntax.
So my suggestion for your use case would be to drop a cloudwatch rule that runs at the time your job needs to kick off (or a minute before, depending on how time sensitive you are). That rule would then interact with your application to kick off your job. You'll only pay for the resources when CloudWatch triggers.
Have you looked into Amazon Simple Notification Service? It sounds like it would serve your needs...
https://aws.amazon.com/sns/
From that page:
Amazon SNS is a fully managed pub/sub messaging service that makes it easy to decouple and scale microservices, distributed systems, and serverless applications. With SNS, you can use topics to decouple message publishers from subscribers, fan-out messages to multiple recipients at once, and eliminate polling in your applications. SNS supports a variety of subscription types, allowing you to push messages directly to Amazon Simple Queue Service (SQS) queues, AWS Lambda functions, and HTTP endpoints. AWS services, such as Amazon EC2, Amazon S3 and Amazon CloudWatch, can publish messages to your SNS topics to trigger event-driven computing and workflows. SNS works with SQS to provide a powerful messaging solution for building cloud applications that are fault tolerant and easy to scale.
You could start the job with apply_async, and then use a countdown, like:
xxx.apply_async(..., countdown=TTT)
It is not guaranteed that the job starts exactly at that time, depending on how busy the queue is, but that does not seem to be an issue in your use case.

Resources