AWS SQS triggering from lambda to Amazon Lightsails - node.js

I am a newbie to AWS world and i have been stuck in a situation where i want to send a download file URL to amazon lightsails server which is a light version of ec2 and from there the file will get downloaded to s3 .
Now here there might be a lot of download file URL generated and transmitted by lambda but my amazon lightsails server cannot handle all those large operations simultaneously and for that i thought of using AWS SQS
Where i will transfer data(download URL, credentials for file upload to s3) to my aws SQS and from their SQS will make a queue of maximum 10 entries and then will trigger and send data in a synchronous manner to amazon lightsails server so that there wont be any choked up or bottle neck condition occurring on server end . Any idea how can that be achieved ?
Thanks in Advance !

You described a common scenario where AWS Lambda is not suitable as a final cumputing service due to the unpredictable ammount of time & disk consumed. Lightsails (EC2) must be used instead.
You may use Lambda in order to send new messajes/jobs to SQS using any AWS SDK. Your Lightsails server now must to poll for pending messages in your SQS queue.
If you have some limitations related to the length of your queue or the retention period or any scale related problem, you can again use a scheduled lambda function in order to inspect your queue and scale horizontally as many Lightsails (EC2) instances it requires.
Here you can find a useful post. Your solution may looks like this:

AWS SQS cannot send(push) data to servers. From your lightsails server you will have to poll the SQS.
So in your lightsails server you can poll for a batch of say 10 messages (based on your processing capability) and then when you are through processing that poll for the next batch of requests

Related

What service can be used for concurrent processing of AWS SQS Message request?

We need to process thousands of Messages in real time with a standard delay of 15mins. We thought of using AWS Lambda and we found that it has a limit of 1000 concurrent request, ECS also has some limitations when it comes to concurrency. What other services can be used for such use case?

Media conversion on AWS

I have an API written in nodeJS (/api/uploadS3) which is a PUT request and accepts a video file and a URL (AWS s3 URL in this case). Once called its task is to upload the file on the s3 URL.
Now, users are uploading files to this node API in different formats (thanks to the different browsers recording videos in different formats) and I want to convert all these videos to mp4 and then store them in s3.
I wanted to know what is the best approach to do this?
I have 2 solutions till now
1. Convert on node server using ffmpeg -
The issue with this is that ffmpeg can only execute a single operation at a time. And since I have only one server I will have to implement a queue for multiple requests which can lead to longer waiting times for users who are at the end of the queue. Apart from that, I am worried that during any ongoing video conversion if my node's traffic handling capability will be affected.
Can someone help me understand what will be the effect of other requests coming to my server while video conversion is going on? How will it impact the RAM, CPU usage and speed of processing other requests?
2. Using AWS lambda function -
To avoid load on my node server I was thinking of using an AWS lambda server where my node API will upload the file to S3 in the format provided by the user. Once, done s3 will trigger a lambda function which can then take that s3 file and convert it into .mp4 using ffmpeg or AWS MediaConvert and once done it uploads the mp4 file to a new s3 path. Now I don't want the output path to be any s3 path but the path that was received by the node API in the first place.
Moreover, I want the user to wait while all this happens as I have to enable other UI features based on the success or error of this upload.
The query here is that, is it possible to do this using just a single API like /api/uploadS3 which --> uploads to s3 --> triggers lambda --> converts file --> uploads the mp4 version --> returns success or error.
Currently, if I upload to s3 the request ends then and there. So is there a way to defer the API response until and unless all the operations have been completed?
Also, how will the lambda function access the path of the output s3 bucket which was passed to the node API?
Any other better approach will be welcomed.
PS - the s3 path received by the node API is different for each user.
Thanks for your post. The output S3 bucket generates File Events when a new file arrives (i.e., is delivered from AWS MediaConvert).
This file event can trigger a second Lambda Function which can move the file elsewhere using any of the supported transfer protocols, re-try if necessary; log a status to AWS CloudWatch and/or AWS SNS; and then send a final API response based on success/completion of them move.
AWS has a Step Functions feature which can maintain state across successive lambda functions, for automating simple workflows. This should work for what you want to accomplish. see https://docs.aws.amazon.com/step-functions/latest/dg/tutorial-creating-lambda-state-machine.html
Note, any one lambda function has a 15 minute maximum runtime, so any one transcoding or file copy operation must complete within 15min. The alternative is to run on EC2.
I hope this helps you out!

How to set intervals between multiple requests AWS Lambda API

I have created an API using AWS Lambda function (using Python). Now my react js code hits this API whenever an event fire. So user can request API as many times the events are fired. Now the problem is we are not getting the response from lambda API sequentially. Sometime we are getting the response of our last request faster than the previous response of previous request.
So we need to handle our response in Lambda function sequentially, may be adding some delay between 2 request or may be implementing throttling. So how can I do that.
Did you check the concurrency setting on Lambda? You can throttle the lambda there.
But if you throttle the lambda and the requests being sent are not being received, the application sending the requests might be receiving an error unless you are storing the requests somewhere on AWS for being processed later.
I think putting an SQS in front of lambda might help. You will be hitting API gateway, the requests get sent to SQS, lambda polls requests concurrently (you can control the concurrency) and then send the response back.
You can use SQS FIFO Queue as a trigger on the Lambda function, set Batch size to 1, and the Reserved Concurrency on the Function to 1. The messages will always be processed in order and will not concurrently poll the next message until the previous one is complete.
SQS triggers do not support Batch Window - which will 'wait' until polling the next message. This is a feature for Stream based Lambda triggers (Kinesis and DynamoDB Streams)
If you want to streamlined process, Step Function will let you manage states using state machines and supports automatic retry based off the outputs of individual states.
As a previous response said, potentially what could help is to put an SQS in front of the Lambda - if order of processing is important, you could also look at setting the SQS queue up as a FIFO queue, which preserves order:
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/FIFO-queues.html
As the other comment said, the other option is to limit concurrency, but even then you're probably best off putting SQS in front as you're then limiting your throughput.

Handling time consuming operations using Nodejs and AWS

The current setup of the project I am working on is based on Nodejs/Express and AWS. AWS Lambda is triggered on a daily basis and is used to call an API endpoint which is expected to fire a varying number of emails via Sendgrid (hundreds to thousands). With a lower number of emails it worked fine but when the number of emails was around 1000 the Lambda timed out and the API crashed.
The limit on Lambda was 1 minute. Raising it up to 5 minutes might make this case of 1000 emails pass but might fail when the number is several thousands. Apart from that we would like to avoid keeping the server busy for several minutes because of which it was set to 1 minute initially.
We are now looking for better solutions to this specific situation. What would be a better approach, is it an option to use SNS Queue, or Serverless with moving all the code that sends the emails to Lambda?
Thanks for any inputs in advance and if more information is required please let me know.
Lambdas are not designed for long running operations. You can use Elastic Beanstalk Workers https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html
Briefly, the lambda publish the task to an SQS queue and an elastic beanstalk app handles it.

Schedule a task to run at some point in the future (architecture)

So we have a Python flask app running making use of Celery and AWS SQS for our async task needs.
One tricky problem that we've been facing recently is creating a task to run in x days, or in 3 hours for example. We've had several needs for something like this.
For now we create events in the database with timestamps that store the time that they should be triggered. Then, we make use of celery beat to run a scheduled task every second to check if there are any events to process (based on the trigger timestamp) and then process them. However, this is querying the database every second for events which we feel could be bettered somehow.
We looked into using the eta parameter in celery (http://docs.celeryproject.org/en/latest/userguide/calling.html) that lets you schedule a task to run in x amount of time. However it seems to be bad practice to have large etas and also AWS SQS has a visibility timeout of about two hours and so anything more than this time would cause a conflict.
I'm scratching my head right now. On the one had this works, and pretty decent in that things have been separated out with SNS, SQS etc. to ensure scaling-tolerance. However, it just doesn't feel write to query the database every second for events to process. Surely there's an easier way or a service provided by Google/AWS to schedule some event (pub/sub) to occur at some time in the future (x hours, minutes etc.)
Any ideas?
Have you taken a look at AWS Step Functions, specifically Wait State? You might be able to put together a couple of lambda functions with the first one returning a timestamp or the number of seconds to wait to the Wait State and the last one adding the message to SQS after the Wait returns.
Amazon's scheduling solution is the use of CloudWatch to trigger events. Those events can be placing a message in an SQS/SNS endpoint, triggering an ECS task, running a Lambda, etc. A lot of folks use the trick of executing a Lambda that then does something else to trigger something in your system. For example, you could trigger a Lambda that pushes a job onto Redis for a Celery worker to pick up.
When creating a Cloudwatch rule, you can specify either a "Rate" (I.e., every 5 minutes), or an arbitrary time in CRON syntax.
So my suggestion for your use case would be to drop a cloudwatch rule that runs at the time your job needs to kick off (or a minute before, depending on how time sensitive you are). That rule would then interact with your application to kick off your job. You'll only pay for the resources when CloudWatch triggers.
Have you looked into Amazon Simple Notification Service? It sounds like it would serve your needs...
https://aws.amazon.com/sns/
From that page:
Amazon SNS is a fully managed pub/sub messaging service that makes it easy to decouple and scale microservices, distributed systems, and serverless applications. With SNS, you can use topics to decouple message publishers from subscribers, fan-out messages to multiple recipients at once, and eliminate polling in your applications. SNS supports a variety of subscription types, allowing you to push messages directly to Amazon Simple Queue Service (SQS) queues, AWS Lambda functions, and HTTP endpoints. AWS services, such as Amazon EC2, Amazon S3 and Amazon CloudWatch, can publish messages to your SNS topics to trigger event-driven computing and workflows. SNS works with SQS to provide a powerful messaging solution for building cloud applications that are fault tolerant and easy to scale.
You could start the job with apply_async, and then use a countdown, like:
xxx.apply_async(..., countdown=TTT)
It is not guaranteed that the job starts exactly at that time, depending on how busy the queue is, but that does not seem to be an issue in your use case.

Resources