I have an AWS Labmda function which synchronises a remote feed of content to an S3 bucket. It needs to be run periodically so is currently called hourly using a CloudWatch cron event.
This works great for a single feed, however I now need multiple feeds to be synchronised which can use exactly the same functionality just with a different source URL and bucket.
Rather than clone the entire Lambda function for each feed, is there some mechanism to pass configuration information into the Lambda invocation to specify what it should be operating on?
The function is written in Node 14.x in case that is significant.
Yes it is possible,
In the above image you can start a CRON JOB for Lambda A and then it can pass that information as payload using asynchronous calls to 2nd Lambda.
This will execute the 2nd Lambda concurrently and thus you can execute as many as 1000 concurrent executions.
It's a fan-out pattern and it can be implemented in your scenario.
Sample code:
for i in range(3):
lambda_client.invoke(FunctionName='Lambda_B',InvocationType='Event',Payload=json.dumps(payload[i]))
For this you have 2 options:
In the same function, change environment variables, publish a version, and attach it to ALIAS. Each published version saves its own values for environment variables. With this approach, the problem is if you want to make some change in the code, you would have to re-publish the function for each alias again (and change all the environment variables each time) so this is error-prone.
The second option is to pass the config details through the event param that Lambda accepts (as a JSON) and read in in the Lambda function. You can have separate Cloudwatch events which will pass different JSON event details.
Trying using Eventbridge cron job which has options to add extra configuration for triggers.
https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-create-rule-schedule.html#eb-cron-expressions
Moreover, as per description it seems you want to do action when some operation is performed in S3. Why not trigger lambda on S3 events themselves.
https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-events.html
Related
I have a lambda function lambda1 that gets triggered by an API call and computes the parameters for another job downstream that will be handled by a different function lambda2.
The resources required to complete the downstream job are not available immediately and will become available at some future time datetime1 which is also calculated by lambda1.
How do I make lambda1 schedule a message in an SNS topic that will be sent out at datetime1 instead of going out immediately? The message sent out at the correct time will then trigger lambda2 which will find all the resources in place and execute correctly.
Is there a better way of doing this instead of SNS?
Both lambda1 and lambda2 are written in Python 3.8
You would be better off using the AWS Step Functions. Step functions are generally used for orchestrating jobs with multiple Lambda functions involved and they support the wait state that you need to run a job at a specific time.
Basically, you will create multiple states. One of the states will be wait state where you will input the wait condition (timestamp at which it will stop waiting). This is what you will send from Lambda1. The next state would be task state which will be your Lambda2.
I have a script that we wanted to run in AWS as a Lambda, written in Python. This will run once ever 24 hours. It describes all the ec2 instances in an environment and I'm checking some details for compliance. After that, we want to send all the instances details to an SNS topic that other similar compliance scripts are using, which are sent to a lambda and from there, into sumologic. I'm currently testing with just one environment, but there is over 1000 instances and so over 1000 messages to be published to SNS, one by one. The lambda times out long before it can send all of these. Once all environments and their instances are checked, it could be close to 6000 messages to publish to SNS.
I need some advice on how to architect this to work with all environments. I was thinking maybe putting all the records into an S3 bucket from my lambda, then creating another lambda that would read each record from the bucket say, 50 at a time, and push them one by one into the SNS topic. I'm not sure though how that would work either.
Any ideas appreciated!
I would recommend to use Step Functions for this problem. Even if you fix the issue for now, sooner or later, with a rising number of instances or additional steps you want to perform the 900 second maximum runtime duration of a single Lambda won't be sufficient anymore.
A simple approach with Step Functions could be:
Step 1: Create a list of EC2 instances to "inspect" with this first Lambda. Could be all instances or just instances with a certain tag etc. You can be as creative as you want.
Step 2: Process this list of instances with a parallel step that calls one Lambda per instance id.
Step 3: The triggered Lambda reads the details from the provided EC2 instance and then publishes the result to SNS. There might already be a pre-defined step for the publication to SNS, so you don't need to program it yourself.
With the new Workflow Studio this should be relatively easy to implement.
Note: This might not be faster than a single Lambda, but it will scale better with more EC2 instances to scan. The only bottleneck here might become the Lambda in Step 1. If that Lambda needs more than 15 minutes to "find" all the EC2 instances to "scan", then you need to become a bit creative. But this is solvable.
I think you could you a SQS queue to solve you problem.
From SNS send the message to a SQS queue. Then from the SQS you lambda can poll for messaged ( default is 10 but you can decrees it though cli ).
On AWS, is it possible to have one HTTP request execute a Lambda, which in turn triggers a cascade of Lambdas running in serial, where the final Lambda returns the result to the user?
I know one way to achieve this is for the initial Lambda to "stay running" and orchestrate the other Lambdas, but I'd be paying for that orchestration Lambda to effectively do nothing most of the time, i.e. paying for the time it's waiting on the others. If it were non-lambda code, that would be like blocking (and paying for) an entire thread while the other threads do their work.
Unless AWS stops the billing clock while async Lambdas are "sleeping"/waiting on network IO?
Unfortunately as you've found only a single Lambda function can be invoked, this becomes an orchestrator.
This is not ideal but will have to be the case if you want to use multiple Lambda functions as you're serving a HTTP request, you can either use the Lambda to call a number of Lambda or instead create a Step Function which can can orchestrate the individual steps. You would still need the Lambda to start this, and then poll the status of it before returning the results.
Here is my use case:
I have a scheduler lamdba and a executor lambda.
In the scheduler lambda, I receive a list of (time, message) tuples indicating that, at time I would like to invoke the executor lambda with event message.
Here is what I have tried
In the scheduler lambda, first clear all triggers from the executor lambda. Then create a EventBridge scheduled event for each (time, message) tuple. This has a few drawbacks...
It's quite difficult to remove all triggers from a lambda, as the Lambda API doesn't let you do that (I believe I have to do it through the EventBridge API with proper tagging)
Adding and removing ~100 triggers every day seems uneconomical and is not the intended use case of event bridge
Running a dedicated EC2 instance to call the lambda function
I'm cheap and I don't want to pay for an instance that will lay idle for ~99.9% of the time.
Not serverless
Is there a serverless way of trigger a lambda in a non-periodic fashion?
A bit of a departure, but could you use dynamodb with a ttl? The scheduler could simply write to the table with the message, and format the ttl column to expire at the time you're adding to the tuple.
You could subscribe the executor lambda to the DynamoDb events, and only respond to events that are removed, and if you use New and old images you can retrieve the message from the old image (otherwise I believe it's empty when the item is deleted).
At the moment, we are calling cloudfront.listDistributions() every minute to identify a change in the status of the distribution we are deploying. This cause Lambda to timeout because CloudFront never deploys faster than 30 minutes (where Lambda timeouts after 15 min).
I would like to notify a Lambda function after a CloudFront Distribution is successfully created. This would allow us to execute the post-creation actions while saving valuable Lambda exec time.
Creating a Rule on CloudWatch does not offer the option to chose CloudFront. Nevertheless, it seems to accept creating a Custom Event Pattern with the source aws.cloudformation.
Considering options:
Trigger a lambda every 5 minutes to list distributions and compare states with previous states stored in DynamoDB.
Anybody with an idea to overcome this lack of feature from AWS?
If you want and have time, there's a trickier and a bit more complex solution for doing that leveraging CloudTrail.
Disclaimer
CloudTrail is not a real-time log system, but ensure that all API calls will be reported on the console within 15 minutes (as stated here under the CloudTrail FAQs). Due to this, what's following makes sense only for long-running tasks like creating a CloudFront distribution, taking up an Aurora DB ans so on.
You can create a CloudWatch event based rule (let's call it CW-r1)
on specific pattern like CreateDistribution or
UpdateDistribution.
CW-r1 triggers a Lambda (LM-1) which enables another CloudWatch
event base rule (CW-r2).
CW-r2 on a scheduled base, triggers a Lambda (LM-2) which via API
request the state of specific distribution. Once distribution is
"Deployed", LM-2 can send a notification via SNS for example (you can
send EMAIL, SMS, Push Notification whatever is supported on SNS).
Once everything is finished, LM-2 can disable the CW-r2 rule in order
to stop processing information.
In this way you can have an automatic notification system based on which API call you desire.