Can we invoke lambda function with large payload using boto3 library - python-3.x

I want to know how to invoke a lambda function using boto3 library with large payload. As of now I am able to invoke it with payload less than 6 mb.
Also I want to know what is the maximum limit for the payload.
Once the above issue is fixed...I have another doubt...
How should I pass this payload in the invoke function..
Earlier I was doing it as below :
lambda_payload = open('fileName.txt','r').read()
lambda_client.invoke( FunctionName='##FName', InvocationType='Request Response', Payload=lambda_payload)
# arn copied is in the below format :
# arn:aws:s3:::dev-abc/fileName.txt
Now what should be my new payload..

The invocation payload of a lambda can only be 6MB when invoked synchronously or 256KB when invoked asynchronously. An easy workaround for this is to upload your payload to S3 and pass the S3 object location as payload to your lambda. Your lambda can then read or stream the contents of the S3 object.
You could add the S3 URI, S3 object ARN or simply the name of the bucket and the name of the object as string values to the invocation payload. You can then use boto3 inside your lambda function to read out the contents of that file.
If you need a larger payload in order to execute an upload, have a look at pre-signing S3 URLs. This would allow you to return a URL that can be used to upload directly to an S3 location.

Related

Amazon S3 + Lambda (Node.JS) clarification on the s3.upload() method

I am following this tutorial wherein the programmer used this code:
await s3
.upload({ Bucket: bucket, Key: target_filename, Body: file_stream })
.promise();
Now, I understand that the method above would use the initialized variables file_stream, bucket, and target_filename (which he didn't bother typing out in his tutorial).
But the tutorial is hard to follow since (for what I know) the Key parameter inside the upload is the actual directory of the file to be re-uploaded back to S3.
This is confusing because at the file_stream variable, another Key parameter exists inside the method getObject().
So, is the filename inside the getObject() method should be the same as target_filename of the upload() method? and can you initialize the variables mentioned just to make it clearer for this question? Thank you.
No, the filename inside the getObject() method may not be the same as the target_filename in upload(). Let's look at a concrete example. Suppose you have a photo.zip file stored on S3 and its key is a/b/photo.zip, and you want to unzip it and reupload it to c/d/photo.jpg assuming that the photo.zip only contains one file. Then, the filename should be a/b/photo.zip, and the target_filename should be c/d/photo.jpg. As you can see, they are clearly different.

AWS Lambda Python - Return BytesIO file?

I'm setting up a function in AWS Lambda using python 3.7 and it won't let me return a bytes type
Please notice that this is not an issue with API Gateway, I'm invoking the lambda directly.
The error is : Runtime.MarshalError, ... is not JSON serializable
output = BytesIO()
#Code that puts an excel file into output...
return {
'Content-Disposition': 'attachment; filename="export.xlsx"',
'Content-Type': 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
'body' : output.getvalue()
}
If I do :
'body' : str(output.getvalue())
It outputs a corrupted file because it adds b'' to the string
If I do :
'body' : base64.b64encode(output.getvalue()).decode()
It also outputs a corrupted file, probably because it changes the binary representation of the file.
Maybe I need to upload to S3? But it doesn't fit in my flow, this is a one time file creation and it would stay in "S3 Limbo" until TTL
It is not possible to return unencoded binary data from a direct invoked AWS Lambda function.
Per the docs:
If the handler returns objects that can't be serialized by json.dumps, the runtime returns an error.
The reason you can do this with API Gateway is because API Gateway is performing the conversion of the base64 JSON content your function returns into binary for you. (See documentation here)
I would need to know more about how you are invoking Lambda to be sure but I suspect you could implement this same base64 decode logic into your direct invoke client. Alternatively, if you wanted to keep the client as simple as possible, use S3 with a lifecycle hook to keep the bucket from filling up with temporary files.

S3 Video to audio file convert using Node js (Lambda function)

I am trying to convert S3 video file to audio file through Lambda function. Whenever video files are uploaded into an S3 bucket I have to generate an audio file and save it back to S3 bucket by triggering the AWS Lambda function. I can convert the video file to audio in local. ( Convert video to an audio file using FFMPEG). But I am wondering, how to do this conversion part in Lambda function every time the video file is uploaded into an S3 bucket. I have no idea how to do this AWS Lambda function. Please share your suggestions.
Sample code:
var ffmpeg = require('fluent-ffmpeg');
/**
* input - string, path of input file
* output - string, path of output file
* callback - function, node-style callback fn (error, result)
*/
function convert(input, output, callback) {
ffmpeg(input)
.output(output)
.on('end', function() {
console.log('conversion ended');
callback(null);
}).on('error', function(err){
console.log('error: ', e.code, e.msg);
callback(err);
}).run();
}
convert('./df.mp4', './output.mp3', function(err){
if(!err) {
console.log('conversion complete');
//...
}
});
Thanks,
You just need to set up an event on s3 bucket - put object - to trigger lambda function (you will get access to the description of the object uploaded to that S3 bucket through the first parameter of the lambda function).
If you can convert the video file to audio on your local machine, using some external libraries, then you need to create a zip file containing your lambda function (in the root of the zip file) as well as the dependencies.
This is pretty simple in case of Node. Create a new folder, run npm init, install needed modules, create index.js file where you put your Node code. Zip all the contents of this folder (not the folder itself). When you create new lambda function, choose to upload this zip file.
If you are wondering about how to programatically communicate with AWS resources and manipulate them, then check aws-sdk which you can import as a module and use it for that purpose.
So basically what you will need to inside of your lambda function is to parse event argument (the first parameter) to obtain bucket and key of the uploaded object. Then you will call s3.getObject method to get the data. Process the data with your custom logic. Call s3.putObject to store the newly transformed data to new S3 location.
Lambda has access to its own local file system, if your code needs to store some data there. You just need to specify absolute path to the file, such as /tmp/output.mp3. To retrieve it, you can use fs module. Then, you can continue with s3.putObject.

How can aws api gateway listen to 2 lambda functions?

My design is that api will trigger first lambda function, this function then send sns and return, sns triggers second lambda function. Now I want that api get the response from the second lambda function.
Here is the flow:
The api get the request from the user and then trigger the first lambda function, the first lambda function creates a sns and return. Now the api is at the lambda function stage and still waiting for the response from the second lambda. sns triggers the second lambda function; the second lambda function return some result and pass it to the api. api gets the response and send it back to user.
I know there is a way using sdk to get the second lambda function and set event type to make it async. But here I want to use sns, is it possible?
Need some help/advices. Thanks in advance!
You need something to share the lambda_func_2's return with lambda_func_1, the api gateway request context only return when you call callback on func1, you can not save or send the request contex to another lb_func.
My solution for this case is use Dynamodb (or every database) to share the f2's result.
F1 send data to sns, the date include a key like transactionID (uuid or timestamp). Then "wait" until F1 receive the result in table (ex: tbl_f2_result) and execute callback function with the result. Maybe query with transactionID until you receive data or only try 10 times (with time out 2s for one time, in worst case you will wait 20 seconds)
F2 has been trigged by SNS, do somthing with data include the transactionID then insert the result (success or not, error message ...) to result table(tbl_f2_result) with transactionID => result, callback finish F2.
transactionID is index key of table :D
You have to increase F1's timeout - Default is 6 seconds.
Of course you can. Lambda provides you a way to implement almost any arbitrary functionality that you want, whether it's inserting a record into your DynamoDB, reading an object from your S3 bucket, calculating the tax amount for the selected item on an e-commerce site, or simply calling an API.
Notice that here you don't need any event to call your api from the lambda, as simply you call the api directly.
As you are using Node, you can simply use an http request; something like this:
var options = {
host: YOUR_API_URL,
port: 80,
path: 'REST_API_END_POINT',
method: 'YOUR_HTTP_METHOD' //POST/GET/...
};
http.request(options, function(res) {
//Whatever you want to do with the reply...
}).end();
Below is what is possible for your problem, but requires polling.
API GTW
integration to --> Lambda1
Lambda1
create unique sha create a folder inside the bucket say s3://response-bucket/
Triggers SNS through sdk with payload having sha
Poll from the key s3://response-bucket/ ( with timeout set )
if result is placed then response is sent back from Lambda1 --> ApiGTW
if timeout then error is returned.
If success then trigger SNS for cleanup of response data in bucket with payload being SHA which will be cleaned up by another lambda.
SNS
Now the payload with the SHA is there in SNS
Lambda2
SNS triggers lambda2
pull out the unique sha from the payload
lambda result is placed in same s3://response-bucket/
exit from lambda2

updating headers of every file in an amazon s3 bucket

I have a large number of files that have incorrect mimetypes in a bucket, as well as no expires set.
How can I change them all?
I'm using Knox:
https://github.com/LearnBoost/knox
I'm trying to iterate over it. How do I get a list of all files in a folder?
When I do this
client.get('/folder').on('response', function(res){
console.log(res)
res.on('data', function(chunk){
console.log(chunk);
});
}).end();
I see osmething about an xml file, how do I access it?
It looks like the library you have chosen does not have any native support for listing buckets. You will need to construct the list requests and parse the XML yourself - documentation for the underlying REST API can be found in the S3 API documentation.
Once you get a list of objects, you can use the S3 copy request functionality to update metadata. Just apply this patch, then pass x-amz-metadata-directive: REPLACE as a header to a copy request specifying the same key as source and destination (the source must specify the bucket as well!), plus any other headers you want to set.

Resources