AWS Lambda Python - Return BytesIO file? - python-3.x

I'm setting up a function in AWS Lambda using python 3.7 and it won't let me return a bytes type
Please notice that this is not an issue with API Gateway, I'm invoking the lambda directly.
The error is : Runtime.MarshalError, ... is not JSON serializable
output = BytesIO()
#Code that puts an excel file into output...
return {
'Content-Disposition': 'attachment; filename="export.xlsx"',
'Content-Type': 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
'body' : output.getvalue()
}
If I do :
'body' : str(output.getvalue())
It outputs a corrupted file because it adds b'' to the string
If I do :
'body' : base64.b64encode(output.getvalue()).decode()
It also outputs a corrupted file, probably because it changes the binary representation of the file.
Maybe I need to upload to S3? But it doesn't fit in my flow, this is a one time file creation and it would stay in "S3 Limbo" until TTL

It is not possible to return unencoded binary data from a direct invoked AWS Lambda function.
Per the docs:
If the handler returns objects that can't be serialized by json.dumps, the runtime returns an error.
The reason you can do this with API Gateway is because API Gateway is performing the conversion of the base64 JSON content your function returns into binary for you. (See documentation here)
I would need to know more about how you are invoking Lambda to be sure but I suspect you could implement this same base64 decode logic into your direct invoke client. Alternatively, if you wanted to keep the client as simple as possible, use S3 with a lifecycle hook to keep the bucket from filling up with temporary files.

Related

Can we invoke lambda function with large payload using boto3 library

I want to know how to invoke a lambda function using boto3 library with large payload. As of now I am able to invoke it with payload less than 6 mb.
Also I want to know what is the maximum limit for the payload.
Once the above issue is fixed...I have another doubt...
How should I pass this payload in the invoke function..
Earlier I was doing it as below :
lambda_payload = open('fileName.txt','r').read()
lambda_client.invoke( FunctionName='##FName', InvocationType='Request Response', Payload=lambda_payload)
# arn copied is in the below format :
# arn:aws:s3:::dev-abc/fileName.txt
Now what should be my new payload..
The invocation payload of a lambda can only be 6MB when invoked synchronously or 256KB when invoked asynchronously. An easy workaround for this is to upload your payload to S3 and pass the S3 object location as payload to your lambda. Your lambda can then read or stream the contents of the S3 object.
You could add the S3 URI, S3 object ARN or simply the name of the bucket and the name of the object as string values to the invocation payload. You can then use boto3 inside your lambda function to read out the contents of that file.
If you need a larger payload in order to execute an upload, have a look at pre-signing S3 URLs. This would allow you to return a URL that can be used to upload directly to an S3 location.

How to log raw JSON to Cloudwatch from AWS Lambda in node.js?

I have some node.js based Lambdas that are logging data.
In order to properly query and filter the data etc, I want to log as pure JSON data from my Lambdas.
However, when I do a regular console.log it makes an ordinary string of the data.
console.log({a:1,b:2,x:"xxx"})
Results in this:
2020-04-29T14:46:45.722Z 3f64c499-fbae-4a84-996c-5e5f0cb5302c INFO { a: 1, b: 2, x: 'xxx' }
The logged line above does not seem to be searchable as JSON using the various filter matching options in CloudWatch.
I've tried to call the AWS.CloudWatchLogs API directly but since I'm using lambda I cannot maintain a token between invocations of the functions, so I'm not sure that's the way to go.
Have anyone else had success in logging raw JSON from a Javascript Lambda?
The problem is that console.log() does not go directly to stdout/stderr. You can see that using this Lambda:
const process = require('process');
exports.handler = async (event) => {
console.log("message 1");
process.stdout.write("message 2\n");
};
If you invoke that, you will see output like this:
START RequestId: 6942bebc-1997-42cd-90c2-d76b44c637283 Version: $LATEST
2020-04-29T17:06:07.303Z 6935bebc-1d97-42cd-90c2-d76b4a637683 INFO message 1
message 2
END RequestId: 6942bebc-1997-42cd-90c2-d76b44c637283
So to get the output you want you could either redefine console.log to go to stderr, or write to stdout/stderr directly.
Or you could use a logging framework that writes to stdout/stderr, which may give you more flexibility on how your messages are written. I don't do Node development, but I've heard that Winston is the standard logging framework.
The trick is to use the correct delimiter before the JSON data.
In your example, there is no delimiter before JSON data:
console.log({a:1,b:2,x:"xxx"})
// {a:1,b:2,x:"xxx"}
The official documentation AWS Lambda function logging in Node. js is adding a newline \n character before the data. An example for your code snippet:
console.log('\n%j', {a:1,b:2,x:"xxx"})
// \n{a:1,b:2,x:"xxx"}
That does not work either as of July 2022.
The solution is to use the tab character \t as the delimiter between the message and the data.
console.log('\t%j', {a:1,b:2,x:"xxx"})
// \t{a:1,b:2,x:"xxx"}
An example of building a formatted message and adding structured data:
const name = 'world';
console.log('hello %s\t%j', name, {a:1,b:2,x:"xxx"})
// hello world\t{a:1,b:2,x:"xxx"}

Downloading Binary File from OneDrive API Using Node/Axios

I am using the One Drive API to grab a file with a node application using the axios library.
I am simply trying to save the file to the local machine (node is running locally).
I use the One Drive API to get the download document link, which does not require authentication (with https://graph.microsoft.com/v1.0/me/drives/[location]/items/[id]).
Then I make this call with the download document link:
response = await axios.get(url);
I receive a JSON response, which includes, among other things, the content-type, content-length, content-disposition and a data element which is the contents of the file.
When I display the JSON response to the console, the data portion looks like this:
data: 'PK\u0003\u0004\u0014\u0000\u0006\u0000\b\u0000\u0000\u0000!\u...'
If the document is simply text, I can save it easily using:
fs.writeFileSync([path], response.data);
But if the file is binary, like a docx file, I cannot figure out how to write it properly. Every time I try it seems to have the wrong encoding. I tried different encodings.
How do I save the file properly based on the type of file retrieved.
Have you tried using an encoding option of fs.writeFileSync of explicitly null, signifying the data is binary?
fs.writeFileSync([path], response.data, {
encoding: null
});

Save an image file into a database with node/request/sequelize/mysql

I'm trying to save a remote image file into a database, but I'm having some issues with it since I've never done it before.
I need to download the image and pass it along (with node-request) with a few other properties to another node api that saves it into a mysql database (using sequelize). I've managed to get some data to save, but when I download it manually and try to open it, it's not really usable and no image shows up.
I've tried a few things: getting the image with node-request, converting it to a base64 string (read about that somewhere) and passing it along in a json payload, but that didn't work. Tried sending it as a multipart, but that didn't work either. Haven't worked with streams/buffers/multipart all that much before and never in node. I've tried looking into node-request pipes, but I couldn't really figure out how possibly apply them to this context.
Here's what I currently have (it's a part es6 class so there's no 'function' keywords; also, request is promisified):
function getImageData(imageUrl) {
return request({
url: imageUrl,
encoding: null,
json: false
});
}
function createEntry(entry) {
return getImageData(entry.image)
.then((imageData) => {
entry.image_src = imageData.toString('base64');
var requestObject = {
url: 'http://localhost:3000/api/entry',
method: 'post',
json: false,
formData: entry
};
return request(requestObject);
});
}
I'm almost 100% certain the problem is in this part because the api just takes what it gets and gives it to sequelize to put into the table, but I could be wrong. Image field is set as longblob.
I'm sure it's something simple once I figure it out, but so far I'm stumped.
This is not a direct answer to your question but it is rarely needed to actually store an image in the database. What is usually done is storing an image on storage like S3, a CDN like CloudFront or even just in a file system of a static file server, and then storing only the file name or some ID of the image in the actual database.
If there is any chance that you are going to serve those images to some clients then serving them from the database instead of a CDN or file system will be very inefficient. If you're not going to serve those images then there is still very little reason to actually put them in the database. It's not like you're going to query the database for specific contents of the image or sort the results on the particular serialization of an image format that you use.
The simplest thing you can do is save the images with a unique filename (either a random string, UUID or a key from your database) and keep the ID or filename in the database with other data that you need. If you need to serve it efficiently then consider using S3 or some CDN for that.

updating headers of every file in an amazon s3 bucket

I have a large number of files that have incorrect mimetypes in a bucket, as well as no expires set.
How can I change them all?
I'm using Knox:
https://github.com/LearnBoost/knox
I'm trying to iterate over it. How do I get a list of all files in a folder?
When I do this
client.get('/folder').on('response', function(res){
console.log(res)
res.on('data', function(chunk){
console.log(chunk);
});
}).end();
I see osmething about an xml file, how do I access it?
It looks like the library you have chosen does not have any native support for listing buckets. You will need to construct the list requests and parse the XML yourself - documentation for the underlying REST API can be found in the S3 API documentation.
Once you get a list of objects, you can use the S3 copy request functionality to update metadata. Just apply this patch, then pass x-amz-metadata-directive: REPLACE as a header to a copy request specifying the same key as source and destination (the source must specify the bucket as well!), plus any other headers you want to set.

Resources