updating headers of every file in an amazon s3 bucket - node.js

I have a large number of files that have incorrect mimetypes in a bucket, as well as no expires set.
How can I change them all?
I'm using Knox:
https://github.com/LearnBoost/knox
I'm trying to iterate over it. How do I get a list of all files in a folder?
When I do this
client.get('/folder').on('response', function(res){
console.log(res)
res.on('data', function(chunk){
console.log(chunk);
});
}).end();
I see osmething about an xml file, how do I access it?

It looks like the library you have chosen does not have any native support for listing buckets. You will need to construct the list requests and parse the XML yourself - documentation for the underlying REST API can be found in the S3 API documentation.
Once you get a list of objects, you can use the S3 copy request functionality to update metadata. Just apply this patch, then pass x-amz-metadata-directive: REPLACE as a header to a copy request specifying the same key as source and destination (the source must specify the bucket as well!), plus any other headers you want to set.

Related

Convert AVRO container file to JSON

I have a bunch of AVRO files in an S3 bucket. Each file contains a series of records. Every time a file is uploaded to the Bucket, a Lambda is triggered. I want to read the content of the AVRO file (the records) and save them in a more friendly format, for instance push all entries to an array so I could do stuff with it.
I am using the client-s3 from the aws-sdk lib.
I have tried with the following piece of code but I am not able to get a working result.
let client = new S3Client({})
let command = new GetObjectCommand(parameters)
const { Body } = await client.send(command)
I have tried other solutions too that I found on the Internet and AWS Docs but I do not seem to find a way to make this work.

StreamingResponse FASTAPI returns strange file name

I have an API that outputs StreamingReponse (https://fastapi.tiangolo.com/advanced/custom-response/?h=fileresponse#streamingresponse) as zip/gz.
When I download the file VIA Swagger, I get a very strange name, for example:
application_gz export something=1&something=1&something=Example&archive_type=gz blob https __<ip_address>_aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaaa
so basically - with an ip address of the server, a uuid, some names. Is there anyway to change this to be something I decide, or atleast more elegant?
thanks!
You can use the Content-Disposition HTTP header to give an alternative file name for the resource. Since StreamingResponse is a subclass of Response, you can set this by using the headers parameter:
return StreamingResponse(fp, headers={'Content-Disposition': 'attachment; filename="yourfilename.zip"'}
You can also use inline instead of attachment if you don't want to force a download but let the client display it directly instead (for example for PDF files).

Downloading Binary File from OneDrive API Using Node/Axios

I am using the One Drive API to grab a file with a node application using the axios library.
I am simply trying to save the file to the local machine (node is running locally).
I use the One Drive API to get the download document link, which does not require authentication (with https://graph.microsoft.com/v1.0/me/drives/[location]/items/[id]).
Then I make this call with the download document link:
response = await axios.get(url);
I receive a JSON response, which includes, among other things, the content-type, content-length, content-disposition and a data element which is the contents of the file.
When I display the JSON response to the console, the data portion looks like this:
data: 'PK\u0003\u0004\u0014\u0000\u0006\u0000\b\u0000\u0000\u0000!\u...'
If the document is simply text, I can save it easily using:
fs.writeFileSync([path], response.data);
But if the file is binary, like a docx file, I cannot figure out how to write it properly. Every time I try it seems to have the wrong encoding. I tried different encodings.
How do I save the file properly based on the type of file retrieved.
Have you tried using an encoding option of fs.writeFileSync of explicitly null, signifying the data is binary?
fs.writeFileSync([path], response.data, {
encoding: null
});

AWS Lambda Python - Return BytesIO file?

I'm setting up a function in AWS Lambda using python 3.7 and it won't let me return a bytes type
Please notice that this is not an issue with API Gateway, I'm invoking the lambda directly.
The error is : Runtime.MarshalError, ... is not JSON serializable
output = BytesIO()
#Code that puts an excel file into output...
return {
'Content-Disposition': 'attachment; filename="export.xlsx"',
'Content-Type': 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
'body' : output.getvalue()
}
If I do :
'body' : str(output.getvalue())
It outputs a corrupted file because it adds b'' to the string
If I do :
'body' : base64.b64encode(output.getvalue()).decode()
It also outputs a corrupted file, probably because it changes the binary representation of the file.
Maybe I need to upload to S3? But it doesn't fit in my flow, this is a one time file creation and it would stay in "S3 Limbo" until TTL
It is not possible to return unencoded binary data from a direct invoked AWS Lambda function.
Per the docs:
If the handler returns objects that can't be serialized by json.dumps, the runtime returns an error.
The reason you can do this with API Gateway is because API Gateway is performing the conversion of the base64 JSON content your function returns into binary for you. (See documentation here)
I would need to know more about how you are invoking Lambda to be sure but I suspect you could implement this same base64 decode logic into your direct invoke client. Alternatively, if you wanted to keep the client as simple as possible, use S3 with a lifecycle hook to keep the bucket from filling up with temporary files.

Save an image file into a database with node/request/sequelize/mysql

I'm trying to save a remote image file into a database, but I'm having some issues with it since I've never done it before.
I need to download the image and pass it along (with node-request) with a few other properties to another node api that saves it into a mysql database (using sequelize). I've managed to get some data to save, but when I download it manually and try to open it, it's not really usable and no image shows up.
I've tried a few things: getting the image with node-request, converting it to a base64 string (read about that somewhere) and passing it along in a json payload, but that didn't work. Tried sending it as a multipart, but that didn't work either. Haven't worked with streams/buffers/multipart all that much before and never in node. I've tried looking into node-request pipes, but I couldn't really figure out how possibly apply them to this context.
Here's what I currently have (it's a part es6 class so there's no 'function' keywords; also, request is promisified):
function getImageData(imageUrl) {
return request({
url: imageUrl,
encoding: null,
json: false
});
}
function createEntry(entry) {
return getImageData(entry.image)
.then((imageData) => {
entry.image_src = imageData.toString('base64');
var requestObject = {
url: 'http://localhost:3000/api/entry',
method: 'post',
json: false,
formData: entry
};
return request(requestObject);
});
}
I'm almost 100% certain the problem is in this part because the api just takes what it gets and gives it to sequelize to put into the table, but I could be wrong. Image field is set as longblob.
I'm sure it's something simple once I figure it out, but so far I'm stumped.
This is not a direct answer to your question but it is rarely needed to actually store an image in the database. What is usually done is storing an image on storage like S3, a CDN like CloudFront or even just in a file system of a static file server, and then storing only the file name or some ID of the image in the actual database.
If there is any chance that you are going to serve those images to some clients then serving them from the database instead of a CDN or file system will be very inefficient. If you're not going to serve those images then there is still very little reason to actually put them in the database. It's not like you're going to query the database for specific contents of the image or sort the results on the particular serialization of an image format that you use.
The simplest thing you can do is save the images with a unique filename (either a random string, UUID or a key from your database) and keep the ID or filename in the database with other data that you need. If you need to serve it efficiently then consider using S3 or some CDN for that.

Resources