I've a requirement to
Download a PDF file from AWS S3 storage. (Key1)
Do some modifications.
Upload the modified PDF file back to S3 storage. (Key2)
The Uploaded file is a new file (K2). Not overwriting the existing file (K1)
Library used for modifying PDFs : pdf-lib
All the executions like downloading/modification/uploading of PDF are being done in AWS Lambda. The runtime is node.js 14.x
The objects in S3 bucket can be accessed through CDN as public access is blocked.
I'm able to download the file, then do the modifications and upload to S3. But when I open the file using CDN URL for the object, it is showing encoded text (garbage). Not the PDF preview of the file.
Downloading PDF file from S3.
const params = {
Bucket: bucket_name,
Key: key
};
// GET FILE AND RETURN PROMISE.
return new Promise((resolve, reject) => {
s3.getObject(params, (err, data) => {
if (err) {
reject(err);
}
try {
const obj = data.Body; // <<-- getting Uint8Array
resolve(obj);
} catch (e) {
reject(err);
}
});
});
Doing Modification on PDF file
async modificationFunction(opts) => {
const { fileData } = opts; //<<---- Unit8Array data from above snippet.
const pdfDoc = await PDFDocument.load(fileData);
// Do Some Modification like drawing lines.
const modifiedPDFData = await pdfDoc.saveAsBase64({ dataUri: true });
return modifiedPDFData; //<<--- Base64 data of modifications.
}
Uploading PDF file
const params = {
Bucket: bucket_name,
Key: key,
Body: data, //<<--- Base64 data of modification from above snippet
};
try {
await s3.upload(params).promise();
console.log('File uploaded:', `s3://${bucket_name}/${key}`);
}
Content of the PDF when viewed using CDN URL is attached. It is encoded/garbage content.
Same PDF when downloaded to laptop from AWS S3 using manual download from S3 bucket is showing the contents properly like a normal PDF file.
Referenced many online resources/stackoverflow threads:
link1
link2 Using the AWS SDK in javascript.
Tried ways with save() and saveAsBase64() methods of the pdf-lib nodejs library.
Tried to save the modified file locally. Upload this file manually to AWS S3 and access through CDN. Able to view the PDF properly this way. So there is some issue with how the file is uploaded to S3.
The issue was not with PDF file download, modification, upload operations. Actually the CDN had a caching policy due to which the initially generated garbage content files were getting served on further requests. After clearing the cache and trying again the files were properly viewable with the CDN URL.
Related
I'm using node js aws-sdk package to download files from s3 storage, and when I download jpeg image and save it as a local file I can't view it. Is it the right way to download jpeg image?
public async downloadFile(fileName: string, targetPath: string): Promise<void> {
try {
const awsObject = await this.s3
.getObject({
Bucket: BUCKET,
Key: fileName,
})
.promise();
fs.writeFileSync(targetPath, awsObject.Body.toString());
} catch (error) {
throw new Error(`Failed to download file from aws storage with error ${error}`);
}
}
this is how I call it:
await awsSdk.downloadFile('fileInS3.jpeg', `test.jpeg`);
When I try to open the saved file I receive an error that says
The file “test.jpeg” could not be opened. It may be damaged or use a file format that Preview doesn’t recognize.
Update
Solved by replacing
fs.writeFileSync(targetPath, awsObject.Body.toString());
With
fs.writeFileSync(targetPath, awsObject.Body as Buffer);
This looks like a problem:
awsObject.Body.toString()
If you're writing an image, converting it to a string is going to break it.
I was practicing on this tutorial
https://www.youtube.com/watch?v=NZElg91l_ms&t=1234s
It is working absolutely like a charm for me but the thing is I am storing images of products I am storing them in bucket and lets say I upload 4 images they all are uploaded.
but when I am displaying them i got access denied error as I am displaying the list and repeated request are maybe detecting it as a spam
This is how i am trying to fetch them on my react app
//rest of data is from mysql datbase (product name,price)
//100+ products
{ products.map((row)=>{
<div className="product-hero"><img src=`http://localhost:3909/images/${row.imgurl}`</div>
<div className="text-center">{row.productName}</div>
})
}
as it fetch 100+ products from db and 100 images from aws it fails
Sorry for such detailed question but in short how can i fetch all product images from my bucket
Note I am aware that i can get only one image per call so how can I get all images one by one in my scenario
//download code in my app.js
const { uploadFile, getFileStream } = require('./s3')
const app = express()
app.get('/images/:key', (req, res) => {
console.log(req.params)
const key = req.params.key
const readStream = getFileStream(key)
readStream.pipe(res)
})
//s3 file
// uploads a file to s3
function uploadFile(file) {
const fileStream = fs.createReadStream(file.path)
const uploadParams = {
Bucket: bucketName,
Body: fileStream,
Key: file.filename
}
return s3.upload(uploadParams).promise()
}
exports.uploadFile = uploadFile
// downloads a file from s3
function getFileStream(fileKey) {
const downloadParams = {
Key: fileKey,
Bucket: bucketName
}
return s3.getObject(downloadParams).createReadStream()
}
exports.getFileStream = getFileStream
It appears that your code is sending image requests to your back-end, which retrieves the objects from Amazon S3 and then serves the images in response to the request.
A much better method would be to have the URLs in the HTML page point directly to the images stored in Amazon S3. This would be highly scalable and will reduce the load on your web server.
This would require the images to be public so that the user's web browser can retrieve the images. The easiest way to do this would be to add a Bucket Policy that grants GetObject access to all users.
Alternatively, if you do not wish to make the bucket public, you can instead generate Amazon S3 pre-signed URLs, which are time-limited URLs that provides temporary access to a private object. Your back-end can calculate the pre-signed URL with a couple of lines of code, and the user's web browser will then be able to retrieve private objects from S3 for display on the page.
I did sililar S3 image handling while I handle my blog's image upload functionality, but I did not use getFileStream() to upload my image.
Because nothing should be done until the image file is fully processed, I used fs.readFile(path, callback) instead to read the data.
My way will generate Buffer Data, but AWS S3 is smart enough to know to intercept this as image. (I have only added suffix in my filename, I don't know how to apply image headers...)
This is my part of code for reference:
fs.readFile(imgPath, (err, data) => {
if (err) { throw err }
// Once file is read, upload to AWS S3
const objectParams = {
Bucket: 'yuyuichiu-personal',
Key: req.file.filename,
Body: data
}
S3.putObject(objectParams, (err, data) => {
// store image link and read image with link
}
}
I'm quite confused on how to use the Amplify library to actually download an mp3 file stored in my s3 bucket. I am able to list the bucket contents and parse it all out into a tree viewer for users to browse the various files, but once I select a file I can't get it to trigger a download.
I'm confident my amplify configuration is correct since I can see all my expected directories and when I select the file I want to download, I see the response size being correct:
You can see it takes 2+ seconds and appears to be downloading the data/mp3 file, but the user is never prompted to save the file and it's not in my Downloads folder.
Here is a capture of my file metadata setup from my bucket:
And the method I'm calling:
getFile (fileKey) {
Storage.get(fileKey, {download: true})
}
Without the "download : true" configuration, I get the verified URL back in the response. I'd like to avoid making a 2nd request using that URL download the file if possible. Anything else I may have missed? Is it better for s3 operations to go back to the standard aws-sdk? Thanks in advance!
I ended up using a combination of this answer:
https://stackoverflow.com/a/36894564
and this snippet:
https://gist.github.com/javilobo8/097c30a233786be52070986d8cdb1743
So the file gets downloaded in the response data(result), I added more meta data tags to the files to get the file name and title. Finally adding the link to the DOM and executing a click() on it saves the file named correctly. Full solution below:
getFile (fileKey) {
Storage.get(fileKey, {download: true}).then(result => {
console.log(result)
let mimeType = result.ContentType
let fileName = result.Metadata.filename
if (mimeType !== 'audio/mp3') {
throw new TypeError("Unexpected MIME Type")
}
try {
let blob = new Blob([result.Body], {type: mimeType})
//downloading the file depends on the browser
//IE handles it differently than chrome/webkit
if (window.navigator && window.navigator.msSaveOrOpenBlob) {
window.navigator.msSaveOrOpenBlob(blob, fileName)
} else {
let objectUrl = URL.createObjectURL(blob);
let link = document.createElement('a')
link.href = objectUrl
link.setAttribute('download', fileName)
document.body.appendChild(link)
link.click()
document.body.removeChild(link)
}
} catch (exc) {
console.log("Save Blob method failed with the following exception.");
console.log(exc);
}
})
}
}
I have an endpoint that takes in form data including a file. This file can be a text file, image, or pdf. I'm using busboy (v0.2.14) to parse the form data. That code looks like this:
let buffers = [];
file.on('data', data => buffers.push(data));
file.on('end', () => {
result.filename = filename;
result.contentType = mimetype;
// Concat the chunks into a Buffer
result.file = new Buffer.concat(buffers);
});
// ...
busboy.write(event.body, event.isBase64Encoded ? 'base64' : 'binary');
busboy.end();
However, when I push the file data up to S3 using the AWS SDK (v2.97.0), all the binary files are corrupted when I go to view them. This does not happen to text files. The S3 upload code looks like this:
static myPutObject(bucketName, fileName, data, contentType, acl) {
const params = {
Bucket: bucketName,
Key: fileName,
Body: data,
ACL: acl,
ContentType: contentType,
ContentEncoding: 'base64'
};
return new AWS.S3().putObject(params).promise();
}
I've tried everything that I can find on Stack Overflow or GitHub with no luck.
If you're using API gateway in the front. apiGateway will mangle the incoming binary unless you specifically enabled binary Media Types.
If you’re using SLS to deploy, then you can just add:
apiGateway:
binaryMediaTypes:
- '*/*'
in the provider section
Read here: https://serverless.com/framework/docs/providers/aws/events/apigateway#binary-media-types
S3 is an "object in" and "object out" store. It does not know whether your content is binary or text or utf-16 encoding. It stores all the bytes as it receives and serves them when requested.
Here is how we validated whether the problem is on S3 or with our code.
Write the binary file locally
Send the same file to S3
Download from S3
Verify local file hash and download file hash for file integrity
That will help you to verify binary file contents.
Hope it helps.
I am developing a web application using Nodejs. I am using Amazon S3 bucket to store files. What I am doing now is that when I upload a video file (mp4) to the S3 bucket, I will get the thumbnail photo of the video file from the lambda function. For fetching the thumbnail photo of the video file, I am using this package - https://www.npmjs.com/package/ffmpeg. I tested the package locally on my laptop and it is working.
Here is my code tested on my laptop
var ffmpeg = require('ffmpeg');
module.exports.createVideoThumbnail = function(req, res)
{
try {
var process = new ffmpeg('public/lalaland.mp4');
process.then(function (video) {
video.fnExtractFrameToJPG('public', {
frame_rate : 1,
number : 5,
file_name : 'my_frame_%t_%s'
}, function (error, files) {
if (!error)
console.log('Frames: ' + files);
else
console.log(error)
});
}, function (err) {
console.log('Error: ' + err);
});
} catch (e) {
console.log(e.code);
console.log(e.msg);
}
res.json({ status : true , message: "Video thumbnail created." });
}
The above code works well. It gave me the thumbnail photos of the video file (mp4). Now, I am trying to use that code in the AWS lambda function. The issue is the above code is using video file path as the parameter to fetch the thumbnails. In the lambda function, I can only fetch the base 64 encoded format of the file. I can get id (s3 path) of the file, but I cannot use it as the parameter (file path) to fetch the thumbnails as my s3 bucket does not allow public access.
So, what I tried to do was that I tried to save the base 64 encoded video file locally in the lambda function project itself and then passed the file path as the parameter for fetching the thumbnails. But the issue was that AWS lamda function file system is read-only. So I cannot write any file to the file system. So what I am trying to do right now is to retrieve the thumbnails directly from the base 64 encoded video file. How can I do it?
Looks like you are using a wrong file location,
/tmp/* is your writable location for temporary files and limited to 512MB
Checkout the tutorial that does the same as you like to do.
https://concrete5.co.jp/blog/creating-video-thumbnails-aws-lambda-your-s3-bucket
Lambda Docs:
https://docs.aws.amazon.com/lambda/latest/dg/limits.html
Ephemeral disk capacity ("/tmp" space) 512 MB
Hope it helps.