Trouble downloading mp3 files from S3 using Amplify/Node - node.js

I'm quite confused on how to use the Amplify library to actually download an mp3 file stored in my s3 bucket. I am able to list the bucket contents and parse it all out into a tree viewer for users to browse the various files, but once I select a file I can't get it to trigger a download.
I'm confident my amplify configuration is correct since I can see all my expected directories and when I select the file I want to download, I see the response size being correct:
You can see it takes 2+ seconds and appears to be downloading the data/mp3 file, but the user is never prompted to save the file and it's not in my Downloads folder.
Here is a capture of my file metadata setup from my bucket:
And the method I'm calling:
getFile (fileKey) {
Storage.get(fileKey, {download: true})
}
Without the "download : true" configuration, I get the verified URL back in the response. I'd like to avoid making a 2nd request using that URL download the file if possible. Anything else I may have missed? Is it better for s3 operations to go back to the standard aws-sdk? Thanks in advance!

I ended up using a combination of this answer:
https://stackoverflow.com/a/36894564
and this snippet:
https://gist.github.com/javilobo8/097c30a233786be52070986d8cdb1743
So the file gets downloaded in the response data(result), I added more meta data tags to the files to get the file name and title. Finally adding the link to the DOM and executing a click() on it saves the file named correctly. Full solution below:
getFile (fileKey) {
Storage.get(fileKey, {download: true}).then(result => {
console.log(result)
let mimeType = result.ContentType
let fileName = result.Metadata.filename
if (mimeType !== 'audio/mp3') {
throw new TypeError("Unexpected MIME Type")
}
try {
let blob = new Blob([result.Body], {type: mimeType})
//downloading the file depends on the browser
//IE handles it differently than chrome/webkit
if (window.navigator && window.navigator.msSaveOrOpenBlob) {
window.navigator.msSaveOrOpenBlob(blob, fileName)
} else {
let objectUrl = URL.createObjectURL(blob);
let link = document.createElement('a')
link.href = objectUrl
link.setAttribute('download', fileName)
document.body.appendChild(link)
link.click()
document.body.removeChild(link)
}
} catch (exc) {
console.log("Save Blob method failed with the following exception.");
console.log(exc);
}
})
}
}

Related

How to upload modified PDF file to AWS s3 from AWS Lambda

I've a requirement to
Download a PDF file from AWS S3 storage. (Key1)
Do some modifications.
Upload the modified PDF file back to S3 storage. (Key2)
The Uploaded file is a new file (K2). Not overwriting the existing file (K1)
Library used for modifying PDFs : pdf-lib
All the executions like downloading/modification/uploading of PDF are being done in AWS Lambda. The runtime is node.js 14.x
The objects in S3 bucket can be accessed through CDN as public access is blocked.
I'm able to download the file, then do the modifications and upload to S3. But when I open the file using CDN URL for the object, it is showing encoded text (garbage). Not the PDF preview of the file.
Downloading PDF file from S3.
const params = {
Bucket: bucket_name,
Key: key
};
// GET FILE AND RETURN PROMISE.
return new Promise((resolve, reject) => {
s3.getObject(params, (err, data) => {
if (err) {
reject(err);
}
try {
const obj = data.Body; // <<-- getting Uint8Array
resolve(obj);
} catch (e) {
reject(err);
}
});
});
Doing Modification on PDF file
async modificationFunction(opts) => {
const { fileData } = opts; //<<---- Unit8Array data from above snippet.
const pdfDoc = await PDFDocument.load(fileData);
// Do Some Modification like drawing lines.
const modifiedPDFData = await pdfDoc.saveAsBase64({ dataUri: true });
return modifiedPDFData; //<<--- Base64 data of modifications.
}
Uploading PDF file
const params = {
Bucket: bucket_name,
Key: key,
Body: data, //<<--- Base64 data of modification from above snippet
};
try {
await s3.upload(params).promise();
console.log('File uploaded:', `s3://${bucket_name}/${key}`);
}
Content of the PDF when viewed using CDN URL is attached. It is encoded/garbage content.
Same PDF when downloaded to laptop from AWS S3 using manual download from S3 bucket is showing the contents properly like a normal PDF file.
Referenced many online resources/stackoverflow threads:
link1
link2 Using the AWS SDK in javascript.
Tried ways with save() and saveAsBase64() methods of the pdf-lib nodejs library.
Tried to save the modified file locally. Upload this file manually to AWS S3 and access through CDN. Able to view the PDF properly this way. So there is some issue with how the file is uploaded to S3.
The issue was not with PDF file download, modification, upload operations. Actually the CDN had a caching policy due to which the initially generated garbage content files were getting served on further requests. After clearing the cache and trying again the files were properly viewable with the CDN URL.

Getting Failed - Network Error while downloading PDF file from Amazon S3

Goal: Try to download a pdf file from Amazon S3 to my local machine via a NodeJS/VueJS application without creating a file on the server's filesystem.
Server: NodeJs(v 18.9.0) Express (4.17.1)
Middleware function that retrieves the file from S3 and converts the stream into a base64 string and sends that string to the client:
const filename = 'lets_go_to_the_snackbar.pdf';
const s3 = new AWS.S3(some access parameters);
const params = {
Bucket: do_not_kick_this_bucket,
Key: `yellowbrickroad/${filename}`
}
try {
const data = await s3
.getObject(params)
.promise();
const byte_string = Buffer.from(data.Body).toString('base64');
res.send(byte_string);
} catch (err) {
console.log(err);
}
Client: VueJS( v 3.2.33)
Function in component receives byte string via an axios (v 0.26.1) GET call to the server. The code to download is as follows:
getPdfContent: async function (filename) {
const resp = await AxiosService.getPdf(filename) // Get request to server made here.
const uriContent = `data:application/pdf;base64,${resp.data}`
const link = document.createElement('a')
link.href = uriContent
link.download = filename
document.body.appendChild(link) // Also tried omitting this line along with...
link.click()
link.remove() // ...omitting this line
}
Expected Result(s):
Browser opens a window to allow a directory to be selected as the file's destination.
Directory Selected.
File is downloaded.
Ice cream and mooncakes are served.
Actual Results(s):
Browser opens a window to allow a directory to be selected as the file's destination
Directory Selected.
Receive Failed - Network Error message.
Lots of crying...
Browser: Chrome (Version 105.0.5195.125 (Official Build) (x86_64))
Read somewhere that Chrome will balk at files larger than 4MB, so I checked the S3 bucket and according to Amazon S3 the file size is a svelte 41.7KB.
After doing some reading, a possible solution was presented that I tried to implement. It involved making a change to the VueJs getPdfContent function as follows:
getPdfContent: async function (filename) {
const resp = await AxiosService.getPdf(filename) // Get request to server made here.
/**** This is the line that was changed ****/
const uriContent = window.URL.createObjectURL(new Blob([resp.data], { type: 'application/pdf' } ))
const link = document.createElement('a')
link.href = uriContent
link.download = filename
document.body.appendChild(link) // Also tried omitting this line along with...
link.click()
link.remove() // ...omitting this line
}
Actual Results(s) for updated code:
Browser opens a window to allow a directory to be selected as the file's destination
Directory Selected.
PDF file downloaded.
Trying to open the file produces the message:
The file “lets_go_to_the_snackbar.pdf” could not be opened.
It may be damaged or use a file format that Preview doesn’t recognize.
I am able to download the file directly from S3 using the AWS S3 console with no problems opening the file.
I've read through similar postings and tried implementing their solutions, but found no joy. I would be highly appreciative if someone can
Give me an idea of where I am going off the path towards reaching the goal
Point me towards the correct path.
Thank you in advance for your help.
After doing some more research I found the problem was how I was returning the data from the server back to the client. I did not need to modify the data received from the S3 service.
Server Code:
let filename = req.params.filename;
const params = {
Bucket: do_not_kick_this_bucket,
Key: `yellowbrickroad/${filename}`
}
try {
const data = await s3
.getObject(params)
.promise();
/* Here I did not modify the information returned */
res.send(data.Body);
res.end();
} catch (err) {
console.log(err);
}
On the client side my VueJS component receives a Blob object as the response
Client Code:
async getFile (filename) {
let response = await AuthenticationService.downloadFile(filename)
const uriContent = window.URL.createObjectURL(new Blob([response.data]))
const link = document.createElement('a')
link.setAttribute('href', uriContent)
link.setAttribute('download', filename)
document.body.appendChild(link)
link.click()
link.remove()
}
In the end the goal was achieved; a file on S3 can be downloaded directly to a user's local machine without the application storing a file on the server.
I would like to mention Sunpun Sandaruwan's answer which gave me the final clue I needed to reach my goal.

Problem file naming when dowloading the file

this is my first stack post, sorry if it's a little blurry :/
So basically I have a Angular project with firestore behind. I got a cloud function which generates an .xlsx file and upload it to my fireStorage.
const path = 'hellothere/excels';
return workBook.xlsx.writeFile(`/tmp/myExcel.xlsx`).then(() => {
return storageFb.upload( `/tmp/myexcel.xlsx`,{
destination: path+'/myExcel.xlsx',
}
)
}).then(() => path);
Where StorageFb is the bucket of my storage.
Actuelly it's working, it uploads my .xlsx file under /hellothere/excels/ with the name myExcel.xlsx. But when I download it (by the admin panel or my angular client), it is fully named hellothere_excels_myExcel.xlsx.
Here is my client code:
this.fireStorage.ref('hellothere/excels/myExcel.xlsx').getDownloadURL().subscribe((url) => {
window.open(url, '_blank');
});
return Promise.resolve();
Simply. I know the code is messy but i'm testing all solution I can find so i'll clean it up afterall
Admin panel path
My file name
So I'm kinda stuck since I dunno why those file won't download with just the 'myExcel' name.
If anyone have a clue you'll save my week ahah ! Thanks !
You need to set the content disposition to define the filename. Try that
const path = 'hellothere/excels';
return workBook.xlsx.writeFile(`/tmp/myExcel.xlsx`).then(() => {
return storageFb.upload( `/tmp/myexcel.xlsx`,{
destination: path+'/myExcel.xlsx',
contentDisposition: 'filename=myExcel.xlsx'
}
)
}).then(() => path);

Intercept document download with Puppeteer and Extract CSV Data

I would like to download a .csv file from the browser, intercept it, and extract the data to convert it into a JSON object. Most responses say to use the requests.buffer(), however, my situation is unique as it always says the buffer is empty, but the file downloads.
I have tried to pull requests.buffer()
downloadPage.on('request', request => {
console.log(request.isNavigationRequest());
console.log(nextRequest);
if (request.isNavigationRequest() && !nextRequest) {
return request.abort();
}
initialRequest = false;
request.continue();
});
downloadPage.on('response', async (response) => {
console.log(response.buffer());
file_data = JSON.parse(response.buffer());
})
await Promise.all([
downloadPage.goto('https://clients.messagelabs.com/Tools/Track-And-Trace/DownloadCsv.ashx?sessionid=' + json_data.request.SessionId).catch(err => console.log(err)),
page.waitForNavigation()
])
Since I am on a corporate network, it won't let me upload my images, but...
When I proceed to the link above on downloadPage.goto(...) it automatically begins a .csv file download. Than the page closes. I think the page closing is clearing the buffer, however, I can't seem to intercept the response to grab the file data before this happens. Any ideas are appreciated.
Please do not link me to another github that tells me to use the request.buffer(), as I have tried many variations.
Error: Protocol error (Network.getResponseBody): No data found for resource with given identifier

Can we somehow rename the file that is being downloaded using puppeteer?

I am downloading a file through puppeteer into my directory. I need to upload this file to an s3 bucket so I need to pick up the file name. But the problem is, this file name has a time stamp that changes every time so I can't keep a hard coded name. So is there a way around this to get a constant name every time (even if the old file is replaced), or how to rename the file being downloaded?
I thought of using node's fs.rename() function but that would again require the current file name.
I want a constant file name to hard code and then upload into the s3 bucket.
await page._client.send('Page.setDownloadBehavior', {behavior: 'allow', downloadPath: './xml'}); // This sets the directory
await page.keyboard.press('Tab');
await page.keyboard.press('Enter'); // This downloads an XML file.
You have two options:
Monitor the requests/responses to log the name of the file and rename it via Node.js
Use the Chrome DevTools Protocol to modify the response header
Option 1: Monitor the requests / response
This is the most straight-forward way to do it. Monitor all responses and in case you notice the response that is being downloaded, use the name to rename it locally via fs.rename.
Code Sample
const path = require('path');
// ...
page.on('response', response => {
const url = response.request().url();
const contentType = response.headers()['content-type'];
if (/* URL and/or contentType matches pattern */) {
const fileName = path.basename(response.request().url());
// handle and rename file name (after making sure it's downloaded)
}
});
The code listens to all responses and wait for a specific pattern (e.g. contentType === 'application/pdf'). Then it takes the file name from the request. Depending on your use case, you might want to check the Content-Disposition header in addition. After that, you have to wait until the file is downloaded (e.g. file is present and file size does not change) and then you can rename it.
Option 2: Use the Chrome DevTools Protocol to modify the response header
I'm 99% sure, that this is possible. You need to intercept the response which is currently not supported by puppeteer itself. But as the Chrome DevTools Protocol is supporting this functionality, you can use it using the low-level protocol.
The idea is to intercept the response and change the Content-Disposition header to your desired file name.
Here is the idea:
Use chrome-remote-interface or a CDP Session to activate Network.requestIntercepted
Listen for Network.requestIntercepted events
Send Network.getResponseBodyForInterception to receive the body of the response
Modify the body and add (or change) the Content-Disposition header to include your filename
Call Network.continueInterceptedRequest with your modified response
Your file should then be save with your modified file name. Check out this comment on github for a code sample. As I already explained it is a rather sophisticated approach as long as puppeteer does not support modifying responses.
You can save the file using GUID as the filename and rename it when the download is completed.
const puppeteer = require('puppeteer');
const fs = require('fs');
const path = require('path');
const downloadFolder = path.resolve('./DOWNLOAD-FOLDER-HERE');
// Act like a dictionary storing the filename for each file with guid
let guids = {};
const browser = await puppeteer.launch({
headless: false
});
let client = await browser.target().createCDPSession();
await client.send('Browser.setDownloadBehavior', {
behavior: 'allowAndName', //allow downloading file and save the file using guid as the filename
downloadPath: downloadFolder, // specify the download folder
eventsEnabled: true //set true to emit download events (e.g. Browser.downloadWillBegin and Browser.downloadProgress)
});
client.on('Browser.downloadWillBegin', async (event) => {
//some logic here to determine the filename
//the event provides event.suggestedFilename and event.url
guids[event.guid] = 'FILENAME.pdf';
});
client.on('Browser.downloadProgress', async (event) => {
// when the file has been downloaded, locate the file by guid and rename it
if(event.state === 'completed') {
fs.renameSync(path.resolve(downloadFolder, event.guid), path.resolve(downloadFolder, guids[event.guid]));
}
});

Resources