Merge PDF from a distant URL, with pdfkit - node.js

I have resources as PDF stored in AWS S3 bucket. I know their URL
I would like, given a list of desired PDF, get them, concat them in a single file, reupload all of it in S3, and return the new URL to be displayed in the browser
I would like if possible to use pdfkit librairy, in order to use a librairy already present in my application.
I have this so far :
public async concat2(filesId: string[]): Promise<boolean> {
const doc = new PDFDocument({ margin: 40, size: 'A4' });
filesId.map(id => doc.file(Buffer.from(`https://path-to-s3.com/${id}`)));
doc.end();
const promises: Promise<unknown>[] = [
this.s3
.upload({
Key: `test-concat.pdf`,
Bucket: 'my-s3-bucket',
Body: doc,
ContentType: 'application/pdf',
})
.promise(),
];
await Promise.all(promises);
return true;
}
The problem is I get a single-file blank file, as if the Buffer data from the existing files was null.
I even tried to reference a local file instead of the Buffer with :
doc.text('Hello world!');
doc.file('assets/pdf/test.pdf');
the result was a single-file document with "hello world", but no other data whatsoever.
What am I missing?

Related

Error loading preview on Firebase Storage with images [Uploaded from Firebase Admin SDK]

I am uploading images to firebase storage using the Admin SDK from NodeJS. When I try to preview the file it doesn't load because it is broken.
Its size is correct, but the preview in the dashboard just throws an error, and the image url returns a white/black small square (depends on the browser).
This is my code from NodeJS:
const bufferStream = new stream.PassThrough();
await bufferStream.end(Buffer.from(user.photoURL, 'base64'));
const mimeType = user.photoURL.match(/[^:]\w+\/[\w-+\d.]+(?=;|,)/)[0];
const fileExtension = mimeType.split('/').pop();
const file = storageBucket.file(`avatars/${user.username}.${fileExtension}`);
const uid = v4();
console.log(uid);
bufferStream.pipe(file.createWriteStream({
metadata: {
contentType: mimeType,
metadata: {
firebaseStorageDownloadTokens: uid,
},
},
}))
.on('error', (error) => {
console.log('error', error);
})
.on('finish', () => {
// The file upload is complete.
console.log('COMPLETED, WORKED');
});
I managed to solve it myself in the end, just for future reference:
Given the exact same code I posted in the original question, I just had to remove the text preceding the actual base64.
Original base64: data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA=
Needed base64: /9j/4AAQSkZJRgABAQAAAQA=
Just remove the first part of the string so it looks like this:
bufferStream.end(Buffer.from(user.photoURL.split(';base64,')[1], 'base64'));

Trying to retrieve an mp3 file stored in AWS S3 and load it into my React client as a Blob...it's not working

I have a react web app that allows users to record mp3 files in the browser. These mp3 files are saved in an AWS S3 bucket and can be retrieved and loaded back into the react app during the user's next session.
Saving the file works just fine, but when I try to retrieve the file with getObject() and try to create an mp3 blob on the client-side, I get a small, unusable blob:
Here's the journey the recorded mp3 file goes on:
1) Saving to S3
In my Express/Node server, I receive the uploaded mp3 file and save to the S3 bucket:
//SAVE THE COMPLETED AUDIO TO S3
router.post("/", [auth, upload.array('audio', 12)], async (req, res) => {
try {
//get file
const audioFile = req.files[0];
//create object key
const userId = req.user;
const projectId = req.cookies.currentProject;
const { sectionId } = req.body;
const key = `${userId}/${projectId}/${sectionId}.mp3`;
const fileStream = fs.createReadStream(audioFile.path)
const uploadParams = {
Bucket: bucketName,
Body: fileStream,
Key: key,
ContentType: "audio/mp3"
}
const result = await s3.upload(uploadParams).promise();
res.send(result.key);
} catch (error) {
console.error(error);
res.status(500).send();
}
});
As far as I know, there are no problems at this stage. The file ends up in my S3 bucket with "type: mp3" and "Content-Type: audio/mp3".
2) Loading file from S3 Bucket
When the react app is loaded up, an HTTP GET Request is made in my Express/Node server to retrieve the mp3 file from the S3 Bucket
//LOAD A FILE FROM S3
router.get("/:sectionId", auth, async(req, res) => {
try {
//create key from user/project/section IDs
const sectionId = req.params.sectionId;
const userId = req.user;
const projectId = req.cookies.currentProject;
const key = `${userId}/${projectId}/${sectionId}.mp3`;
const downloadParams = {
Key: key,
Bucket: bucketName
}
s3.getObject(downloadParams, function (error, data) {
if (error) {
console.error(error);
res.status(500).send();
}
res.send(data);
});
} catch (error) {
console.error(error);
res.status(500).send();
}
});
The "data" returned here is as such:
3) Making a Blob URL on the client
Finally, in the React client, I try to create an 'audio/mp3' blob from the returned array buffer
const loadAudio = async () => {
const res = await api.loadAudio(activeSection.sectionId);
const blob = new Blob([res.data.Body], {type: 'audio/mp3' });
const url = URL.createObjectURL(blob);
globalDispatch({ type: "setFullAudioURL", payload: url });
}
The created blob is severely undersized and appears to be completely unusable. Downloading the file results in a 'Failed - No file' error.
I've been stuck on this for a couple of days now with no luck. I would seriously appreciate any advice you can give!
Thanks
EDIT 1
Just some additional info here: in the upload parameters, I set the Content-Type as audio/mp3 explicitly. This is because when not set, the Content-Type defaults to 'application/octet-stream'. Either way, I encounter the same issue with the same result.
EDIT 2
At the request of a commenter, here is the res.data available on the client-side after the call is complete:
Based on the output of res.data on the client, there are a couple of things that you'd need to do:
Replace uses of res.data.Body with res.data.Body.data (as the actual data array is in the data attribute of res.data.Body)
Pass a Uint8Array to the Blob constructor, as the existing array is of a larger type, which will create an invalid blob
Putting that together, you would end up replacing:
const blob = new Blob([res.data.Body], {type: 'audio/mp3' });
with:
const blob = new Blob([new Uint8Array(res.data.Body.data)], {type: 'audio/mp3' });
Having said all that, the underlying issue is that the NodeJS server is sending the content over as a JSON encoded serialisation of the response from S3, which is likely overkill for what you are doing. Instead, you can send the Buffer across directly, which would involve, on the server side, replacing:
res.send(data);
with:
res.set('Content-Type', 'audio/mp3');
res.send(data.Body);
and on the client side (likely in the loadAudio method) processing the response as a blob instead of JSON. If using the Fetch API then it could be as simple as:
const blob = await fetch(<URL>).then(x => x.blob());
Your server side code seems alright to me. I'm not super clear about the client-side approach. Do you load this into the blob into the HTML5 Audio player.
I have a few approaches, assuming you're trying to load this into an audio tag in the UI.
<audio controls src="data:audio/mpeg;base64,blahblahblah or html src" />
Assuming that the file you had uploaded to S3 is valid here are two approaches:
Return the data as a base64 string instead of as a buffer directly from S3. You can do this in your server side by returning as
const base64MP3 = data.Body.toString('base64');
You can then pass this in to the MP3 player in the src property and it will play the audio. Prefix it with data:audio/mpeg;base64
Instead of returning the entire MP3 file, have your sectionID method return a presigned S3 URL. Essentially, this is a direct link to the object in S3 that is authorized for say 5 minutes.
Then you should be able to use this URL directly to stream the audio
and set it as the src. Keep in mind that it will expire.
const url = s3.getSignedUrl('getObject', {
Bucket: myBucket,
Key: myKey,
Expires: signedUrlExpireSeconds
});
You stated: "The created blob is severely undersized and appears to be completely unusable"
This appears to me that you have an encoding issue. Once you read the MP3 from the Amazon S3 bucket, you need to encode it properly so it functions in a web page.
I did a similar multimedia use case that involved MP4 and a Java app. That is, i wanted a MP4 obtained from a bucket to play in the web page - as shown in this example web app.
Once I read the byte stream from the S3 bucket, I had to encode it so it would play in a HTML Video tag. Here is a good reference to properly encode a MP3 file.

creategunzip() on google cloud storage object

So I'm uploading backup files in JSON-format to a google cloud storage bucket. Server is NodeJS. To save space, I want to compress the files before uploading.
My function to upload a file is:
const bufferStream = new stream.PassThrough()
bufferStream.end(Buffer.from(req.file.buffer, 'utf8'))
const bucket = storage.bucket('backups')
const filename = 'backup.json.gz'
const file = bucket.file(filename)
const writeStream = file.createWriteStream({
metadata: {
contentType: 'application/json',
contentEncoding: 'gzip'
},
validation: "md5"
})
bufferStream.pipe(zlib.createGzip()).pipe(writeStream).on('finish', async () => {
return res.status(200).end()
})
This function works. I have a problem with the decompressing, while downloading. My function here is:
const bucket = storage.bucket('backups')
let backup = ''
const readStream = bucket.file('backup.json.gz').createReadStream()
readStream.pipe(zlib.createGunzip()) // <-- here
readStream.on('data', (data) => {
backup += data
})
readStream.on('end', () => {
res.status(200).send(backup).end()
})
When I use the download function, I get the following error:
Error: incorrect header check
Errno: 3
code: Z_DATA_ERROR
When I just delete the createGunzip() function, it all works! I can even read the content of the file, but for some reason, I'm thinking this might not be the ideal solution. Now, for testing, I have files with max. filesize 50kB but problably will get files > 10Mb in production.
Does the createGunzip() function expects a buffer? Or is there something else wrong?
Thanks!
According to the documentation if your objects are gzipped and uploaded properly,
then the returned object will be automatically decompressed, that's why no gunzipping needed in your case.
If you want to receive the file as-is then you should include Accept-Encoding: gzip headers with your request.

How to write to an existing file in a S3 bucket based on the pre signed URL?

I've been searching for a way to write to a JSON file in a S3 bucket from the pre signed URL. From my research it appears it can be done but these are not in Node:
http PUT a file to S3 presigned URLs using ruby
PUT file to S3 with presigned URL
Uploading a file to a S3 Presigned URL
Write to a AWS S3 pre-signed url using Ruby
How to create and read .txt file with fs.writeFile to AWS Lambda
Not finding a Node solution from my searches and using a 3rd party API I'm trying to write the callback to a JSON that is in a S3 bucket. I can generate the pre signed URL with no issues but when I try to write dummy text to the pre signed URL I get:
Error: ENOENT: no such file or directory, open
'https://path-to-file-with-signed-url'
When I try to use writeFile:
fs.writeFile(testURL, `This is a write test: ${Date.now()}`, function(err) {
if(err) return err
console.log("File written to")
})
and my understanding of the documentation under file it says I can use a URL. I'm starting to believe this might be a permissions issue but I'm not finding any luck in the documentation.
After implementing node-fetch I still get an error (403 Forbidden) writing to a file in S3 based on the pre signed URL, here is the full code from the module I've written:
const aws = require('aws-sdk')
const config = require('../config.json')
const fetch = require('node-fetch')
const expireStamp = 604800 // 7 days
const existsModule = require('./existsModule')
module.exports = async function(toSignFile) {
let checkJSON = await existsModule(`${toSignFile}.json`)
if (checkJSON == true) {
let testURL = await s3signing(`${toSignFile}.json`)
fetch(testURL, {
method: 'PUT',
body: JSON.stringify(`This is a write test: ${Date.now()}`),
}).then((res) => {
console.log(res)
}).catch((err) => {
console.log(`Fetch issue: ${err}`)
})
}
}
async function s3signing(signFile) {
const s3 = new aws.S3()
aws.config.update({
accessKeyId: config.aws.accessKey,
secretAccessKey: config.aws.secretKey,
region: config.aws.region,
})
params = {
Bucket: config.aws.bucket,
Key: signFile,
Expires: expireStamp
}
try {
// let signedURL = await s3.getSignedUrl('getObject', params)
let signedURL = await s3.getSignedUrl('putObject', params)
console.log('\x1b[36m%s\x1b[0m', `Signed URL: ${signedURL}`)
return signedURL
} catch (err) {
return err
}
}
Reviewing the permissions I have no issues with uploading and write access has been set in the permissions. In Node how can I write to a file in the S3 bucket using that file's pre-signed URL as the path?
fs is the filesystem module. You can't use it as an HTTP client.
You can use the built-in https module, but I think you'll find it easier to use node-fetch.
fetch('your signed URL here', {
method: 'PUT',
body: JSON.stringify(data),
// more options and request headers and such here
}).then((res) => {
// do something
}).catch((e) => {
// do something else
});
Was looking for an elegant way to transfer s3 file to an s3 signed url using PUT. Most examples I found were using the PUT({body : data}). I came across one suggestion to read the data to a readable stream and then pipe it to the PUT. However I still didn't like the notion of loading large files into memory and then assigning them to the put stream. Piping read to write is always better in memory and performance. Since the s3.getObject().createReadStream() returns a request object, which supports pipe, all that we need to do is to pipe it correctly to the PUT request which exposes a write stream.
Get object function
async function GetFileReadStream(key){
return new Promise(async (resolve,reject)=>{
var params = {
Bucket: bucket,
Key: key
};
var fileSize = await s3.headObject(params)
.promise()
.then(res => res.ContentLength);
resolve( {stream : s3.getObject(params).createReadStream(),fileSize});
});
}
Put object function
const request = require('request');
async function putStream(presignedUrl,readStream){
return new Promise((resolve,reject)=>{
var putRequestWriteStream = request.put({url:presignedUrl,headers:{'Content-Type':'application/octet-stream','Content-Length':readStream.fileSize }});
putRequestWriteStream.on('response', function(response) {
var etag = response.headers['etag'];
resolve(etag);
})
.on('end', () =>
console.log("put done"))
readStream.stream.pipe(putRequestWriteStream);
});
}
This works great with a very small memory foot print. Enjoy.

AWS S3 - Fetch PDF as octet-stream and upload to S3 bucket

I'm fetching a PDF from a 3rd-party API. The response content-type is application/octet-stream. Thereafter, I upload it to S3 but if I go to S3 and download the newly written file, the content is not visible, the pages are blank, viewed in Chromium and Adobe Acrobat. The file is also not zero bytes and has the correct number of pages.
Using the binary encoding gives me a file size closest to the actual file size. But it's still not exact, it's slightly smaller.
The API request (using the request-promise module):
import { get } from 'request-promise';
const payload = await get('someUrl').catch(handleError);
const buffer = Buffer.from(payload, 'binary');
const result = await new S3().upload({
Body: buffer,
Bucket: 'somebucket',
ContentType: 'application/pdf',
ContentEncoding: 'binary',
Key: 'somefile.pdf'
}).promise();
Additionally, downloading the file from Postman also results in a file with blank pages. Does anybody know where I am going wrong here?
As #Micheal - sqlbot mentioned in the comments, the download was the issue. I wasn't getting the entire byte stream from the API.
Changing const payload = await get('someUrl').catch(handleError);
to
import * as request from 'request'; // notice I've imported the base request lib
let bufferArray = [];
request.get('someUrl')
.on('response', (res) => {
res.on('data', (chunk) => {
bufferArray = bufferArray.concat(Buffer.from(chunk)); //save response in a temp array for now
});
.on('end', () => {
const dataBuffer = Buffer.concat(bufferArray); //this now contains all my data
//send to s3
});
});
Note: it is not recommended to stream responses with the request-promise library - outlined in the documentation. I used the base request library instead.
https://github.com/request/request-promise#api-in-detail

Resources