Stream node requests to the cloud with file metadata - node.js

Im using koa in order to build a web app, and I want to allow users to upload files to it. The files need to be streamed to the cloud, but I would like to avoid saving the file locally.
The problem is that I need some file metadata before I pipe the upload stream to the writeable stream. I want to have the mime-type and optionally attach other data like the original file name etc.
I tried sending the binary data with the request's "content-type" header set to the file's type, but I would like the request to have the content type application/octet-stream so I can know in the back-end how to handle the request.
I read somewhere that the better option would be to use multipart/form-data but I'm not sure how to structure the request, and how to parse the metadata in order to notify the cloud before I pipe to its write stream.
Here is the code im currently using. Basically, it just pipes the request as is, and I use the request header to know the type of the file:
module.exports = async ctx => {
// Generate a random id that will be part of the filename.
const id = pushid();
// Get the content type from the header.
const contentType = ctx.header['content-type'];
// Get the extension for the file from the content type
const ext = contentType.split('/').pop();
// This is the configuration for the upload stream to the cloud.
const uploadConfig = {
// I must specify a content type, or know the file extension.
contentType
// there is some other stuff here but its not relevant.
};
// Create a upload stream for the cloud storage.
const uploadStream = bucket
.file(`assets/${id}/original.${ext}`)
.createWriteStream(uploadConfig);
// Here is what took me hours to get to work... dev life is hard
ctx.req.pipe(uploadStream);
// return a promise so Koa doesn't shut down the request before its finished uploading.
return new Promise((resolve, reject) =>
uploadStream.on('finish', resolve).on('error', reject)
);
};
Please assume I don't know much about the uploading protocols and managing streams.

Ok so after a lot of searching I found out that there is a parser that works with streams called busboy. It is pretty easy to use, but before jumping into the code I highly suggest everyone dealing with multipart/form-data requests to read this article.
Here is how I solved it:
const Busboy = require('busboy');
module.exports = async ctx => {
// Init busboy with the headers of the "raw" request.
const busboy = new Busboy({ headers: ctx.req.headers });
busboy.on('file', (fieldname, stream, filename, encoding, contentType) => {
const id = pushid();
const ext = path.extname(filename);
const uploadStream = bucket
.file(`assets/${id}/original${ext}`)
.createWriteStream({
contentType,
resumable: false,
metadata: {
cacheControl: 'public, max-age=3600'
}
});
stream.pipe(uploadStream);
});
// Pipe the request to busboy.
ctx.req.pipe(busboy);
// return a promise that resolves to whatever you want
ctx.body = await new Promise(resolve => {
busboy.on('finish', () => {
resolve('done');
});
});
};

Related

Trying to retrieve an mp3 file stored in AWS S3 and load it into my React client as a Blob...it's not working

I have a react web app that allows users to record mp3 files in the browser. These mp3 files are saved in an AWS S3 bucket and can be retrieved and loaded back into the react app during the user's next session.
Saving the file works just fine, but when I try to retrieve the file with getObject() and try to create an mp3 blob on the client-side, I get a small, unusable blob:
Here's the journey the recorded mp3 file goes on:
1) Saving to S3
In my Express/Node server, I receive the uploaded mp3 file and save to the S3 bucket:
//SAVE THE COMPLETED AUDIO TO S3
router.post("/", [auth, upload.array('audio', 12)], async (req, res) => {
try {
//get file
const audioFile = req.files[0];
//create object key
const userId = req.user;
const projectId = req.cookies.currentProject;
const { sectionId } = req.body;
const key = `${userId}/${projectId}/${sectionId}.mp3`;
const fileStream = fs.createReadStream(audioFile.path)
const uploadParams = {
Bucket: bucketName,
Body: fileStream,
Key: key,
ContentType: "audio/mp3"
}
const result = await s3.upload(uploadParams).promise();
res.send(result.key);
} catch (error) {
console.error(error);
res.status(500).send();
}
});
As far as I know, there are no problems at this stage. The file ends up in my S3 bucket with "type: mp3" and "Content-Type: audio/mp3".
2) Loading file from S3 Bucket
When the react app is loaded up, an HTTP GET Request is made in my Express/Node server to retrieve the mp3 file from the S3 Bucket
//LOAD A FILE FROM S3
router.get("/:sectionId", auth, async(req, res) => {
try {
//create key from user/project/section IDs
const sectionId = req.params.sectionId;
const userId = req.user;
const projectId = req.cookies.currentProject;
const key = `${userId}/${projectId}/${sectionId}.mp3`;
const downloadParams = {
Key: key,
Bucket: bucketName
}
s3.getObject(downloadParams, function (error, data) {
if (error) {
console.error(error);
res.status(500).send();
}
res.send(data);
});
} catch (error) {
console.error(error);
res.status(500).send();
}
});
The "data" returned here is as such:
3) Making a Blob URL on the client
Finally, in the React client, I try to create an 'audio/mp3' blob from the returned array buffer
const loadAudio = async () => {
const res = await api.loadAudio(activeSection.sectionId);
const blob = new Blob([res.data.Body], {type: 'audio/mp3' });
const url = URL.createObjectURL(blob);
globalDispatch({ type: "setFullAudioURL", payload: url });
}
The created blob is severely undersized and appears to be completely unusable. Downloading the file results in a 'Failed - No file' error.
I've been stuck on this for a couple of days now with no luck. I would seriously appreciate any advice you can give!
Thanks
EDIT 1
Just some additional info here: in the upload parameters, I set the Content-Type as audio/mp3 explicitly. This is because when not set, the Content-Type defaults to 'application/octet-stream'. Either way, I encounter the same issue with the same result.
EDIT 2
At the request of a commenter, here is the res.data available on the client-side after the call is complete:
Based on the output of res.data on the client, there are a couple of things that you'd need to do:
Replace uses of res.data.Body with res.data.Body.data (as the actual data array is in the data attribute of res.data.Body)
Pass a Uint8Array to the Blob constructor, as the existing array is of a larger type, which will create an invalid blob
Putting that together, you would end up replacing:
const blob = new Blob([res.data.Body], {type: 'audio/mp3' });
with:
const blob = new Blob([new Uint8Array(res.data.Body.data)], {type: 'audio/mp3' });
Having said all that, the underlying issue is that the NodeJS server is sending the content over as a JSON encoded serialisation of the response from S3, which is likely overkill for what you are doing. Instead, you can send the Buffer across directly, which would involve, on the server side, replacing:
res.send(data);
with:
res.set('Content-Type', 'audio/mp3');
res.send(data.Body);
and on the client side (likely in the loadAudio method) processing the response as a blob instead of JSON. If using the Fetch API then it could be as simple as:
const blob = await fetch(<URL>).then(x => x.blob());
Your server side code seems alright to me. I'm not super clear about the client-side approach. Do you load this into the blob into the HTML5 Audio player.
I have a few approaches, assuming you're trying to load this into an audio tag in the UI.
<audio controls src="data:audio/mpeg;base64,blahblahblah or html src" />
Assuming that the file you had uploaded to S3 is valid here are two approaches:
Return the data as a base64 string instead of as a buffer directly from S3. You can do this in your server side by returning as
const base64MP3 = data.Body.toString('base64');
You can then pass this in to the MP3 player in the src property and it will play the audio. Prefix it with data:audio/mpeg;base64
Instead of returning the entire MP3 file, have your sectionID method return a presigned S3 URL. Essentially, this is a direct link to the object in S3 that is authorized for say 5 minutes.
Then you should be able to use this URL directly to stream the audio
and set it as the src. Keep in mind that it will expire.
const url = s3.getSignedUrl('getObject', {
Bucket: myBucket,
Key: myKey,
Expires: signedUrlExpireSeconds
});
You stated: "The created blob is severely undersized and appears to be completely unusable"
This appears to me that you have an encoding issue. Once you read the MP3 from the Amazon S3 bucket, you need to encode it properly so it functions in a web page.
I did a similar multimedia use case that involved MP4 and a Java app. That is, i wanted a MP4 obtained from a bucket to play in the web page - as shown in this example web app.
Once I read the byte stream from the S3 bucket, I had to encode it so it would play in a HTML Video tag. Here is a good reference to properly encode a MP3 file.

creategunzip() on google cloud storage object

So I'm uploading backup files in JSON-format to a google cloud storage bucket. Server is NodeJS. To save space, I want to compress the files before uploading.
My function to upload a file is:
const bufferStream = new stream.PassThrough()
bufferStream.end(Buffer.from(req.file.buffer, 'utf8'))
const bucket = storage.bucket('backups')
const filename = 'backup.json.gz'
const file = bucket.file(filename)
const writeStream = file.createWriteStream({
metadata: {
contentType: 'application/json',
contentEncoding: 'gzip'
},
validation: "md5"
})
bufferStream.pipe(zlib.createGzip()).pipe(writeStream).on('finish', async () => {
return res.status(200).end()
})
This function works. I have a problem with the decompressing, while downloading. My function here is:
const bucket = storage.bucket('backups')
let backup = ''
const readStream = bucket.file('backup.json.gz').createReadStream()
readStream.pipe(zlib.createGunzip()) // <-- here
readStream.on('data', (data) => {
backup += data
})
readStream.on('end', () => {
res.status(200).send(backup).end()
})
When I use the download function, I get the following error:
Error: incorrect header check
Errno: 3
code: Z_DATA_ERROR
When I just delete the createGunzip() function, it all works! I can even read the content of the file, but for some reason, I'm thinking this might not be the ideal solution. Now, for testing, I have files with max. filesize 50kB but problably will get files > 10Mb in production.
Does the createGunzip() function expects a buffer? Or is there something else wrong?
Thanks!
According to the documentation if your objects are gzipped and uploaded properly,
then the returned object will be automatically decompressed, that's why no gunzipping needed in your case.
If you want to receive the file as-is then you should include Accept-Encoding: gzip headers with your request.

How to write to an existing file in a S3 bucket based on the pre signed URL?

I've been searching for a way to write to a JSON file in a S3 bucket from the pre signed URL. From my research it appears it can be done but these are not in Node:
http PUT a file to S3 presigned URLs using ruby
PUT file to S3 with presigned URL
Uploading a file to a S3 Presigned URL
Write to a AWS S3 pre-signed url using Ruby
How to create and read .txt file with fs.writeFile to AWS Lambda
Not finding a Node solution from my searches and using a 3rd party API I'm trying to write the callback to a JSON that is in a S3 bucket. I can generate the pre signed URL with no issues but when I try to write dummy text to the pre signed URL I get:
Error: ENOENT: no such file or directory, open
'https://path-to-file-with-signed-url'
When I try to use writeFile:
fs.writeFile(testURL, `This is a write test: ${Date.now()}`, function(err) {
if(err) return err
console.log("File written to")
})
and my understanding of the documentation under file it says I can use a URL. I'm starting to believe this might be a permissions issue but I'm not finding any luck in the documentation.
After implementing node-fetch I still get an error (403 Forbidden) writing to a file in S3 based on the pre signed URL, here is the full code from the module I've written:
const aws = require('aws-sdk')
const config = require('../config.json')
const fetch = require('node-fetch')
const expireStamp = 604800 // 7 days
const existsModule = require('./existsModule')
module.exports = async function(toSignFile) {
let checkJSON = await existsModule(`${toSignFile}.json`)
if (checkJSON == true) {
let testURL = await s3signing(`${toSignFile}.json`)
fetch(testURL, {
method: 'PUT',
body: JSON.stringify(`This is a write test: ${Date.now()}`),
}).then((res) => {
console.log(res)
}).catch((err) => {
console.log(`Fetch issue: ${err}`)
})
}
}
async function s3signing(signFile) {
const s3 = new aws.S3()
aws.config.update({
accessKeyId: config.aws.accessKey,
secretAccessKey: config.aws.secretKey,
region: config.aws.region,
})
params = {
Bucket: config.aws.bucket,
Key: signFile,
Expires: expireStamp
}
try {
// let signedURL = await s3.getSignedUrl('getObject', params)
let signedURL = await s3.getSignedUrl('putObject', params)
console.log('\x1b[36m%s\x1b[0m', `Signed URL: ${signedURL}`)
return signedURL
} catch (err) {
return err
}
}
Reviewing the permissions I have no issues with uploading and write access has been set in the permissions. In Node how can I write to a file in the S3 bucket using that file's pre-signed URL as the path?
fs is the filesystem module. You can't use it as an HTTP client.
You can use the built-in https module, but I think you'll find it easier to use node-fetch.
fetch('your signed URL here', {
method: 'PUT',
body: JSON.stringify(data),
// more options and request headers and such here
}).then((res) => {
// do something
}).catch((e) => {
// do something else
});
Was looking for an elegant way to transfer s3 file to an s3 signed url using PUT. Most examples I found were using the PUT({body : data}). I came across one suggestion to read the data to a readable stream and then pipe it to the PUT. However I still didn't like the notion of loading large files into memory and then assigning them to the put stream. Piping read to write is always better in memory and performance. Since the s3.getObject().createReadStream() returns a request object, which supports pipe, all that we need to do is to pipe it correctly to the PUT request which exposes a write stream.
Get object function
async function GetFileReadStream(key){
return new Promise(async (resolve,reject)=>{
var params = {
Bucket: bucket,
Key: key
};
var fileSize = await s3.headObject(params)
.promise()
.then(res => res.ContentLength);
resolve( {stream : s3.getObject(params).createReadStream(),fileSize});
});
}
Put object function
const request = require('request');
async function putStream(presignedUrl,readStream){
return new Promise((resolve,reject)=>{
var putRequestWriteStream = request.put({url:presignedUrl,headers:{'Content-Type':'application/octet-stream','Content-Length':readStream.fileSize }});
putRequestWriteStream.on('response', function(response) {
var etag = response.headers['etag'];
resolve(etag);
})
.on('end', () =>
console.log("put done"))
readStream.stream.pipe(putRequestWriteStream);
});
}
This works great with a very small memory foot print. Enjoy.

Missing data when using busboy library to upload file inside of a Lambda?

I am trying to use busboy inside a lambda function to process a post request which is supposed to upload an image file. I notice that the whole file content is not making it to be parsed by busboy.
I tried changing the call to busboy.write to just use 'base64' since it looks like the file arrives in binary, but that didn't work either.
my client code
const formData = new FormData();
formData.append("file", params.file, params.file.name);
const request = new XMLHttpRequest();
request.open("POST", "https://myapi/uploadphoto");
request.setRequestHeader('Authorization', this.props.idToken);
request.send(formData);
my lambda code
function getFile(event) {
const busboy = new Busboy({headers: event.headers});
const result = {};
return new Promise((resolve, reject) => {
busboy.on('file', (fieldname, file, filename, encoding, mimetype) => {
file.on('data', data => {
result.content = data;
console.log("got data... " + data.length + ' bytes');
});
file.on('end', () => {
result.filename = filename;
result.contentType = mimetype;
resolve(result);
});
});
busboy.on('error', error => reject(error));
busboy.write(event.body, event.isBase64Encoded ? 'base64' : 'binary');
busboy.end();
});
}
When trying with an example photo, I notice that the "got data" console log is showing me that I am not receiving the whole file. The file I am using is 229707 bytes but the console log says that it received 217351 bytes.
I am wondering if I am using busboy wrong or if this is some quirk of lambda + api gateway. Any ideas or help troubleshooting is much appreciated.
I was struggling with this issue too, but in the end it was a problem with API Gateway.
I was able to solve the problem by adding multipart/form-data as a binary media type inside the settings of API Gateway.
To do it go to API Gateway > "Your API" > Settings > Add Binary Media Type and add multipart/form-data.
After that, deploy again your API and it should work.
I hope this helps anyone!

Pass ReadStream to a POST FORM from file on google cloud storage

I am trying to send a POST form including (raw) files and these files are located in a google cloud storage bucket
This code runs in a firebase cloud function - Instead of downloading the storage file to a the cloud function instance and then uploading it via the form (which works), I would like to pass the form the Stream directly
async function test() {
const rp = require('request-promise');
const path = require('path');
const { Storage } = require('#google-cloud/storage');
const storage = new Storage();
const bucketName = 'xxx';
const bucket = storage.bucket(bucketName);
const fileAPath = path.join('aaa', 'bbb.jpg');
let formData = {
fileA: bucket.file(fileAPath).createReadStream(),
};
return rp({
uri: uri,
method: 'POST',
formData: formData,
});
}
The POST works as intended if we download the file first (to a temp file on the cloud functions instance) and then use fs.createReadStream(fileAPath_tmp)
The POST fails (i.e. the end point is not receiving the file in the same way, if at all), when using the code above (no temp download) using bucket.file(fileAPath).createReadStream()
Based on the docs for Google File Storage createReadStream, you need to use the read stream as if it is an event emitter to populate a buffer to return to the end user. You should be able to use the .pipe() method to pipe it directly to the HTTP response, similar to your existing source code.
remoteFile.createReadStream()
.on('error', function(err) {})
.on('response', function(response) {
// Server connected and responded with the specified status and headers.
})
.on('end', function() {
// The file is fully downloaded.
})
.pipe(.....));

Resources