Stream a zip file in nodejs

Stream a zip file in nodejs - node.js

I am searching for a solution to stream my zip file in order to send it through to azure blob-storage.
Currently this is what I have
async _uploadStreamToBlob(zipFile, fileName) {
const blobService = await this.__self.blobStorage.createBlobService(this.__self.blobStorageConnectionString);
const containerName = this.__self.blobContainerName;
const sourceFilePath = `${path.resolve(zipFile)}`;
const streamSource = fs.createReadStream(sourceFilePath);
return new Promise((resolve, reject) => {
streamSource.pipe(blobService.createWriteStreamToBlockBlob(containerName, fileName, error => {
if (error) {
reject(error);
} else {
resolve({ message: `Upload of '${fileName}' complete` });
}
}));
});
};
This clearly does not work as I've tested otherwise since the fileStream feeds zero bytes into the pipe, resulting in a succesful upload of a 0 byte zipFile into the blob-storage.
How do I stream the zipFile onto the azureWriteStream? Or how do I get the bytes off of the zipFile(preserving the contents)?
If there is any other way to achieving this, I am all ears.
Thanks

Use createBlockBlobFromLocalFile directly:
blobService.createBlockBlobFromLocalFile(containerName, fileName, sourceFilePath, (err) => {
// Handle err
});

Related

Returning value from "finish" event after piping data from a read stream to a write stream

I am trying to create a read stream and then pipe the contents of a word document XML file to a write stream and then read from that finished write stream. The problem I am running into is that on the first sequence of reading then writing and then reading I get a [Error: ENOENT: no such file or directory, open] error. However after this file was created from the first attempt the code runs smoothly and returns the pageCount value as expected.
I have tried to read from the completed file and then return the pageCount value inside of the 'finish' event, but that just leaves me with an undefined returned value. As such, I am not sure what to do.
Any help would be appreciated for this struggling junior.
Update, the following code worked for me.
console.log("unzipping");
const createWordOutput = new Promise((resolve, reject) => {
console.log("end is firing");
fs.createReadStream(data)
.pipe(unzipper.Parse())
.on("entry", async function (entry) {
const fileName = entry.path;
const type = entry.type;
const size = entry.vars.uncompressedSize;
//docProps has the meta data of the document, the word/document.xml is the actual document
if (fileName === "docProps/app.xml") {
// console.log(fileName);
entry.pipe(fs.createWriteStream("./wordOutput")).on("finish", () => {
console.log("finished writing the file");
console.log("resolving");
return resolve();
});
//once the piping is completed and the XML structure is fully writen a 'finish' event is emitted. This event accepts a callback. Here I put the cb and call readTheFile on the ./output file. This successfully reads the metadata of each file
} else {
entry.autodrain();
}
});
});
await createWordOutput;
const pageCount = await readWordFile("./wordOutput");
if (pageCount === undefined) {
console.log("PAGECOUNT IS UNDEFINED");
}
console.log("logging page count in unzip the file");
console.log(pageCount);
return pageCount;
};```

The error is coming from readWordFile, because it runs before the stream is done.
You need to move reading to the finish part
Try this:
console.log("unzipping");
let pageCount = "";
fs.createReadStream(data)
.pipe(unzipper.Parse())
.on("entry", function(entry) {
const fileName = entry.path;
const type = entry.type;
const size = entry.vars.uncompressedSize;
if (fileName === "docProps/app.xml") {
// console.log(fileName);
entry.pipe(fs.createWriteStream("./wordOutput")).on("finish", async() => {
console.log("finished writing the file");
// finished writing, do everything here, and return
pageCount = await readWordFile("./wordOutput");
if (pageCount === undefined) {
console.log("PAGECOUNT IS UNDEFINED");
}
console.log("logging page count in unzip the file");
console.log(pageCount);
return pageCount;
});
} else {
entry.autodrain();
}
});
};
const readWordFile = (data) => {
return new Promise(async(resolve, reject) => {
console.log("this is the data that readWordFile received");
console.log(data);
console.log("reading word file");
const XMLData = await fsp.readFile(data, {
encoding: "utf-8"
});
console.log(XMLData);
const pageCount = XMLData.split("<Pages>")
.join(",")
.split("</Pages>")
.join(",")
.split(",")[1];
console.log("getting page count from read word file");
console.log(pageCount);
resolve(pageCount);
});
};

How to readFile() with async / execFile() Node js

Thanks in advance.
I'm creating an Electron-Create-React-App using electron-forge on Windows 10 Pro and am stuck with using async functions with execFile and readFile().
I want to achieve the following:-
main process - Receive a buffer of a screen capture (video) from the renderer process.
Create a temporary file and write the buffer to a .mp4 file.
Crop the video (based on x:y:width:height) using ffmpeg (installed in Electron as a binary).
Output = .mp4 file in temporary directory
Read the cropped .mp4 file using fs.readFile() (as a base64 encoded buffer)
Send the buffer to another renderer screen.
Delete temp file.
Q: I've managed to do most of it but cannot access the cropped .mp4 file in the temp directory.
I've tried the following:-
Electron main process
const fs = require('fs').promises
const path = require('path')
ipcMain.on("capture:buffer", async (video_object) => {
const {x_pos, y_pos, window_width, window_height, buffer} = video_object
try {
const dir = await fs.mkdtemp(await fs.realpath(os.tmpdir()) + path.sep)
const captured_video_file_path = path.join(dir, "screen_capture_video.mp4")
// This works
await fs.writeFile(captured_video_file_path, buffer, (error, stdout, stderr) => {
if (error) {
console.log(error)
}
console.log("Screen Capture File written")
})
// This also works
execFile(`${ffmpeg.path}`,
['-i', `${captured_video_file_path}`, '-vf',
`crop=${window_width}:${window_height}:${x_pos}:${y_pos}`,
`${path.join(dir,'cropped_video.mp4')}`],
(error, stdout, stderr) => {
if (error) {
console.log(error.message)
}
if (stderr) {
console.log(stderr)
}
console.log("Cropped File created")
})
// This code onwards doesn't work
await fs.readFile(path.join(dir, "cropped_video.mp4"), 'base64', (error, data) => {
if (error) {
console.log(error)
}
// To renderer
mainWindow.webContents.send("main:video_buffer", Buffer.from(data))
})
} catch (error) {
console.log(error)
} finally {
fs.rmdir(dir, {recursive: true})
}
})
When trying to read the file i get the following error :-
[Error: ENOENT: no such file or directory, open 'C:\Users\XXXX\XXXXX\XXXXX\temp\temp_eYGMCR\cropped_video.mp4']
I've checked that the correct path exists with console.log.
I suspect it is a 'simple' issue with using async / execFile() properly but don't know exactly where I am making a silly mistake.
Any help would be appreciated.
Thanks.

Because at that time of calling fs.readFile, execFile may not be done yet.
Untested, but you may want to create a promise and wait for execFile to be completed before proceeding and see whether it works.
await new Promise( resolve => {
execFile(`${ffmpeg.path}`,
['-i', `${captured_video_file_path}`, '-vf',
`crop=${window_width}:${window_height}:${x_pos}:${y_pos}`,
`${path.join(dir,'cropped_video.mp4')}`],
(error, stdout, stderr) => {
if (error) {
console.log(error.message)
}
if (stderr) {
console.log(stderr)
}
console.log("Cropped File created")
resolve() //this tells `await` it's ready to move on
})
})

Thanks for the pointers guys.
Here's the solution I found.
Another big problem with safely creating and removing temporary directories in Electron is fs.rmdir() doesn't work when using Electron-Forge / Builder due to an issue with ASAR files.
(ASAR files are used to package Electron apps).
const fsPromises = require('fs').promises
ipcMain.on("capture:buffer", async (video_object) => {
const {x_pos, y_pos, window_width, window_height, buffer} = video_object
const temp_dir = await fsPromises.mkdtemp(await fsPromises.realpath(os.tmpdir()) + path.sep)
const captured_video_file_path = path.join(dir, "screen_capture_video.mp4")
try {
await fsPromises.writeFile(captured_video_file_path, buffer)
}
catch (error) {console.error}
// note no callback req'd as per jfriends advice
let child_object =
execFile(`${ffmpeg.path}`,
['-i', `${captured_video_file_path}`, '-vf',
`crop=${window_width}:${window_height}:${x_pos}:${y_pos}`,
`${path.join(dir,'cropped_video.mp4')}`],
(error, stdout, stderr) => {
if (error) {
console.log(error.message)
}
if (stderr) {
console.log(stderr)
}
console.log("Cropped File created")
})
child_object.on("close", async
() => {
try { video_buffer = await fsPromises.readFile(path.join(dir, "cropped_video.mp4")
// To renderer
mainWindow.webContents.send("main:video_buffer", video_buffer)
} catch (error) {
log(error)
} finally {
process.noAsar = true
fs.rmdir(temp_directory, {recursive: true}, (error) => {if (error) {log(error)}})
console.log("Done !!!")
process.noASAR = false
}
})

How do I save a file to my nodejs server from web service call

My issue is this:
I have made a call to someones web service. I get back the file name, extension and the "bytes". Bytes actually come in as an array and at position 0 "Bytes[0]" is the following string:
JVBERi0xLjYKJeLjz9MKMSAwIG9iago8PC9EZWNvZGVQYXJtczw8L0sgLTEvQ29sdW1ucyAyNTUwL1Jvd3MgMzMwMD4+L1R5cGUvWE9iamVjdC9CaXRzUGVyQ29tcG9uZW50IDEvU3VidHlwZS9JbWFnZS9XaWR0aCAyNTUwL0NvbG9yU3BhY2UvRGV2aWNlR3JheS9GaWx0ZXIvQ0NJVFRGYXhEZWNvZGUvTGVuZ3RoIDI4Mzc0L0hlaWdodCAzMzAwPj5zdHJlYW0K////////y2IZ+M8+zOPM/HzLhzkT1NAjCCoEY0CMJNAjCR4c8HigRhBAi1iZ0eGth61tHhraTFbraRaYgQ8zMFyGyGM8ZQZDI8MjMI8M6enp9W6enp+sadIMIIEYwy/ggU0wwgwjWzSBUmwWOt/rY63fraTVNu6C7R7pN6+v///20v6I70vdBaPjptK8HUQfX9/17D/TMet+l06T//0v3/S9v+r98V0nH///7Ff+Ed3/v16X9XX/S/KP0vSb//W88ksdW18lzBEJVpPXT0k9b71///...
The string example above has been cut off for readability.
How do I take that string and save it as a readable file?
This case it's a pdf.
let pdfBytes = '{String shown above in example}'

You can use the Node.js File System Module to save the received buffer.
Assuming the encoding of your data is base64:
const fs = require('fs');
let pdfBytes = 'JVBERi0xLjYKJeLjz9...'
let writeStream = fs.createWriteStream('filename.pdf');
writeStream.write(pdfBytes, 'base64');
writeStream.on('finish', () => {
console.log('saved');
});
writeStream.end();

I am using the fs file system here to create and save the file. I use a lot of try catch in case anything goes wrong. This example shows how you could pass the data to a function that could then create the file for you.
const util = require('util');
const fs = require('fs');
const fsOpen = util.promisify(fs.open);
const fsWriteFile = util.promisify(fs.writeFile);
const fsClose = util.promisify(fs.close);
function saveNewFile(path, data) {
return new Promise((async (resolve, reject) => {
let fileToCreate;
// Open the file for writing
try {
fileToCreate = await fsOpen(path, 'wx');
} catch (err) {
reject('Could not create new file, it may already exist');
return;
}
// Write the new data to the file
try {
await fsWriteFile(fileToCreate, data);
} catch (err) {
reject('Error writing to new file');
return;
}
// Close the file
try {
await fsClose(fileToCreate);
} catch (err) {
reject('Error closing new file');
return;
}
resolve('File created');
}));
};
// Data we want to use to create the file.
let pdfBytes = 'JVBERi0xLjYKJeLj...'
saveNewFile('./filename.pdf', pdfBytes);

How to combine video upload chunks Node.js

I'm trying to upload a large (8.3GB) video to my Node.js (Express) server by chunking using busboy. How to I receive each chunk (busboy is doing this part) and piece it together as one whole video?
I have been looking into readable and writable streams but I'm not ever getting the whole video. I keep overwriting parts of it, resulting in about 1 GB.
Here's my code:
req.busboy.on('file', (fieldname, file, filename) => {
logger.info(`Upload of '${filename}' started`);
const video = fs.createReadStream(path.join(`${process.cwd()}/uploads`, filename));
const fstream = fs.createWriteStream(path.join(`${process.cwd()}/uploads`, filename));
if (video) {
video.pipe(fstream);
}
file.pipe(fstream);
fstream.on('close', () => {
logger.info(`Upload of '${filename}' finished`);
res.status(200).send(`Upload of '${filename}' finished`);
});
});

After 12+ hours, I got it figured out using pieces from this article that was given to me. I came up with this code:
//busboy is middleware on my index.js
const fs = require('fs-extra');
const streamToBuffer = require('fast-stream-to-buffer');
//API function called first
uploadVideoChunks(req, res) {
req.pipe(req.busboy);
req.busboy.on('file', (fieldname, file, filename, encoding, mimetype) => {
const fileNameBase = filename.replace(/\.[^/.]+$/, '');
//save all the chunks to a temp folder with .tmp extensions
streamToBuffer(file, function (error, buffer) {
const chunkDir = `${process.cwd()}/uploads/${fileNameBase}`;
fs.outputFileSync(path.join(chunkDir, `${Date.now()}-${fileNameBase}.tmp`), buffer);
});
});
req.busboy.on('finish', () => {
res.status(200).send(`Finshed uploading chunk`);
});
}
//API function called once all chunks are uploaded
saveToFile(req, res) {
const { filename, profileId, movieId } = req.body;
const uploadDir = `${process.cwd()}/uploads`;
const fileNameBase = filename.replace(/\.[^/.]+$/, '');
const chunkDir = `${uploadDir}/${fileNameBase}`;
let outputFile = fs.createWriteStream(path.join(uploadDir, filename));
fs.readdir(chunkDir, function(error, filenames) {
if (error) {
throw new Error('Cannot get upload chunks!');
}
//loop through the temp dir and write to the stream to create a new file
filenames.forEach(function(tempName) {
const data = fs.readFileSync(`${chunkDir}/${tempName}`);
outputFile.write(data);
//delete the chunk we just handled
fs.removeSync(`${chunkDir}/${tempName}`);
});
outputFile.end();
});
outputFile.on('finish', async function () {
//delete the temp folder once the file is written
fs.removeSync(chunkDir);
}
});
}

Use streams
multer allow you to easily handle file uploads as part of an express route. This works great for small files that don’t leave a significant memory footprint.
The problem with loading a large file into memory is that you can actually run out of memory and cause your application to crash.
use multipart/form-data request. This can be handled by assigning the readStream to that field instead in your request options
streams are extremely valuable for optimizing performance.

Try with this code sample, I think it will work for you.
busboy.on("file", function(fieldName, file, filename, encoding, mimetype){
const writeStream = fs.createWriteStream(writePath);
file.pipe(writeStream);
file.on("data", data => {
totalSize += data.length;
cb(totalSize);
});
file.on("end", () => {
console.log("File "+ fieldName +" finished");
});
});
You can refer this link also for resolve this problem
https://github.com/mscdex/busboy/issues/143

I think multer is good with this, did you try multer?

Node.js: How to read a stream into a buffer?

I wrote a pretty simple function that downloads an image from a given URL, resize it and upload to S3 (using 'gm' and 'knox'), I have no idea if I'm doing the reading of a stream to a buffer correctly. (everything is working, but is it the correct way?)
also, I want to understand something about the event loop, how do I know that one invocation of the function won't leak anything or change the 'buf' variable to another already running invocation (or this scenario is impossible because the callbacks are anonymous functions?)
var http = require('http');
var https = require('https');
var s3 = require('./s3');
var gm = require('gm');
module.exports.processImageUrl = function(imageUrl, filename, callback) {
var client = http;
if (imageUrl.substr(0, 5) == 'https') { client = https; }
client.get(imageUrl, function(res) {
if (res.statusCode != 200) {
return callback(new Error('HTTP Response code ' + res.statusCode));
}
gm(res)
.geometry(1024, 768, '>')
.stream('jpg', function(err, stdout, stderr) {
if (!err) {
var buf = new Buffer(0);
stdout.on('data', function(d) {
buf = Buffer.concat([buf, d]);
});
stdout.on('end', function() {
var headers = {
'Content-Length': buf.length
, 'Content-Type': 'Image/jpeg'
, 'x-amz-acl': 'public-read'
};
s3.putBuffer(buf, '/img/d/' + filename + '.jpg', headers, function(err, res) {
if(err) {
return callback(err);
} else {
return callback(null, res.client._httpMessage.url);
}
});
});
} else {
callback(err);
}
});
}).on('error', function(err) {
callback(err);
});
};

Overall I don't see anything that would break in your code.
Two suggestions:
The way you are combining Buffer objects is a suboptimal because it has to copy all the pre-existing data on every 'data' event. It would be better to put the chunks in an array and concat them all at the end.
var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
var buf = Buffer.concat(bufs);
})
For performance, I would look into if the S3 library you are using supports streams. Ideally you wouldn't need to create one large buffer at all, and instead just pass the stdout stream directly to the S3 library.
As for the second part of your question, that isn't possible. When a function is called, it is allocated its own private context, and everything defined inside of that will only be accessible from other items defined inside that function.
Update
Dumping the file to the filesystem would probably mean less memory usage per request, but file IO can be pretty slow so it might not be worth it. I'd say that you shouldn't optimize too much until you can profile and stress-test this function. If the garbage collector is doing its job you may be overoptimizing.
With all that said, there are better ways anyway, so don't use files. Since all you want is the length, you can calculate that without needing to append all of the buffers together, so then you don't need to allocate a new Buffer at all.
var pause_stream = require('pause-stream');
// Your other code.
var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
var contentLength = bufs.reduce(function(sum, buf){
return sum + buf.length;
}, 0);
// Create a stream that will emit your chunks when resumed.
var stream = pause_stream();
stream.pause();
while (bufs.length) stream.write(bufs.shift());
stream.end();
var headers = {
'Content-Length': contentLength,
// ...
};
s3.putStream(stream, ....);

Javascript snippet
function stream2buffer(stream) {
return new Promise((resolve, reject) => {
const _buf = [];
stream.on("data", (chunk) => _buf.push(chunk));
stream.on("end", () => resolve(Buffer.concat(_buf)));
stream.on("error", (err) => reject(err));
});
}
Typescript snippet
async function stream2buffer(stream: Stream): Promise<Buffer> {
return new Promise < Buffer > ((resolve, reject) => {
const _buf = Array < any > ();
stream.on("data", chunk => _buf.push(chunk));
stream.on("end", () => resolve(Buffer.concat(_buf)));
stream.on("error", err => reject(`error converting stream - ${err}`));
});
}

You can easily do this using node-fetch if you are pulling from http(s) URIs.
From the readme:
fetch('https://assets-cdn.github.com/images/modules/logos_page/Octocat.png')
.then(res => res.buffer())
.then(buffer => console.log)

Note: this solely answers "How to read a stream into a buffer?" and ignores the context of the original question.
ES2018 Answer
Since Node 11.14.0, readable streams support async iterators.
const buffers = [];
// node.js readable streams implement the async iterator protocol
for await (const data of readableStream) {
buffers.push(data);
}
const finalBuffer = Buffer.concat(buffers);
Bonus: In the future, this could get better with the stage 2 Array.fromAsync proposal.
// 🛑 DOES NOT WORK (yet!)
const finalBuffer = Buffer.concat(await Array.fromAsync(readableStream));

You can convert your readable stream to a buffer and integrate it in your code in an asynchronous way like this.
async streamToBuffer (stream) {
return new Promise((resolve, reject) => {
const data = [];
stream.on('data', (chunk) => {
data.push(chunk);
});
stream.on('end', () => {
resolve(Buffer.concat(data))
})
stream.on('error', (err) => {
reject(err)
})
})
}
the usage would be as simple as:
// usage
const myStream // your stream
const buffer = await streamToBuffer(myStream) // this is a buffer

I suggest loganfsmyths method, using an array to hold the data.
var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
var buf = Buffer.concat(bufs);
}
IN my current working example, i am working with GRIDfs and npm's Jimp.
var bucket = new GridFSBucket(getDBReference(), { bucketName: 'images' } );
var dwnldStream = bucket.openDownloadStream(info[0]._id);// original size
dwnldStream.on('data', function(chunk) {
data.push(chunk);
});
dwnldStream.on('end', function() {
var buff =Buffer.concat(data);
console.log("buffer: ", buff);
jimp.read(buff)
.then(image => {
console.log("read the image!");
IMAGE_SIZES.forEach( (size)=>{
resize(image,size);
});
});
I did some other research
with a string method but that did not work, per haps because i was reading from an image file, but the array method did work.
const DISCLAIMER = "DONT DO THIS";
var data = "";
stdout.on('data', function(d){
bufs+=d;
});
stdout.on('end', function(){
var buf = Buffer.from(bufs);
//// do work with the buffer here
});
When i did the string method i got this error from npm jimp
buffer: <Buffer 00 00 00 00 00>
{ Error: Could not find MIME for Buffer <null>
basically i think the type coersion from binary to string didnt work so well.

I suggest to have array of buffers and concat to resulting buffer only once at the end. Its easy to do manually, or one could use node-buffers

I just want to post my solution. Previous answers was pretty helpful for my research. I use length-stream to get the size of the stream, but the problem here is that the callback is fired near the end of the stream, so i also use stream-cache to cache the stream and pipe it to res object once i know the content-length. In case on an error,
var StreamCache = require('stream-cache');
var lengthStream = require('length-stream');
var _streamFile = function(res , stream , cb){
var cache = new StreamCache();
var lstream = lengthStream(function(length) {
res.header("Content-Length", length);
cache.pipe(res);
});
stream.on('error', function(err){
return cb(err);
});
stream.on('end', function(){
return cb(null , true);
});
return stream.pipe(lstream).pipe(cache);
}

in ts, [].push(bufferPart) is not compatible;
so:
getBufferFromStream(stream: Part | null): Promise<Buffer> {
if (!stream) {
throw 'FILE_STREAM_EMPTY';
}
return new Promise(
(r, j) => {
let buffer = Buffer.from([]);
stream.on('data', buf => {
buffer = Buffer.concat([buffer, buf]);
});
stream.on('end', () => r(buffer));
stream.on('error', j);
}
);
}

You can do this by:
async function toBuffer(stream: ReadableStream<Uint8Array>) {
const list = []
const reader = stream.getReader()
while (true) {
const { value, done } = await reader.read()
if (value)
list.push(value)
if (done)
break
}
return Buffer.concat(list)
}
or using buffer consumer
const buf = buffer(stream)

You can check the "content-length" header at res.headers. It will give you the length of the content you will receive (how many bytes of data it will send)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Stream a zip file in nodejs - node.js

Use createBlockBlobFromLocalFile directly: blobService.createBlockBlobFromLocalFile(containerName, fileName, sourceFilePath, (err) => { // Handle err });

Related

Returning value from "finish" event after piping data from a read stream to a write stream

How to readFile() with async / execFile() Node js

How do I save a file to my nodejs server from web service call

How to combine video upload chunks Node.js

Node.js: How to read a stream into a buffer?

Categories

Resources