Only reading additional content in a file

Only reading additional content in a file - node.js

I'm working on a project which requires reading a log file, The log file can get massive, so I only want to read additional data added to the file and not the old data to avoid potential performance issues.
I tried fs.createReadStream but it didn't seem to work:
code:
let stream = fs.createReadStream(`/path/`, { autoClose: true, encoding: "utf-8" });
stream.on('ready', () => {
console.log("Loaded log file!");
})
stream.on("data", chunk => {
// Data stream
});
stream.on("end", () => {
stream.pause();
console.log("ended");
// Going thru the data
setTimeout(() => {
stream.resume();
stream.read();
}, 10000);
});
With that code, the "end" event only triggers once, although the stream is set to not close automatically, and no additional data is going thru.

After messing a bit with the readStream and thanks to #KyleRifqi I was able to come up with this solution, storing the last amount of bytes and updating it everytime the stream ends.
let lastBytes = 0;
let text = "";
let stream = fs.createReadStream(`/path/`, { autoClose: true, encoding: "utf-8", start: lastBytes });
stream.on('ready', () => {
console.log("Loaded log file!");
});
stream.on("data", chunk => {
text += chunk;
});
stream.on("end", () => {
let read = Buffer.byteLength(text, "utf-8");
lastBytes += read;
//Code here
});

Related

Order of data piped/pumped through nodejs streams

I'm trying to write a nodejs code that read (audio) files and stream the content to a remote service (dialogflow). I'm having trouble ensuring the order of the chunks sent to the stream.
Most of the time everything seems to be in the right order, but once in a while, the data seems to be sent in an out-of-order fashion.
Pseudo code:
for (var i = 0; i < numFiles; ++i) {
await sendData(fs.createReadStream(filenames[i]), i);
}
...
async function sendData(inputDataStream, chunkIndex) {
await inputDataStream
.pipe(new Transform({
objectMode: true,
transform: (obj, _, next) => {
console.log('Sending chunk ' + chunkIndex);
next(null, <some data>);
}
}), {end: false})
.pipe(outputStream, {end: false});
}
I can see that 'Sending chunk ...' is printed out of order sometimes.
Q: is there a way to avoid this problem?
Another issue is that, while, most of the time, each chunk is sent contiguously, occasionally, some chunks will be split and sent in smaller sub-chunks (even though each file is not large).
[I repeated this experiment many times on the same set of files]
Q: Is there a way I can control the chunk size? (what did I do wrong here?)
Q: Is this because the remote service cannot handle the rate of transmission? If so, how should I properly react to that?
[I have also tried using pump(), but still observed the same behavior]
Thanks in advance.

For Dialogflow, I have used the following pump method, and it is working fine.
await pump(
fs.createReadStream(filename),
new Transform({
objectMode: true,
transform: (obj, _, next) => {
next(null, {inputAudio: obj});
},
}),
detectStream
);
}
Ref: link
I didn't face any issue with pump as of now.
Also, I have come around one more use case, In which a WebSocket connection is used to receive audio from a streaming endpoint and then use that audio for intent detection. (I have used this one with both Dialogflow ES and CX).
example for es:
function getDialogflowStream() {
let sessionClient = new dialogflow.SessionsClient();
let sessionPath = sessionClient.projectAgentSessionPath(
projectId,
sessionID,
);
// First Request
let initialStreamRequest = {
session: sessionPath,
queryInput: {
audioConfig: {
audioEncoding: encoding,
sampleRateHertz: sampleRateHertz,
languageCode: languageCode,
},
singleUtterance: true,
},
};
const detectStream = sessionClient
.streamingDetectIntent()
.on('error', error => {
console.error(error);
writeFlag = false;
detectStream.end();
})
.on('data', data => {
if (data.recognitionResult) {
console.log(
`Intermediate transcript: ${data.recognitionResult.transcript}`
);
} else {
console.log(
`Query results: ${data.queryResult}`
);
}
});
// Write the initial stream request to config for audio input.
detectStream.write(initialStreamRequest);
return detectStream;
}
const wss = new WebSocket.Server({
port,
handleProtocols: (protocols, req) => {
return 'dialogflow.stream';
}
});
wss.on('connection', (ws, req) => {
console.log(`received connection from ${req.connection.remoteAddress}`);
let dialogflowStreamer = getDialogflowStream();
ws.on('message', (message) => {
if (typeof message === 'string') {
console.log(`received message: ${message}`);
console.log(`UUID: ${calluuid}`);
} else if (message instanceof Buffer) {
// Transform message and write to detect
dialogflowStreamer.write({ inputAudio: message });
}
});
ws.on('close', (code, reason) => {
console.log(`socket closed ${code}:${reason}`);
dialogflowStreamer.end();
sessionID = uuid.v4();
});
});
One more thing make sure your sample rate and encoding in input configuration are the same as the audio file because I have faced issues when it's different.

Nodejs: What's the proper way to pipe to a buffer / string [duplicate]

I'm hacking on a Node program that uses smtp-protocol to capture SMTP emails and act on the mail data. The library provides the mail data as a stream, and I don't know how to get that into a string.
I'm currently writing it to stdout with stream.pipe(process.stdout, { end: false }), but as I said, I need the stream data in a string instead, which I can use once the stream has ended.
How do I collect all the data from a Node.js stream into a string?

Another way would be to convert the stream to a promise (refer to the example below) and use then (or await) to assign the resolved value to a variable.
function streamToString (stream) {
const chunks = [];
return new Promise((resolve, reject) => {
stream.on('data', (chunk) => chunks.push(Buffer.from(chunk)));
stream.on('error', (err) => reject(err));
stream.on('end', () => resolve(Buffer.concat(chunks).toString('utf8')));
})
}
const result = await streamToString(stream)

What do you think about this ?
async function streamToString(stream) {
// lets have a ReadableStream as a stream variable
const chunks = [];
for await (const chunk of stream) {
chunks.push(Buffer.from(chunk));
}
return Buffer.concat(chunks).toString("utf-8");
}

None of the above worked for me. I needed to use the Buffer object:
const chunks = [];
readStream.on("data", function (chunk) {
chunks.push(chunk);
});
// Send the buffer or you can put it into a var
readStream.on("end", function () {
res.send(Buffer.concat(chunks));
});

Hope this is more useful than the above answer:
var string = '';
stream.on('data',function(data){
string += data.toString();
console.log('stream data ' + part);
});
stream.on('end',function(){
console.log('final output ' + string);
});
Note that string concatenation is not the most efficient way to collect the string parts, but it is used for simplicity (and perhaps your code does not care about efficiency).
Also, this code may produce unpredictable failures for non-ASCII text (it assumes that every character fits in a byte), but perhaps you do not care about that, either.

(This answer is from years ago, when it was the best answer. There is now a better answer below this. I haven't kept up with node.js, and I cannot delete this answer because it is marked "correct on this question". If you are thinking of down clicking, what do you want me to do?)
The key is to use the data and end events of a Readable Stream. Listen to these events:
stream.on('data', (chunk) => { ... });
stream.on('end', () => { ... });
When you receive the data event, add the new chunk of data to a Buffer created to collect the data.
When you receive the end event, convert the completed Buffer into a string, if necessary. Then do what you need to do with it.

I'm using usually this simple function to transform a stream into a string:
function streamToString(stream, cb) {
const chunks = [];
stream.on('data', (chunk) => {
chunks.push(chunk.toString());
});
stream.on('end', () => {
cb(chunks.join(''));
});
}
Usage example:
let stream = fs.createReadStream('./myFile.foo');
streamToString(stream, (data) => {
console.log(data); // data is now my string variable
});

And yet another one for strings using promises:
function getStream(stream) {
return new Promise(resolve => {
const chunks = [];
# Buffer.from is required if chunk is a String, see comments
stream.on("data", chunk => chunks.push(Buffer.from(chunk)));
stream.on("end", () => resolve(Buffer.concat(chunks).toString()));
});
}
Usage:
const stream = fs.createReadStream(__filename);
getStream(stream).then(r=>console.log(r));
remove the .toString() to use with binary Data if required.
update: #AndreiLED correctly pointed out this has problems with strings. I couldn't get a stream returning strings with the version of node I have, but the api notes this is possible.

From the nodejs documentation you should do this - always remember a string without knowing the encoding is just a bunch of bytes:
var readable = getReadableStreamSomehow();
readable.setEncoding('utf8');
readable.on('data', function(chunk) {
assert.equal(typeof chunk, 'string');
console.log('got %d characters of string data', chunk.length);
})

Easy way with the popular (over 5m weekly downloads) and lightweight get-stream library:
https://www.npmjs.com/package/get-stream
const fs = require('fs');
const getStream = require('get-stream');
(async () => {
const stream = fs.createReadStream('unicorn.txt');
console.log(await getStream(stream)); //output is string
})();

Streams don't have a simple .toString() function (which I understand) nor something like a .toStringAsync(cb) function (which I don't understand).
So I created my own helper function:
var streamToString = function(stream, callback) {
var str = '';
stream.on('data', function(chunk) {
str += chunk;
});
stream.on('end', function() {
callback(str);
});
}
// how to use:
streamToString(myStream, function(myStr) {
console.log(myStr);
});

I had more luck using like that :
let string = '';
readstream
.on('data', (buf) => string += buf.toString())
.on('end', () => console.log(string));
I use node v9.11.1 and the readstream is the response from a http.get callback.

The cleanest solution may be to use the "string-stream" package, which converts a stream to a string with a promise.
const streamString = require('stream-string')
streamString(myStream).then(string_variable => {
// myStream was converted to a string, and that string is stored in string_variable
console.log(string_variable)
}).catch(err => {
// myStream emitted an error event (err), so the promise from stream-string was rejected
throw err
})

What about something like a stream reducer ?
Here is an example using ES6 classes how to use one.
var stream = require('stream')
class StreamReducer extends stream.Writable {
constructor(chunkReducer, initialvalue, cb) {
super();
this.reducer = chunkReducer;
this.accumulator = initialvalue;
this.cb = cb;
}
_write(chunk, enc, next) {
this.accumulator = this.reducer(this.accumulator, chunk);
next();
}
end() {
this.cb(null, this.accumulator)
}
}
// just a test stream
class EmitterStream extends stream.Readable {
constructor(chunks) {
super();
this.chunks = chunks;
}
_read() {
this.chunks.forEach(function (chunk) {
this.push(chunk);
}.bind(this));
this.push(null);
}
}
// just transform the strings into buffer as we would get from fs stream or http request stream
(new EmitterStream(
["hello ", "world !"]
.map(function(str) {
return Buffer.from(str, 'utf8');
})
)).pipe(new StreamReducer(
function (acc, v) {
acc.push(v);
return acc;
},
[],
function(err, chunks) {
console.log(Buffer.concat(chunks).toString('utf8'));
})
);

All the answers listed appear to open the Readable Stream in flowing mode which is not the default in NodeJS and can have limitations since it lacks backpressure support that NodeJS provides in Paused Readable Stream Mode.
Here is an implementation using Just Buffers, Native Stream and Native Stream Transforms and support for Object Mode
import {Transform} from 'stream';
let buffer =null;
function objectifyStream() {
return new Transform({
objectMode: true,
transform: function(chunk, encoding, next) {
if (!buffer) {
buffer = Buffer.from([...chunk]);
} else {
buffer = Buffer.from([...buffer, ...chunk]);
}
next(null, buffer);
}
});
}
process.stdin.pipe(objectifyStream()).process.stdout

This worked for me and is based on Node v6.7.0 docs:
let output = '';
stream.on('readable', function() {
let read = stream.read();
if (read !== null) {
// New stream data is available
output += read.toString();
} else {
// Stream is now finished when read is null.
// You can callback here e.g.:
callback(null, output);
}
});
stream.on('error', function(err) {
callback(err, null);
})

Using the quite popular stream-buffers package which you probably already have in your project dependencies, this is pretty straightforward:
// imports
const { WritableStreamBuffer } = require('stream-buffers');
const { promisify } = require('util');
const { createReadStream } = require('fs');
const pipeline = promisify(require('stream').pipeline);
// sample stream
let stream = createReadStream('/etc/hosts');
// pipeline the stream into a buffer, and print the contents when done
let buf = new WritableStreamBuffer();
pipeline(stream, buf).then(() => console.log(buf.getContents().toString()));

setEncoding('utf8');
Well done Sebastian J above.
I had the "buffer problem" with a few lines of test code I had, and added the encoding information and it solved it, see below.
Demonstrate the problem
software
// process.stdin.setEncoding('utf8');
process.stdin.on('data', (data) => {
console.log(typeof(data), data);
});
input
hello world
output
object <Buffer 68 65 6c 6c 6f 20 77 6f 72 6c 64 0d 0a>
Demonstrate the solution
software
process.stdin.setEncoding('utf8'); // <- Activate!
process.stdin.on('data', (data) => {
console.log(typeof(data), data);
});
input
hello world
output
string hello world

In my case, the content type response headers was Content-Type: text/plain. So, I've read the data from Buffer like:
let data = [];
stream.on('data', (chunk) => {
console.log(Buffer.from(chunk).toString())
data.push(Buffer.from(chunk).toString())
});

Can I get the Buffer from fs.createWriteStream()?

I have simple question - Can I get the Buffer from fs.createWriteStream()?
In my case I use archiver NPM module which may create archives from Buffer. When this module end work with method archive.finalize() I may get result from archive.pipe(stream). I want have Buffer from fs.createWriteStream() which I may send to res.send() Express method.
let archive = archiver("zip", {
zlib: { level: 9 } // Sets the compression level.
});
const stream = fs.createWriteStream(
"./src/backend/api/test.zip"
);
archive.pipe(stream);
archive.append(buffer, { name: "test.csv" });
archive.finalize();

If I understood your problem correctly, that might help you
const stream = fs.createWriteStream(
"./src/backend/api/test.zip"
);
stream.on('data', (chunk) => {
//here u get every chunk as a buffer
})
stream.on('finish', () => {
//here your code if the stream is ending
})

Nodejs: How to send a readable stream to the browser

If I query the box REST API and get back a readable stream, what is the best way to handle it? How do you send it to the browser?? (DISCLAIMER: I'm new to streams and buffers, so some of this code is pretty theoretical)
Can you pass the readStream in the response and let the browser handle it? Or do you have to stream the chunks into a buffer and then send the buffer??
export function getFileStream(req, res) {
const fileId = req.params.fileId;
console.log('fileId', fileId);
req.sdk.files.getReadStream(fileId, null, (err, stream) => {
if (err) {
console.log('error', err);
return res.status(500).send(err);
}
res.type('application/octet-stream');
console.log('stream', stream);
return res.status(200).send(stream);
});
}
Will ^^ work, or do you need to do something like:
export function downloadFile(req, res) {
const fileId = req.params.fileId;
console.log('fileId', fileId);
req.sdk.files.getReadStream(fileId, null, (err, stream) => {
if (err) {
console.log('error', err);
return res.status(500).send(err);
}
const buffers = [];
const document = new Buffer();
console.log('stream', stream);
stream.on('data', (chunk) => {
buffers.push(buffer);
})
.on('end', function(){
const finalBuffer = Buffer.concat(buffers);
return res.status(200).send(finalBuffer);
});
});
}

The first example would work if you changed you theoretical line to:
- return res.status(200).send(stream);
+ res.writeHead(200, {header: here})
+ stream.pipe(res);
That's the nicest thing about node stream. The other case would (in essence) work too, but it would accumulate lots of unnecessary memory.
If you'd like to check a working example, here's one I wrote based on scramjet, express and browserify:
https://github.com/MichalCz/scramjet/blob/master/samples/browser/browser.js
Where your streams go from the server to the browser. With minor mods it'll fit your problem.

Node.js: How to read a stream into a buffer?

I wrote a pretty simple function that downloads an image from a given URL, resize it and upload to S3 (using 'gm' and 'knox'), I have no idea if I'm doing the reading of a stream to a buffer correctly. (everything is working, but is it the correct way?)
also, I want to understand something about the event loop, how do I know that one invocation of the function won't leak anything or change the 'buf' variable to another already running invocation (or this scenario is impossible because the callbacks are anonymous functions?)
var http = require('http');
var https = require('https');
var s3 = require('./s3');
var gm = require('gm');
module.exports.processImageUrl = function(imageUrl, filename, callback) {
var client = http;
if (imageUrl.substr(0, 5) == 'https') { client = https; }
client.get(imageUrl, function(res) {
if (res.statusCode != 200) {
return callback(new Error('HTTP Response code ' + res.statusCode));
}
gm(res)
.geometry(1024, 768, '>')
.stream('jpg', function(err, stdout, stderr) {
if (!err) {
var buf = new Buffer(0);
stdout.on('data', function(d) {
buf = Buffer.concat([buf, d]);
});
stdout.on('end', function() {
var headers = {
'Content-Length': buf.length
, 'Content-Type': 'Image/jpeg'
, 'x-amz-acl': 'public-read'
};
s3.putBuffer(buf, '/img/d/' + filename + '.jpg', headers, function(err, res) {
if(err) {
return callback(err);
} else {
return callback(null, res.client._httpMessage.url);
}
});
});
} else {
callback(err);
}
});
}).on('error', function(err) {
callback(err);
});
};

Overall I don't see anything that would break in your code.
Two suggestions:
The way you are combining Buffer objects is a suboptimal because it has to copy all the pre-existing data on every 'data' event. It would be better to put the chunks in an array and concat them all at the end.
var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
var buf = Buffer.concat(bufs);
})
For performance, I would look into if the S3 library you are using supports streams. Ideally you wouldn't need to create one large buffer at all, and instead just pass the stdout stream directly to the S3 library.
As for the second part of your question, that isn't possible. When a function is called, it is allocated its own private context, and everything defined inside of that will only be accessible from other items defined inside that function.
Update
Dumping the file to the filesystem would probably mean less memory usage per request, but file IO can be pretty slow so it might not be worth it. I'd say that you shouldn't optimize too much until you can profile and stress-test this function. If the garbage collector is doing its job you may be overoptimizing.
With all that said, there are better ways anyway, so don't use files. Since all you want is the length, you can calculate that without needing to append all of the buffers together, so then you don't need to allocate a new Buffer at all.
var pause_stream = require('pause-stream');
// Your other code.
var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
var contentLength = bufs.reduce(function(sum, buf){
return sum + buf.length;
}, 0);
// Create a stream that will emit your chunks when resumed.
var stream = pause_stream();
stream.pause();
while (bufs.length) stream.write(bufs.shift());
stream.end();
var headers = {
'Content-Length': contentLength,
// ...
};
s3.putStream(stream, ....);

Javascript snippet
function stream2buffer(stream) {
return new Promise((resolve, reject) => {
const _buf = [];
stream.on("data", (chunk) => _buf.push(chunk));
stream.on("end", () => resolve(Buffer.concat(_buf)));
stream.on("error", (err) => reject(err));
});
}
Typescript snippet
async function stream2buffer(stream: Stream): Promise<Buffer> {
return new Promise < Buffer > ((resolve, reject) => {
const _buf = Array < any > ();
stream.on("data", chunk => _buf.push(chunk));
stream.on("end", () => resolve(Buffer.concat(_buf)));
stream.on("error", err => reject(`error converting stream - ${err}`));
});
}

You can easily do this using node-fetch if you are pulling from http(s) URIs.
From the readme:
fetch('https://assets-cdn.github.com/images/modules/logos_page/Octocat.png')
.then(res => res.buffer())
.then(buffer => console.log)

Note: this solely answers "How to read a stream into a buffer?" and ignores the context of the original question.
ES2018 Answer
Since Node 11.14.0, readable streams support async iterators.
const buffers = [];
// node.js readable streams implement the async iterator protocol
for await (const data of readableStream) {
buffers.push(data);
}
const finalBuffer = Buffer.concat(buffers);
Bonus: In the future, this could get better with the stage 2 Array.fromAsync proposal.
// 🛑 DOES NOT WORK (yet!)
const finalBuffer = Buffer.concat(await Array.fromAsync(readableStream));

You can convert your readable stream to a buffer and integrate it in your code in an asynchronous way like this.
async streamToBuffer (stream) {
return new Promise((resolve, reject) => {
const data = [];
stream.on('data', (chunk) => {
data.push(chunk);
});
stream.on('end', () => {
resolve(Buffer.concat(data))
})
stream.on('error', (err) => {
reject(err)
})
})
}
the usage would be as simple as:
// usage
const myStream // your stream
const buffer = await streamToBuffer(myStream) // this is a buffer

I suggest loganfsmyths method, using an array to hold the data.
var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
var buf = Buffer.concat(bufs);
}
IN my current working example, i am working with GRIDfs and npm's Jimp.
var bucket = new GridFSBucket(getDBReference(), { bucketName: 'images' } );
var dwnldStream = bucket.openDownloadStream(info[0]._id);// original size
dwnldStream.on('data', function(chunk) {
data.push(chunk);
});
dwnldStream.on('end', function() {
var buff =Buffer.concat(data);
console.log("buffer: ", buff);
jimp.read(buff)
.then(image => {
console.log("read the image!");
IMAGE_SIZES.forEach( (size)=>{
resize(image,size);
});
});
I did some other research
with a string method but that did not work, per haps because i was reading from an image file, but the array method did work.
const DISCLAIMER = "DONT DO THIS";
var data = "";
stdout.on('data', function(d){
bufs+=d;
});
stdout.on('end', function(){
var buf = Buffer.from(bufs);
//// do work with the buffer here
});
When i did the string method i got this error from npm jimp
buffer: <Buffer 00 00 00 00 00>
{ Error: Could not find MIME for Buffer <null>
basically i think the type coersion from binary to string didnt work so well.

I suggest to have array of buffers and concat to resulting buffer only once at the end. Its easy to do manually, or one could use node-buffers

I just want to post my solution. Previous answers was pretty helpful for my research. I use length-stream to get the size of the stream, but the problem here is that the callback is fired near the end of the stream, so i also use stream-cache to cache the stream and pipe it to res object once i know the content-length. In case on an error,
var StreamCache = require('stream-cache');
var lengthStream = require('length-stream');
var _streamFile = function(res , stream , cb){
var cache = new StreamCache();
var lstream = lengthStream(function(length) {
res.header("Content-Length", length);
cache.pipe(res);
});
stream.on('error', function(err){
return cb(err);
});
stream.on('end', function(){
return cb(null , true);
});
return stream.pipe(lstream).pipe(cache);
}

in ts, [].push(bufferPart) is not compatible;
so:
getBufferFromStream(stream: Part | null): Promise<Buffer> {
if (!stream) {
throw 'FILE_STREAM_EMPTY';
}
return new Promise(
(r, j) => {
let buffer = Buffer.from([]);
stream.on('data', buf => {
buffer = Buffer.concat([buffer, buf]);
});
stream.on('end', () => r(buffer));
stream.on('error', j);
}
);
}

You can do this by:
async function toBuffer(stream: ReadableStream<Uint8Array>) {
const list = []
const reader = stream.getReader()
while (true) {
const { value, done } = await reader.read()
if (value)
list.push(value)
if (done)
break
}
return Buffer.concat(list)
}
or using buffer consumer
const buf = buffer(stream)

You can check the "content-length" header at res.headers. It will give you the length of the content you will receive (how many bytes of data it will send)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Only reading additional content in a file - node.js

Related

Order of data piped/pumped through nodejs streams

Nodejs: What's the proper way to pipe to a buffer / string [duplicate]

Can I get the Buffer from fs.createWriteStream()?

Nodejs: How to send a readable stream to the browser

Node.js: How to read a stream into a buffer?

Categories

Resources