Reading only modified data - node.js

In my project, i am using fs.watchFile for listening to the modification of the text file.
Requirement
Read only the last updated data
Note In text file data is always added, no deletion.
Sample code
fs.watchFile(config.filePath, function (curr, prev) {
fs.readFile(config.filePath, function (err, data) {
if (err) throw err;
console.log(data);
});
});
Above code reads whole text file when the file is modified.
Any suggestion will be greatful.
Working Code
fs.watchFile(config.filePath, function (curr, prev) {
var filestream = fs.createReadStream(config.filePath,{start:prev.size,end:curr.size,encoding:"utf-8");
filestream.on('data', function (data) {
console.log(data);
});
});

You can work with the stat object of the file. The curr and the prev object are both Stats objects and have an attribute called "size".
I assume you are always adding data to the beginning or the end of the file, otherwise there is no way in knowing where the data was added.
The difference between prev.size and curr.size tells you how many bytes were added. By give the readFile-Function an options object with a start and an end attribute.
For example: You always add to the end, then you can make such a call:
fs.readFile(config.filePath, {start: prev.size}, function ...);
If you add in the beginning:
fs.readFile(config.filePath, {start: 0, end: (curr.size-prev.size)}, function ...);
Hope that helps!

Related

convert mongoose stream to array

I have worked with mongodb but quite new to mongoose ORM. I was trying to fetch data from a collection and the explain() output was showing 50ms. the overall time it was taking to fetch the data via mongoose was 9 seconds. Here is the query:
Node.find({'dataset': datasetRef}, function (err, nodes){
// handle error and data here
});
Then I applied index on the field I was querying on. The explain() output now showed 4ms. But the total time to retrieve data via mongoose did not change. Then i searched a bit and found that using lean() can help bring the performance of read queries in mongoose quite close to native mongodb
So I changed my query to:
Node.find({'dataset': datasetRef})
.lean()
.stream({transform: JSON.stringify})
.pipe(res)
This solved the performance issues completely. But the end result is a stream of JSON docs like this:
{var11: val11, var12: val12}{var21: val21, var22: val22} ...
How do I parse this to form an array of docs ? Or should I not be using stream at all ? In my opinion, there is no point using a stream if I am planning to form the array at backend, since I will then have to wait for all the docs to be read into memory. But I also think that parsing and creating the whole array at front end might be costly.
How can I achieve best performance in this case without clogging the network as well ?
UPDATE
I am trying to solve this problem using a through stream. However, I am not able to insert commas in between the JSON objects yet. See the code below:
res.write("[");
var through = require('through');
var tr = through(
function write(data){
this.queue(data.replace(/\}\{/g,"},{"));
}
);
var dbStream = db.node.find({'dataset': dataSetRef})
.lean()
.stream({'transform': JSON.stringify});
dbStream.on("end", function(){
res.write("]");
});
dbStream
.pipe(tr)
.pipe(res);
With this, I am able to get the "[" in the beginning and "]" at the end. However, still not able to get patten "}{" replaced with "},{". Not sure what am I doing wrong
UPDATE 2
Now figured out why the replace is not working. It appears that since I have specified the transform function as JSON.stringify, it reads one JSON object at a time and hence never encounter the pattern }{ since it never picks multiple JSON elements at a time.
Now I have modified my code, and written a custom transform function which does JSON.stringify and then appends a comma at the end. The only problem I am facing here is that I don't know when it is the last JSON object in the stream. Because I don't wanna append the comma in that case. At the moment, I append an empty JSON object once the end is encountered. But somehow this does not look like a convincing idea. Here is the code:
res.write("[");
function transform(data){
return JSON.stringify(data) + ",";
}
var dbStream = db.node.find({'dataset': dataSetRef})
.lean()
.stream({'transform': transform});
dbStream.on("end", function(){
res.write("{}]");
});
dbStream
.pipe(res);
The only problem I am facing here is that I don't know when it is the last JSON object in the stream.
But you do know which one is first. Knowing that, instead of appending the comma, you can prepend it to every object except the first one. In order to do that, set up your transform function inside a closure:
function transformFn(){
var first = true;
return function(data) {
if (first) {
first = false;
return JSON.stringify(data);
}
return "," + JSON.stringify(data);
}
}
Now you can just call that function and set it as your actual transform.
var transform = transformFn();
res.write("[");
var dbStream = db.node.find({'dataset': dataSetRef})
.lean()
.stream({'transform': transform});
dbStream.on("end", function(){
res.write("]");
});
dbStream
.pipe(res);
#cbajorin and #rckd both gave correct answers.
However, repeating this code all the time seems like a pain.
Hence my solution uses an extra Transform stream to achieve the same thing.
import { Transform } from 'stream'
class ArrayTransform extends Transform {
constructor(options) {
super(options)
this._index = 0
}
_transform(data, encoding, done) {
if (!(this._index++)) {
// first element, add opening bracket
this.push('[')
} else {
// following element, prepend comma
this.push(',')
}
this.push(data)
done()
}
_flush(done) {
if (!(this._index++)) {
// empty
this.push('[]')
} else {
// append closing bracket
this.push(']')
}
done()
}
}
Which in turn can be used as:
const toArray = new ArrayTransform();
Model.find(query).lean().stream({transform: JSON.stringify })
.pipe(toArray)
.pipe(res)
EDIT: added check for empty
I love #cdbajorin's solution, so i created a more readable version of it (ES6):
Products
.find({})
.lean()
.stream({
transform: () => {
let index = 0;
return (data) => {
return (!(index++) ? '[' : ',') + JSON.stringify(data);
};
}() // invoke
})
.on('end', () => {
res.write(']');
})
.pipe(res);
var mongoose = require('mongoose');
mongoose.connect('mongodb://localhost/shoppingdb');
var Sports = mongoose.model('sports', {});
var result = [];
var prefix_out = "your info";
Sports.find({"goods_category": "parts"}).
cursor().
on("data", function(doc){
//stream ---> string
var str = JSON.stringify(doc)
//sring ---> JSON
var json = JSON.parse(str);
//handle Your Property
json.handleYourProperty = prefix_out + json.imageURL;
result.push(result);
}).
on('error', function(err){
console.log(err);
}).
on('close', function(){
console.log(result);
});

Nodejs saving file linebreak dosent work on initial run

I am trying to generate a list of links inside <fileName.txt> file like this:
fs.writeFile(fileName, linksRemaining, function(err){
But if the file already exists I want to continue adding links without over-writing old ones. So I simply check if it exists store the data in a variable and add the additional content after a line break. like this:
fs.exists(doneFileName, function(exists) {
if (exists) {
fs.readFile(doneFileName, 'utf8', function (err, data) {
if(!err){
linksCurrentDoneList = data;
linksCurrentDoneList = linksCurrentDoneList+'\n'+linkTarget;
callback(1);
}else{
return console.log("Error: "+err);
}
});
....
The above code is in a loop and puts links several time, Issues is that on first run of my loop it negates line break '\n' but on 2nd and so on loops it works...
Suppose I am running loop for three links at a time, the result will be like this in notepad:
http://www.link1.com/
http://www.link2.com/
http://www.link3.com/http://www.link4.com/
http://www.link5.com/
http://www.link6.com/http://www.link7.com/
http://www.link8.com/
http://www.link9.com/http://www.link10.com/
http://www.link11.com/
.....
What I am trying to achieve is quite obvious... a line break for each link -- I am completely out of clue why is this happening,
In frustration I tried the following:
linksCurrentDoneList = '\n'+linksCurrentDoneList+'\n'+linkTarget+'\n';
But didn't help, in fact the line break was still just 1(same as the example above). Any one have any clue what might be going on?
EDIT:
Every link is on a separate line if I open my .txt file in another software like ms-world !!.
Seems like you want to use append function to append data to a file.
var fs =require('fs');
fs.appendFile('test.txt', "This is a test.\n", {flag: 'a'}, function(err) {
if (err) throw err;
console.log('The data was appended to file!');
});
Note that if test.txt doesn't exist, appendFile() will create. Here is he output running 2 times:
This is a test.
This is a test.

Mongoose schema does not save inside async.forEach loop

First post here, so apologies if I get things wrong...
I'm creating some standalone code to read a folder structure and return all .mp3 files in an Array. Once this has been returned I then loop through the array and for each item I create a Mongoose Object and populate the fields, before saving the object with .save()
I am looping through the array using async.forEach - and while it does loop through all items in the Array they do not save, and there is no error produced to help me identify what is wrong.
If I move the logic of the loop elsewhere then the MP3s are stored in the mongodb database - if I have the example shown nothing is saved.
var saveMP3s = function(MP3Files, callback) {
console.log('start loop -> saving MP3s');
async.forEach( MP3Files, function(mp3file, callback) {
newTrack = new MP3Track();
newTrack.title = mp3file.title;
newTrack.track = mp3file.track;
newTrack.disk = mp3file.disk;
newTrack.metadata = mp3file.metadata;
newTrack.path = mp3file.path;
console.log('....:> Song Start: ');
console.log(newTrack.title);
console.log(newTrack.track);
console.log(newTrack.disk);
console.log(newTrack.metadata);
console.log(newTrack.path);
console.log('....:> Song End: ');
newTrack.save(function (err) {
if (err) {
console.log(err);
} else {
console.log('saving Track: '+ newTrack.title);
callback();
}
});
}, function(err) {
if (err) { console.log(err); }
});
console.log('end loop -> finished saving MP3s');
};
The trouble I have is that when the code is NOT in the async loop the code works and the MP3 is saved in the MongoDB database, inside the async code nothing is saved and no errors are given as to why.
I did try (in an earlier incarnation of the code) create the objects once I had read the metadata of the MP3 files - but for some reason it would not save the last 2 objects in the list (of 12)... So I've rewritten it to scan all the items first and then populate mongoDB using mongoose from an array; just to split things up. But having no luck in finding out why nothing happens and why there are no errors on the .save()
Any help with be greatly appreciated.
Regards,
Mark
var saveMP3s = function(MP3Files, callback) {
console.log('start loop -> saving MP3s');
async.forEach( MP3Files, function(mp3file, callback) {
var newTrack = new MP3Track(); // <--- USE VAR HERE
newTrack.title = mp3file.title;
newTrack.track = mp3file.track;
newTrack.disk = mp3file.disk;
newTrack.metadata = mp3file.metadata;
newTrack.path = mp3file.path;
console.log('....:> Song Start: ');
console.log(newTrack.title);
console.log(newTrack.track);
console.log(newTrack.disk);
console.log(newTrack.metadata);
console.log(newTrack.path);
console.log('....:> Song End: ');
newTrack.save(function (err) {
if (err) {
console.log(err);
} else {
console.log('saving Track: '+ newTrack.title);
callback();
}
});
}, function(err) {
if (err) { console.log(err); }
});
console.log('end loop -> finished saving MP3s');
};
I suspect that the missing var keyword is declaring a global variable and you are overriding it on each iteration of the loop. Meaning, before the first newTrack can complete it's async save operation, you've already moved on in the loop and overridden that variable with the next instance.
Also, in async.forEach you MUST invoke the callback when the operation is completed. You are only calling it if the record saves successfully. You should also be calling it if an error occurs and pass in the error.
Finally, the callback argument to your saveMP3s function never gets called at all. The call to callback() inside the newTrack.save function only refers to the callback argument passed into the anonymous function by async.forEach.
SOLVED
Hi,
I ended up solving it by having the async loop higher up the in the logic so the saveMP3s function was called from within the async loop itself.
Thank you all for your thoughts and suggestions.
Enjoy your day.
mark
(not allowed to Accept my answer until tomorrow, so will update the question status then)

Asynchronous file appends

In trying to learn node.js/socket.io I have been messing around with creating a simple file uploader that takes data chunks from a client browser and reassembles on server side.
The socket.io event for receiving a chunk looks as follows:
socket.on('sendChunk', function (data) {
fs.appendFile(path + fileName, data.data, function (err) {
if (err)
throw err;
console.log(data.sequence + ' - The data was appended to file ' + fileName);
});
});
The issue is that data chunks aren't necessarily appended in the order they were received due to the async calls. Typical console output looks something like this:
1 - The data was appended to file testfile.txt
3 - The data was appended to file testfile.txt
4 - The data was appended to file testfile.txt
2 - The data was appended to file testfile.txt
My question is, what is the proper way to implement this functionality in a non-blocking way but enforce sequence. I've looked at libraries like async, but really want to be able to process each as it comes in rather than creating a series and run once all file chunks are in. I am still trying to wrap my mind around all this event driven flow, so any pointers are great.
Generally you would use a queue for the data waiting to be written, then whenever the previous append finishes, you try to write the next piece. Something like this:
var parts = [];
var inProgress = false;
function appendPart(data){
parts.push(data);
writeNextPart();
}
function writeNextPart(){
if (inProgress || parts.length === 0) return;
var data = parts.shift();
inProgress = true;
fs.appendFile(path + fileName, data.data, function (err) {
inProgress = false;
if (err) throw err;
console.log(data.sequence + ' - The data was appended to file ' + fileName);
writeNextPart();
});
}
socket.on('sendChunk', function (data) {
appendPart(data);
});
You will need to expand this to keep a queue of parts and inProgress based on the fileName. My example assumes those will be constant for simplicity.
Since you need the appends to be in order or synchronous. You could use fs.appendFileSync instead of fs.appendFile. This is quickest way to handle it, but it hurts performance.
If you want to handle it asynchronously yourself, use streams which deal with this problem using EventEmitter. It turns out that the response (as well as the request) objects are streams. Create a writeable stream with fs.createWriteStream and write all pieces to append the file.
fs.createWriteStream(path, [options])#
Returns a new WriteStream object (See Writable Stream).
options is an object with the following defaults:
{ flags: 'w',
encoding: null,
mode: 0666 }
In your case you would use flags: 'a'

nodejs express fs iterating files into array or object failing

So Im trying to use the nodejs express FS module to iterate a directory in my app, store each filename in an array, which I can pass to my express view and iterate through the list, but Im struggling to do so. When I do a console.log within the files.forEach function loop, its printing the filename just fine, but as soon as I try to do anything such as:
var myfiles = [];
var fs = require('fs');
fs.readdir('./myfiles/', function (err, files) { if (err) throw err;
files.forEach( function (file) {
myfiles.push(file);
});
});
console.log(myfiles);
it fails, just logs an empty object. So Im not sure exactly what is going on, I think it has to do with callback functions, but if someone could walk me through what Im doing wrong, and why its not working, (and how to make it work), it would be much appreciated.
The myfiles array is empty because the callback hasn't been called before you call console.log().
You'll need to do something like:
var fs = require('fs');
fs.readdir('./myfiles/',function(err,files){
if(err) throw err;
files.forEach(function(file){
// do something with each file HERE!
});
});
// because trying to do something with files here won't work because
// the callback hasn't fired yet.
Remember, everything in node happens at the same time, in the sense that, unless you're doing your processing inside your callbacks, you cannot guarantee asynchronous functions have completed yet.
One way around this problem for you would be to use an EventEmitter:
var fs=require('fs'),
EventEmitter=require('events').EventEmitter,
filesEE=new EventEmitter(),
myfiles=[];
// this event will be called when all files have been added to myfiles
filesEE.on('files_ready',function(){
console.dir(myfiles);
});
// read all files from current directory
fs.readdir('.',function(err,files){
if(err) throw err;
files.forEach(function(file){
myfiles.push(file);
});
filesEE.emit('files_ready'); // trigger files_ready event
});
As several have mentioned, you are using an async method, so you have a nondeterministic execution path.
However, there is an easy way around this. Simply use the Sync version of the method:
var myfiles = [];
var fs = require('fs');
var arrayOfFiles = fs.readdirSync('./myfiles/');
//Yes, the following is not super-smart, but you might want to process the files. This is how:
arrayOfFiles.forEach( function (file) {
myfiles.push(file);
});
console.log(myfiles);
That should work as you want. However, using sync statements is not good, so you should not do it unless it is vitally important for it to be sync.
Read more here: fs.readdirSync
fs.readdir is asynchronous (as with many operations in node.js). This means that the console.log line is going to run before readdir has a chance to call the function passed to it.
You need to either:
Put the console.log line within the callback function given to readdir, i.e:
fs.readdir('./myfiles/', function (err, files) { if (err) throw err;
files.forEach( function (file) {
myfiles.push(file);
});
console.log(myfiles);
});
Or simply perform some action with each file inside the forEach.
I think it has to do with callback functions,
Exactly.
fs.readdir makes an asynchronous request to the file system for that information, and calls the callback at some later time with the results.
So function (err, files) { ... } doesn't run immediately, but console.log(myfiles) does.
At some later point in time, myfiles will contain the desired information.
You should note BTW that files is already an Array, so there is really no point in manually appending each element to some other blank array. If the idea is to put together the results from several calls, then use .concat; if you just want to get the data once, then you can just assign myfiles = files directly.
Overall, you really ought to read up on "Continuation-passing style".
I faced the same problem, and basing on answers given in this post I've solved it with Promises, that seem to be of perfect use in this situation:
router.get('/', (req, res) => {
var viewBag = {}; // It's just my little habit from .NET MVC ;)
var readFiles = new Promise((resolve, reject) => {
fs.readdir('./myfiles/',(err,files) => {
if(err) {
reject(err);
} else {
resolve(files);
}
});
});
// showcase just in case you will need to implement more async operations before route will response
var anotherPromise = new Promise((resolve, reject) => {
doAsyncStuff((err, anotherResult) => {
if(err) {
reject(err);
} else {
resolve(anotherResult);
}
});
});
Promise.all([readFiles, anotherPromise]).then((values) => {
viewBag.files = values[0];
viewBag.otherStuff = values[1];
console.log(viewBag.files); // logs e.g. [ 'file.txt' ]
res.render('your_view', viewBag);
}).catch((errors) => {
res.render('your_view',{errors:errors}); // you can use 'errors' property to render errors in view or implement different error handling schema
});
});
Note: you don't have to push found files into new array because you already get an array from fs.readdir()'c callback. According to node docs:
The callback gets two arguments (err, files) where files is an array
of the names of the files in the directory excluding '.' and '..'.
I belive this is very elegant and handy solution, and most of all - it doesn't require you to bring in and handle new modules to your script.

Resources