Electron Node fs.writefile intermediate failures on promise - node.js

The following "write file", "check hash" in intermediate cases ( only sometimes) fails in an Electron 8.2.1 application. It will display that it has calculated d41d8cd98f00b204e9800998ecf8427e which is the hash over an empty string. However when i look in the folder, the file exists. So my assumption is that somehow sometimes that fs.writeFile under higher load still is not ready writing or something the file returns an empty string instead of the contents.
And note that in 99% of the cases the application runs correctly. Only in some cases (and we think very high load) this fails.
I read node - fs.writeFile creates a blank file which comes closer but it does not provide a reason or "why the hell" nor because of this you need to do that.
fs.writeFile(cPath, body, 'utf-8', (err) => {
if (err) {
errors.handleErrorLocal(err);
reject();
return;
}
log.info(cFile + ' C HASH is:' + hash + ' HASH calculated:' + md5File.sync(cPath))
if (hash && md5File.sync(cPath) !== hash.toLowerCase()) {
log.warn('C failed to download. Wrong Hash. ', cFile)
}
resolve();
});
The answer on node.js readfile woes from panu-logic seems to correspond with this experience (bottom answer) but also does not provide any reason why other than "magical" in his case however he tries to read while being written. "logically" this is not the case but the assumption is that this is for some unknown reason the case.
I am aware that i could rewrite to stream or await but that requiress me to change it , push it out, waiting for some time for bugs then retry. I would rather know before all of this what the reason is and just know for sure that the fix for this works rather than stretching this over weeks or months.

Related

How to check file is writable (resource is not busy nor locked)

excel4node's write to file function catches error and does not propagate to a caller. Therefore, my app cannot determine whether write to file is successful or not.
My current workaround is like below:
let fs = require('fs')
try {
let filePath = 'blahblah'
fs.writeFileSync(filePath, '') // Try-catch is for this statement
excel4nodeWorkbook.write(filePath)
} catch (e) {
console.log('File save is not successful')
}
It works, but I think it's a sort of hack and that it's not a semantically correct way. I also testedfs.access and fs.accessSync, but they only check permission, not the state (busy/lock) of resource.
Is there any suggestion for this to look and behave nicer without modifying excel4node source code?
I think you are asking the wrong question. If you check at time T, then write at time T + 1ms, what would guarantee that the file is still writeable?
If the file is not writeable for whatever reason, the write will fail, period. Nothing to do. Your code is fine, but you can probably also do without the fs.writeFileSync(), which will just erase whatever else was in the file before.
You can also write to a randomly-generated file path to make reasonably sure that two processes are not writing to the same file at the same time, but again, that will not prevent all possible write errors, so what you really, really want is rather some good error handling.
In order to handle errors properly you have to provide a callback!
Something along the lines of:
excel4nodeWorkbook.write(filePath, (err) => {
if (err) console.error(err);
});
Beware, this is asynchronous code, so you need to handle that as well!
You already marked a line in the library's source code. If you look a few lines above, you can see it uses the handler argument to pass any errors to. In fact, peeking at the documentation comment above the function, it says:
If callback is given, callback called with (err, fs.Stats) passed
Hence you can simply pass a function as your second argument and check for err like you've probably already seen elsewhere in the node environment:
excel4nodeWorkbook.write(filepath, (err) => {
if (err) {
console.error(err);
}
});

PouchDb Replicate of single document causes huge memory usage, then crash

I have a situation where live sync is refusing to get some documents on it's own, making PouchDb.get return saying the document is not found (despite it being there in CouchDb that it is replicating from).
Reading through the documentation, it suggests doing a manual replicate first, then a sync. So I changed my code to first replicate
docId='testdoc';
return new Promise(function (resolve, reject) {
var toReplicate=[docId];
console.log("replicate new ", toReplicate);
var quickReplicate = self.db.replicate.from(self.url, {
doc_ids: toReplicate,
// timeout: 100000, //makes no difference
checkpoint: false, //attempt to get around bad checkpoints, but I purged all checkpoints and still have the issue
batch_size: 10, //attempt to get around huge memory usage
batches_limit: 1
}).on('denied', function (err) {
// a document failed to replicate (e.g. due to permissions)
console.log("replicate denied", err);
reject(err);
}).on('complete', function (info) {
// handle complete
console.log("replicate complete", info, toReplicate);
resolve(info);
}).on('error', function (err) {
// handle error
console.log("replicate error", err);
reject(err);
}).on('change', function (change) {
console.log("replicate change", change);
}).on('pause', function (err) {
console.log("replicate pause", err);
});
})
Then get the doc
return self.db.get(docId).catch(function (err) {
console.error(err);
throw err;
});
This function is called multiple times (about 8 times on average), each time requesting a single doc. They may all run at almost the exact same time.
To simplify this, I commented out nearly every single time this function was used, one at a time, until I found the exact document causing the problem. I reduced it down to a very simple command directly calling the problem document
db.replicate.from("https://server/db",{
doc_ids:['profile.bf778cd1c7b4e5ea9a3eced7049725a1']
}).then(function(result){
console.log("Done",result);
});
This will never finish, the browser will rapidly use up memory and crash.
It is probably related to database rollback issues in this question here Is it possible to get the latest seq number of PouchDB?
When you attempt to replicate this document, no event is ever fired in the above code. Chrome/firefox will just sit, gradually using more ram and maxing the CPU then the browser crashes with this message in chrome.
This started happening after we re-created our test system like this:
1: A live Couchdb is replicated to a test system.
2: The test Couchdb is modified and becomes ahead of the live system. Causing replication conflicts.
3: The test CouchDb is deleted, and the replication rerun from start, creating a fresh test system.
Certain documents now have this problem, despite never being in PouchDb before, and there should be no existing replication checkpoints for PouchDb since the database is a fresh replication of live. Even destroying the PouchDb doesn't work. Even removing the indexDb pouch doesn't solve it. I am not sure what else to try.
-Edit, I've narrowed down the problem a little bit, the document has a ton of deleted revisions from conflicts. It seems to get stuck looping through them.

node's fs.writeFile does not overwrite previous contents

I have a file which sometimes gets \00 null characters inside it. So I need to repair it.
Thats why I'm reading it, removing the invalid characters and writing it again. BUT, fs.writeFile is not overwriting its previous contents. The new contents get appended, which is not what i want.
Is is because my write code is inside read code?
fs.readFile('./' + file, function (err, data) {
if (err) {
console.error(err);
return;
}
var str = data.toString();
var repaired = str.slice(0, str.indexOf('\00')) + str.slice(str.lastIndexOf('\00') + 1, str.length);
//console.log(repaired);
fs.writeFile('./' + file, repaired, function (err) {
if (err)
console.error(err);
});
});
I've also tried using {flag:'w'} (which i think fs.writeFile may already have by default)
Thanks to #thefourtheye for pointing me towards me to proper direction.
As there was no \00 character in the file i was testing with, the str.indexOf('\00') and was getting the whole file, and again str.slice(str.lastIndexOf('\00') was getting the whole file. Thats why I thought
Using replace function did the job.
var repaired = str.replace(/\00/g,'');
I had the same or similar problem in that it seems when I called fs.writeFile() with different content but the same file, if the new content was shorter than the existing content then it did not overwrite all of the previous file-content.
I found an explanation why this may be happening and a suggested remedy at:
https://github.com/nodejs/node-v0.x-archive/issues/4965.
According to that "This is not a bug. (Or at least, not one that Node has ever pretended to address...."
The suggested solution is "wait for the callback". I assume that means "wait for the write-callback to be called before trying to read the file". That makes sense of course, you should not try to read what may not have been fully written yet.
But, if you write to the same file several times like I did, then waiting for the (first) write-callback to complete before reading is not enough. Why? Because another 'write' may be in progress when you do the reading, and thus you can get garbled content.

"Error: OK" when using fs.readFile() in Node.js (after some iteration of about a hundred thousand)?

I'm "walking" a hundred thousand JSON files, reading the content and throwing an error if something bad happens:
walk(__dirname + '/lastfm_test', 'json', function (err, files) {
files.forEach(function (filePath) {
fs.readFile(filePath, function (err, data) {
if (err) throw err;
});
});
});
The walk function is largely inspired by this question (chjj answer). After some iteration, the line if (err) throw err gets executed. the error throw is:
Error: OK, open 'path/to/somejsonfile.json'
Any chance to investigate what's happening here? I'm sure that the walk function is ok: in fact replacing the call fs.readFile() with console.log(filePath) shows the paths. without errors.
Some useful info: Windows 7 x64, node.exe x64 .0.10.5. Last.fm dataset downloaded from here.
I recommend using the graceful-fs module for this purpose. It will automatically limit the number of open file descriptors. It's written by Isaac Schlueter, the creator of npm and maintainer of Node, so it's pretty solid. The bare fs module lets you shoot yourself in the foot.
The "foreach-loop" is executing readFile very often. NodeJS starts opening the files in a background thread. But no file is processed in the NodeJS main thread until the foreach loop is finished (and all file open requests are scheduled). For this reason no files are processed (and later closed) while opening all files. At some time point many files are opened and all available handles are used, resulting in the useless error message.
Their are multiple soulutions to your problem:
First you could open all files synchronously after each other. But this would slow down the application and would not match the event based programming model of NodeJS. (But is the easiest solution if you don't mind the performance)
Better would be opening only a specific amount of files at a time (e.g. ~1000 files) and after processing one you could open the next one.
Pseude Code:
1. walk the file system and store all file name in an array
2. fs.readFile for a batch of files from the array
3. In the callback of readFile after processing, start opening more files from the array if not empty.

NodeJS Filesytem sync and performance

I've run into an issue with NodeJS where, due to some middleware, I need to directly return a value which requires knowing the last modified time of a file. Obviously the correct way would be to do
getFilename: function(filename, next) {
fs.stat(filename, function(err, stats) {
// Do error checking, etc...
next('', filename + '?' + new Date(stats.mtime).getTime());
});
}
however, due to the middleware I am using, getFilename must return a value, so I am doing:
getFilename: function(filename) {
stats = fs.statSync(filename);
return filename + '?' + new Date(stats.mtime).getTime());
}
I don't completely understand the nature of the NodeJS event loop, so what I was wondering is if statSync had any special sauce in it that somehow pumped the event loop (or whatever it is called in node, the stack of instructions waiting to be performed) while the filenode information was loading or is it really blocking and that this code is going to cause performance nightmares down the road and I should rewrite the middleware I am using to use a callback? If it does have special sauce to allow for the event loop to continue while it is waiting on the disk, is that available anywhere else (though some promise library or something)?
Nope, there is no magic here. If you block in the middle of the function, everything is blocked.
If performance becomes an issue, I think your only option is to rewrite that part of the middleware, or get creative with how it is used.

Resources