I have a function that writes data to a file then uploads that file to cloud storage. The file isn't finished writing before it starts uploading so I am getting a partial file in cloud storage. I found that fs.writeFileSync(path, data[, options]) could help, but I am not exactly sure how it works.
It is my understanding that node runs asynchronously and I have several async processes running prior to this portion of code. I understand what synchronous vs asynchronous means, but I am having a little trouble understanding how it plays in this example. Here are my questions if I replace the below code with fs.writeFileSync(path, data[, options])
What do the docs mean by "Synchronously append data to a file"
a. Will the next lines of code be halted until the fs.writeFileSync(path, data) is finished?
b. Are previous asynchronous processes halted by this line of code?
If other async processes are not affected how is writeFileSync different the writeFile?
Is there a callback feature in writeFileSync that I am misunderstanding?
Code for reference
outCsv = "x","y","z"
filename = "file.csv"
fs.writeFile(filename, outCsv, function (err) {
if (err) {
return console.log(err);
}
console.log('The file was saved!');
bucket.upload(filename, (err, file) => {
if (err) {
return console.log(err);
}
console.log('The file was uploaded!');
});
});
Will the next lines of code be halted until the fs.writeFileSync(path, data) is finished?
Yes. It is a blocking operation. Note that you're assuming that fs.writeFileSync does finish.
Are previous asynchronous processes halted by this line of code?
Kinda. Since JavaScript is single-threaded, they will also not be running while the file is writing but will queue up at the next tick of the event loop.
If other async processes are not affected how is writeFileSync different the writeFile?
It blocks any code that comes after it. For an easier example consider the following:
setTimeout(() => console.log('3'), 5);
console.log('1'); // fs.writeFileSync
console.log('2');
vs
setTimeout(() => console.log('3'), 5);
setTimeout(() => console.log('1'), 0); // fs.writeFile
console.log('2');
The first will print 1 2 3 because the call to console.log blocks what comes after. The second will print 2 1 3 because the setTimeout is non-blocking. The code that prints 3 isn't affected either way: 3 will always come last.
Is there a callback feature in writeFileSync that I am misunderstanding?
Don't know. You didn't post enough code for us to say.
This all begs the question of why to prefer fs.writeFile over the alternative. The answer is this:
The sync version blocks.
While it's taking however long it takes your e.g. webserver isn't handling requests.
Related
I asked a similar question yesterday to this, but the solution was very easy and did not really address my fundamental problem of not understanding flow control in asynchronous JavaScript. The short version of what I am trying to do is build a MongoDB collection from a directory of JSON files. I had it working, but I modified something and now the flow is such that the program runs to completion, and therefore closes the connection before the asynchronous insertOne() calls are executed. When the insertOne() calls are finally executed, the data is not input and I get warnings about an unhandled exception from using a closed connection.
I am new to this, so if what I am doing is not best practice (it isn't), please let me know and I am happy to change things to get it to be reliable. The relevant code basically looks like this:
fs.readDirSync(dataDir).forEach(async function(file){
//logic to build data object from JSON file
console.log('Inserting object ' + obj['ID']);
let result = await connection.insertOne(obj);
console.log('Object ' + result.insertedId + ' inserted.');
})
The above is wrapped in an async function that I await for. By placing a console.log() message at the end of program flow, followed by a while(true);, I have verified that all the "'Inserting object ' + obj[ID]" messages are printed, but not the following "'Object ' + result.insertedId + ' inserted'" messages when flow reaches the end of the program. If I remove the while(true); I get all the error messages, because I am no longer blocking and obviously by that point the client is closed. In no case is the database actually built.
I understand that there are always learning curves, but it is really frustrating to not be able to do something as simple as flow control. I am just trying to do something as simple as "loop through each file, perform function on each file, close, and exit", which is remedial programming. So, what is the best way to mark a point that flow control will not pass until all attempts to insert data into the Collection are complete (either successfully or unsuccessfully, because ideally I can use a flag to mark if there were any errors)?
I have found a better answer than my original, so I am going to post it for anyone else who needs this in the future, as there does not seem to be too much out there. I will leave my original hack up too, as it is an interesting experiment to run for anyone curious about the asynchronous queue. I will also note for everyone that there is a pretty obvious way to Promise.allSettled(), but it seems that this would put all files into memory at once which is what I am trying to avoid, so I am not going to write up that solution too.
This method uses the Node fs Promises API, specifically the fsPromises readdir method. I'll show the results running three test files I made in the same directory that have console.log() messages peppered throughout to help understand program flow.
This first file (without-fs-prom.js) uses the ordinary read method and demonstrates the problem. As you can see, the asynchronous functions (the doFile() calls) do not terminate until the end. This means anything you wanted to run only after all the files are processed would be run before processing finished.
/*
** This version loops through the files and calls an asynchronous
** function with the tradidional fs API (not the Promises API).
*/
const fs = require('fs');
async function doFile(file){
console.log(`Doing ${file}`);
return true;
}
async function loopFiles(){
console.log('Enter loopFiles(), about to loop through the files.');
fs.readdirSync(__dirname).forEach(async function(file){
console.log(`About to do file ${file}`);
ret = await doFile(file);
console.log(`Did file ${file}, returned ${ret}`);
return ret;
});
console.log('Done looping through the files, returning from loopFiles()');
}
console.log('Calling loopFiles()');
loopFiles();
console.log('Returned from loopFiles()');
/* Result of run:
> require('./without-fs-prom')
Calling loopFiles()
Enter loopFiles(), about to loop through the files.
About to do file with-fs-prom1.js
Doing with-fs-prom1.js
About to do file with-fs-prom2.js
Doing with-fs-prom2.js
About to do file without-fs-prom.js
Doing without-fs-prom.js
Done looping through the files, returning from loopFiles()
Returned from loopFiles()
{}
> Did file with-fs-prom1.js, returned true
Did file with-fs-prom2.js, returned true
Did file without-fs-prom.js, returned true
*/
The problem can be partially fixed using the fsPromises API as in with-fs-prom1.js follows:
/*
** This version loops through the files and calls an asynchronous
** function with the fs/promises API and assures all files are processed
** before termination of the loop.
*/
const fs = require('fs');
async function doFile(file){
console.log(`Doing ${file}`);
return true;
}
async function loopFiles(){
console.log('Enter loopFiles(), read the dir');
const files = await fs.promises.readdir(__dirname);
console.log('About to loop through the files.');
for(const file of files){
console.log(`About to do file ${file}`);
ret = await doFile(file);
console.log(`Did file ${file}, returned ${ret}`);
}
console.log('Done looping through the files, returning from loopFiles()');
}
console.log('Calling loopFiles()');
loopFiles();
console.log('Returned from loopFiles()');
/* Result of run:
> require('./with-fs-prom1')
Calling loopFiles()
Enter loopFiles(), read the dir
Returned from loopFiles()
{}
> About to loop through the files.
About to do file with-fs-prom1.js
Doing with-fs-prom1.js
Did file with-fs-prom1.js, returned true
About to do file with-fs-prom2.js
Doing with-fs-prom2.js
Did file with-fs-prom2.js, returned true
About to do file without-fs-prom.js
Doing without-fs-prom.js
Did file without-fs-prom.js, returned true
Done looping through the files, returning from loopFiles()
*/
In this case, code after the file iteration loop within the asynchronous function itself runs after all files have been processed. You can have code in any function context with the following construction (file with-fs-prom2.js):
/*
** This version loops through the files and calls an asynchronous
** function with the fs/promises API and assures all files are processed
** before termination of the loop. It also demonstrates how that can be
** done from another asynchrounous call.
*/
const fs = require('fs');
async function doFile(file){
console.log(`Doing ${file}`);
return true;
}
async function loopFiles(){
console.log('Enter loopFiles(), read the dir');
const files = await fs.promises.readdir(__dirname);
console.log('About to loop through the files.');
for(const file of files){
console.log(`About to do file ${file}`);
ret = await doFile(file);
console.log(`Did file ${file}, returned ${ret}`);
}
console.log('Done looping through the files, return from LoopFiles()');
return;
}
async function run(){
console.log('Enter run(), calling loopFiles()');
await loopFiles();
console.log('Returned from loopFiles(), return from run()');
return;
}
console.log('Calling run()');
run();
console.log('Returned from run()');
/* Result of run:
> require('./with-fs-prom2')
Calling run()
Enter run(), calling loopFiles()
Enter loopFiles(), read the dir
Returned from run()
{}
> About to loop through the files.
About to do file with-fs-prom1.js
Doing with-fs-prom1.js
Did file with-fs-prom1.js, returned true
About to do file with-fs-prom2.js
Doing with-fs-prom2.js
Did file with-fs-prom2.js, returned true
About to do file without-fs-prom.js
Doing without-fs-prom.js
Did file without-fs-prom.js, returned true
Done looping through the files, return from LoopFiles()
Returned from loopFiles(), return from run()
*/
EDIT
This was my first tentative answer. It is a hack of a solution at best. I am leaving it up because it is an interesting experiment for people who want to peer into the asynchronous queue, and there may be some real use case for this somewhere too. I think my newly posted answer is superior in all reasonable cases, though.
Original Answer
I found a bit of an answer. It is a hack, but further searching on the net and the lack of responses indicate that there may be no real good way to reliably control flow with asynchronous code and callbacks. Basically, the modification is along the lines of:
fs.readDirSync(dataDir).forEach(async function(file){
jobsOutstanding++;
//logic to build data object from JSON file
console.log('Inserting object ' + obj['ID']);
let result = await connection.insertOne(obj);
console.log('Object ' + result.insertedId + ' inserted.');
jobsOutstanding--;
})
Where jobsOutstanding is a top level variable to the module with an accessor, numJobsOutstanding().
I now wrap the close like this (with some logging to watch how the flow works):
async function closeClient(client){
console.log("Enter closeClient()");
if(!client || !client.topology || !client.topology.isConnected()){
console.log("Already closed.");
}
else if(dataObject.numJobsOutstanding() == 0){
await client.close();
console.log("Closed.");
}
else{
setTimeout(function(){ closeClient(client);}, 100);
}
}
I got this one to run correctly, and the logging is interesting to visualize the asynchronous queue. I am not going to accept this answer yet to see if anyone out there knows something better.
I can read the files using nodejs file system:
const fs = require('fs');
fs.readFile('./assets/test1.txt', (err, data) => {
if(err){
console.log(err)
}
console.log(data.toString())
})
console.log('hello shawn!')
Why console.log('hello shawn!') read first times then read the console.log(data.toString())?
Is there any other things in file system read data first then read below console?
It is because .readFile is a asynchronous operation. Its last parameter is callback function, which is started after operation is done. I recommend read something about callbacks and event loop.
You can use a synchronous version of function readFileSync or use utils.promisify to convert callback function to promise and use async/await then example.
I haven't found anything specific about this, it isn't really a problem but I would like to understand better what is going on here.
Basically, I'am testing some simple NodeJS code , like this :
//Summary : Open a file , write to the file, delete the file.
let fs = require('fs');
fs.open('mynewfile.txt' , 'w' , function(err,file){
if(err) throw err;
console.log('Created file!')
})
fs.appendFile('mynewfile.txt' , 'Depois de ter criado este ficheiro com o fs.open, acrescentei-lhe data com o fs.appendFile' , function(err){
if (err) throw err;
console.log('Added text to the file.')
})
fs.unlink('mynewfile.txt', function(err){
if (err) throw err;
console.log('File deleted!')
})
console.log(__dirname);
I thought this code would be executed in the order it was written from the top to the bottom, but when I look at the terminal I'am not sure that was the case because this is what I get :
$ node FileSystem.js
C:\Users\Simon\OneDrive\Desktop\Portfolio\Learning Projects\NodeJS_Tutorial
Created file!
File deleted!
Added text to the file.
//Expected order would be: Create file, add text to file , delete file , log dirname.
Instead of what ther terminal might make you think, in the end when I look at my folder the code order still seems to have been followed somehow because the file was deleted and I have nothing left on the directory.
So , I was wondering , why is it that the terminal doesn't log in the same order that the code is written from the top to the bottom.
Would this be the result of NodeJS asynchronous nature or is it something else ?
The code is (in princliple) executed from top to bottom, as you say. But fs.open, fs.appendFile, and fs.unlink are asynchronous. Ie, they are placed on the execution stack in the partiticular order, but there is no guarantee whatsoever, in which order they are finished, and thus you can't guarantee, in which order the callbacks are executed. If you run the code multiple times, there is a good chance, that you may encounter different execution orders ...
If you need a specific order, you have two different options
You call the later operation only in the callback of the prior, ie something like below
fs.open('mynewfile.txt' , 'w' , function(err,file){
if(err) throw err;
console.log('Created file!')
fs.appendFile('mynewfile.txt' , '...' , function(err){
if (err) throw err;
console.log('Added text to the file.')
fs.unlink('mynewfile.txt', function(err){
if (err) throw err;
console.log('File deleted!')
})
})
})
You see, that code gets quite ugly and hard to read with all that increasing nesting ...
You switch to the promised based approach
let fs = require('fs').promises;
fs.open("myfile.txt", "w")
.then(file=> {
return fs.appendFile("myfile.txt", "...");
})
.then(res => {
return fs.unlink("myfile");
})
.catch(e => {
console.log(e);
})
With the promise-version of the operations, you can also use async/await
async function doit() {
let file = await fs.open('myfile.txt', 'w');
await fs.appendFile('myfile.txt', '...');
await fs.unlink('myfile.txt', '...');
}
For all three possibilites, you probably need to close the file, before you can unlink it.
For more details please read about Promises, async/await and the Execution Stack in Javascript
It's a combination of 2 things:
The asynchronous nature of Node.js, as you correctly assume
Being able to unlink an open file
What likely happened is this:
The file was opened and created at the same time (open with flag w)
The file was opened a second time for appending (fs.appendFile)
The file was unlinked
Data was appended to the file (while it was already unlinked) and the file was closed
When data was being appended, the file still existed on disk as an inode, but had zero hard links (references) to it. It still takes up space then, but the OS checks the reference count when closing and frees up the space if the count has fallen to zero.
People sometimes run into a similar situation with daemons such as HTTP servers that employ log rotation: if something goes wrong when switching over logs, the old log file may be unlinked but not closed, so it's never cleaned up and it takes space forever (until you reboot or restart the process).
Note that the ordering of operations that you're observing is random, and it is possible that they would be re-ordered. Don't rely on it.
You could write this as (untested):
let fs = require('fs');
const main = async () => {
await fs.open('mynewfile.txt' , 'w');
await fs.appendFile('mynewfile.txt' , 'content');
await fs.unlink('mynewfile.txt');
});
main()
.then(() => console.log('success'()
.catch(console.error);
or within another async function:
const someOtherFn = async () => {
try{
await main();
} catch(e) {
// handle any rejection to your liking
}
}
(The catch block is not mandatory. You can opt to just let them throw to the top. It's just to showcase how async / await allows you to make synchronous code appear as if it was synchronous code without runing into callback hell.)
I have some code (somewhat simplified for this discussion) that is something like this
var inputFile='inputfile.csv';
var parser = parse({delimiter: ','}, function (err, data) {
async.eachSeries(data, function (line, callback) {
SERVER.Request(line[0], line[1]);
SERVER.on("RequestResponse", function(response) {
console.log(response);
});
callback();
});
});
SERVER.start()
SERVER.on("ready", function() {
fs.createReadStream(inputFile).pipe(parser);
});
and what I am trying to do is run a CSV file through a command line node program that will iterate over each line and then make a request to a server which responds with an event RequestResponse and I then log the response. the RequestResponse takes a second of so, and the way I have the code set up now it just flies through the CSV file and I get an output for each iteration but it is mostly the output I would expect for the first iteration with a little of the output of the second iteration. I need to know how to make iteration wait until there has been a RequestResponse event before continuing on to the next iteration. is this possible?
I have based this code largely in part on
NodeJs reading csv file
but I am a little lost tbh with Node.js and with async.foreach. any help would be greatly appreciated
I suggest that you bite the bullet and take your time learning promises and async/await. Then you can just use a regular for loop and await a web response promise.
Solution is straight forward. You need to call the "callback" after the server return thats it.
async.eachSeries(data, function (line, callback) {
SERVER.Request(line[0], line[1]);
SERVER.on("RequestResponse", function(response) {
console.log(response);
SERVER.removeAllListeners("RequestResponse", callback)
});
})
What is happening is that eachSeries is expecting callback AFTER you are down with the particular call.
I have the node.js code running on a server and would like to know if it is blocking or not. It is kind of similar to this:
function addUserIfNoneExists(name, callback) {
userAccounts.findOne({name:name}, function(err, obj) {
if (obj) {
callback('user exists');
} else {
// Add the user 'name' to DB and run the callback when done.
// This is non-blocking to here.
user = addUser(name, callback)
// Do something heavy, doesn't matter when this completes.
// Is this part blocking?
doSomeHeavyWork(user);
}
});
};
Once addUser completes the doSomeHeavyWork function is run and eventually places something back into the database. It does not matter how long this function takes, but it should not block other events on the server.
With that, is it possible to test if node.js code ends up blocking or not?
Generally, if it reaches out to another service, like a database or a webservice, then it is non-blocking and you'll need to have some sort of callback. However, any function will block until something (even if nothing) is returned...
If the doSomeHeavyWork function is non-blocking, then it's likely that whatever library you're using will allow for some sort of callback. So you could write the function to accept a callback like so:
var doSomHeavyWork = function(user, callback) {
callTheNonBlockingStuff(function(error, whatever) { // Whatever that is it likely takes a callback which returns an error (in case something bad happened) and possible a "whatever" which is what you're looking to get or something.
if (error) {
console.log('There was an error!!!!');
console.log(error);
callback(error, null); //Call callback with error
}
callback(null, whatever); //Call callback with object you're hoping to get back.
});
return; //This line will most likely run before the callback gets called which makes it a non-blocking (asynchronous) function. Which is why you need the callback.
};
You should avoid in any part of your Node.js code synchronous blocks which don't call system or I/O operations and which computation takes long time (in computer meaning), e.g iterating over big arrays. Instead move this type of code to the separate worker or divide it to smaller synchronous pieces using process.nextTick(). You can find explanation for process.nextTick() here but read all comments too.