Part of my application involves deleting a large folder and some rows that relate to the files in that folder in a database periodically.
To do this I was thinking of using this code:
const { rmdir } = require('fs/promises');
... begin transaction
... remove database rows
await rmdir(dir, { recursive: true });
... end transaction
The reason I want to use the Promises API instead of rmdirSync (which stops the entire server) is that I want the server to keep accepting requests while the operating system is busy deleting the folder. I know I could just use the callback api and just end the transaction before the folder is deleted, but that may lead to inconsistencies and also wouldn't catch any potential errors. (the files might still be served by express.static while the database rows are already gone)
But I read in this documentation:
https://nodejs.org/api/fs.html#fs_callback_example
That the use of the Promise API is not recommended in favour of the callback API.
So here's my questions:
Why is it slower to use the Promise API instead of the Callback API?
How much slower?
Could I just do this instead:
const { unlink } = require('fs');
... begin transaction
... remove database rows
await new Promise((res,rej)=>{
unlink('/tmp/hello', (err) => {
resolve()
})
});
... end transaction
The promises api does exactly the same thing as your included code:
await new Promise((resolve, reject)=>{
unlink('/tmp/hello', (err) => {
if (err) {
return reject(err)
}
resolve()
})
});
The memory and time difference is directly related to the promises. Callback API uses bare c++, but promises require an additional layer of logic.
There is probably no difference until you make a million operations per second.
Related
I haven't found anything specific about this, it isn't really a problem but I would like to understand better what is going on here.
Basically, I'am testing some simple NodeJS code , like this :
//Summary : Open a file , write to the file, delete the file.
let fs = require('fs');
fs.open('mynewfile.txt' , 'w' , function(err,file){
if(err) throw err;
console.log('Created file!')
})
fs.appendFile('mynewfile.txt' , 'Depois de ter criado este ficheiro com o fs.open, acrescentei-lhe data com o fs.appendFile' , function(err){
if (err) throw err;
console.log('Added text to the file.')
})
fs.unlink('mynewfile.txt', function(err){
if (err) throw err;
console.log('File deleted!')
})
console.log(__dirname);
I thought this code would be executed in the order it was written from the top to the bottom, but when I look at the terminal I'am not sure that was the case because this is what I get :
$ node FileSystem.js
C:\Users\Simon\OneDrive\Desktop\Portfolio\Learning Projects\NodeJS_Tutorial
Created file!
File deleted!
Added text to the file.
//Expected order would be: Create file, add text to file , delete file , log dirname.
Instead of what ther terminal might make you think, in the end when I look at my folder the code order still seems to have been followed somehow because the file was deleted and I have nothing left on the directory.
So , I was wondering , why is it that the terminal doesn't log in the same order that the code is written from the top to the bottom.
Would this be the result of NodeJS asynchronous nature or is it something else ?
The code is (in princliple) executed from top to bottom, as you say. But fs.open, fs.appendFile, and fs.unlink are asynchronous. Ie, they are placed on the execution stack in the partiticular order, but there is no guarantee whatsoever, in which order they are finished, and thus you can't guarantee, in which order the callbacks are executed. If you run the code multiple times, there is a good chance, that you may encounter different execution orders ...
If you need a specific order, you have two different options
You call the later operation only in the callback of the prior, ie something like below
fs.open('mynewfile.txt' , 'w' , function(err,file){
if(err) throw err;
console.log('Created file!')
fs.appendFile('mynewfile.txt' , '...' , function(err){
if (err) throw err;
console.log('Added text to the file.')
fs.unlink('mynewfile.txt', function(err){
if (err) throw err;
console.log('File deleted!')
})
})
})
You see, that code gets quite ugly and hard to read with all that increasing nesting ...
You switch to the promised based approach
let fs = require('fs').promises;
fs.open("myfile.txt", "w")
.then(file=> {
return fs.appendFile("myfile.txt", "...");
})
.then(res => {
return fs.unlink("myfile");
})
.catch(e => {
console.log(e);
})
With the promise-version of the operations, you can also use async/await
async function doit() {
let file = await fs.open('myfile.txt', 'w');
await fs.appendFile('myfile.txt', '...');
await fs.unlink('myfile.txt', '...');
}
For all three possibilites, you probably need to close the file, before you can unlink it.
For more details please read about Promises, async/await and the Execution Stack in Javascript
It's a combination of 2 things:
The asynchronous nature of Node.js, as you correctly assume
Being able to unlink an open file
What likely happened is this:
The file was opened and created at the same time (open with flag w)
The file was opened a second time for appending (fs.appendFile)
The file was unlinked
Data was appended to the file (while it was already unlinked) and the file was closed
When data was being appended, the file still existed on disk as an inode, but had zero hard links (references) to it. It still takes up space then, but the OS checks the reference count when closing and frees up the space if the count has fallen to zero.
People sometimes run into a similar situation with daemons such as HTTP servers that employ log rotation: if something goes wrong when switching over logs, the old log file may be unlinked but not closed, so it's never cleaned up and it takes space forever (until you reboot or restart the process).
Note that the ordering of operations that you're observing is random, and it is possible that they would be re-ordered. Don't rely on it.
You could write this as (untested):
let fs = require('fs');
const main = async () => {
await fs.open('mynewfile.txt' , 'w');
await fs.appendFile('mynewfile.txt' , 'content');
await fs.unlink('mynewfile.txt');
});
main()
.then(() => console.log('success'()
.catch(console.error);
or within another async function:
const someOtherFn = async () => {
try{
await main();
} catch(e) {
// handle any rejection to your liking
}
}
(The catch block is not mandatory. You can opt to just let them throw to the top. It's just to showcase how async / await allows you to make synchronous code appear as if it was synchronous code without runing into callback hell.)
fs.rename("${nombreHtml}.html",(err)=>{
console.log(err)
})
fs.appendFileSync("${nombreHtml}.html", htmlSeparado, () => { })
I try to run these two operations but it doesn't want to work
fs.rename is an asyncronous task.
By the time fs.rename finished its execution, fs.appendFileSync has already tried appending data to an html file which has not existed by the time.
fs.rename ... awaiting callback
fs.append ... failing
fs.rename finished, file now has a new name.
You probably want to either place fs.appendFileSync inside the fs.rename callback, or switch to promises. (example at the bottom)
example that should work:
fs.rename("${nombreHtml}.html",(err)=>{
if (err) console.log(err)
else {
fs.appendFileSync("${nombreHtml}.html", htmlSeparado, () => { })
}
})
By the way, because syncronous functions block the event loop and hence freeze your server for the time handling that function, making it unavailable for any other request - using filesystem's syncronous functions is rather less recommended for the general usecase, as the read/write/append operations are rather long. it is recommended to use the async versions of them, which return a callback or a promise, as you have done using fs.rename.
fs has a built-in sub-module with the same functions as promises which can be accessed by require('fs').promises.
this way you could just
const { rename, appendFile } = require('fs').promises;
try {
await rename("${nombreHtml}.html");
await appendFile("${nombreHtml}.html", htmlSeparado);
} catch (error) {
console.log(error);
}
I assume you want a template string so that the variables insert themselves into the string:
fs.rename(`${nombreHtml}.html`,(err)=>{
console.log(err)
})
fs.appendFileSync(`${nombreHtml}.html`, htmlSeparado, () => { })
I am new to nodejs/expressjs and mongodb. I am trying to create an API that exposes data to my mobile app that I am trying to build using Ionic framework.
I have a route setup like this
router.get('/api/jobs', (req, res) => {
JobModel.getAllJobsAsync().then((jobs) => res.json(jobs)); //IS THIS THe CORRECT WAY?
});
I have a function in my model that reads data from Mongodb. I am using the Bluebird promise library to convert my model functions to return promises.
const JobModel = Promise.promisifyAll(require('../models/Job'));
My function in the model
static getAllJobs(cb) {
MongoClient.connectAsync(utils.getConnectionString()).then((db) => {
const jobs = db.collection('jobs');
jobs.find().toArray((err, jobs) => {
if(err) {
return cb(err);
}
return cb(null, jobs);
});
});
}
The promisifyAll(myModule) converts this function to return a promise.
What I am not sure is,
If this is the correct approach for returning data to the route callback function from my model?
Is this efficient?
Using promisifyAll is slow? Since it loops through all functions in the module and creates a copy of the function with Async as suffix that now returns a promise. When does it actually run? This is a more generic question related to node require statements. See next point.
When do all require statements run? When I start the nodejs server? Or when I make a call to the api?
Your basic structure is more-or-less correct, although your use of Promise.promisifyAll seems awkward to me. The basic issue for me (and it's not really a problem - your code looks like it will work) is that you're mixing and matching promise-based and callback-based asynchronous code. Which, as I said, should still work, but I would prefer to stick to one as much as possible.
If your model class is your code (and not some library written by someone else), you could easily rewrite it to use promises directly, instead of writing it for callbacks and then using Promise.promisifyAll to wrap it.
Here's how I would approach the getAllJobs method:
static getAllJobs() {
// connect to the Mongo server
return MongoClient.connectAsync(utils.getConnectionString())
// ...then do something with the collection
.then((db) => {
// get the collection of jobs
const jobs = db.collection('jobs');
// I'm not that familiar with Mongo - I'm going to assume that
// the call to `jobs.find().toArray()` is asynchronous and only
// available in the "callback flavored" form.
// returning a new Promise here (in the `then` block) allows you
// to add the results of the asynchronous call to the chain of
// `then` handlers. The promise will be resolved (or rejected)
// when the results of the `job().find().toArray()` method are
// known
return new Promise((resolve, reject) => {
jobs.find().toArray((err, jobs) => {
if(err) {
reject(err);
}
resolve(jobs);
});
});
});
}
This version of getAllJobs returns a promise which you can chain then and catch handlers to. For example:
JobModel.getAllJobs()
.then((jobs) => {
// this is the object passed into the `resolve` call in the callback
// above. Do something interesting with it, like
res.json(jobs);
})
.catch((err) => {
// this is the error passed into the call to `reject` above
});
Admittedly, this is very similar to the code you have above. The only difference is that I dispensed with the use of Promise.promisifyAll - if you're writing the code yourself & you want to use promises, then do it yourself.
One important note: it's a good idea to include a catch handler. If you don't, your error will be swallowed up and disappear, and you'll be left wondering why your code is not working. Even if you don't think you'll need it, just write a catch handler that dumps it to console.log. You'll be glad you did!
After a lot of googling I have not been able to confirm the correct approach to this problem. The following code runs as expected but I have a grave feeling that I am not approaching this in the correct way, and I am setting myself up for problems.
The following code is initiated by the main app.js file and is passed a location to start loading XML files from and processing into a mongoDB
exports.processProfiles = function(path) {
var deferrer = q.defer();
q(dataService.deleteProfiles()) // simple mongodb call to empty the Profiles collection
.then(function(deleteResult) {
return loadFilenames(path); // method to load all filenames in the given path using fs
})
.then(function(filenames) {
// now we have all the file names lets load and save
filenames.forEach(function(filename) {
// Here is where i think the problem is!
// kick off another promise chain for the dynamically sized array of files to process
q(loadFileContent(path, filename)) // first we load the data in the file
.then(function(inboundFile) {
// then parse XML structure to my new shiny JSON structure
// and ask Mongo to store it for me
return dataService.createProfile(processProfileXML(filename, inboundFile));
})
.done(function(result) {
console.log(result);
})
});
})
.catch(function(err) {
deferrer.reject('Unable to Process Profile records : ' + err);
})
.done(function() {
deferrer.resolve('Profile Processing Completed');
});
return deferrer.promise;
}
Whilst this code works these are my main concerns but cannot solve them on my own after a few hours of Google and reading.
1) Is this blocking? The read out to the console is difficult to understand if this is running asynchronously as i want it to - i think it is but advice on if I am doing something fundamentally wrong would be great
2) Is having a nested promise a bad idea, should I be linking it to the outter promise - I have tried but could not get anything to compile or run.
I haven't used Q in a really long time, but I think that you'd need to do is let it know you're about to hand back an array of promises that need to all be satisfied before moving on.
Additionally as you're waiting for multiple promises on one section of code, rather than nesting further, throw the 'set' of promises back up once they're all satisfied.
q(dataService.deleteProfiles()) // simple mongodb call to empty the Profiles collection
.then(function (deleteResult) {
return loadFilenames(path); // method to load all filenames in the given path using fs
})
.then(function (filenames) {
return q.all(
filenames.map(function (filename) {
return q(loadFileContent(path, filename)) { /* Do stuff with your filenames */ });
})
);
.then(function (resultsOfLoadFileContentsPromises) {
console.log('I did stuff with all the things');
)
.catch(function(err) {});
What you have is not 'blocking'. But really what you're doing with promises is moving things into a new 'block'ing section. The more blocks you have, the more async-ish your code will appear. If nothing else is running apart from this promise, it will still appear procedural.
But inner promises must still resolve before the parent promises resolve thereafter.
Inner promises like what you have aren't an inherently bad, personally I will break them out into seperate files to makes easier to reason about, but I wouldn't define that as 'bad' unless there's no need for that inner promise to exist, however where possible (and in your example here) I've adjusted so I throw back up the next set of promises for a new section to deal with the data after it's gotten it.
(I'm not great with Q though, this code will probably require a little further tweaking).
I am working with zombie.js to scrape one site, I must to use the callback style to connect to each url. The point is that I have got an urls array and I need to process each urls using an async function. This is my first approach:
Array urls = {http..., http...};
function process_url(index)
{
if(index == urls.length)
return;
async_function(url,
function() {
...
//parse the url
...
// Process the next url
process_url(index++);
}
);
}
process_url(0)
Without use someone third party nodejs library to use the asyn funtion as sync function or to wait for the function (wait.for, synchornized, mocha), this is the way that I though to resolve this problem, I don't know what would happen if the array is too big. Is the function released from the memory when the next function is called? or all the functions are in memory until the end?
Any ideas?
Your scheme will work. I call it "manually sequencing async operations".
A general purpose version of what you're doing would look like this:
function processItem(data, callback) {
// do your async function here
// for example, let's suppose it was an http request using the request module
request(data, callback);
}
function processArray(array, fn) {
var index = 0;
function next() {
if (index < array.length) {
fn(array[index++], function(err, result) {
// process error here
if (err) return;
// process result here
next();
});
}
}
next();
}
processArray(arr, processItem);
As to your specific questions:
I don't know what would happen if the array is too big. Is the
function released from the memory when the next function is called? or
all the functions are in memory until the end?
Memory in Javascript is released when it is not longer referenced by any running code and when the garbage collector gets time to run. Since you are running a series of asynchronous operations here, it is likely that the garbage collector gets a chance to run regularly while waiting for the http response from the async operation so memory could get cleaned up then. Functions are just another type of object in Javascript and they get garbage collected just like anything else. When they are no longer reference by running code, they are eligible for garbage collection.
In your specific code, because you are re-calling process_url() only in an async callback, there is no stack build-up (as in normal recursion). The prior instance of process_url() has already completed BEFORE the async callback is called and BEFORE you call the next iteration of process_url().
In general, management and coordination of multiple async operations is much, much easier using promises which are built into the current versions of node.js and are part of the ES6 ECMAScript standard. No external libraries are required to use promises in current versions of node.js.
For a list of a number of different techniques for sequencing your asynchronous operations on your array, both using promises and not using promises, see:
How to synchronize a sequence of promises?.
The first step in using promises is the "promisify" your async function so that it returns a promise instead of takes a callback.
function async_function_promise(url) {
return new Promise(function(resolve, reject) {
async_function(url, function(err, result) {
if (err) {
reject(err);
} else {
resolve(result);
}
});
});
}
Now, you have a version of your function that returns promises.
If you want your async operations to proceed one at a time so the next one doesn't start until the previous one has completed, then a usual design pattern for that is to use .reduce() like this:
function process_urls(array) {
return array.reduce(function(p, url) {
return p.then(function(priorResult) {
return async_function_promise(url);
});
}, Promise.resolve());
}
Then, you can call it like this:
var myArray = ["url1", "url2", ...];
process_urls(myArray).then(function(finalResult) {
// all of them are done here
}, function(err) {
// error here
});
There are also Promise libraries that have some helpful features that make this type of coding simpler. I, myself, use the Bluebird promise library. Here's how your code would look using Bluebird:
var Promise = require('bluebird');
var async_function_promise = Promise.promisify(async_function);
function process_urls(array) {
return Promise.map(array, async_function_promise, {concurrency: 1});
}
process_urls(myArray).then(function(allResults) {
// all of them are done here and allResults is an array of the results
}, function(err) {
// error here
});
Note, you can change the concurrency value to whatever you want here. For example, you would probably get faster end-to-end performance if you increased it to something between 2 and 5 (depends upon the server implementation on how this is best optimized).