NodeJS using FS and AsyncJS blocking the UI - node.js

I'm using react, electron, nodejs, asyncjs redux and thunk.
I wrote the following code which is supposed to download a list of files and write it to disk. In my code when the user presses a button i call this actionCreator:
export function downloadList(pack) {
return (dispatch, getState) => {
const { downloadManager } = getState();
async.each(downloadManager.downloadQueue[pack].libs, async (url, callback) => {
const filename = url.split('/').pop().split('#')[0].split('?')[0];
await downloadFile(url, `dl/${filename}`);
callback();
}, (err) => {
if (err) {
console.log('A file failed to process');
} else {
dispatch({
type: DOWNLOAD_COMPLETED,
packName: pack
});
}
});
};
}
async function downloadFile(url, path) {
const file = fs.createWriteStream(path);
const request = https.get(url, (response) => {
response.pipe(file);
file.on('finish', () => {
file.close();
});
}).on('error', (err) => { // Handle errors
fs.unlink(path); // Delete the file async. (But we don't check the result)
});
}
It does what it's supposed to do but while it does that, it blocks the entire UI. I really can't understand why it's happening since if i use an
setTimeout
with 3000ms delay inside the async.each it doesn't block the UI.
Another strange behaviour is that if i use the eachLimit function of asyncJS it just downloads me the limit of files, so if i want to download 100 files but i set eachLimit to 10 parallel, it just downloads the first 10 files and then stops. Can you enlight me about this?
I wanted to use axios to download files since it doesn't need to know if the urls are http or https but i can't find any resource on using axios with stream responsetype

I can answer the first part. Pretty much every existent implementation of JavaScript runs on one thread. This means that the runtime is concurrent, but not parallel, i.e. the runtime does one and exactly one thing at a time. This means that if there is a function call that takes a while, it will block everything else. Therefore, something in the downloadList function is blocking the event loop. However, if you use setTimeout, then the downloadList function will be pushed onto the message queue, which will unblock the event and allow the UI to be rendered. For more information on the event loop check out this video

Related

Possible race condition Node/Express

I am trying to understand a case I am running into with my Node/Express application.
Let's say I have the following which is a function called when a specific route is hit:
async function routeThatDoesStuff(req, res, next) {
doAsyncStuff()
res.status(200).json({ message: 'Completed' })
}
In this case doAsyncStuff() does database operations on resources that are not critical to send back to the user so theoretically we don't need to await it. However, it seems that this operation does not actually complete unless I put the await in front.
My guess is that this has to do with the event loop etc in Node. That potentially because the route is completing before the doAsyncStuff() completes, the function doesn't actually complete because Node terminated it prematurely.
My big question is how Node handles async children function execution when parent functions have already completed?
An example to show that this should work:
const sleep = (milliseconds, cb) => {
return new Promise(resolve => setTimeout(() => {
resolve(cb())
}, milliseconds))
}
async function routeThatDoesStuff(req, res, next) {
sleep(5000, () => console.log('done'))
res.status(200).json({ message: 'Completed' })
}
JSON response is shown immediately, callback is executed after 5 seconds even though the response has been sent.

createWriteStream in node seems to execute 'end' event before 'data'

I am unable to understand how the event loop is processing my snippet. What I am trying to achieve is
to read from a csv
download a resource found in the csv
upload it to s3
write it into a new csv file
const readAndUpload = () => {
fs.createReadStream('filename.csv')
.pipe(csv())
.on('data', ((row: any) => {
const file = fs.createWriteStream("file.jpg");
var url = new URL(row.imageURL)
// choose whether to make an http or https request
let client = (url.protocol=="https:") ? https : http
const request = client.get(row.imageURL, function(response:any) {
// file save
response.pipe(file);
console.log('file saved')
let filePath = "file.jpg";
let params = {
Bucket: 'bucket-name',
Body : fs.createReadStream(filePath),
Key : "filename.jpg"
};
// upload to s3
s3.upload(params, function (err: any, data: any) {
//handle error
if (err) {
console.log("Error", err);
}
//success
if (data) {
console.log("Uploaded in:", data.Location);
row.imageURL = data.Location
writeData.push(row)
// console.log(writeData)
}
});
});
}))
.on('end', () => {
console.log("done reading")
const csvWriter = createCsvWriter({
path: 'out.csv',
header: [
{id: 'id', title: 'some title'}
]
});
csvWriter
.writeRecords(writeData)
.then(() => console.log("The CSV file was written successfully"))
})
}
Going by the log statements that I have added, done reading and The CSV file was written successfully is printed by before file saved. My understanding was that the end event is called after the data event, so I am unsure of where I am going wrong.
Thank you for reading!
I'm not sure if this is part of the problem or not, but you've got an extra set of parens in this part of the code. Change this:
.on('data', ((row: any) => {
.....
})).on('end', () => {
to this:
.on('data', (row: any) => {
.....
}).on('end', () => {
And, if the event handlers are set up properly, your .on('data', ...) event handler gets called before the .on('end', ....) for the same stream. If you put this:
console.log('at start of data event handler');
as the first line in that event handler, you will see it get called first.
But, your data event handler uses multiple asynchronous calls and nothing you have in your code makes the end event wait for all your processing to be done in the data event handler. So, since that processing takes awhile, it's natural that the end event would occur before you're done running all that asynchronous code on the data event.
In addition, if you ever can have more than one data event (which one normally would), you're going to have multiple data events in flight at the same time and since you're using a fixed filename, they will probably be overwriting each other.
The usual way to solve something like this is to to stream.pause() to pause the readstream at the start of the data event processing and then when all your asynchronous stuff is done, you can then stream.resume() to let it start going again.
You will need to get the right stream in order to pause and resume. You could do something like this:
let stream = fs.createReadStream('filename.csv')
.pipe(csv());
stream.on('data', ((row: any) => {
stream.pause();
....
});
Then, way inside your s3.upload() callback, you can call stream.resume(). You will also need much, much better error handling that you have or things will just get stuck if you get an error.
It also looks like you have other concurrency issues too where you call:
response.pipe(file);
And you then attempt to use the file without actually waiting for that .pipe() operation to be done (which is also asynchronous). Overall, this whole logic really needs a major cleanup. I don't understand what exactly you're trying to do in all the different steps to know how to write a totally clean and simpler version.

Nodejs global variable scope issue

I'm quite new to Nodejs. In the following code I am getting json data from an API.
let data_json = ''; // global variable
app.get('/', (req, res) => {
request('http://my-api.com/data-export.json', (error, response, body) => {
data_json = JSON.parse(body);
console.log( data_json ); // data prints successfully
});
console.log(data_json, 'Data Test - outside request code'); // no data is printed
})
data_json is my global variable and I assign the data returned by the request function. Within that function the json data prints just fine. But I try printing the same data outside the request function and nothing prints out.
What mistake am I making?
Instead of waiting for request to resolve (get data from your API), Node.js will execute the code outside, and it will print nothing because there is still nothing at the moment of execution, and only after node gets data from your api (which will take a few milliseconds) will it execute the code inside the request. This is because nodejs is asynchronous and non-blocking language, meaning it will not block or halt the code until your api returns data, it will just keep going and finish later when it gets the response.
It's a good practice to do all of the data manipulation you want inside the callback function, unfortunately you can't rely on on the structure you have.
Here's an example of your code, just commented out the order of operations:
let data_json = ''; // global variable
app.get('/', (req, res) => {
//NodeJS STARTS executing this code
request('http://my-api.com/data-export.json', (error, response, body) => {
//NodeJS executes this code last, after the data is loaded from the server
data_json = JSON.parse(body);
console.log( data_json );
//You should do all of your data_json manipluation here
//Eg saving stuff to the database, processing data, just usual logic ya know
});
//NodeJS executes this code 2nd, before your server responds with data
//Because it doesn't want to block the entire code until it gets a response
console.log(data_json, 'Data Test - outside request code');
})
So let's say you want to make another request with the data from the first request - you will have to do something like this:
request('https://your-api.com/export-data.json', (err, res, body) => {
request('https://your-api.com/2nd-endpoint.json', (err, res, body) => {
//Process data and repeat
})
})
As you can see, that pattern can become very messy very quickly - this is called a callback hell, so to avoid having a lot of nested requests, there is a syntactic sugar to make this code look far more fancy and maintainable, it's called Async/Await pattern. Here's how it works:
let data_json = ''
app.get('/', async (req,res) => {
try{
let response = await request('https://your-api.com/endpoint')
data_json = response.body
} catch(error) {
//Handle error how you see fit
}
console.log(data_json) //It will work
})
This code does the same thing as the one you have, but the difference is that you can make as many await request(...) as you want one after another, and no nesting.
The only difference is that you have to declare that your function is asynchronous async (req, res) => {...} and that all of the let var = await request(...) need to be nested inside try-catch block. This is so you can catch your errors. You can have all of your requests inside catch block if you think that's necessary.
Hopefully this helped a bit :)
The console.log occurs before your request, check out ways to get asynchronous data: callback, promises or async-await. Nodejs APIs are async(most of them) so outer console.log will be executed before request API call completes.
let data_json = ''; // global variable
app.get('/', (req, res) => {
let pr = new Promise(function(resolve, reject) {
request('http://my-api.com/data-export.json', (error, response, body) => {
if (error) {
reject(error)
} else {
data_json = JSON.parse(body);
console.log(data_json); // data prints successfully
resolve(data_json)
}
});
})
pr.then(function(data) {
// data also will have data_json
// handle response here
console.log(data_json); // data prints successfully
}).catch(function(err) {
// handle error here
})
})
If you don't want to create a promise wrapper, you can use request-promise-native (uses native Promises) created by the Request module team.
Learn callbacks, promises and of course async-await.

Puppeteer (Cluster) closing page when I interact with it

In a NodeJS v10.x.x environment, when trying to create a PDF page from some HTML code, I'm getting a closed page issue every time I try to do something with it (setCacheEnabled, setRequestInterception, etc...):
async (page, data) => {
try {
const {options, urlOrHtml} = data;
const finalOptions = { ...config.puppeteerOptions, ...options };
// Set caching flag (if provided)
const cache = finalOptions.cache;
if (cache != undefined) {
delete finalOptions.cache;
await page.setCacheEnabled(cache); //THIS LINE IS CAUSING THE PAGE TO BE CLOSED
}
// Setup timeout option (if provided)
let requestOptions = {};
const timeout = finalOptions.timeout;
if (timeout != undefined) {
delete finalOptions.timeout;
requestOptions.timeout = timeout;
}
requestOptions.waitUntil = 'networkidle0';
if (urlOrHtml.match(/^http/i)) {
await page.setRequestInterception(true); //THIS LINE IS CAUSING ERROR DUE TO THE PAGE BEING ALREADY CLOSED
page.once('request', request => {
if(finalOptions.method === "POST" && finalOptions.payload !== undefined) {
request.continue({method: 'POST', postData: JSON.stringify(finalOptions.payload)});
}
});
// Request is for a URL, so request it
await page.goto(urlOrHtml, requestOptions);
}
return await page.pdf(finalOptions);
} catch (err) {
logger.info(err);
}
};
I read somewhere that this issue could be caused due to some await missing, but that doesn't look like my case.
I'm not using directly puppeteer, but this library that creates a cluster on top of it and handles processes:
https://github.com/thomasdondorf/puppeteer-cluster
You already gave the solution, but as this is a common problem with the library (I'm the author 🙂) I would like to provide some more insights.
How the task function works
When a job is queued and ready to be executed, puppeteer-cluster will create a page and call the task function (given to cluster.task) with the created page object and the queued data. The cluster then waits until the Promise is finished (fulfilled or rejected) and will close the page and execute the next job in the queue.
As an async-function is implicitly creating a Promise, this means as soon as the async-function given to the cluster.task function is finished, the page is closed. There is no magic happening to determine if the page might be used in the future.
Waiting for asynchronous events
Below is a code sample with a common mistake. The user might want to wait for an external event before closing the page as in the (not working) example below:
Non-working (!) code sample:
await cluster.task(async ({ page, data }) => {
await page.goto('...');
setTimeout(() => { // user is waiting for an asynchronous event
await page.evaluate(/* ... */); // Will throw an error as the page is already closed
}, 1000);
});
In this code, the page is already closed before the asynchronous function is executed. To correct way to do this would be to return a Promise instead.
Working code sample:
await cluster.task(async ({ page, data }) => {
await page.goto('...');
// will wait until the Promise resolves
await new Promise(resolve => {
setTimeout(() => { // user is waiting for an asynchronous event
try {
await page.evalute(/* ... */);
resolve();
} catch (err) {
// handle error
}
}, 1000);
});
});
In this code sample, the task function waits until the inner promise is resolved until it resolves the function. This will keep the page open until the asynchronous function calls resolve. In addition, the code uses a try..catch block as the library is not able to catch events thrown inside asynchronous code blocks.
I got it.
I was indeed forgetting an await to the call that was made to the function I posted.
That call was in another file that I use fot the cluster instance creation:
async function createCluster() {
//We will protect our app with a Cluster that handles all the processes running in our headless browser
const cluster = await Cluster.launch({
concurrency: Cluster[config.cluster.concurrencyModel],
maxConcurrency: config.cluster.maxConcurrency
});
// Event handler to be called in case of problems
cluster.on('taskerror', (err, data) => {
console.log(`Error on cluster task... ${data}: ${err.message}`);
});
// Incoming task for the cluster to handle
await cluster.task(async ({ page, data }) => {
main.postController(page, data); // <-- I WAS MISSING A return await HERE
});
return cluster;
}

Wait for an event to happen before sending HTTP response in NodeJS?

I'm looking for a solution to waiting for an event to happen before sending a HTTP response.
Use Case
The idea is I call a function in one of my routes: zwave.connect("/dev/ttyACM5"); This function return immediately.
But there exists 2 events that notice about if it succeed or fail to connect the device:
zwave.on('driver ready', function(){...});
zwave.on('driver failed', function(){...});
In my route, I would like to know if the device succeed or fail to connect before sending the HTTP response.
My "solution"
When an event happen, I save the event in a database:
zwave.on('driver ready', function(){
//In the database, save the fact the event happened, here it's event "CONNECTED"
});
In my route, execute the connect function and wait for the event to
appear in the database:
router.get('/', function(request, response, next) {
zwave.connect("/dev/ttyACM5");
waitForEvent("CONNECTED", 5, null, function(){
response.redirect(/connected);
});
});
// The function use to wait for the event
waitForEvent: function(eventType, nbCallMax, nbCall, callback){
if(nbCall == null) nbCall = 1;
if(nbCallMax == null) nbCallMax = 1;
// Looking for event to happen (return true if event happened, false otherwise
event = findEventInDataBase(eventType);
if(event){
waitForEvent(eventType, nbCallMax, nbCall, callback);
}else{
setTimeout(waitForEvent(eventType, callback, nbCallMax, (nbCall+1)), 1500);
}
}
I don't think it is a good practice because it iterates calls over the database.
So what are your opinions/suggestions about it?
I've gone ahead and added the asynchronous and control-flow tags to your question because at the core of it, that is what you're asking about. (As an aside, if you're not using ES6 you should be able to translate the code below back to ES5.)
TL;DR
There are a lot of ways to handle async control flow in JavaScript (see also: What is the best control flow module for node.js?). You are looking for a structured way to handle it—likely Promises or the Reactive Extensions for JavaScript (a.k.a RxJS).
Example using a Promise
From MDN:
The Promise object is used for asynchronous computations. A Promise represents a value which may be available now, or in the future, or never.
The async computation in your case is the computation of a boolean value describing the success or failure to connect to the device. To do so, you can wrap the call to connect in a Promise object like so:
const p = new Promise((resolve) => {
// This assumes that the events are mutually exclusive
zwave.connect('/dev/ttyACM5');
zwave.on('driver ready', () => resolve(true));
zwave.on('driver failed', () => resolve(false));
});
Once you have a Promise representing the state of the connection, you can attach functions to its "future" value:
// Inside your route file
const p = /* ... */;
router.get('/', function(request, response, next) {
p.then(successful => {
if (successful) {
response.redirect('/connected');
}
else {
response.redirect('/failure');
}
});
});
You can learn more about Promises on MDN, or by reading one of many other resources on the topic (e.g. You're Missing the Point of Promises).
Have you tried this? From the look of it, your zwave probably have already implemented an EventEmmiter, you just need to attach a listener to it
router.get('/', function(request, response, next) {
zwave.connect("/dev/ttyACM5");
zwave.once('driver ready', function(){
response.redirect(/connected);
});
});
There is a npm sync module also. which is used for synchronize the process of executing the query.
When you want to run parallel queries in synchronous way then node restrict to do that because it never wait for response. and sync module is much perfect for that kind of solution.
Sample code
/*require sync module*/
var Sync = require('sync');
app.get('/',function(req,res,next){
story.find().exec(function(err,data){
var sync_function_data = find_user.sync(null, {name: "sanjeev"});
res.send({story:data,user:sync_function_data});
});
});
/*****sync function defined here *******/
function find_user(req_json, callback) {
process.nextTick(function () {
users.find(req_json,function (err,data)
{
if (!err) {
callback(null, data);
} else {
callback(null, err);
}
});
});
}
reference link: https://www.npmjs.com/package/sync

Resources