How to emit events properly? - node.js

I'm learning how to emit events using the NodeJS Event module but I'm struggling on a workflow interrogation.
I'd like to listen for new post created on a specific website using web scraping as it doesn't provide any other way.
For now, all I could find was by using a loop every X minutes and emit it once it notices a difference. But it involves a lot of inconvenience :
How to do it in a loop without blocking the whole script
That involves that you need to wait before continuing in the loop
So, how should you listen to events such as a new post? Is looking for differences at regular intervals a good practice?

The scraping is not blocking the whole script : the loading itself is pretty asynchronous, only the parsing part is synchronous.
If the scraping action is done with a promise, you can run promisses in parallel with promises.all, you could also run scraping with setInterval that way it will be started at regular interval even if some of the scrapings are slow or even failing
If you really want to not block the eventloop at all, you can run a worker for each scraping process.

Related

When event loop starts?

I’ve recently started to figure out what event loop really is and that confused me a lot, seems like I don’t know how nodejs works..
I mean when program starts, gets loaded into memory - what’s next?
I can’t see a place inside event loop where all sync. Code executes (like for/ while cycles that’s computes something).. doesn’t that means that V8 executes JavaScript and starts event loop when needed?
If anybody can help and explain how nodejs runtime is functioning on the high level would be really great
I highly recommend reading this Asynchrony: Now & Later
and I'll quote some things that I've once read.
........
JS Engine know nothing about code being asynchronous,It only execute code at a time and finishes..no more no less
the JS host environment is the one who has an implementation of the event-loop concept where code that doesn't need to run now(in the future),is waiting(imagine a network call/ io call) to finish processing and get called (be added to the event-queue of the event-loop and then executed at a next tick)
At program start,I'm 100% sure but I think all code is added to the event-queue(the way how the event-loop is implemented) and it's processed as First in First out (FIFO) which means the earlier the code the first is executed,and while running if some code need to be stalled like a setTimeout or IO process or an Ajax call(which both need time) it's up to them to use for example a callback to call(here the callback is added to the event-queue) and it's the event-loop responsibility to execute these callback in order that they've reached in at a next future tick.

Can the same line of Node.js code run at the same time?

I'm pretty new to node and am trying to setup an express server. The server will handle various requests and if any of them fail call a common failure function that will send e-mail. If I was doing this in something like Java I'd likely use something like a synchronized block and a boolean to allow the first entrance into the code to send the mail.
Is there anything like a synchronized block in Node? I believe node is single threaded and has a few helper threads to handle asyncronous/callback code. Is it at all possible that the same line of code could run at exactly the same time in Node?
Thanks!
Can the same line of Node.js code run at the same time? Is it at all possible that the same line of code could run at exactly the same time in Node?
No, it is not. Your Javascript in node.js is entirely single threaded. An event is pulled from the event queue. That calls a callback associated with that event. That callback runs until it returns. No other events can be processed until that first one returns. When it returns, the interpreter pulls the next event from the event queue and then calls the callback associated with it.
This does not mean that there are not concurrency issues in node.js. There can be, but it is caused not by code running at the same physical time and creating conflicting access to shared variables (like can happen in threaded languages like Java). Concurrency issues can be caused by the asynchronous nature of I/O in node.js. In the asynchronous case, you call an asynchronous function, pass it a callback (or expect a promise in return). Your code then continues on and returns to the interpreter. Some time later an event will occur inside of node.js native code that will add something to the event queue. When the interpreter is free from running other Javascript, it will process that event and then call your callback which will cause more of your code to run.
While all this is "in process", other events are free to run and other parts of your Javascript can run. So, the exposure to concurrency issues comes, not from simultaneous running of two pieces of code, but from one piece of your code running while another piece of your code is waiting for a callback to occur. Both pieces of code are "in process". They are not "running" at the same time, but both operations are waiting from something else to occur in order to complete. If these two operations access variables in ways that can conflict with each other, then you can have a concurrency issue in Javascript. Because none of this is pre-emptive (like threads in Java), it's all very predictable and is much easier to design for.
Is there anything like a synchronized block in Node?
No, there is not. It is simply not needed in your Javascript code. If you wanted to protect something from some other asynchronous operation modifying it while your own asynchronous operation was waiting to complete, you can use simple flags or variables in your code. Because there's no preemption, a simple flag will work just fine.
I believe node is single threaded and has a few helper threads to handle asyncronous/callback code.
Node.js runs your Javascript as single threaded. Internally in its own native code, it does use threads in order to do its work. For example, the asynchronous file system access code internal to node.js uses threads for disk I/O. But these threads are only internal and the result of these threads is not to call Javascript directly, but to insert events in the event queue and all your Javascript is serialized through the event queue. Pull event from event queue, run callback associated with the event. Wait for that callback to return. Pull next event from the event queue, repeat...
The server will handle various requests and if any of them fail call a common failure function that will send e-mail. If I was doing this in something like Java I'd likely use something like a synchronized block and a boolean to allow the first entrance into the code to send the mail.
We'd really have to see what your code looks like to understand what exact problem you're trying to solve. I'd guess that you can just use a simple boolean (regular variable) in node.js, but we'd have to see your code to really understand what you're doing.

Javascript background processing without web worker?

I'm quite sure with this, but just to have your opinion:
Is it somehow possible to perform operations in the background with Javascript if web workers are not available?
Is there perhaps a way to "misuse" the asynchronous setTimeout() function or some other mechanisms?
My goal is to read things from the localStorage, do a few transformations and send them via Ajax.
Thanks.
You can't run operation in background, but you can split it in small chunks, and run each next part with setTimeout. As result browser will have time to render changes and will be responsive to normal actions, while your long process will be executed as well
function iteration(){
do_some_small_amount_of_work();
if (!not_finished)
scheduler.setTimeout(iteration, 1);
}
There is not really multithreading in Javascript, but you can run code asynchronously using setTimeout(code, 0). This queues the function for execution.
Without using web workers, what you've suggested (using setTimeout) is the only way to do it, and of course it's not really "background" at all at that point. Note that you'll need to keep the processing quite short each time you fire the "background" code, because it's not really background code at all; while your code is running, the page will be fairly unresponsive (the degree to which it's unresponsive will vary by browser, but certainly any JavaScript code on the page will have to wait for your function call to finish).
No, there is no way to do anything in the background in Javascript. It's strictly single threaded.
You can use setTimeout or setInterval to do the work in the foreground, but just a small part of it each time. That way the interface is still reasonably responsive as it handles events between your bursts of work.

Node.js background processing

I'm new to node.js, so please forgive what probably is a naive question :) My question is what is the best way to setup a non-UI job written in node? The task I've created is used to crawl some web content based upon an Azure queue (the queue message tells the job which content to crawl). All of the examples I see around node are more UI and request based, using http.createServer and listening on a specific port. While I can make this work, this doesn't seem right, it seems like I just need to create some sort of javascript setInterval loop (or something similar) that keeps looking at my queue. Any suggestions or examples that would push me in the right direction would be greatly appreciated.
Chris
I'm not really clear on what you're trying to do, but node doesn't depend on the http stack at all. If you just want to start node and have it process something, that is pretty straightforward. Your app.js could be as simple as:
var queueWorker = require('worker');
var startWorker = function() {
if(queueWorker.hasWork()) {
queueWorker.processQueue(startWorker);
} else {
setTimeout(startWorker, 1000);
}
};
startWorker();
What this is doing is setting up a worker loop where every second it will check to see if there is new work, and if there is start processing it. Once it is done processing the work, go back to the 1 second interval checking for new work.
You would have to create the worker module as the check for hasWork and the processing of said work is application dependent.
If you wanted to get a little more fancy, processQueue could spawn a new node process which is only responsible for actually processing the work, then you could keep track of the number of spawned workers versus CPU limitations and have a relatively simple node app which processes data on multiple threads.

Do slow function calls inside response.write() block the event loop?

So, after reading a little about non-blocking code, does...
response.write(thisWillTakeALongTime());
...block the process? If so, do we need to pass the response into pretty much every slow function call we make, and have that function handle the response?
Thanks for helping to clarify!
Yes, it will block the event loop. And passing the response object into the slow function won't help, no matter where you call the slow function you will be blocking the event loop.
As to how to fix it, we will need more information.
What is making your slow function slow? Are you performing large calculations?
Are you doing sync versions of file/database calls?
It depends on what you mean by process. The web server has already finished serving the page at this point you js will execute however the request is synchronous so the javascript will continue to devote its in your function until it returns regardless even if take years. (hopefully by this point the browser will detect your script is taking too long and give you the opportunity to kill it). Even still you suffer the embarrassment of the user having to kill your javascript functionality and them not being able to use the page.
So how do you solve the problem. The time when this gets particular important is when your js is making the problem because at the point a whole host of things of things can go wrong. Imagine that your user is on the other side of the earth. the network latency could make your js painfully slow. when using ajax its preferable to use Asynchronous requests which get around this. I personally recommend using jquery as it makes async ajax calls really easy and the documentation on the side is quite straight forward. The other thing I recommend is making the return output small. It made be better to return json output and build the needed html from that.

Resources