request advice converting project to node.js - node.js

Our company processes machine data. We get machine reports from a socket connection and write the data to mysql tables. Then, other scripts and threads written in python and ruby pull these records out and process the data, saving them back to the database and notify our clients on their machines health.
I want to convert these processes to node.js But, i have to do it kinda incrementally. Instead of doing so much writing back and forth to the database, I would like to hand the data off from our input process to a node server, that will process everything at once.
I suppose i can modify our input scripts to hand off the raw bytes we get from the machines or I could process them as JSON. I had thought about writing the new processes using a node http server, but that seems like some overhead. I have thought websockets may be a better solution, but I am looking for ideas.. thanks

Related

What are ways processing long running queries with Node?

I need to pull a large amount of data from another server and then save to my database ( mongodb). What is the best way to handle processing long running queries. The another consider is that there will be a good number of users of the app. They will intiate the similar requests to pull the data from the remote server and save it to our server.
I have found a various options like queue-worker, cluster, etc . I am new to NodeJs so I am a bit confused. Which one will be the best or is there any other solution I might consider.

Building Websites only on NodeJs and Express blocking requests over http

I have a question regarding the examples out there when using Nodejs, Express and Jade for templates.
All the examples show how to build some sort of a user administrative interface where you can add user profiles, delete them and manage them.
Those are considered beginner's guides to NodeJs. My question is around the fact that if I have have 10 users concurrently accessing the same interface and doing the same operations, surely NodeJs will block the requests for the other users as they are running on the same port.
So let's say I am pulling out a list of users which may be something like 10000. Yes I can do paging, but that is not the point. While I am getting the list from the server another 4 users want to access the application. They have to wait for my process to end. That is my question - how can one avoid that using NodeJS & Express?
I am on this issue for a couple of months! I currently have something in place that does the following:
Run the main processing of stuff on a port
Run a Socket.io process on a different port
Use a sticky session
The idea is that I do a request (like getting a list of items), and immediately respond with some request reference but without the requested items, thus releasing the port.
In the background "asynchronously" I then do the process of getting the items. Upon which when completed, I do an http request from one node to the socket node port node SENDING the items through.
When that is done I then perform a socket.io emit WITH the data and the initial request reference so that the correct user gets the message.
On the client side I have an event listening for the socket which then completes the ajax request by populating the list.
I have SOME success in doing this! It actually works to a degree! I have an issue online which complicates matters due to ip addresses, and socket.io playing funny.
I also have multiple workers using clustering. I use it in the following manner:
I create a master worker
I spawn workers
I take any connection request and pass it to the relevant worker.
I do that for the main node request as well as for the socket requests. Like I said I use 2 ports!
As you can see I have had a lot of work done on this and I am not getting a proper solution!
My question is this - have I gone all around the world 10 times only to have missed something simple? This sounds way to complicated to achieve a non-blocking nodejs only website.
I asked myself - surely all these tutorials would have not missed on something as important as this! But they did!
I have researched, read, and tested a lot of code - this is my very first time I ask anything on stackoverflow!
Thank you for any assistance.
P.S. One example of the same approach is this: I request a report using jasper, I pass parameters, and with the "delayed ajax response" approach as described above I simply release the port, and in the background a very intensive report is being generated (and this can be very intensive process as a lot of calculations are being performed)..! I really don't see a better approach - any help will be super appreciated!
Thank you for taking the time to read!
I'm sorry to say it, but yes, you have been going around the world 10 times only to have been missing something simple.
It's obvious that your previous knowledge/experience with webservers are from a blocking point of view, and if this was the case, your concerns had been valid.
Node.js is a framework focused around using a single thread to execute code, which means if it does any blocking operations, no one else would be able to get anything done.
There are some operations that can do this in node, like reading/writing to disk. However, most node operations will be asynchronous.
I believe you are familiar with the term, so I won't go into details. What asynchronous operations allows node to do, is to keep this single thread idle as much as possible. By idle I mean open for other work. If your code is fully asynchronous, then handling 4 concurrent users (or even 400) shouldn't be a problem, even for a single thread.
Now, in regards to your initial problem of ports: Once a request is received on a given port, node.js execute whatever code you have written for it, until it encounters an asynchronous operation as soon as that happens, it is available to to pick up more requests on the same port.
The second problem you inquire about, is the database operation. In this case, node-js would send the query to the database (which takes no time at all) and the database does that actual execution of the query. In the meantime, node is free to do whatever it wants, until the database is finished, and lets node know there is a result to fetch.
You can recognize async operations by their structure: my_function(..., ..., callback). Function that uses a callback function, is in most cases asynch.
So bottom line: Don't worry about the problems around blocking IO, as you will hardly encounter any in node. Use a single port if you want (By creating multiple child processes, you can even have multiple node instances on the same port).
Hope this explains it good enough. If you have any further questions, let me know :)

Is there a way to run a node task in a child process?

I have a node server, which needs to:
Serve the web pages
Keep querying an external REST API and save data to database and send data to clients for certain updates from REST API.
Task 1 is just a normal node tasks. But I don't know how to implement the task 2. This task won't expose any interface to outside. It's more like a background task.
Can anybody suggest? Thanks.
To make a second node.js app that runs at the same time as your first one, you can just create another node.js app and then run it from your first one using child_process.spawn(). It can regularly query the external REST API and update the database as needed.
The part about "Send data to clients for certain updates from REST API" is not so clear what you're trying to do.
If you're using socket.io to send data to connected browsers, then the browsers have to be connected to your web server which I presume is your first node.js process. To have the second node.js process cause data to be sent through the socket.io connections in the first node.js process, you need some interprocess way to communicate. You can use stdout and stdin via child_process.spawn(), you can use some feature in your database or any of several other IPC methods.
Because querying a REST API and updating a database are both asynchronous operations, they don't take much of the CPU of a node.js process. As such, you don't really have to do these in another node.js process. You could just have a setInterval() in your main node.js process, query the API every once in a while, update the database when results are received and then you can directly access the socket.io connections to send data to clients without having to use a separate process and some sort of IPC mechanism.
Task 1:
Express is good way to accomplish this task.
You can explore:
http://expressjs.com/
Task 2:
If you are done with Expressjs. Then you can write your logic with in Express Framework.
This task then can be done with node module forever. Its a simple tool that runs your background scripts forever. You can use forever to run scripts continuously (whether it is written in node.js or not)
Have a look:
https://github.com/foreverjs/forever

Pass data between multiple NodeJS servers

I am still pretty new to NodeJS and want to know if I am looking at this in the wrong way.
Background:
I am making an app that runs once a week, generates a report, and then emails that out to a list of recipients. My initial reason for using Node was because I have an existing front end already built using angular and I wanted to be able to reuse code in order to simplify maintenance. My main idea was to have 4+ individual node apps running in parallel on our server.
The first app would use node-cron in order to run every Sunday. This would check the database for all scheduled tasks and retrieve the stored parameters for the reports it is running.
The next app is a simple queue that would store the scheduled tasks and pass them to the worker tasks.
The actual pdf generation would be somewhat CPU intensive, so this would be a cluster of n apps that would retrieve and run individual reports from the queue.
When done making the pdf, they would pass to a final email app that would send the file out.
My main concerns are communication between apps. At the moment I am setting up the 3 lower levels (ie. all but the scheduler) on separate ports with express, and opening http requests to them when needed. Is there a better way to handle this? Would the basic 'net' work better than the 'http' package? Is Express even necessary for something like this, or would I be better off running everything as a basic http/net server? So far the only real use I've made of Express is to specifically listen to a path for put requests and to parse the incoming json. I was led to asking here because in tracking logs so far I see every so often the http request is reset, which doesn't appear to affect the data received on the child process, but I still like to avoid errors in my coding.
I think that his kind of decoupling could leverage some sort of stateful priority queue with features like retry on failure, clustering, ...
I've used Kue.js in the past with great sucess, it's redis backed and has nice documentation and interface http://automattic.github.io/kue/

How to keep hold of data on a Node.js server in case of error?

I'm working with Node.js on a serious project for the first time, building a multiplayer card game. I'm using socket.io for websockets. The Node.js server should run 24/7 once the game is live.
Currently, I have a few issues I need to iron out. For clarity, I will split them into two sub-questions below.
First, every time there's an error the whole server crashes. I will try to iron out all bugs before going live of course, but it would still be nice if we could avoid killing the whole server process when one card game's data is corrupted. For example, if I refer to a non-existing element of an object, of course Node throws an error. That's fine, but is there a way to prevent such occurrences from crashing the server (besides making sure that this type of thing doesn't happen in the first place)?
Second, game states are stored on the server during gameplay. So, if there are 3 games in progress, there will be states[1], states[2] and states[3] on the Node server. When the server crashes, I'd like to somehow keep hold of this data so the server could be restarted immediately and the data restored. I think sessions on the Node server are not an option because they will (presumably) die with the server, no? Storing everything to MySQL upon every game action seems like a huge waste of resources (but correct me if I'm wrong).
This all comes down to: how can I make sure the server can function autonomously through any potential problems without crashing and, if it has to crash, how can I make sure it doesn't take all ongoing games with it?
If bad state data is causing your app to crash, wouldn't reloading it after restart just cause another crash?
You should always validate that incoming data is completely valid before storing it, whether in-memory or in a database.
Whenever accessing a property of an object that may not exist, always check for that object's existence.
if (state[n].obj && state[n].obj.prop == val) ...
As far as storing your data out-of-process, your app sounds like a good candidate for a simple key-value store like Redis. If you store all your state data in Redis, it will survive node restarts, and performance will still be similar to in-proc memory. It also gives you the benefit of being able to scale beyond a single node process to handle heavy load.

Resources