Nodejs response to API request after data processing is done - node.js

In my current project, my nodeJs/express will receive a HTTP request through a route.
Once received, node will then use NightmareJS to perform webscraping and subsequently execute a python script that further process the data.
Lastly, it would then update this data into MongoDB.
Everything takes about 5mins.
What I am trying to achieve is to allow my front-end to somehow receive an acknowledgement that the request was put through. But also receive an update when the above process is completed and the database is updated.
I have looked into using Long polling or socket.io. However, I don't know which one I should use or how. Or should I use rabbitMQ instead? Putting the response that it is complete into the queue while my front-end constantly querying this queue.

Long polling or socket.io are similar, socket.io has Long polling fallback if WS not supported
rabbitMQ is quite different, you cannot use rabbitMQ protocal in browser, so you need a client app, not a web
socket.io is excellent and go well along express, still there are other options, SSE (server send events), firebase. You need to feel them before you choose one, they are not that hard if you follow their official guide
4.some of my opensource might help
https://github.com/postor/sse-notify-suite
https://github.com/postor/node-realtime-db
benefits of each solution
ajax + server cache: simple
long pull: low latency
SSE: low latency, event based
socket.io: low latency, event based, high throug put, double direction, long pull fall back

Related

Alternative to GraphQL long polling on an Express server for a large request?

Objective
I need to show a big table of data in my React web app frontend.
My backend is an Express server with a GraphQL layer and a few "normal" endpoints.
My server gets data from various sources, including an external API, which is the data source for my current task.
My server has a database that I can use freely. I cannot directly access the external API from my front end.
The data all comes from the external API I mentioned. In fact, it comes from multiple similar calls to the same endpoint with many different IDs. Each of those individual calls takes a while to return but doesn't risk timing out.
Current Solution
My naive implementation: I do one GraphQL query in which the resolver does all the API calls to the external service in parallel. It waits on them all to complete using Promise.all(). It then returns a big array containing all the data I need to my server. My server then returns that data to me.
Problem With Current Solution
Unfortunately, this sometimes leaves my frontend hanging for too long and it times out (takes longer than 2 minutes).
Proposed Solution
Is there a better way than manually implementing long polling in GraphQL?
This is my main plan for a solution at the moment:
Frontend sends a request to my server
Server returns a 200 and starts hitting the external API, and sets a flag in the database
Server stores the result of each API call in the database as it completes
Meanwhile, the frontend shows a loading screen and keeps making the same GraphQL query for an entity like MyBigTableData which will tell me how many of the external API calls have returned
When they've all returned, the next time I ask for MyBigTableData, the server will send back all the data.
Question
Is there a better alternative to GraphQL long polling on an Express server for this large request that I have to do?
An alternative that comes to mind is to not use GraphQL and instead use a standard HTTP endpoint, but I'm not sure that really makes much difference.
I also see that HTTP/2 has multiplexing which could be relevant. My server currently runs HTTP/1.1 and upgrading is something of an unknown to me.
I see here that Keep-Alive, which sounds like it could be relevant, is unusable in Safari which is bad as many of my users use Safari to access the frontend.
I can't use WebSockets because of technical restraints. I don't want to set a ridiculously long timeout on my client either (and I'm not sure if it's possible)
I discovered that GraphQL has polling built in https://www.apollographql.com/docs/react/data/queries/#polling
In the end, I made a REST polling system.

How to use socket.io properly with express app

I wonder how do I use socket.io properly with my express app.
I have a REST API written in express/node.js and I want to use socket.io to add real-time feature for my app. Consider that I want to do something I can do just by sending a request to my REST API. What should I do with socket.io? Should I send request to the REST API and send socket.io client the result of the process or handle the whole process within socket.io emitter and then send the result to socket.io client?
Thanks in advance.
Question is not that clear but from what I'm getting from it, is that you want to know what you would use it for that you cant already do with your current API?
The short answer is, well nothing really.. Websockets are just the natural progression of API's and the need for a more 'real-time' interface between systems.
Old methods (and still used and relevant for the right use case) is long polling where you keep checking back to the server for updated items and if so grab them.. This works but it can be expensive in terms of establishing a connection, performing a lookup, then closing a connection.
websockets keep that connection open, allowing both the client and server to communicate real time. So for example, lets say you make an update to your backend data and want users to get that update, using long polling you would rely on each client to ping back to the server, check if there is an update and if so grab it. This can cause lags between updates, some users have updated data while other do not etc.
Now, take the same scenario with websockets, you make an update to the backend data, hit submit, this then emits to your socket server. Socket server takes the call, performs the task ( grabs updated data ) and emits it to the users, each connected user instantly gets that update.
Socket servers are typically used for things like real time chats or polling where packets are smaller but they are also used for web games etc. Depending on the size of your payloads will determine how best to send data back and forth because the larger the payload the more resources / bandwidth it will take on the socket server so its something to consider.

Which one is best for chat app ? Web socket or Send request every 3 seconds

I am making chat app on react-native. I am using socket.io for this but socket.io sometimes not working successful. I would like to change send request to server side every 3 seconds.
I just send request for one chat id
Which one is best? If i use send request in every 3 seconds , will happen any problem from server side
Maybe long polling (not polling, it is different behaviour, with long polling an api call can stay pending until response is available) is an option but WebSocket are far preferable.
Responses are faster, it costs less resources serverside, less bandwidth, you can subscribe to multiple streams and so on.
Here you can valutate some metrics:
Ref: https://blog.feathersjs.com/http-vs-websockets-a-performance-comparison-da2533f13a77
socket.io scales better, and has better performance, than any polling HTTP request mechanism. When working well, it will also have faster response times than 3 seconds - it may not seem long, but actually it may be noticeable to users.
If your chat app is for a low number of users, then a polling mechanism is easier to implement and should work just fine.
If you intend to scale your application to a large number of users, you will need socket.io or a similar subscribe/push mechanism to connected clients.

How are Node.js+Socket.io+MongoDB webapps truly asynchronous?

I have a good old-style LAMP webapp. A week ago I needed to add a push notification mechanism to it.
Therefore, what I did was to add node.js+socket.io on the server and poll the MySQL database every 10 seconds using node.js to check whether there were new items: if so, I would have sent them to the client(s) with socket.io.
I was pretty happy with the result, even if that is not a proper realtime notification (as there is a lag of up to 10 secs).
Now, I am about to build a new webapp which will need push notifications, too. I am wondering whether to go with the same approach as the first one (that I believe is more stable and mature) or to go totally Node.js, without PHP and Apache. As for the database, I have already decided to go for MongoDB.
Finally, my question is: if I go for Node.js+Socket.io+MongoDB will I get a truly near-real-time webapp? I mean, as soon as a new record is inserted into MongoDB, will there be some sort of event triggered that I can catch via node.js, do some checking on it and, if relevant, send the notification to the client? Or will there be anyway some sort of polling on the db server-side and lag, as with my first LAMP webapp?
A related question: can you build a realtime webapp on MySQL without doing any polling as I did with my first app. Or do you need MongoDB (or Redis)?
I hope this question is not too silly - sorry, I am just starting with Node.js and co.
Thanks.
I understand your problem because I switched to node.js from php/apache/mysql too.
Generally node.js is stable, modules and your scripts are the main reasons for errors
Real-time has nothing to do with database, it's all about client and server, you can query as many data as you want in your requests and push it to the other client.
Choosing node.js is very wise but it's harder to implement.
When you insert a new record to your db, the event is the request itself, you will make a push event along with the database query something like:
// Please note this is not real code, just an example of the idea
app.get('/query', function(request, response){
// Query your database
db.query('SELECT * FROM users', function(rows){
// Push notification to dan
socket.emit('database_query_executed', 'to_dan', rows);
// End request
response.end('success');
})
})
Of course you can use MySQL! And any database you want, as I said real-time has nothing to do with databases because the database is in the middle of the process and it's totally optional.
If you want to use node.js for push notifications and php/apache for mysql then you will need to create 2 requests for each server something like:
// this is javascript
ajax('http://node.yoursite.com/push', node_options)
ajax('http://php.yoursite.com/mysql_query', php_options)
or if you want just one request, or you want to use a form, you can call your php and inside php you can create an http or net request to node.js from php, something like:
// this is php
new HttpRequest('http://node.youtsite.com/push', HttpRequest::METH_GET);
Using:
A regular MongoDB Collection as the Store,
A MongoDB Capped Collection with Tailable Cursors as the Queue,
A Node worker with Socket.IO watching the Queue as the Worker,
A Node server to serve the page with the Socket.IO client, and to receive POSTed data (or however else the data gets added) as the Server
It goes like:
The new data gets sent to the Server,
The Server puts the data in the Store,
The Server adds the data's ObjectID to the Queue,
The Queue will send the newly arrived ObjectID to the open Tailable Cursor on the Worker,
The Worker goes and gets the actual data in the ObjectID from the Store,
The Worker emits the data through the socket,
The client receives the data from the socket.
This is 'push' from the initial addition of the data all the way to receipt at the client - no polling, so as real-time as you can get given the processing time at each step.
Re: triggers in MongoDB - please see this answer: https://stackoverflow.com/a/12405093/1651408
There are much more convenient triggers in MySQL, but to call Node.js from them would require a bit of work with MySQL UDFs (user-defined functions), for instance pushing data through a Unix socket. Please note that this is necessary only when other applications (besides your Node.js process) are updating the database, and be sure to choose InnoDB as storage in this case (row- vs. table-level locking).
Can see no big problem with your technology choice of sockets.io, even if client-side web sockets aren't supported, you'll fall back (gracefully, I hope) to polling.
Finally, your question is not silly at all, since push technology is definitely superior to the flood of polling requests - it scales better. EDIT: However, would not describe either technology as real-time.
Another EDIT: for a quite well-known and successful setup of this kind please read this: http://blog.fogcreek.com/the-trello-tech-stack/
Have you discovered Chole? It works separately from your web sever and interfaces with it by using HTTP POSTs. That way you can code your web app any which way you want.
Actually Using Push Technology like Socket.IO helps you to use
the server's resource efficiently and also helps you to leverage old browsers to modern browsers making websocket or websocket-like connection.
10 sec polling is a HTTP request which is expensive especially when a lot of users present.
Unlike polling technology, push technology is relatively cheap. Users' client is opening a dedicated socket(ie. websocket) to listen to the server's push notification.
And usually your client-side JavaScript do some actions when the push notification is received.
Using your LAMP stack and Socket.IO with different port (other than 80) will be good enough to implement what you need.
But using Node.js + MongoDB + Socket.IO actually helps you to manage your server's resource much efficiently.
Because those three have non-blocking nature.
If you understand non-blocking concept correctly and implement your app appropriately,
your identical app, an app with same feature but with different language and different database, would be able to handle a lot more requests than general LAMP stack.
Above picture is a famous chart of comparing Non-blocking vs Thread way to handle concurrency
Apache(Thread) vs Nginx(Non-blocking)
MySQL is a great database. I believe you won't need join and transactions for realtime notification.
MongoDB does not have those two features unless you implement similar features by yourself.
Because of not having those two and some characteristics of its own, MongoDB can store and fetch data much faster than traditional SQL databases.
Switching from MySQL to MongoDB will decrease the time taking to insert and fetch data.
with JS you can open a socket to your server (not old browser), the server will have a ah-hoc program (on an ad-hoc port, so you need the permission to open door and run program on your server) that will send data (almost) realtime from and to the client, and without the HTTP's protocol overhead.old browser will just fall-back to polling mechanism.
I can't see other way to do this (probably there are already "coocked" framework that do this)

RESTful backend and socket.io to sync

Today, i had the idea of the following setup. Create a nodejs server along with express and socket.io. With express, i would create a RESTful API, which is connected to a mongo. BackboneJS or similar would connect the client to that REST API.
Now every time the mongodb(ie the data in it iam interested in) changes, socket.io would fire an event to the client, which would carry a courser to the data which has changed. The client then would trigger the appropriate AJAX requests to the REST to get the new data, where it needs it.
So, the socket.io connection would behave like a synchronize trigger. It would be there for the entire visit and could also manage sessions that way. All the payload would be send over http.
Pros:
REST API for use with other clients than web
Auth could be done entirely over socket.io. Only sending token along with REST requests.
Use the benefits of REST.
Would also play nicely with pub/sub service like Redis'
Cons:
Greater overhead, than using pure socket.io.
What do you think, are there any great disadvantages i did not think of?
I agree with #CharlieKey, you should send the updated data rather than re-requesting.
This is exactly what Tower is doing:
save some data: https://github.com/viatropos/tower/blob/development/src/tower/model/persistence.coffee#L77
insert into mongodb (cursor is a query/persistence abstraction): https://github.com/viatropos/tower/blob/development/src/tower/model/cursor/persistence.coffee#L29
notify sockets: https://github.com/viatropos/tower/blob/development/src/tower/model/cursor/persistence.coffee#L68
emit updated records to client: https://github.com/viatropos/tower/blob/development/src/tower/server/net/connection.coffee#L62
The disadvantage of using sockets as a trigger to re-request with Ajax is that every connected client will have to fetch the data, so if 100 people are on your site there's going to be 100 HTTP requests every time data changes - where you could just reuse the socket connections.
I think that pushing the updated data with the socket.io event would be better than re-requesting the lastest. Even better you could only push the modified pieces of data decreasing the amount of data sent over the line. Overall though a interesting idea.
I'd look into Now.js since it does pretty much exactly what you need.
It creates a namespace which is shared among the client and server. The server can call functions on the client directly and vice versa.
That is if you insist on your current infrastructure decision to use MongoDB and Node.js, otherwise there would be CouchDB which is a full web server and document database with sophisticated replication mechanisms built-in.

Resources