Is there a way to connect to a database via sockets and socket.io? - node.js

I am writing an application whereby some external module/component is updating a SQLite database with new data every few hundred milliseconds or so, and my job is to write an application that queries that data and broadcasts it over sockets every few hundred milliseconds as well.
So currently I'm doing something like this with node, express, and socket.io:
timer = setInterval(function() {
db.all('SELECT * FROM cache', function(err, rows) {
io.emit('data', rows);
});
},
400
);
But I feel like there should be a more direct approach to this, whereby I can maintain a socket connection directly to the database, and listen for changes "live", rather than having to do blind queries (even if the data may not have changed), and emit.
Maybe this is not supported by SQLite (which is fine, I think I have some flexibility in the storage system I'm using), but is what I'm asking at all possible?
Note that I don't have control over the database updating process, so I can't just emit the data I'm about to store in the database. That whole process is a black box C program and I ONLY have access to the database itself.

What you're looking for is commonly called pub/sub (short for publish and subscribe). Clients waiting for data connect to a server and subscribe to the sort of events they want to receive. The data originators also connect to this server and publish events. The RPC with events that Socket.IO gives you are really similar to this. The clients have set up handlers for certain types of events, and the server fires these events with the appropriate data.
The problem is, pub/sub isn't typically implemented in a database. (Redis is an exception.) SQLite certainly has no capability for this. Since you can't modify the original application and only have access to the file database, there is nothing you can do. What you need is to effectively make your server an adapter from polling the database to broadcasting messages.
I do see a problem though with your setup. The first is that you are querying the database every 400 milliseconds. Don't do that. What if your query takes 500 milliseconds? Now you have a second query piling up. What if those two queries are now slow because they are both attempting to run at the same time? Now you have 3, 4, 5, and then 100 queries piling up. Don't schedule your next query to run until one is done. Check out an implementation of throttle for this.
The next problem is that you are blindly sending out all of the results to the client every time. I don't know what your application does, but I'm guessing that there is a chance for overlap from the previous query. Does your database have columns with timestamps? You could modify your query to use them. Or, modify your application to filter them.

Related

Server constantly running a function to update a cache, will it block all other server functions?

About once a minute, I need to cache all orderbooks from various cryptocurrency exchanges. There are hundreds of orderbooks, so this update function will likely never stop running.
My question is: If my server is constantly running this orderbook update function, will it block all other server functionality? Will users ever be able to interact with my server?
Do I need to create a separate service to perform the updating, or can Node somehow prioritize API requests and pause the caching function?
My question is: If my server is constantly running this orderbook
update function, will it block all other server functionality? Will
users ever be able to interact with my server?
If you are writing asynchronously, these actions will go into your eventloop and your node server would pick next event from eventloop while these actions are being performed. If you have too many events like this, your event queue would be long and user would face really slow response or may even get a timeout
Do I need to create a separate service to perform the updating, or can
Node somehow prioritize API requests and pause the caching function?
Node only consumes event from the event queue. There are no priorities.
From the design perspective, you should look for options which can reduce this write load like bulkCreate/edit or if you are using redis for cache, consider redis pipeline
This is a very open ended question much of which depends on your system. In general your server should be able to handle concurrent requests, but there are some things to watch out for.
Performance costs. If the operation to retrieve and store data requires too much computational power, then it will cause strain on all requests processed by the server.
Database connections. The server spends a lot of time waiting for database queries to complete. If you have one database connection for the entire application, and this connection is busy, they will have to wait until the database connection is free. You may want to look into database connection 'pooling'.

nodejs - run a function at a specific time

I'm building a website that some users will enter and after a specific amount of time an algorithm has to run in order to take the input of the users that is stored in the database and create some results for them storing the results also in the database. The problem is that in nodejs i cant figure out where and how should i implement this algorithm in order to run after a specific amount of time and only once(every few minutes or seconds).
The app is builded in nodejs-expressjs.
For example lets say that i start the application and after 3 minutes the algorithm should run and take some data from the database and after the algorithm has created some output stores it in database again.
What are the typical solutions for that (at least one is enough). thank you!
Let say you have a user request that saves url to crawl and get listed products
So one of the simplest ways would be to:
On user requests create in DB "tasks" table
userId | urlToCrawl | dateAdded | isProcessing | ....
Then in node main site you have some setInterval(findAndProcessNewTasks, 60000)
so it will get all tasks that are not currently in work (where isProcessing is false)
every 1 min or whatever interval you need
findAndProcessNewTasks
will query db and run your algorithm for every record that is not processed yet
also it will set isProcessing to true
eventually once algorithm is finished it will remove the record from tasks (or mark some another field like "finished" as true)
Depending on load and number of tasks it may make sense to process your algorithm in another node app
Typically you would have a message bus (Kafka, rabbitmq etc.) with main app just sending events and worker node.js apps doing actual job and inserting products into db
this would make main app lightweight and allow scaling worker apps
From your question it's not clear whether you want to run the algorithm on the web server (perhaps processing input from multiple users) or on the client (processing the input from a particular user).
If the former, then use setTimeout(), or something similar, in your main javascript file that creates the web server listener. Your server can then be handling inputs from users (via the app listener) and in parallel running algorithms that look at the database.
If the latter, then use setTimeout(), or something similar, in the javascript code that is being loaded into the user's browser.
You may actually need some combination of the above: code running on the server to periodically do some processing on a central database, and code running in each user's browser to periodically refresh the user's display with new data pulled down from the server.
You might also want to implement a websocket and json rpc interface between the client and the server. Then, rather than having the client "poll" the server for the results of your algorithm, you can have the client listen for events arriving on the websocket.
Hope that helps!
If I understand you correctly - I would just send the data to the client-side while rendering the page and store it into some hidden tag (like input type="hidden"). Then I would run a script on the server-side with setTimeout to display the data to the client.

Cancel a running query

I have an application where users are running a geospatial query against a mongo database. The query can return many thousands of results (~50k). These results are then streamed to the client over a websocket. However, users can abort a request mid result set and execute a new query. Users will frequently start, abort, and re-start requests on the order of several times per minute. Sometimes they even cancel/restart every couple of seconds.
The question is, when a user aborts a request, how do I cancel the query on the server so it doesn't continue to tie up resources streaming back thousands of unneeded results? I'm currently calling destroy() on the cursor, but it's not clear that this is actually stopping the query from executing on the server.
What's the best practice in this case?
Have you tried this?
db.currentOp()
db.killOp(IDRETURNEDHE)
This is a good example.
The answer is it depends upon a lot of your implementation details.
If your server is in the middle of streaming results (e.g. still hasn't sent or queued everything) when the server receives some sort of other message that the previous results should be cancelled, then it is possible for you to communicate with that other stream and tell it to stop sending. How exactly you would do that depends entirely upon your code and you would have to show us your code for us to know.
Chances are the db query is long since complete and what is going on is the server is in the process of streaming results to the client. So, if that's the case, then it isn't the db you're looking for, it's the code that streams the response to the client. Since node.js JS is single threaded, the only time another request would actually get run on the server would be while the streaming code was in some async write operation, waiting for that to finish. You would probably have to set some flag that was uniquely associated with a particular user and then your stream code would have to check for that flag before each chunk of data was sent. If it saw the cancel flag, it could abandon sending the rest of the results.
You could make things more cancellable by explicitly chunking your results (say 500 at a time) and checking for a cancel flag between the sending of each chunk.
If, on the other hand, all the data has already been buffered up by the TCP layer on the server, then the only way to stop that from being sent is to tear down the webSocket and force the client to reconnect.

How can I "break up" a long running server side function in a Meteor app?

I have, as part of a meteor application, a server side that gets POST messages of information to feed to the web client via inserts/updates to a Collection. So far so good. However, sometimes these updates can be rather large (50K records a go, every 5 seconds). I was having a hard time keeping up to this until I started using batch-insert package and then low-level batch.find.update() and batch.execute() from Mongo.
However, there is still a good amount of processing going on even with 50K records (it does some calculations, analytics, etc). I would LOVE to be able to "thread" that logic so the main event loop can continue along. However, I am not sure there is a real easy way to create "real" threads for this within Meteor. So baring that, I would like to know the best / proper way of at least "batching" the work so that every N (say 1K or so) records I can release the event loop back to process other events (like some client side DDP messages and the like). Then do another 1K records, etc. until however many records as I need are done.
I am THINKING the solution lies within using Fibers/Futures -- which appear to be the Meteor way -- but I am not positive that is correct or the low level ideas like "setTimeout()" and/or "setImmediate()" are more appropriate.
TIA!
Meteor is not a one size fits all tool. I think you should decouple your meteor application from your batch processing. Set up a separate meteor instance, or better yet set up a pure node.js server to handle these requests and batch processes. It would look like this:
Create a node.js instance that connects to the same mongo database using the mongodb plugin (https://www.npmjs.com/package/mongodb).
Use express if you're using node.js to handle the post requests (https://www.npmjs.com/package/express).
Do the batch processing/inserts/updates in this instance.
The updates in mongo will be reflected in meteor very quickly. I had a similar situation and used a node server to do some batch data collection and then pass it into a cassandra database. I then used pig latin to run some batch operations on that data, and then inserted it into mongo. My meteor application would reactively display the new data pretty much instantaneously.
You can call this.unblock() inside a server method to allow the code to run in the background, and immediately return from the method. See example below.
Meteor.methods({
longMethod: function() {
this.unblock();
Meteor._sleepForMs(1000 * 60 * 60);
}
});

Redis and Node.js and Socket.io Questions

I have been just learning redis and node.js There are two questions I have for which I couldn't find any satisfying answer.
My first question is about reusing redis clients within the node.js. I have found this question and answer: How to reuse redis connection in socket.io? , but it didn't satisfy me enough.
Now, if I create the redis client within the connection event, it will be spawned for each connection. So, if I have 20k concurrent users, there will be 20k redis clients.
If I put it outside of the connection event, it will be spawned only once.
The answer is saying that he creates three clients for each function, outside of the connection event.
However, from what I know MySQL that when writing an application which spawns child processes and runs in parallel, you need to create your MySQL client within the function in which you are creating child instances. If you create it outside of it, MySQL will give an error of "MySQL server has gone away" as child processes will try to use the same connection. It should be created for each child processes separately.
So, even if you create three different redis clients for each function, if you have 30k concurrent users who send 2k messages concurrently, you should run into the same problem, right? So, every "user" should have their own redis client within the connection event. Am I right? If not, how node.js or redis handles concurrent requests, differently than MySQL? If it has its own mechanism and creates something like child processes within the redis client, why we need to create three different redis clients then? One should be enough.
I hope the question was clear.
-- UPDATE --
I have found an answer for the following question. http://howtonode.org/control-flow
No need to answer but my first question is still valid.
-- UPDATE --
My second question is this. I am also not that good at JS and Node.js. So, from what I know, if you need to wait for an event, you need to encapsulate the second function within the first function. (I don't know the terminology yet). Let me give an example;
socket.on('startGame', function() {
getUser();
socket.get('game', function (gameErr, gameId) {
socket.get('channel', function (channelErr, channel) {
console.log(user);
client.get('games:' + channel + '::' + gameId + ':owner', function (err, owner) { //games:channel.32:game.14
if(owner === user.uid) {
//do something
}
});
}
});
});
So, if I am learning it correctly, I need to run every function within the function if I need to wait I/O answer. Otherwise, node.js's non-blocking mechanism will allow the first function to run, in this case it will get the result in parallel, but the second function might not have the result if it takes time to get. So, if you are getting a result from redis for example, and you will use the result within the second function, you have to encapsulate it within the redis get function. Otherwise second function will run without getting the result.
So, in this case, if I need to run 7 different functions and the 8. function will need the result of all of them, do I need to write them like this, recursively? Or am I missing something.
I hope this was clear too.
Thanks a lot,
So, every "user" should have their own redis client within the connection event.
Am I right?
Actually, you are not :)
The thing is that node.js is very unlike, for example, PHP. node.js does not spawn child processes on new connections, which is one of the main reasons it can easily handle large amounts of concurrent connections, including long-lived connections (Comet, Websockets, etc.). node.js processes events sequentially using an event queue within one single process. If you want to use several processes to take advantage of multi-core servers or multiple servers, you will have to do it manually (how to do so is beyond the scope of this question, though).
Therefore, it is a perfectly valid strategy to use one single Redis (or MySQL) connection to serve a large quantity of clients. This avoids the overhead of instantiating and terminating a database connection for each client request.
So, every "user" should have their own redis client within the
connection event. Am I right?
You shouldn't make a new Redis client for each connected user, that's not the proper way to do it. Instead just create 2-3 clients max and use them.
For more information checkout this question:
How to reuse redis connection in socket.io?
As for the first question:
The "right answer" might make you think you are good with one Connection.
In reality, whenever you are doing something that is waiting on an IO, a timer, etc, you are actually making node run the waiting method on the queue. Hence, if you use only 1 single connection, you will actually limit the performance of the thread you working on ( a single CPU) to the speed of redis - which is probably a few hundreds of callbacks per second (non-redis waiting callbacks will still go on) - while this is not poor performance, there's no reason to create this kind of limitation. It is recommended to create a few (5-10) connections to avoid this issue in it's entire. This number goes up for slower databases, e.g. MySQL, but is dependant on the type of queries and the code specifics.
Do note, that you should run a few workers on your server, per the number of CPUs you have, for best performance.
In regards to the 2nd Question:
It is a much better practice, to name the functions, one after the other, and use the names in the code rather than defining it as you go. In some situations, it will reduce memory consumption.

Resources