Node js read/write concurrency with mongoose/mongodb - node.js

I'm developing an API for sending SMS with an Http request. I use node js and mongoose. So I have a problem like the one with multi thread application.
The fact is that when a user send a sms, I verify the number of sms he has already sent in database (using mongoose) and if the number doesn't exceed a limit his sms is sent and the number of sms he has sent is increment in the database (there is a value for the number of sms he has sent in the hour,day,week and month in the schema). But the fact is that I use a callbacks for the process of read value and increment value and many other operation in my code.
So the problem (I think) is that when user send requests very quickly the server different callbacks read the same count of the sms sent, authorize user to sent sms, increment and save the same value so that the count of sms is false.
In a multi thread application that access to a variable the solution would be to prevent other threads to read a variable before the actual thread has done all of it works.
With Node js event system and access to data in mongoDB I just don't know how to solve my problem.
Thank you in advance for the answers.
PS: I don't know the solution but it will be good if it works also with clusters that allow node js to use multi core.

I think you should try some cache approach.
now I meet same situation with you.
I will try to use cache to store the record_id that is in process.
When new request come, the coming process need check cache. If the record_id is in cache that means that record is using by other thread. So that thread need wait or do something else until finish. And when the process finish that will remove the record_id in cache in callback function

Thanks Cristy, I have solved the main part of my problem using async queue.
My application works well when I run it the default way of node js.
But there is an other problem. I intend to run my code on a server that has 4 cores so I want to use the node cluster module. But when I used this... because it runs code like 4 differents process (I used a server with 4 cores) they use differents queues and the error I mention earlier always occured, they read and write to the database without waiting for other thread to finish processing verifications + update.
So I would like to know what should I do to have an optimal and fast application.
Should I stop to use the cluster module and don't take benefit of multi core server (I don't think it is the best answer)?
Should I store it in my mongodb (maybe try to not persist the queue but store it in the memory in other to make it faster) ?
Is there a way to share the queue in the code when I use cluster?
What is my best choice?

Related

Is it possible to force Node.js/Express to process requests sequentially?

I took over a project where the developers were not fully aware of how Node.js works, so they created code accessing MongoDB with Mongoose which would leave inconsistent data in the database whenever you had any concurrent request reaching the same endpoint / modifying the same data. The project uses the Express web framework.
I already instructed them to implement a fix for this (basically, to use Mongoose transaction support with automatically managed retriable transactions), but due to the size of the project they will take a lot of time to fix it.
I need to put this in production ASAP, so I thought I could try to do it if I'm able to guarantee sequential processing of the incoming requests. I'm completely aware that this is a bad thing to do, but it would be just a temporary solution (with a low count of concurrent users) until a proper fix is in place.
So is there any way to make Node.js to process incoming requests in a sequential manner? I just basically don't want code from different requests to run interleaved, or putting it another way, I don't want non-blocking operations (.then()/await) to yield to another task and instead block until the asynchronous operation ends, so every request is processed entirely before attending another request.
I have an NPM package that can do this: https://www.npmjs.com/package/async-await-queue
Create a queue limited to 1 concurrent user and enclose the code that calls Mongo in wait()/end()
Or you can also use an async mutex, there are a few NPM packages as well.

nodejs - run a function at a specific time

I'm building a website that some users will enter and after a specific amount of time an algorithm has to run in order to take the input of the users that is stored in the database and create some results for them storing the results also in the database. The problem is that in nodejs i cant figure out where and how should i implement this algorithm in order to run after a specific amount of time and only once(every few minutes or seconds).
The app is builded in nodejs-expressjs.
For example lets say that i start the application and after 3 minutes the algorithm should run and take some data from the database and after the algorithm has created some output stores it in database again.
What are the typical solutions for that (at least one is enough). thank you!
Let say you have a user request that saves url to crawl and get listed products
So one of the simplest ways would be to:
On user requests create in DB "tasks" table
userId | urlToCrawl | dateAdded | isProcessing | ....
Then in node main site you have some setInterval(findAndProcessNewTasks, 60000)
so it will get all tasks that are not currently in work (where isProcessing is false)
every 1 min or whatever interval you need
findAndProcessNewTasks
will query db and run your algorithm for every record that is not processed yet
also it will set isProcessing to true
eventually once algorithm is finished it will remove the record from tasks (or mark some another field like "finished" as true)
Depending on load and number of tasks it may make sense to process your algorithm in another node app
Typically you would have a message bus (Kafka, rabbitmq etc.) with main app just sending events and worker node.js apps doing actual job and inserting products into db
this would make main app lightweight and allow scaling worker apps
From your question it's not clear whether you want to run the algorithm on the web server (perhaps processing input from multiple users) or on the client (processing the input from a particular user).
If the former, then use setTimeout(), or something similar, in your main javascript file that creates the web server listener. Your server can then be handling inputs from users (via the app listener) and in parallel running algorithms that look at the database.
If the latter, then use setTimeout(), or something similar, in the javascript code that is being loaded into the user's browser.
You may actually need some combination of the above: code running on the server to periodically do some processing on a central database, and code running in each user's browser to periodically refresh the user's display with new data pulled down from the server.
You might also want to implement a websocket and json rpc interface between the client and the server. Then, rather than having the client "poll" the server for the results of your algorithm, you can have the client listen for events arriving on the websocket.
Hope that helps!
If I understand you correctly - I would just send the data to the client-side while rendering the page and store it into some hidden tag (like input type="hidden"). Then I would run a script on the server-side with setTimeout to display the data to the client.

Nodejs - High Traffic to Clustered Microservices Problems

Sorry for the novel...
I'm working on a Nodejs project where I need to decrypt millions of envelopes in multiple files. Any APIs of my application have to run on localhost.
The main API handles client requests to decrypt a batch of files. Each file contains thousands to millions of envelopes that need to be decrypted. Each file is considered a job and these jobs are queued up by the Main API and then run concurrently by forking a new process for each job. (I only allow 5 concurrent jobs/forks at one time) In each process, a script runs that goes through and decrypts the file.
This runs relatively quickly but instead of doing the decryption in the code of each process/script forked by the Main API, I want to dish this work out to another API (call it Decrypt API) that basically takes the envelope in the request and sends back the decrypted result in the response.
So I created this api and then used 'forky' to cluster it. Then from my processes, instead of doing the decryption in those, I makes multiple parallel requests to the Decrypt API and once I get the responses back just place the decrypted results in a file.
At first my problem was that I made requests right as I got each envelope without waiting for a request to return before sending the next one. I would basically send "parallel" requests if you will, and then just handle the vote in the callback of each request. This led to what I think is too many outstanding reqs at one time because I was getting an ECONNRESET error. Some requests were dropped. So my solution was to have a maximum of x outstanding reqs(I used 10) at any one time to avoid too many concurrent reqs. This seemed ok but then I realized since I was forking 5 processes from the MainAPI, and although each one had this new 'outstanding reqs' limiting code, since they were running concurrently I was still running into the problem of too many reqs at once to the Decrypt API. Also, this method of using two different microservices/APIs is slower than just having the MainAPI's forked processses just do the decryption. In the Decrypt API I'm also using the node 'crypto' library and some of those functions that I use are synchronous so I suspect that with high traffic that's a problem, but I can't avoid those sync methods.
So finally, my question is, what can I do to increase the speed of the Decrypt API with high traffic like I described, and what can I do to avoid these dropped requests?
Forgive me if I sound like a noob, but since these APIs are all running on the same machine and localhost, could this be why this method is slower than just doing decryption in each process?
Thanks!

Redis and Node.js and Socket.io Questions

I have been just learning redis and node.js There are two questions I have for which I couldn't find any satisfying answer.
My first question is about reusing redis clients within the node.js. I have found this question and answer: How to reuse redis connection in socket.io? , but it didn't satisfy me enough.
Now, if I create the redis client within the connection event, it will be spawned for each connection. So, if I have 20k concurrent users, there will be 20k redis clients.
If I put it outside of the connection event, it will be spawned only once.
The answer is saying that he creates three clients for each function, outside of the connection event.
However, from what I know MySQL that when writing an application which spawns child processes and runs in parallel, you need to create your MySQL client within the function in which you are creating child instances. If you create it outside of it, MySQL will give an error of "MySQL server has gone away" as child processes will try to use the same connection. It should be created for each child processes separately.
So, even if you create three different redis clients for each function, if you have 30k concurrent users who send 2k messages concurrently, you should run into the same problem, right? So, every "user" should have their own redis client within the connection event. Am I right? If not, how node.js or redis handles concurrent requests, differently than MySQL? If it has its own mechanism and creates something like child processes within the redis client, why we need to create three different redis clients then? One should be enough.
I hope the question was clear.
-- UPDATE --
I have found an answer for the following question. http://howtonode.org/control-flow
No need to answer but my first question is still valid.
-- UPDATE --
My second question is this. I am also not that good at JS and Node.js. So, from what I know, if you need to wait for an event, you need to encapsulate the second function within the first function. (I don't know the terminology yet). Let me give an example;
socket.on('startGame', function() {
getUser();
socket.get('game', function (gameErr, gameId) {
socket.get('channel', function (channelErr, channel) {
console.log(user);
client.get('games:' + channel + '::' + gameId + ':owner', function (err, owner) { //games:channel.32:game.14
if(owner === user.uid) {
//do something
}
});
}
});
});
So, if I am learning it correctly, I need to run every function within the function if I need to wait I/O answer. Otherwise, node.js's non-blocking mechanism will allow the first function to run, in this case it will get the result in parallel, but the second function might not have the result if it takes time to get. So, if you are getting a result from redis for example, and you will use the result within the second function, you have to encapsulate it within the redis get function. Otherwise second function will run without getting the result.
So, in this case, if I need to run 7 different functions and the 8. function will need the result of all of them, do I need to write them like this, recursively? Or am I missing something.
I hope this was clear too.
Thanks a lot,
So, every "user" should have their own redis client within the connection event.
Am I right?
Actually, you are not :)
The thing is that node.js is very unlike, for example, PHP. node.js does not spawn child processes on new connections, which is one of the main reasons it can easily handle large amounts of concurrent connections, including long-lived connections (Comet, Websockets, etc.). node.js processes events sequentially using an event queue within one single process. If you want to use several processes to take advantage of multi-core servers or multiple servers, you will have to do it manually (how to do so is beyond the scope of this question, though).
Therefore, it is a perfectly valid strategy to use one single Redis (or MySQL) connection to serve a large quantity of clients. This avoids the overhead of instantiating and terminating a database connection for each client request.
So, every "user" should have their own redis client within the
connection event. Am I right?
You shouldn't make a new Redis client for each connected user, that's not the proper way to do it. Instead just create 2-3 clients max and use them.
For more information checkout this question:
How to reuse redis connection in socket.io?
As for the first question:
The "right answer" might make you think you are good with one Connection.
In reality, whenever you are doing something that is waiting on an IO, a timer, etc, you are actually making node run the waiting method on the queue. Hence, if you use only 1 single connection, you will actually limit the performance of the thread you working on ( a single CPU) to the speed of redis - which is probably a few hundreds of callbacks per second (non-redis waiting callbacks will still go on) - while this is not poor performance, there's no reason to create this kind of limitation. It is recommended to create a few (5-10) connections to avoid this issue in it's entire. This number goes up for slower databases, e.g. MySQL, but is dependant on the type of queries and the code specifics.
Do note, that you should run a few workers on your server, per the number of CPUs you have, for best performance.
In regards to the 2nd Question:
It is a much better practice, to name the functions, one after the other, and use the names in the code rather than defining it as you go. In some situations, it will reduce memory consumption.

nodeJS multi node Web server

I need to create multi node web server that will be allow to control number of nodes in real time and change process UID and GUID.
For example at start server starts 5 workers and pushes them into workers pool.
When the server gets the new request it searches for free workers, sets UID or GUID if needed, and gives it the request to proces. In case if there is no free workers, server will create new one, set GUID or UID, also pushes it into pool and so on.
Can you suggest me how it can be implemented?
I've tried this example http://nodejs.ru/385 but it doesn't allow to control the number of workers, so I decided that there must be other solution but I can't find it.
If you have some examples or links that will help me to resolve this issue write me please.
I guess you are looking for this: http://learnboost.github.com/cluster/
I don't think cluster will do it for you.
What you want is to use one process per request.
Have in mind that this can be very innefficient, and node is designed to work around those types of worker processing, but if you really must do it, then you must do it.
On the other hand, node is very good at handling processes, so you need to keep a process pool, which is easily accomplished by using node internal child_process.spawn API.
Also, you will need a way for you to communicate to the worker process.
I suggest opening a unix-domain socket and sending the client connection file descriptor, so you can delegate that connection into the new worker.
Also, you will need to handle edge-cases for timeouts, etc.
https://github.com/pgte/fugue I use this.

Resources