Concurrent requests overriding data in Redis - node.js

Scenarios: When ever a request comes I need to connect to Redis instance, open the connection, fetch the count, update the count and close the connect(For every request this is the flow).When the requests are coming in sequential order i.e. 1 user sending 100 requests one after the other then the count in Redis is 100.
Issue: Issue is when concurrent requests comes. i.e. 10 users sending 100 requests(each user 10 requests) concurrently then the count is not 100 its around 50.
Example: Assume count in Redis is 0. If 10 requests comes at the same time then 10 connections will be opened and all the 10 connections will fetch the count value as 0 and updated it to 1.
Analysis: I found out that, as the requests are coming concurrently, multiple connections are fetching the same count value and updating it because of it the count value is getting overridden. Can anyone suggest a best way to avoid this problem if you have already encountered this problem.
Here we are using Hapijs, Redis 3.0, ioredis

I would recommend queueing each task so that each request finishes before the next one starts.
Queue.js is a good library I have used before but you can check out others if you want.
Here is an example basically from the docs but adapted slightly for your use case:
var queue = require('../')
var q = queue()
var results = []
var rateLimited = false
q.push(function (cb) {
if(!rateLimited){
// get data and push into results
results.push('two')
}
cb()
})
q.start(function (err) {
if (err) throw err
console.log('all done:', results)
})
This is a very loose example as I just wrote it quickly and without seeing your code base but I hope you get the idea.

Related

Batch requests and concurrent processing

I have a service in NodeJS which fetches user details from DB and sends that to another application via http. There can be millions of user records, so processing this 1 by 1 is very slow. I have implemented concurrent processing for this like this:
const userIds = [1,2,3....];
const users$ = from(this.getUsersFromDB(userIds));
const concurrency = 150;
users$.pipe(
switchMap((users) =>
from(users).pipe(
mergeMap((user) => from(this.publishUser(user)), concurrency),
toArray()
)
)
).subscribe(
(partialResults: any) => {
// Do something with partial results.
},
(err: any) => {
// Error
},
() => {
// done.
}
);
This works perfectly fine for thousands of user records, it's processing 150 user records concurrently at a time, pretty faster than publishing users 1 by 1.
But problem occurs when processing millions of user records, getting those from database is pretty slow as result set size also goes to GBs(more memory usage also).
I am looking for a solution to get user records from DB in batches, while keep on publishing those records concurrently in parallel.
I thinking of a solution like, maintain a queue(of size N) of user records fetched from DB, whenever queue size is less than N, fetch next N results from DB and add to this queue.
Then the current solution which I have, will keep on getting records from this queue and keep on processing those concurrently with defined concurrency. But I am not quite able to put this in code. Is there are way we can do this using RxJS?
I think your solution is the right one, i.e. using the concurrent parameter of mergeMap.
The point that I do not understand is why you are adding toArray at the end of the pipe.
toArray buffers all the notifications coming from upstream and will emit only when the upstream completes.
This means that, in your case, the subscribe does not process partial results but processes all of the results you have obtained executing publishUser for all users.
On the contrary, if you remove toArray and leave mergeMap with its concurrent parameter, what you will see is a continuous flow of results into the subscribe due to the concurrency of the process.
This is for what rxjs is concerned. Then you can look at the specific DB you are using to see if it supports batch reads. In which case you can create buffers of user ids with the bufferCount operator and query the db with such buffers.

will I hit maximum writes per second per database if I make a document using Promise.all like this?

I am now developing an app. and I want to send a message to all my users inbox. the code is like this in my cloud functions.
const query = db.collection(`users`)
.where("lastActivity","<=",now)
.where("lastActivity",">=",last30Days)
const usersQuerySnapshot = await query.get()
const promises = []
usersQuerySnapshot.docs.forEach( userSnapshot => {
const user = userSnapshot.data()
const userID = user.userID
// set promise to create data in user inbox
const p1 = db.doc(`users/${userID}/inbox/${notificationID}`).set(notificationData)
promises.push(p1)
})
return await Promise.all(promises)
there is a limit in Firebase:
Maximum writes per second per database 10,000 (up to 10 MiB per
second)
say if I send a message to 25k users (create a document to 25K users),
how long the operations of that await Promise.all(promises) will take place ? I am worried that operation will take below 1 second, I don't know if it will hit that limit or not using this code. I am not sure about the operation rate of this
if I hit that limit, how to spread it out over time ? could you please give a clue ? sorry I am a newbie.
If you want to throttle the rate at which document writes happen, you should probably not blindly kick off very large batches of writes in a loop. While there is no guarantee how fast they will occur, it's possible that you could exceed the 10K/second/database limit (depending on how good the client's network connection is, and how fast Firestore responds in general). Over a mobile or web client, I doubt that you'll exceed the limit, but on a backend that's in the same region as your Firestore database, who knows - you would have to benchmark it.
Your client code could simply throttle itself with some simple logic that measures its progress.
If you have a lot of documents to write as fast as possible, and you don't want to throttle your client code, consider throttling them as individual items of work using a Cloud Tasks queue. The queue can be configured to manage the rate at which the queue of tasks will be executed. This will drastically increase the amount of work you have to do to implement all these writes, but it should always stay in a safe range.
You could use e.g. p-limit to reduce promise concurrency in the general case, or preferably use batched writes.

How to know how many requests to make without knowing amount of data on server

I have a NodeJS application where I need to fetch data from another server (3rd-party, I have no control over it). The server requires you to specify a max number of entries to return, along with an offset. So for example if there are 100 entries on the server, I could request a pageSize of 100 and offset of 0, or pageSize of 10, and do 10 requests with offset 1,2,3, etc. and do a Promise.all (doing multiple concurrent smaller requests is faster when timing it).
var pageSize = 100;
var offsets = [...Array(totalItems / pageSize).keys()];
await Promise.all(offsets.map(async i => //make request with pageSize and offset));
The only problem is that the number of entries changes, and there is no property returned by the server indicating the total number of items. I could do something like this and while loop until the server comes back empty:
var offset = 0;
var pageSize = 100;
var data = [];
var response = await //make request with pageSize and offset
while (response is not empty){
data.push(response);
offset++;
//send another request
But that isn't as efficient/quick as sending multiple concurrent requests like above.
Is there any good way around this that can deal with the dynamic length of the data on the server?
Without the server giving you some hints about how many items there are, there's not a lot you can do to parallelize multiple requests as you don't really want to send more requests than are needed and you don't want to artificially make your requests for smallish number of items just so you can run more requests in parallel.
You could run some tests and find some practical limits. What are the maximum number of items that the server and your client seem to be OK with you requesting (100? 1000? 10,000? 100,000?) and just request that many to start with. If it indicates there are more after that, then send another request of a similar size.
The main idea with this is to minimize the number of separate requests and maximize the data you can get in a single call. That should be more efficient than more parallel requests, each requesting fewer items, because its ultimately the same server on the other end and same data store that has to provide all the data so the fewest roundtrips in the fewest separate requests is probably the best.
But, some of this is dependent upon the scale and architecture of the target host so experiments will be required to see what practically works best.

Firebase RTDB batched transactions (increment 2 values at the same time)

I am aware that you can do batched, atomic all or nothing updates using update - but can you do the same thing with a transaction?
Currently I am trying to increment a users friend count (2 users) at the same time when the friend request is accepted.
Here is what I am doing which works, but if something goes wrong it will lead to bad data inconsistencies which came about a couple times.
const upOneFriend = firebase
.database()
.ref("users")
.child(friend.uid)
.child("friendCount");
const upOneCurrentUser = firebase
.database()
.ref("users")
.child(userUid)
.child("friendCount");
upOneFriend
.transaction(currentCount => {
return currentCount + 1;
})
.then(() => {
upOneCurrentUser.transaction(currentCount2 => {
return currentCount2 + 1;
});
})
.catch(() => {
console.log("error increment");
});
Like I said, is works, but I need to do this at the same time! I have looked around and have not found anything related to batch transactions for the Realtime Database.
Cheers.
Transactions in Firebase Realtime Database work on a single node. If you need to update multiple nodes in a transaction, you'll need to run the transaction on the first common node above the ones you're looking to update. In your scenario that'd mean you run the transaction across users, which would probably significantly reduce throughput.
An alternative would be to use a multi-location update. But since a multi-location update doesn't auto-protect against concurrent writes, you'd have to include the information to protect against that in the write itself.
For an example of this, see my answer here: Is the way the Firebase database quickstart handles counts secure?, and How to update multiple children under child node using a transaction?

How to fix a race condition in node js + redis + mongodb web application

I am building a web application that will process many transactions a second. I am using an Express Server with Node Js. On the database side, I am using Redis to store attributes of a user which will fluctuate continuously based on stock prices. I am using MongoDB to store semi-permanent attributes like Order configuration, User configuration, etc.,
I am hitting a race condition when multiple orders placed by a user are being processed at the same time, but only one would have been eligible as a check on the Redis attribute which stores the margin would not have allowed both the transactions.
The other issue is my application logic interleaves Redis and MongoDB read + write calls. So how would I go about solving race condition across both the DBs
I am thinking of trying to WATCH and MULTI + EXEC on Redis in order to make sure only one transaction happens at a time for a given user.
Or I can set up a Queue on Node / Redis which will process Orders one by one. I am not sure which is the right approach. Or how to go about implementing it.
This is all pseudocode. Application logic is a lot more complex with multiple conditions.
I feel like my entire application logic is a critical section ( Which I think is a bad thing )
//The server receives a request from Client to place an Order
getAvailableMargin(user.username).then((margin) => { // REDIS call to fetch margin of user. This fluctuates a lot, so I store it in REDIS
if (margin > 0) {
const o = { // Prepare an order
user: user.username,
price: orderPrice,
symbol: symbol
}
const order = new Order(o);
order.save((err, o) => { // Create new Order in MongoDB
if (err) {
return next(err);
}
User.findByIdAndUpdate(user._id, {
$inc: {
balance: pl
}
}) // Update balance in MongoDB
decreaseMargin(user.username) // decrease margin of User in REDIS
);
}
});
Consider margin is 1 and with each new order margin decreases by 1.
Now if two requests are received simultaneously, then the margin in Redis will be 1 for both the requests thus causing a race condition. Also, two orders will now be open in MongoDB as a result of this. When in fact at the end of the first order, the margin should have become 0 and the second order should have been rejected.
Another issue is that we have now gone ahead and updated the balance for the User in MongoDB twice, one for each order.
The expectation is that one of the orders should not execute and a retry should happen by checking the new margin in Redis. And the balance of the user should also have updated only once.
Basically, would I need to implement a watch on both Redis and MongoDB
and somehow retry a transaction if any of the watched fields/docs change?
Is that even possible? Or is there a much simpler solution that I might be missing?

Resources