Redis zrangebyscore and zincrby under high concurrency - node.js

I am facing concurrency problem with redis, my API is build on Nodejs Fastify and i am using fastify-redis in my API call.
I am using two simple methods of redis ZRANGEBYSCORE and ZINCRBY
The problem is under high concurrency ZINCRBY is executed late which results into giving me same value on multiple request.
How can i prevent this under high concurrency is there any method to lock the KEY which was previously executed.
Here is the example of my code
numbers = await redis.zrangebyscore(
`user:${req.query.key}:${state}`, //key
0, // min value
50, // max value
"LIMIT",
0, // offset
1 // limit
);
if (numbers.length > 0) {
await redis.zincrby(`user:${req.query.key}:${state}`, 1, numbers[0]);
res.send(numbers[0]);
}

The issue isn't concurrency per se, it's that you have a series of operations that need to be atomic and you haven't done anything to ensure that.
Redis has facilities to try to ensure atomicity, as described here. In your case, since the value from the first operation is used in the second, you couldn't do a simple MULTI and EXEC. You'd have to instead WATCH the key and then retry the operation if it aborted.
The simpler, and recommended approach, though, is just to put your above code into a Lua script, where it can be executed on the server as a single atomic operation.

Related

When should I use worker-threads?

I am currently working on a backend which provides rest endpoints for my frontend with nestjs. In some endpoints I receive e.g. an array of elements which I need to process.
Concrete Example:
I receive an array of 50 elements. For each element I need to make a SQL request. Therefore I need to loop over the array and do stuff in SQL.
I always ask myself: At what amount of elements should I use for example worker threads to not block the event loop?
Maybe I misunderstood the blocking of the event loop and someone can enlight me.
I don't think that you'll need worker-threads in this scenario. As long as the sql-queries are executed asynchronsouly, i.e. the sql-query calls do not block, you will be fine. You can use Promise.all to speed up the processing of the loop, as the queries will be executed in parallel, e.g.
const dbQueryPromises = [];
for(const entry of data) {
dbQueryPromises.push(dbConnection.query(buildQuery(entry)));
}
await Promise.all(dbQueryPromises);
If, however, your code performs computation-heavy operations inside the loop, then you should consider worker-threads as the long running operations on your call stack will block the eventloop.
Only use them if you need to do CPU-intensive tasks with large amounts of data. They allow you to avoid the serialization step of the data. 50 Is not enough I believe

Using Redis transaction vs redlock to solve the Lost Update Problem

I'm working with a Redis cluster having 2+ nodes. I'm trying to figure out which tool best fits for handling concurrency - transaction or locking. Transactions are well documented, but I didn't find a good best-practice-example on redlock. I also wonder why two tools exist and what's the use case for each.
For simplicity, let's assume I want to do a concurrent increment and there is no INCR command in Redis.
Option 1. Using Transactions
If I understand correctly, NodeJS pseudocode would look like this:
transactIncrement = async (key) => {
await redisClient.watch(key);
let value = redisClient.get(key);
value = value + 1;
const multi = await redisClient.multi();
try {
await redisClient.set(key, value, multi);
await redisClient.exec(multi);
} catch (e) {
// most probably error thrown because transaction failed
// TODO: think if it's a good idea to restart in every case, introducing a potential infinite loop
// whatever, restart
await transactIncrement(key);
}
}
Bad things I can see above are:
try-catch block
possibility to use transactions with multiple keys is limited on redis cluster
Option 2. Redlock
Is it true that trying to lock a resource that's already locked would not cause a failure immediately? So that redlock tries N times before erroring?
If true then here's my pseudocode:
redlockIncrement = async (key) => {
await redlock.lock(key, 1);
// below this line it's guaranteed that other "threads" are put on hold
// and cannot access the key, right?
let value = await redisClient.get(key);
value = value + 1;
await redisClient.set(key, value);
await redlock.unlock(key);
}
Summary
If I got things right then redlock is definitely a more powerful technique. Please correct me if I'm wrong in the above assumptions. It would also be really great if someone provides an example of code solving similar problem because I couldn't find one.
Redlock is useful when you have a distributed set of components that you want to coordinate to create an atomic operation.
You wouldn't use it for operations that affect a single Redis node. That's because Redis already has much simpler and more reliable means of ensuring atomicity for commands that use its single-threaded server: transactions or scripting. (You didn't mention Lua scripting, but that's the most powerful way to create custom atomic commands).
Since INCR operates on a single key, and therefore on a single node, the best way to implement that would be with a simple Lua script.
Now, if you want to use a sequence of commands that spans multiple nodes neither transactions nor scripting will work. In that case you could use Redlock or a similar distributed lock. However, you would generally try to avoid that in your Redis design. Specifically, you would use hash tags to force certain keys to reside on the same node:
Hash tags are a way to ensure that multiple keys are allocated in the same hash slot. This is used in order to implement multi-key operations in Redis Cluster.

Nodejs memory cache thread safety

I have a Node service. The node service has a memory cache, it's an in-memory key-value store. The node service also running a periodic task runs every day which is rebuild cache (CPU intensive and cost a lot of time). When rebuild cache task is running, Will it block other /get request? Is there any race condition here?
/get: get the cache data by key
setInterval(()=>{ rebuildCache() ; }, 3600)
async rebuildCache(filepath: string, key: string): Promise<void> {
const obj = constructFromFile(filepath)
//load json from filepath. do some cpu intensitive work. (build schema, reference, etc)
cache[key] = obj
}
Will it block other /get request?
It depends upon what rebuildCache() does and how it works. If it's a synchronous operation (entirely CPU), then that will block the event loop and will block all request processing.
Is there any race condition here?
It depends upon how the code that uses the cache is written and what rebuildCache() does. If any operations that uses the cache are asynchronous and they depend upon cache consistency from before it runs an asynchronous operation to after it finishes an asynchronous operation, then a rebuildCache() can occur in that window of time and you could indeed have a race condition.
The devil is in the details of your actual implementation for both code using the cache and the function that rebuilds the cache so we can only offer hypothetical answers without seeing the actual code.
FYI, this code in your question:
setInterval(rebuildCache())
only calls rebuildCache() once. To actually call it on any interval, you would need something like this:
setInterval(rebuildCache, t);
Note, you pass a function reference (without the parens) and you pass a time for the timer interval.

Concurrency between Meteor.setTimeout and Meteor.methods

In my Meteor application to implement a turnbased multiplayer game server, the clients receive the game state via publish/subscribe, and can call a Meteor method sendTurn to send turn data to the server (they cannot update the game state collection directly).
var endRound = function(gameRound) {
// check if gameRound has already ended /
// if round results have already been determined
// --> yes:
do nothing
// --> no:
// determine round results
// update collection
// create next gameRound
};
Meteor.methods({
sendTurn: function(turnParams) {
// find gameRound data
// validate turnParams against gameRound
// store turn (update "gameRound" collection object)
// have all clients sent in turns for this round?
// yes --> call "endRound"
// no --> wait for other clients to send turns
}
});
To implement a time limit, I want to wait for a certain time period (to give clients time to call sendTurn), and then determine the round result - but only if the round result has not already been determined in sendTurn.
How should I implement this time limit on the server?
My naive approach to implement this would be to call Meteor.setTimeout(endRound, <roundTimeLimit>).
Questions:
What about concurrency? I assume I should update collections synchronously (without callbacks) in sendTurn and endRound (?), but would this be enough to eliminate race conditions? (Reading the 4th comment on the accepted answer to this SO question about synchronous database operations also yielding, I doubt that)
In that regard, what does "per request" mean in the Meteor docs in my context (the function endRound called by a client method call and/or in server setTimeout)?
In Meteor, your server code runs in a single thread per request, not in the asynchronous callback style typical of Node.
In a multi-server / clustered environment, (how) would this work?
Great question, and it's trickier than it looks. First off I'd like to point out that I've implemented a solution to this exact problem in the following repos:
https://github.com/ldworkin/meteor-prisoners-dilemma
https://github.com/HarvardEconCS/turkserver-meteor
To summarize, the problem basically has the following properties:
Each client sends in some action on each round (you call this sendTurn)
When all clients have sent in their actions, run endRound
Each round has a timer that, if it expires, automatically runs endRound anyway
endRound must execute exactly once per round regardless of what clients do
Now, consider the properties of Meteor that we have to deal with:
Each client can have exactly one outstanding method to the server at a time (unless this.unblock() is called inside a method). Following methods wait for the first.
All timeout and database operations on the server can yield to other fibers
This means that whenever a method call goes through a yielding operation, values in Node or the database can change. This can lead to the following potential race conditions (these are just the ones I've fixed, but there may be others):
In a 2-player game, for example, two clients call sendTurn at exactly same time. Both call a yielding operation to store the turn data. Both methods then check whether 2 players have sent in their turns, finding the affirmative, and then endRound gets run twice.
A player calls sendTurn right as the round times out. In that case, endRound is called by both the timeout and the player's method, resulting running twice again.
Incorrect fixes to the above problems can result in starvation where endRound never gets called.
You can approach this problem in several ways, either synchronizing in Node or in the database.
Since only one Fiber can actually change values in Node at a time, if you don't call a yielding operation you are guaranteed to avoid possible race conditions. So you can cache things like the turn states in memory instead of in the database. However, this requires that the caching is done correctly and doesn't carry over to clustered environments.
Move the endRound code outside of the method call itself, using something else to trigger it. This is the approach I've taken which ensures that only the timer or the final player triggers the end of the round, not both (see here for an implementation using observeChanges).
In a clustered environment you will have to synchronize using only the database, probably with conditional update operations and atomic operators. Something like the following:
var currentVal;
while(true) {
currentVal = Foo.findOne(id).val; // yields
if( Foo.update({_id: id, val: currentVal}, {$inc: {val: 1}}) > 0 ) {
// Operation went as expected
// (your code here, e.g. endRound)
break;
}
else {
// Race condition detected, try again
}
}
The above approach is primitive and probably results in bad database performance under high loads; it also doesn't handle timers, but I'm sure with some thinking you can figure out how to extend it to work better.
You may also want to see this timers code for some other ideas. I'm going to extend it to the full setting that you described once I have some time.

How is setTimeout implemented in node.js

I was wondering if anybody knows how setTimeout is implemented in node.js. I believe I have read somewhere that this is not part of V8. I quickly tried to find the implementation, but could not find it in the source(BIG).I for example found this timers.js file, which then for example links to timer_wrap.cc. But these file do not completely answer all of my questions.
Does V8 have setTimeout implementation? I guess also from the source the answer is no.
How is setTimeout implemented? javascript or native or combination of both? From timers.js I assume something along the line of both:
var Timer = process.binding('timer_wrap').Timer;`
When adding multiple timers(setTimeout) how does node.js know which to execute first? Does it add all the timers to a collection(sorted)? If it is sorted then finding the timeout which needs to be executed is O(1) and O(log n) for insertion? But then again in timers.js I see them use a linkedlist?
But then again adding a lot of timers is not a problem at all?
When executing this script:
var x = new Array(1000),
len = x.length;
/**
* Returns a random integer between min and max
* Using Math.round() will give you a non-uniform distribution!
*/
function getRandomInt (min, max) {
return Math.floor(Math.random() * (max - min + 1)) + min;
}
var y = 0;
for (var i = 0; i < len; i++) {
var randomTimeout = getRandomInt(1000, 10000);
console.log(i + ', ' + randomTimeout + ', ' + ++y);
setTimeout(function () {
console.log(arguments);
}, randomTimeout, randomTimeout, y);
}
you get a little bit of CPU usage but not that much?
I am wondering if I implement all these callbacks one by one in a sorted list if I will get better performance?
You've done most of the work already. V8 doesn't provides an implementation for setTimeout because it's not part of ECMAScript. The function you use is implemented in timers.js, which creates an instance of a Timeout object which is a wrapper around a C class.
There is a comment in the source describing how they are managing the timers.
// Because often many sockets will have the same idle timeout we will not
// use one timeout watcher per item. It is too much overhead. Instead
// we'll use a single watcher for all sockets with the same timeout value
// and a linked list. This technique is described in the libev manual:
// http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod#Be_smart_about_timeouts
Which indicates it's using a double linked list which is #4 in the linked article.
If there is not one request, but many thousands (millions...), all
employing some kind of timeout with the same timeout value, then one
can do even better:
When starting the timeout, calculate the timeout value and put the
timeout at the end of the list.
Then use an ev_timer to fire when the timeout at the beginning of the
list is expected to fire (for example, using the technique #3).
When there is some activity, remove the timer from the list,
recalculate the timeout, append it to the end of the list again, and
make sure to update the ev_timer if it was taken from the beginning of
the list.
This way, one can manage an unlimited number of timeouts in O(1) time
for starting, stopping and updating the timers, at the expense of a
major complication, and having to use a constant timeout. The constant
timeout ensures that the list stays sorted.
Node.js is designed around async operations and setTimeout is an important part of that. I wouldn't try to get tricky, just use what they provide. Trust that it's fast enough until you've proven that in your specific case it's a bottleneck. Don't get stuck on premature optimization.
UPDATE
What happens is you've got essentially a dictionary of timeouts at the top level, so all 100ms timeouts are grouped together. Whenever a new timeout is added, or the oldest timeout triggers, it is appended to the list. This means that the oldest timeout, the one which will trigger the soonest, is at the beginning of the list. There is a single timer for this list, and it's set based on the time until the first item in the list is set to expire.
If you call setTimeout 1000 times each with the same timeout value, they will be appended to the list in the order you called setTimeout and no sorting is necessary. It's a very efficient setup.
No problem with many timers!
When uv loop call poll, it pass timeout argument to it with closest timer of all timers.
[closest timer of all timers]
https://github.com/joyent/node/blob/master/deps/uv/src/unix/timer.c #120
RB_MIN(uv__timers, &loop->timer_handles)
[pass timeout argument to poll api]
https://github.com/joyent/node/blob/master/deps/uv/src/unix/core.c #276
timeout = 0;
if ((mode & UV_RUN_NOWAIT) == 0)
timeout = uv_backend_timeout(loop);
uv__io_poll(loop, timeout);
Note: on Windows OS, it's almost same logic

Resources