I wondering what would be the way to design a web service like this:
Say I have a server listening for requests, it receives some key and checks if it's cached (for example using some DB) and if it's not it does some processing, generates the answer, stores it in cache DB and returns answer to client.
This seems to work OK but what happens if two clients request for the same non-existent key? In this case a race condition would happen, so it would look like
client 1 -> check cache DB -> generate answer -> store in cache -> reply to client
client 2 -> check cache DB -> generate answer -> store in cache -> reply to client
One way to avoid this issue would be using a UNIQUE feature in the DB, so whenever the second answer is generated and written to the DB, some error happens. This is fine but seems more like a patch rather than a real solution. Specially, imagine a case where generating the answer takes a lot of processing, then something else would be preferable.
One option I can think of is using job queues, so whenever a key is received, the key is either appended to an existing job, or a new job is added to the queue.
I've been playing with node.js for some weeks and I'm surprised that I haven't found examples showing this kind of use case. So I'm wondering if this is an acceptable solution for cases like this, or something better exists?
Here is how you can do that in a single-process setup:
var Emitter = require('events').EventEmitter;
var requests = Object.create(null);
function getSomething (key, callback) {
var request = requests[key];
if (!request) {
request = requests[key] = new Emitter;
getSomethingActually(key, function (err, result) {
delete requests[key];
if (err) return request.emit('error', err);
request.emit('result', result);
});
}
request.once('result', function (result) {
callback(null, result);
});
request.once('error', function (err) {
callback(err);
});
}
if you want to scale this, you need to use some external storage + event bus, like redis.
You should be using job queues (or some other sort of offloading jobs) either way. Processing-intensive tasks should always be taken out of your main Node application (either by a queue, spawning it as a separate process, etc) or else it will block the event loop, thus blocking all other requests.
This being said, if you choose to use a queue of some sort that can have a unique constraint, such as a postgres backed queue, and set a unique constraint on the key, duplicates will never be inserted into the work queue, so will never be processed twice. You can simply ignore a unique constraint error in this case.
Note that it is still likely possible, yet very unlikely, to have a sequence of events like:
request check the 'cache' for key x, gets a miss
worker completes answer for key x, inserts it into 'cache', removes x from queue
request received a miss for key x, adds it to the queue
worker pulls key x from the queue, starts computation
After this (probably unlikely) sequence of events, the second worker would get an error inserting the key. In my opinion, this is probably an unlikely enough event that adding a unique key constraint and just ignoring a unique constraint violation error on the second worker is probably a viable enough option.
Related
I am getting n post requests (on each webhook trigger) from a webhook. The data is identical on all requests that come from the same trigger - they all have the same 'orderId'. I'm interested in saving only one of these requests, so on each endpoint hit I'm checking if this specific orderId exists as a row in my database, otherwise - create it.
if (await orderIdExists === null) {
await Order.create(
{
userId,
status: PENDING,
price,
...
}
);
await sleep(3000)
function sleep(ms) {
return new Promise((resolve) => {
setTimeout(resolve, ms);
});
}
}
return res.status(HttpStatus.OK).send({success: true})
} catch (error) {
return res.status(HttpStatus.INTERNAL_SERVER_ERROR).send({success: false})
}
}
else {
return res.status(HttpStatus.UNAUTHORIZED).send(responseBuilder(false, responseErrorCodes.INVALID_API_KEY, {}, req.t));
}
}
Problem is before Sequelize manages to save the new created order in the db (all of the n post requests get to the enpoint in 1 sec - or less), I already get another endpoint hit from the other n post requests, while orderIdExists still equels null, So it ends up creating more identical orders. One (not so good solution) is to make orderId unique in the db, which prevents the creation of on order with the same orderId, but tries to anyway, which results in empty id incrementation in the db. Any idea would be greatly appreciated.
p.s. as you can see, i tried adding a 'sleep' function to no avail.
Your database is failing to complete its save operation before the next request arrives. The problem is similar to the Dogpile Effect or a "cache slam".
This requires some more thinking about how you are framing the problem: in other words the "solution" will be more philosophical and perhaps have less to do with code, so your results on StackOverflow may vary.
The "sleep" solution is no solution at all: there's no guarantee how long the database operation might take or how long you might wait before another duplicate request arrives. As a rule of thumb, any time "sleep" is deployed as a "solution" to problems of concurrency, it usually is the wrong choice.
Let me posit two possible ways of dealing with this:
Option 1: write-only: i.e. don't try to "solve" this by reading from the database before you write to it. Just keep the pipeline leading into the database as dumb as possible and keep writing. E.g. consider a "logging" table that just stores whatever the webhook throws at it -- don't try to read from it, just keep inserting (or upserting). If you get 100 ping-backs about a specific order, so be it: your table would log it all and if you end up with 100 rows for a single orderId, let some other downstream process worry about what to do with all that duplicated data. Presumably, Sequelize is smart enough (and your database supports whatever process locking) to queue up the operations and deal with write repetitions.
An upsert operation here would be helpful if you do want to have a unique constraint on the orderId (this seems sensible, but you may be aware of other considerations in your particular setup).
Option 2: use a queue. This is decidedly more complex, so weigh carefully wether or not your use-case justifies the extra work. Instead of writing data immediately to the database, throw the webhook data into a queue (e.g. a first-in-first-out FIFO queue). Ideally, you would want to choose a queue that supports de-duplication so that exiting messages are guaranteed to be unique, but that infers state, and that usually relies on a database of some sort, which is sort of the problem to begin with.
The most important thing a queue would do for you is it would serialize the messages so you can deal with them one at a time (instead of multiple database operations kicking off concurrently). You can upsert data into the database when you read a message out of the queue. If the webhook keeps firing and more messages enter the queue, that's fine because the queue forces them all to line up single-file and you can handle each insertion one at a time. You'll know that each database operation has completed before it moves on to the next message so you never "slam" the DB. In other words, putting a queue in front of the database will allow it to handle data when the database is ready instead of whenever a webhook comes calling.
The idea of a queue here is similar to what a semaphore accomplishes. Note that your database interface may already implement a kind of queue/pool under-the-hood, so weigh this option carefully: don't reinvent a wheel.
Hope those ideas are useful.
You saved my time #Everett and #april-henig. I found that saving directly into database read to records duplicates. If you store records into an object and deal with one record at time helped me a lot.
May be I would share my solution perhaps some may find it useful in future.
Create an empty object to save success request
export const queueAllSuccessCallBack = {};
Save POST request in object
if (status === 'success') { // I checked the request if is only successfully
const findKeyTransaction = queueAllSuccessCallBack[client_reference_id];
if (!findKeyTransaction) { // check if Id is not added to avoid any duplicates
queueAllSuccessCallBack[client_reference_id] = {
transFound,
body,
}; // save new request id as key and the value as data you want
}
}
Access the object to save into database
const keys = Object.keys(queueAllSuccessCallBack);
keys.forEach(async (key) => {
...
// Do extra checks if you want to do so
// Or save in database direct
});
I need NodeJS to prevent concurrent operations for the same requests. From what I understand, if NodeJS receives multiple requests, this is what happens:
REQUEST1 ---> DATABASE_READ
REQUEST2 ---> DATABASE_READ
DATABASE_READ complete ---> EXPENSIVE_OP() --> REQUEST1_END
DATABASE_READ complete ---> EXPENSIVE_OP() --> REQUEST2_END
This results in two expensive operations running. What I need is something like this:
REQUEST1 ---> DATABASE_READ
DATABASE_READ complete ---> DATABASE_UPDATE
DATABASE_UPDATE complete ---> REQUEST2 ---> DATABASE_READ ––> REQUEST2_END
---> EXPENSIVE_OP() --> REQUEST1_END
This is what it looks like in code. The problem is the window between when the app starts reading the cache value and when it finishes writing to it. During this window, the concurrent requests don't know that there is already one request with the same itemID running.
app.post("/api", async function(req, res) {
const itemID = req.body.itemID
// See if itemID is processing
const processing = await DATABASE_READ(itemID)
// Due to how NodeJS works,
// from this point in time all requests
// to /api?itemID="xxx" will have processing = false
// and will conduct expensive operations
if (processing == true) {
// "Cheap" part
// Tell client to wait until itemID is processed
} else {
// "Expensive" part
DATABASE_UPDATE({[itemID]: true})
// All requests to /api at this point
// are still going here and conducting
// duplicate operations.
// Only after DATABASE_UPDATE finishes,
// all requests go to the "Cheap" part
DO_EXPENSIVE_THINGS();
}
}
Edit
Of course I can do something like this:
const lockedIDs = {}
app.post("/api", function(req, res) {
const itemID = req.body.itemID
const locked = lockedIDs[itemID] ? true : false // sync equivalent to async DATABASE_READ(itemID)
if (locked) {
// Tell client to wait until itemID is processed
// No need to do expensive operations
} else {
lockedIDs[itemID] = true // sync equivalent to async DATABASE_UPDATE({[itemID]: true})
// Do expensive operations
// itemID is now "locked", so subsequent request will not go here
}
}
lockedIDs here behaves like an in-memory synchronous key-value database. That is ok, if it is just one server. But what if there are multiple server instances? I need to have a separate cache storage, like Redis. And I can access Redis only asynchronously. So this will not work, unfortunately.
Ok, let me take a crack at this.
So, the problem I'm having with this question is that you've abstracted the problem so much that it's really hard to help you optimize. It's not clear what your "long running process" is doing, and what it is doing will affect how to solve the challenge of handling multiple concurrent requests. What's your API doing that you're worried about consuming resources?
From your code, at first I guessed that you're kicking off some kind of long-running job (e.g. file conversion or something), but then some of the edits and comments make me think that it might be just a complex query against the database which requires a lot of calculations to get right and so you want to cache the query results. But I could also see it being something else, like a query against a bunch of third party APIs that you're aggregating or something. Each scenario has some nuance that changes what's optimal.
That said, I'll explain the 'cache' scenario and you can tell me if you're more interested in one of the other solutions.
Basically, you're in the right ballpark for the cache already. If you haven't already, I'd recommend looking at cache-manager, which simplifies your boilerplate a little for these scenarios (and let's you set cache invalidation and even have multi-tier caching). The piece that you're missing is that you essentially should always respond with whatever you have in the cache, and populate the cache outside the scope of any given request. Using your code as a starting point, something like this (leaving off all the try..catches and error checking and such for simplicity):
// A GET is OK here, because no matter what we're firing back a response quickly,
// and semantically this is a query
app.get("/api", async function(req, res) {
const itemID = req.query.itemID
// In this case, I'm assuming you have a cache object that basically gets whatever
// is cached in your cache storage and can set new things there too.
let item = await cache.get(itemID)
// Item isn't in the cache at all, so this is the very first attempt.
if (!item) {
// go ahead and let the client know we'll get to it later. 202 Accepted should
// be fine, but pick your own status code to let them know it's in process.
// Other good options include [503 Service Unavailable with a retry-after
// header][2] and [420 Enhance Your Calm][2] (non-standard, but funny)
res.status(202).send({ id: itemID });
// put an empty object in there so we know it's working on it.
await cache.set(itemID, {});
// start the long-running process, which should update the cache when it's done
await populateCache(itemID);
return;
}
// Here we have an item in the cache, but it's not done processing. Maybe you
// could just check to see if it's an empty object or not, but I'm assuming
// that we've setup a boolean flag on the cached object for when it's done.
if (!item.processed) {
// The client should try again later like above. Exit early. You could
// alternatively send the partial item, an empty object, or a message.
return res.status(202).send({ id: itemID });
}
// if we get here, the item is in the cache and done processing.
return res.send(item);
}
Now, I don't know precisely what all your stuff does, but if it's me, populateCache from above is a pretty simple function that just calls whatever service we're using to do the long-running work and then puts it into the cache.
async function populateCache(itemId) {
const item = await service.createThisWorkOfArt(itemId);
await cache.set(itemId, item);
return;
}
Let me know if that's not clear or if your scenario is really different from what I'm guessing.
As mentioned in the comments, this approach will cover most normal issues you might have with your described scenario, but it will still allow two requests to both fire off the long-running process, if they come in faster than the write to your cache store (e.g. Redis). I judge the odds of that happening are pretty low, but if you're really concerned about that then the next more paranoid version of this would be to simply remove the long-running process code from your web API altogether. Instead, your API just records that someone requested that stuff to happen, and if there's nothing in the cache then respond as I did above, but completely remove the block that actually calls populateCache altogether.
Instead, you would have a separate worker process running that would periodically (how often depends on your business case) check the cache for unprocessed jobs and kick off the work for processing them. By doing it this way, even if you have 1000's of concurrent requests for the same item, you can ensure that you're only processing it one time. The downside of course is that you add whatever the periodicity of the check is to the delay in getting the fully processed data.
You could create a local Map object (in memory for synchronous access) that contains any itemID as a key that is being processed. You could make the value for that key be a promise that resolves with whatever the result is from anyone who has previously processed that key. I think of this like a gate keeper. It keeps track of which itemIDs are being processed.
This scheme tells future requests for the same itemID to wait and does not block other requests - I thought that was important rather than just using a global lock on all requests related to itemID processing.
Then, as part of your processing, you first check the local Map object. If that key is in there, then it's currently being processed. You can then just await the promise from the Map object to see when it's done being processed and get any result from prior processing.
If it's not in the Map object, then it's not being processed now and you can immediately put it in Map to mark it as "in process". If you set a promise as the value, then you can resolve that promise with whatever result you get from this processing of the object.
Any other requests that come along will end up just waiting on that promise and you will thus only process this ID once. The first one to start with that ID will process it and all other requests that come along while it's processing will use the same shared result (thus saving the duplication of your heavy computation).
I tried to code up an example, but did not really understand what your psuedo-code was trying to do well enough to offer a code example.
Systems like this have to have perfect error handling so that all possible error paths handle the Map and promise embedded in the Map properly.
Based on your fairly light pseudo-code example, here's a similar pseudo code example that illustrates the above concept:
const itemInProcessCache = new Map();
app.get("/api", async function(req, res) {
const itemID = req.query.itemID
let gate = itemInProcessCache.get(itemID);
if (gate) {
gate.then(val => {
// use cached result here from previous processing
}).catch(err => {
// decide what to do when previous processing had an error
});
} else {
let p = DATABASE_UPDATE({itemID: true}).then(result => {
// expensive processing done
// return final value so any others waiting on the gate can just use that value
// decide if you want to clear this item from itemInProcessCache or not
}).catch(err => {
// error on expensive processing
// remove from the gate cache because we didn't get a result
// expensive processing will have to be done by someone else
itemInProcessCache.delete(itemID);
});
// mark this item as being processed
itemInProcessCache.set(itemID, p);
}
});
Note: This relies on the single-threadedness of node.js. No other request can get started until the request handler here returns so that itemInProcessCache.set(itemID, p); gets called before any other requests for this itemID could get started.
Also, I don't know databases very well, but this seems very much like a feature that a good multi-user database might have built in or have supporting features that makes this easier since it's not an uncommon idea to not want to have multiple requests all trying to do the same database work (or worse yet, trouncing each other's work).
I have a question about a situation I am in currently which I have a solution for but am not quite sure if it 100% solves the issue at hand as I do not have tests written that could validate my solution.
I would love your oppinion on the matter and maybe a suggestion of a more elegant solution or possibly even a way to avoid the issue completely.
Here it is:
I am making a game where you may create or join open rooms/games.
There is a gamelist in the UI and when you click a game you attempt to join that game.
Each game has a bet (amount of credit that you win or lose) that the creator set which anyone joining must match.
On the serverside, before I let the player actually join the room, I must validate that his credit balance is sufficient to match the bet of the game he is joining. This will be via an API call.
Now, if two players join the game at once, lets say the validation of the first player joining takes 3 seconds but the validation of the second only 1 second.
Since rooms are 1 vs 1 I must not let a player join if someone else already did.
I can do this simply by checking if theres a player in the game already:
// game already full
if (game.p2) {
return socket.emit("join_game_reply", {
err: "Someone else already joined."
})
}
But, the issue at hand is, after that check, I must validate the balance.
So we get something like this:
socket.on("join_game", data => {
const game = openGames[data.gameId}
// game already full
if (game.p2) {
return socket.emit("join_game_reply", {
err: "Someone else already joined."
})
}
// check if users balance is sufficient to match bet of room creator
verifyUserBalance(socket.player, game.bet)
.then(sufficient => {
if(sufficient){
// join game
game.p2 = socket.player
}
})
})
The issue here:
What if at the time playerX clicks join the game is open, validation starts but while validating playerY joins and finishes validation before playerX and therefore is set as game.p2. Validation of playerX finished shortly after and the server then continues to set game.p2 to playerX, leaving playerY with a UI state of ingame even though on the server he is not anymore.
The solution I have is to literally just do the check again after validation:
socket.on("join_game", data => {
const game = openGames[data.gameId}
// game already full
if (game.p2) {
return socket.emit("join_game_reply", {
err: "Someone else already joined."
})
}
// check if users balance is sufficient to match bet of room creator
verifyUserBalance(socket.player, game.bet)
.then(sufficient => {
if(sufficient){
// join game
if (game.p2) {
return socket.emit("join_game_reply", {
err: "Someone else already joined."
})
game.p2 = socket.player
}
}
})
})
The reason I think this works is because nodeJS is single threaded and I can therefore make sure that after validating I only let players join if no one else joined in the meantime.
After writing this up I actually feel pretty confident that it will work so please let me in on my mistakes if you see any! Thanks a lot for taking the time!
Your code will work, but I think this is bootstrapping for short-term and you will have to change it in the mid-term.
It will work
A. if you have only one server.
B. If your server is not crashing
C. If you have just one synchronous action (here game.p2 = socket.player)
A. Multiple servers
To scale up your infra, I'm afraid it won't work.
You should not use nodejs variables (as openGames) to store data but retrieve them from a cache database (as redis). This redis database will be you single source of truth.
B. If server crash
The same kind of problems will happen if your server crash (for any reason, like full disk ...) You will lose all your data stored in nodejs variables.
C. Multiple actions
If you want to add one action (like putting the bet amount in escrow) in your workflow, you will need to catch the failure if this action (and the failure of the room joining) and guarantee that there is a all-or-nothing mechanism (escrow+joining or nothing).
You can manage it in your code but it will become quite complex.
Transactions
When dealing with money + actions, I think you should use transactions features of databases. I would use for example Redis Transactions.
You need to make "verify and join" an atomic operation on the server so nobody else can cause a race condition. There are many different ways to approach a solution. The best solution would only impact that particular game, not impacting the processing of joining other games.
Here's one idea:
Create a means of "provisionally joining a game". This will essentially reserve your spot in the game while you then check to see if the user verifies for the game. This prevents anyone else from joining a game you were first at and are in the process of verifying.
When someone else comes in to provisionally join a game, but the game already has a provisional user, the join function can return a promise that is not yet resolved. If the previous provisional join verifies and finishes, then this promise will reject because the game is already full. If the other provisional join fails to verify, then the first one to request a provisional join will resolve and it can then go on with the verification process.
If this second user verifies correctly and converts to a completed join to the game, it will reject any other waiting promises for other provisional joins to this game. If it fails to verify, then it goes back to step 2 and the next one waiting gets a chance.
In this way, each game essentially has a queue of users waiting to get into the game. The queue hangs around as long as the game isn't full of verified users so whenever anyone doesn't verify, the next one in the queue gets a shot at joining.
For performance and user-experience reasons, you may want to implement a timeout on waiting in the queue and you may want to limit how many users can be in the queue (probably no point in allowing 100 users to be in the queue since it's unlikely they will all fail to verify).
It's important to understand that the verify and join needs to be all implemented on the server because that's the only way you can assure the integrity of the process and control it to avoid race conditions.
In my Meteor application to implement a turnbased multiplayer game server, the clients receive the game state via publish/subscribe, and can call a Meteor method sendTurn to send turn data to the server (they cannot update the game state collection directly).
var endRound = function(gameRound) {
// check if gameRound has already ended /
// if round results have already been determined
// --> yes:
do nothing
// --> no:
// determine round results
// update collection
// create next gameRound
};
Meteor.methods({
sendTurn: function(turnParams) {
// find gameRound data
// validate turnParams against gameRound
// store turn (update "gameRound" collection object)
// have all clients sent in turns for this round?
// yes --> call "endRound"
// no --> wait for other clients to send turns
}
});
To implement a time limit, I want to wait for a certain time period (to give clients time to call sendTurn), and then determine the round result - but only if the round result has not already been determined in sendTurn.
How should I implement this time limit on the server?
My naive approach to implement this would be to call Meteor.setTimeout(endRound, <roundTimeLimit>).
Questions:
What about concurrency? I assume I should update collections synchronously (without callbacks) in sendTurn and endRound (?), but would this be enough to eliminate race conditions? (Reading the 4th comment on the accepted answer to this SO question about synchronous database operations also yielding, I doubt that)
In that regard, what does "per request" mean in the Meteor docs in my context (the function endRound called by a client method call and/or in server setTimeout)?
In Meteor, your server code runs in a single thread per request, not in the asynchronous callback style typical of Node.
In a multi-server / clustered environment, (how) would this work?
Great question, and it's trickier than it looks. First off I'd like to point out that I've implemented a solution to this exact problem in the following repos:
https://github.com/ldworkin/meteor-prisoners-dilemma
https://github.com/HarvardEconCS/turkserver-meteor
To summarize, the problem basically has the following properties:
Each client sends in some action on each round (you call this sendTurn)
When all clients have sent in their actions, run endRound
Each round has a timer that, if it expires, automatically runs endRound anyway
endRound must execute exactly once per round regardless of what clients do
Now, consider the properties of Meteor that we have to deal with:
Each client can have exactly one outstanding method to the server at a time (unless this.unblock() is called inside a method). Following methods wait for the first.
All timeout and database operations on the server can yield to other fibers
This means that whenever a method call goes through a yielding operation, values in Node or the database can change. This can lead to the following potential race conditions (these are just the ones I've fixed, but there may be others):
In a 2-player game, for example, two clients call sendTurn at exactly same time. Both call a yielding operation to store the turn data. Both methods then check whether 2 players have sent in their turns, finding the affirmative, and then endRound gets run twice.
A player calls sendTurn right as the round times out. In that case, endRound is called by both the timeout and the player's method, resulting running twice again.
Incorrect fixes to the above problems can result in starvation where endRound never gets called.
You can approach this problem in several ways, either synchronizing in Node or in the database.
Since only one Fiber can actually change values in Node at a time, if you don't call a yielding operation you are guaranteed to avoid possible race conditions. So you can cache things like the turn states in memory instead of in the database. However, this requires that the caching is done correctly and doesn't carry over to clustered environments.
Move the endRound code outside of the method call itself, using something else to trigger it. This is the approach I've taken which ensures that only the timer or the final player triggers the end of the round, not both (see here for an implementation using observeChanges).
In a clustered environment you will have to synchronize using only the database, probably with conditional update operations and atomic operators. Something like the following:
var currentVal;
while(true) {
currentVal = Foo.findOne(id).val; // yields
if( Foo.update({_id: id, val: currentVal}, {$inc: {val: 1}}) > 0 ) {
// Operation went as expected
// (your code here, e.g. endRound)
break;
}
else {
// Race condition detected, try again
}
}
The above approach is primitive and probably results in bad database performance under high loads; it also doesn't handle timers, but I'm sure with some thinking you can figure out how to extend it to work better.
You may also want to see this timers code for some other ideas. I'm going to extend it to the full setting that you described once I have some time.
I have been writing a lot of NodeJS recently and that has forced me to attack some problems from a different perspective. I was wondering what patterns had developed for the problem of processing chunks of data sequentially (rather than in parallel) in an asynchronous request-environment, but I haven't been able to find anything directly relevant.
So to summarize the problem:
I have a list of data stored in an array format that I need to process.
I have to send this data to a service asynchronously, but the service will only accept a few at a time.
The data must be processed sequentially to meet the restrictions on the service, meaning making a number of parallel asynchronous requests is not allowed
Working in this domain, the simplest pattern I've come up with is a recursive one. Something like
function processData(data, start, step, callback){
if(start < data.length){
var chunk = data.split(start, step);
queryService(chunk, start, step, function(e, d){
//Assume no errors
//Could possibly do some matching between d and 'data' here to
//Update data with anything that the service may have returned
processData(data, start+step, step, callback);
});
}
else{
callback(data);
}
}
Conceptually, this should step through each item, but it's intuitively complex. I feel like there should be a simpler way of doing this. Does anyone have a pattern they tend to follow when approaching this kind of problem?
My first thought process would be to rely on object encapsulation. Create an object that contains all of the information about what needs to be processed and all of the relevant data about what has been processed and is being processed and the callback function will just call the 'next' function for the object, which will in turn start processing on the next piece of data and update the object. Essentially working like a n asynchronous for-loop.