Preventing Potential Race Condition in Calls to an API? - node.js

There's an API that my node.js server accesses quite a bit. It requires me to change my password every 3 months. Fortunately, there's also an API call for changing the password. :) I have a cron job that runs regularly and changes the password when necessary.
If my app is accessing the API at the exact time the password is being changed, there's a potential race condition and the API call could fail. What are some good patterns for dealing with this?
I could put all the API calls into a queue, and use a cron job to pull the most recent one off the queue and run it. If the API call fails, it would stay in the queue and get run next time the cron job runs. But that seems like it might be overkill.
I could use a try/catch handler with the API call, inside a while loop, and just run the while loop until the API call completes successfully. But that's going to block the rest of my app.
I could use a try/catch handler with the API call, inside a setTimeOut, and just re-run the setTimeOut until the API call completes successfully. This way the API call would only run when the main thread is done with other work and gets around to it. But would this be a mistake if the server is under heavy load?
Is there a better pattern for dealing with this sort of thing?

The try/catch handlers would lose data in the event of a server crash, so I went with the cron job/queue approach. I'm using a queue maintained as a table in my db, so that if something interrupts the server, nothing will be lost.

Related

Best way to start a background process from GCP HTTP function call?

So, according to the docs here https://cloud.google.com/functions/docs/writing/http
Terminating HTTP functions
If a function creates background tasks (such as threads, futures, Node.js Promise objects, callbacks, or system processes), you must terminate or otherwise resolve these tasks before returning an HTTP response. Any tasks not terminated prior to an HTTP response may not be completed, and may also cause undefined behavior.
So, if one needs to launch a long-running background task from within HTTP function, but still return from function fast, there is no a straightforward way.
Have tried the PubSub approach (calling await topic.publishJSON(pars)), but looks like publishing a topic is quite time-consuming operation - which takes 2-3 secs. (8-)
Then probably pubsub trigger function runs well ok, but this 2-3 seconds delay makes it useless.
P.S.: using the approach with starting Promise from inside function is actually working, but it sounds like error-prone since it's against the docs.
If you need a quick answer you have 2 type of solutions
Async
With Cloud Functions, you need to invoke (perform an HTTP call) another functions (or Cloud Run or App Engine), without waiting the answer, and answer back to the requester. The call that you performed will run in background and answer something to your cloud function that no longer listen!
With PubSub, it's similar. Instead of invoking a Cloud Functions (or Cloud Run or App Engine), you publish a message into a PubSub topic. Then create a subscription to call your long running pocess
Same idea with Cloud Task, but you create a Task in a queue
Sync
If you use Cloud Run instead of Cloud Functions, you are able to perform partial answer to the requester. Like that, you can immediately answer back to the requester with a partial response which says "OK" and continue the process in the request context, and send another partial response when you want, or at the end of the long running process to inform the user the end of their process.

Web Api - Mutex Per User

I have an asp.net core Web Api application.
In my application I have Web Api method which I want to prevent multi request from the same user to enter simultaneously. I don't mind request from different users to perform simultaneously.
I am not sure how to create the lock and where to put it. I thought about creating some kind of a dictionary which will contains the user id and perform the lock on the item but I don't think i'm getting it right. Also, what will happen if there is more than one server and there is a load balancer?
Example:
Let assume each registered user can do 10 long task each month. I need to check for each user if he exceeded his monthly limit. If the user will send many simultaneously requests to the server, he might be allowed to perform more than 10 operations. I understand that I need to put a lock on the method but I do want to allow other users to perform this action simultaneously.
What you're asking for is fundamentally not how the Internet works. The HTTP and underlying IP protocols are stateless, meaning each request is supposed to run independent of any knowledge of what has occurred previously (or concurrently, as the case may be). If you're worried about excessive load, your best bet is to implement rate limiting/throttling tied to authentication. That way, once a user burns through their allotted requests, they're cut off. This will then have a natural side-effect of making the developers programming against your API more cautious about sending excessive requests.
Just to be a bit more thorough, here, the chief problem with the approach you're suggesting is that I know of no way it can be practically implemented. You can use something like SemaphoreSlim to create a lock, but that needs to be static so that the same instance is used for each request. Being static is going to limit your ability to use a dictionary of them, which is what you'll need for this. It can technically be done, I suppose, but you'd have to use a ConcurrentDictionary and even then, there's no guarantee of single-thread additions. So, concurrent requests for the same user could load concurrent semphaphores into it, which defeats the entire point. I suppose you could front-load the dictionary with a semphaphore for each user from the start, but that could become a huge waste of resources, depending on your user-base. Long and short, it's one of those things where when you're finding a solution this darn difficult, it's a good sign you're likely trying to do something you shouldn't be doing.
EDIT
After reading your example, I think this really just boils down to an issue of trying to handle the work within the request pipeline. When there's some long-running task to be completed or just some heavy work to be done, the first step should always be to pass it off to a background service. This allows you to return a response quickly. Web servers have a limited amount of threads to handle requests with, and you want to service the request and return a response as quickly as possible to keep from exhausting your threadpool.
You can use a library like Hangfire to handle your background work or you can implement an IHostedService as described here to queue work on. Once you have your background service ready, you would then just immediately hand off to that any time your get a request to this endpoint, and return a 202 Accepted response with a URL the client can hit to check the status. That solves your immediate issue of not wanting to allow a ton of requests to this long-running job to bring your API down. It's now essentially doing nothing more that just telling something else to do it and then returning immediately.
For the actual background work you'd be queuing, there, you can check the user's allowance and if they have exceeded 10 requests (your rate limit), you fail the job immediately, without doing anything. If not, then you can actually start the work.
If you like, you can also enable webhook support to notify the client when the job completes. You simply allow the client to set a callback URL that you should notify on completion, and then when you've finish the work in the background task, you hit that callback. It's on the client to handle things on their end to decide what happens when the callback is it. They might for instance decide to use SignalR to send out a message to their own users/clients.
EDIT #2
I actually got a little intrigued by this. While I still think it's better for your to offload the work to a background process, I was able to create a solution using SemaphoreSlim. Essentially you just gate every request through the semaphore, where you'll check the current user's remaining requests. This does mean that other users must wait for this check to complete, but then your can release the semaphore and actually do the work. That way, at least, you're not blocking other users during the actual long-running job.
First, add a field to whatever class you're doing this in:
private static readonly SemaphoreSlim _semaphore = new SemaphoreSlim(1, 1);
Then, in the method that's actually being called:
await _semaphore.WaitAsync();
// get remaining requests for user
if (remaining > 0)
{
// decrement remaining requests for user (this must be done before this next line)
_semaphore.Release();
// now do the work
}
else
{
_semaphore.Release();
// handle user out of requests (return error, etc.)
}
This is essentially a bottle-neck. To do the appropriate check and decrementing, only one thread can go through the semaphore at a time. That means if your API gets slammed, requests will queue up and may take a while to complete. However, since this is probably just going to be something like a SELECT query followed by an UPDATE query, it shouldn't take that long for the semaphore to release. You should definitely do some load testing and watch it, though, if you're going to go this route.

Node.js + express.js and thread safety

Assume I have an array of items and each GET call make a change on this array (may be add/remove/shift)
Would that be "thread-safe"? I know that Node.js is a single-threaded, yet is there a possibility that two GET requests would be handled "simultaneously"?
As node is single-threaded only one piece of code is ever being executed at any time. A callback (such as the callback from a remote HTTP GET request) will be added to the end of the event loop's message queue. When there are no more functions on the stack, the program waits for a message to be added to the queue, and runs the message's function (in this case, the request callback function).
If you are making parallel requests to a remote server then you won't get the requests completed in the same order each time unless you run the requests in series. The callback functions will never run at the same time, however - only one function can ever be executed at once.
It would be thread safe because all operation on arrays are blocking. The only operations in node.js that are not blocking are I/Os.
Since you don't have any async operation, there is no problem with your situation. (Except if you need to do something like an access to a database or such ?)

WCF - spawn a new worker thread and return to caller without waiting for it to finnish

I have a WCF web service hosted in IIS- This service has a method - lets call it DoSomething(). DoSomething() is called from a client-side application.
DoSomething performs some work and returns the answer to the user. Now I need to log how often DoSomething is being called. I can add it to the DoSomething function so that it will for every call write to an sql database and update a counter, but this will slow down the DoSomething method as the user needs to wait for this extra database call.
Is it a good option to let the DoSomething method spawn a new thread which will update the counter in the database, and then just return the answer from the DoSomething method to the user without waiting for the thread to finnish? Then I will not know if the database update fails, but that is not critical.
Any problems with spawning a new background thread and not wait for it to finnish in WCF? Or is there a better way to solve this?
Update: To ask the question in a little different way. Is it a bad idea to spawn new threads insde a wcf web service method?
The main issue is one of reliability. Is this a call you care about? If the IIS process crashes after you returned the response, but before your thread completes, does it matter? If no, then you can use client side C# tools. If it does matter, then you must use a reliable queuing technology.
If you use the client side then spawning a new thread just to block on a DB call is never the correct answer. What you want is to make the call async, and for that you use SqlCommand.BeginExecute after you ensure that AsyncronousProcessing is enabled on the connection.
If you need reliable processing then you can use a pattern like Asynchronous procedure execution which relies on persisted queues.
As a side note things like logging, or hit counts, and the like are a huge performance bottleneck if done in the naive approach of writing to the database on every single HTTP request. You must batch and flush.
If you want to only track a single method like DoSomething() in service then you can create an custom operation behavior and apply it over the method.
The operation behavior will contain the code that logs the info to database. In that operation behavior you can use the .NET 4.0's new TPL library to create a task that will take care of database logging. If you use TPL you don't need to worry about directly creating threads.
The advantage of using operation behvaior tomorrow you need to track another method then at that time instead of duplicating the code there you are just going to mark the method with the custom operation behavior. If you want to track all the methods then you should go for service behavior.
To know more about operation behaviors check http://msdn.microsoft.com/en-us/library/system.servicemodel.operationbehaviorattribute.aspx
To know more about TPL(Task Parallel Library) check http://msdn.microsoft.com/en-us/library/dd460717.aspx

Why Google App Engine Tasks can spuriously be executed more than once?

Why Google App Engine Tasks can be executed more than once?
According do Brett Slatkin talk from Google I/O 2009, it is possible for a task to spuriously run twice even without server failures!
This has something to do with spurious wakeup of threads?
Brant Slatkin gave a similar talk at I/0 2010.
I don't know that he ever gave details of how or when this could happen. His point was that because of the way Task Queues work it is possible by design for tasks to be reenqueued. Because of this you need to write your tasks so that it does not cause problems if that happens.
For example, let's say you have a task that sends an email and then increments a counter in Datastore. If there was a bug in your code OR if Datastore was down, it is possible for the email to be sent successfully but for the write to Datastore to fail. If you didn't handle the failure from Datastore in your code by handling the exception the failure to write to Datastore would result in your task returning a HTTP status code of 500. Task Queue is designed to reenqueue the task if it returns a status code >299. This would result in your task being executed over and over until the write to datastore was successful. Which means that someone would get many duplicate emails.
I think the line about "Possible for a task to spuriously run twice.." was just a way to say App Engine isn't guaranteed to protect against this so you need to make sure you take care of it in your code.

Resources