Strategy to handle race conditions with regrads to web applicaiton backend? - multithreading

I have been asked questions regarding race conditions in web application like movie ticket or travel website often in interviews.
Question is something like this.
Say for a bus or plane ticket website, there is only seat left. Two(or many in extreme scenario) users on different computer log into the website at the same time and see that one seat is left. They both go ahead, select that seat and place the order.
Now there are two requests we have to handle. For the first request, we will book the ticket and but for the second request, we have to sort-of throw an error and show the error message to the end user saying the seat is not available.
Say the database schema is some-thing like this:
bus_id, seat_id,is_taken
so for the first request, we make the is_taken for corresponding bus_id, seat_id 1. Then for the second request, there won't be any seat_id with is_taken =0 so we won't book the ticket.
But here, in my opinion, we have put a restriction that at one time, only one request can be handled; Second request can be handled, only after first request has been completed.
However that is not practical, since we might have a huge website with loads of traffic and application running on several servers in parallel. We have to process requests in parallel.
Since I don't have much experience with handling race conditions in these sorts of multi-threaded web applications, I can't quite figure, what is the right way about solving this.
What is the right(even if basic) approach/ design patterns to tackle these scenarios?

Web applictions are necessarily multithreaded. There are two ways of solving this.
Application level (Not preferred)
I am not sure which programming language are you using for building the application. But all the programming language used for building websites will have something like "synchornize" which allows you to prevent two threads accessing same block of code simultaneously.
This is not preferred as this solution is not horizontally scalable. When you decide to do the increase the capacity by running one more instance of your web application, this solution fails terribly.
Database level
This is the preferred solution. You obtain the lock on the record in the database before you update.
SQL provides an option for selecting the record for update.
SELECT * FROM BUS_SEATS WHERE BUS_ID = 1 FOR UPDATE;
Above sql is one way to obtain lock. All the database provide this kind of feature. With this feature you can lock the required row and do the update and ensure consistency in the database.

At some point, there has to be some sort of synchronization.
Since you're using a database, which is usually the bottleneck anyway, you might as well let it handle the race condition.
All you have to do is update the row atomically. The requests can still be handled in parallel by the application.
Sql-pseudocode:
DECLARE #success = false;
UPDATE bus_seats
SET is_taken = 1, success = true
WHERE seat_id = #seat_id AND is_taken=0
return #success;

Related

Best way to implement background “timer” functionality in Python/Django

I am trying to implement a Django web application (on Python 3.8.5) which allows a user to create “activities” where they define an activity duration and then set the activity status to “In progress”.
The POST action to the View writes the new status, the duration and the start time (end time, based on start time and duration is also possible to add here of course).
The back-end should then keep track of the duration and automatically change the status to “Finished”.
User actions can also change the status to “Finished” before the calculated end time (i.e. the timer no longer needs to be tracked).
I am fairly new to Python so I need some advice on the smartest way to implement such a concept?
It needs to be efficient and scalable – I’m currently using a Heroku Free account so have limited system resources, but efficiency would also be important for future production implementations of course.
I have looked at the Python threading Timer, and this seems to work on a basic level, but I’ve not been able to determine what kind of constraints this places on the system – e.g. whether the spawned Timer thread might prevent the main thread from finishing and releasing resources (i.e. Heroku Dyno threads), etc.
I have read that persistence might be a problem (if the server goes down), and I haven’t found a way to cancel the timer from another process (the .cancel() method seems to rely on having the original object to cancel, and I’m not sure if this is achievable from another process).
I was also wondering about a more “background” approach, i.e. a single process which is constantly checking the database looking for activity records which have reached their end time and swapping the status.
But what would be the best way of implementing such a server?
Is it practical to read the database every second to find records with an end time of “now”? I need the status to change in real-time when the end time is reached.
Is something like Celery a good option, or is it overkill for a single process like this?
As I said I’m fairly new to these technologies, so I may be missing other obvious solutions – please feel free to enlighten me!
Thanks in advance.
To achieve this you need some kind of scheduling tasks functionality. For a fast simpler implementation is a good solution to use the Timer object from the
Threading module.
A more complete solution is tu use Celery. If you are new, deeping in it will give you a good value start using celery as a queue manager distributing your work easily across several threads or process.
You mentioned that you want it to be efficient and scalable, so I guess you will want to implement similar functionalities that will require multiprocessing and schedule so for that reason my recommendation is to use celery.
You can integrate it into your Django application easily following the documentation Integrate Django with Celery.

Conceptual approach of threads in Delphi

Over 2 years ago, Remy Lebeau gave me invaluable tips on threads in Delphi. His answers were very useful to me and I feel like I made great progress thanks to him. This post can be found here.
Today, I now face a "conceptual problem" about threads. This is not really about code, this is about the approach one should choose for a certain problem. I know we are not supposed to ask for personal opinions, I am merely asking if, on a technical point a view, one of these approach must be avoided or if they are both viable.
My application has a list of unique product numbers (named SKU) in a database. Querying an API with theses SKUS, I get back a JSON file containing details about these products. This JSON file is processed and results are displayed on screen, and saved in database. So, at one step, a download process is involved and it is executed in a worker thread.
I see two different approaches possible for this whole procedure :
When the user clicks on the start button, a query is fired, building a list of SKUs based on the user criteria. A Tstringlist is then built and, for each element of the list, a thread is launched, downloads the JSON, sends back the result to the main thread and terminates.
This can be pictured like this :
When the user clicks on the start button, a query is fired, building a list of SKUs based on the user criteria. Instead of sending SKU numbers one after another to the worker thread, the whole list is sent, and the worker thread iterates through the list, sending back results for displaying and saving to the main thread (via a synchronize event). So we only have one worker thread working the whole list before terminating.
This can be pictured like this :
I have coded these two different approaches and they both work... with each their downsides that I have experienced.
I am not a professional developer, this is a hobby and, before working my way further down a path or another for "polishing", I would like to know if, on a technical point of view and according to your knowledge and experience, one of the approaches I depicted should be avoided and why.
Thanks for your time
Mathias
Another thing to consider in this case is latency to your API that is producing the JSON. For example, if it takes 30 msec to go back and forth to the server, and 0.01 msec to create the JSON on the server, then querying a single JSON record per request, even if each request is in a different thread, does not make much sense. In that case, it would make sense to do fewer requests to the server, returning more data on each request, and partition the results up among different threads.
The other thing is that threads are not a solution to every problem. I would question why you need to break each sku into a single thread. how long is each individual thread running and how much processing is each thread doing? In general, creating lots of threads, for each thread to work for a fraction of a msec does not make sense. You want the threads to be alive for as long as possible, processing as much data as they can for the job. You don't want the computer to be using as much time creating/destroying threads as actually doing useful work.

DDD - How to modify several AR (from different bounded contexts) throughout single request?

I would want expose a little scenario which is still at paper state, and which, regarding DDD principle seem a bit tedious to accomplish.
Let's say, I've an application for hosting accounts management. Basically, the application compose several bounded contexts such as Web accounts management, Ftp accounts management, Mail accounts management... each of them represented by their own AR (they can live standalone).
Now, let's imagine I want to provide a UI with an HTML form that compose one fieldset for each bounded context, for instance to update limits and or features. How should I process exactly to update all AR without breaking single transaction per request principle? Can I create a kind of "outer" AR, let's say a ClientHostingProperties AR which would holds references to other AR and update them as part of single transaction, using own repository? Or should I better create an AR that emit messages to let's listeners provided by the bounded contexts react on, in which case, I should probably think about ES?
Thanks.
How should I process exactly to update all AR without breaking single transaction per request principle?
You are probably looking for a process manager.
Basic sketch: persisting the details from the submitted form is a transaction unto itself (you are offered an opportunity to accrue business value; step 1 is to capture that opportunity).
That gives you a way to keep track of whether or not this task is "done": you compare the changes in the task to the state of the system, and fire off commands (to run in isolated transactions) to make changes.
Processes, in my mind, end up looking a lot like state machines. These tasks are commands are done, these commands are not done, these commands have failed: now what? and eventually reach a state where there are no additional changes to be made, and this instance of the process is "done".
Short answer: You don't.
An aggregate is a transactional boundary, which means that if you would update multiple aggregates in one "action", you'd have to use multiple transactions. The reason for an aggregate to be equivalent to one transaction is that this allows you to guarantee consistency.
This means that you have two options:
You can make your aggregate larger. Then you can actually guarantee consistency, but your ability to handle concurrent requests gets worse. So this is usually what you want to avoid.
You can live with the fact that it's two transactions, which means you are eventually consistent. If so, you usually use something such as a process manager or a flow to handle updating multiple aggregates. In its simplest form, a flow is nothing but a simple if this event happens, run that command rule. In its more complex form, it has its own state.
Hope this helps 😊

How to handle Web application logic and database concurrency?

Let's say I have a table called items. User of my webapp can delete row of the items table, but I don't want to let the table empty.
So currently I have code like this in my application:
if (itemsCount() <= 1) {
don't delete;
}
else {
delete;
}
But I realize this code is vulnerable to concurrency problem. For example if currently the size of items is 2, and there are two thread executing this code at almost the exact same time, the table might become empty.
I think this problem is pretty common for people writing webapps. People should've already solved it. What are the available solutions for this?
The most common solution is to use a Transaction Manager. In your case, the Transaction Manager would coordinate the thread execution to make sure that only one thread at a time access and updates the table.
You didn't mention which language and which kind of environment you are using, but assuming Java and JEE, transaction management is quite easy. Start here.

Replacing bad performing workers in pool

I have a set of actors that are somewhat stateless and perform similar tasks.
Each of these workers is unreliable and potentially low performing. In my design- I can easily spawn more actors to replace lazy ones.
The performance of an actor is assessed by itself. Is there a way to make the supervisor/actor pool do this assessment, to help decide which workers are slow enough for me to replace? Or is my current strategy "the" right strategy?
I'm new to akka myself, so only trying to help, but my attack would be something along the following lines:
Write your own routing logic, something along the following lines https://github.com/akka/akka/blob/v2.3.5/akka-actor/src/main/scala/akka/routing/SmallestMailbox.scala Keep in mind that a new instance is created for every pool, so each instance can store information about how many messages have been processed by each actor so far. In this instance, once you find an actor underperforming, mark it as 'removable' (once it is no longer processing any new messages) in a separate data structure and stop sending further messages.
Write your own router pool: override createRouterActor https://github.com/akka/akka/blob/v2.3.5/akka-actor/src/main/scala/akka/routing/RouterConfig.scala:236 to provide your own CustomRouterPoolActor
Write your CustomRouterPoolActor along the following lines: https://github.com/akka/akka/blob/8485cd2ebb46d2fba851c41c03e34436e498c005/akka-actor/src/main/scala/akka/routing/Resizer.scala (See ResizablePoolActor). This actor will have access to your strategy instance. From this strategy instance- remove the routees already marked for removal. Look at ResizablePoolCell to see how to remove actors.
Question is - why some of your workers perform badly? Is there anything difference between them (I assume not). If not, that maybe some payloads simply require more work the the others - what's the point of terminating them then?
Once we had similar problem - and used SmallestMailboxRoutingLogic. It basically try to distribute the workload based on mailbox sizes.
Anyway, I would rather try to answer the question - why some of the workers are unstable and perform poorly - because this looks like a biggest problem you are just trying to cover elsewhere.

Resources