WSO2 ESB - Making a sequence wait/pause - multithreading

I have a use case where I need a sequence to wait for a period of time before it continues. Basically it is a "Thread.Sleep(x)", but this would mean the Thread is not available for the Thread pool. This could have consequences for high load systems. So therefore I have two questions:
1) What would be the best way to implement this use case?
2) How much of a burden would using Thread.Sleep be for WSO?
Alternative solutions, for example using topic and stuff are also welcome :)
Hope you guys can help!
Answering the questions in the responses:
We are sending requests to an external system and an offline data store (ODS; DSS component of WSO2). The external system has precedense, but when it doesn't return within one second we want the ODS to answer the request.
Alternative paths:
- The ODS is offline, in this case the system has to wait for the external system for a longer time;
- The external system returns after some time, althought the ODS result has been send to the requester we still want the response of the external system to update our ODS.
We are currently investigating clone and aggregator.

When you say, Thread.sleep(), the first thing came to my mind is using a Class Mediator. This would be an easy way to write custom logic and add a sleep.
The sample for "Writing your own Custom Mediation in Java" will help you to learn the steps for writing a Class Mediator.
You need to copy the Jar containing custom mediator class to repository/components/lib/
When you use thread sleep inside your mediation logic, the request will hang for the specified time period.
This may impact your performance. But you should be able to tune the parameters for your needs.
It all depends on your requirements.

Related

Best way to implement background “timer” functionality in Python/Django

I am trying to implement a Django web application (on Python 3.8.5) which allows a user to create “activities” where they define an activity duration and then set the activity status to “In progress”.
The POST action to the View writes the new status, the duration and the start time (end time, based on start time and duration is also possible to add here of course).
The back-end should then keep track of the duration and automatically change the status to “Finished”.
User actions can also change the status to “Finished” before the calculated end time (i.e. the timer no longer needs to be tracked).
I am fairly new to Python so I need some advice on the smartest way to implement such a concept?
It needs to be efficient and scalable – I’m currently using a Heroku Free account so have limited system resources, but efficiency would also be important for future production implementations of course.
I have looked at the Python threading Timer, and this seems to work on a basic level, but I’ve not been able to determine what kind of constraints this places on the system – e.g. whether the spawned Timer thread might prevent the main thread from finishing and releasing resources (i.e. Heroku Dyno threads), etc.
I have read that persistence might be a problem (if the server goes down), and I haven’t found a way to cancel the timer from another process (the .cancel() method seems to rely on having the original object to cancel, and I’m not sure if this is achievable from another process).
I was also wondering about a more “background” approach, i.e. a single process which is constantly checking the database looking for activity records which have reached their end time and swapping the status.
But what would be the best way of implementing such a server?
Is it practical to read the database every second to find records with an end time of “now”? I need the status to change in real-time when the end time is reached.
Is something like Celery a good option, or is it overkill for a single process like this?
As I said I’m fairly new to these technologies, so I may be missing other obvious solutions – please feel free to enlighten me!
Thanks in advance.
To achieve this you need some kind of scheduling tasks functionality. For a fast simpler implementation is a good solution to use the Timer object from the
Threading module.
A more complete solution is tu use Celery. If you are new, deeping in it will give you a good value start using celery as a queue manager distributing your work easily across several threads or process.
You mentioned that you want it to be efficient and scalable, so I guess you will want to implement similar functionalities that will require multiprocessing and schedule so for that reason my recommendation is to use celery.
You can integrate it into your Django application easily following the documentation Integrate Django with Celery.

Best practices for internal api calls to external apis with buffer

I have different external APIs doing basically the same things but in a different way : add product informations (ext_api).
I would like to make an adapter API that would call, behind the scene, the different external APIs (adapter_api).
My problem is the following : the external APIs are optimised when calling them with a batch of products attributes. However, my API would be optimised on a product by product basis.
I would like to somehow make a buffer of product attributes that would grow when I call my adapter_api. When the number of product attributes reach a certain limit, the ext_api would be called and the buffer would be reset and ready to receive more product attributes.
I'm wondering how to achieve that. I was thinking of making a REST api in python that would store the buffer of product attributes. I would like this REST api to be able to scale on a Kubernetes cluster : it would need low latency, and several instance of this API would write in the buffer of products until one of them reach the limit and make the call to the external API.
Here is what I have in mind :
Are there any best practices concerning the buffer on this use case ? To add some extra informations : my main purpose here is to hide from internal business APIs (not drawn) the complexity of calling many different external APIs each of which have their own rules and credentials.
Thank you very much for your help.
You didn't tell us your performance evaluation criteria.
You did tell us this:
don't know how to store the buffer : I would like to avoid databases or files.
which makes little sense,
since there's a simple answer to this question:
Is there any best practices on this use case ?
Yes. The best practice is to append requests to buffer.txt
and send the batch when that file exceeds some threshold.
A convenient way to implement the threshold would be
to send when getsize() reports a large enough value.
If requests are of quite different size and the batch
size really matters to you, then append a single byte
to a 2nd file, and use size of that to indicate how
many entries are enqueued.
requirements
The heart of your question seems to revolve around
what was left unsaid:
What is the cost function for sending too many "small" batches to ext_api?
What is the cost function for the consumer of the adapter_api, what does it care about? Low latency return, perhaps?
If ext_api permanently fails (say, a day of downtime), do we have some responsibility for quickly notifying the consumer that its updates are going into a black hole?
And why would using the filesystem be inappropriate?
It seems a perfect match for your needs.
Consider using a global in-memory object,
such as list or queue for the batch you're accumulating.
You might want to protect accesses with a lock.
Maybe your client doesn't really want a
one-product-at-a-time API.
Maybe you'd prefer to have your client
accumulate items,
sending only when its batch size is big enough.

Correlation ID in multi-threaded and multi-process application

I've joined a legacy project, where there's virtually no logging. Few days ago we had a production release that failed massively, and we had no clear idea what's going on. That's why improving logging is one of the priorities now.
I'd like to introduce something like "correlation id", but I'm not sure what approach to take. Googling almost always brings me to the solutions that are suitable for "Microservices talking via REST" architecture, which is not my case.
Architecture is a mix of Spring Framework and NodeJS running on the same Unix box - it looks like this:
Spring receives a Request (first thread is started) and does minor processing.
Processing goes to a thread from ThreadPool (second thread is started).
Mentioned second thread starts a separate process of NodeJS that does some HTML processing.
Process ends, second thread ends, first thread ends.
Options that come to my mind are:
Generate UUID and pass it around as argument.
Generate UUID and store it in ThreadLocal, pass it when necessary when changing threads or when starting a process.
Any other ideas how it can be done correctly?
You are on the right track. Generate a UUID and pass it as a header into the request. For any of the request that do not have this header add a filter thats checks for it and add it.
Your filter will pick such a header and can put it in thread local where MDC can pick it from. There after any logging you do will have the correlation id. When making a call to any other process/request you need to make sure you pass this id as an argument/header. And the cycle repeats.
Your thread doing the task should just be aware of this ID. Its upto you to decide how you want to pass it. Try to just separate out such concerns from your biz logic (Using Aspects or any other way you see fit) and more you can keep this under the hood easier it would be for you.
You can refer to this example

Simple Qt threading mechanism with progress?

I want to look for files with given extensions recursively from a given root directory and to display the number of files currently found in my GUI.
Since this kind of processing may be long, the GUI may be blocked.
I could just wait for the end of the processing and get the file count, but I am learning Qt (PyQt), so I see this as a training.
So I have read Qt doc:
When to Use Alternatives to Threads, and I don't think it's for me.
Then I read:
Choosing an Appropriate Approach, and I think my solution is the first one:
Run a new linear function within another thread, optionally with
progress updates during the run
But in this case you have 3 choices:
Qt provides different solutions:
Place the function in a reimplementation of QThread::run() and start the QThread. Emit signals to update progress. OR
Place the function in a reimplementation of QRunnable::run() and add the QRunnable to a QThreadPool. Write to a thread-safe variable
to update progress. OR
Run the function using QtConcurrent::run(). Write to a thread-safe variable to update progress.
Could you tell me how to choose the best one?
I have read some "solutions" but I'd like to understand why you should use one methodology instead of another one.
And also since I am looking for files, I may have a directory in which many files would match the search criteria. So it would mean lots of interruptions. Is there something special to keep in mind regarding this?
Thank you!
From what I know (hopefully more can chime in).
QThread offers support with signal interaction. For example, you'd be able to stop your concurrent function with a signal. Not sure how you'd do that with the other options, if at all.
Things to keep in mind: widgets all have to live in the main thread, but can communicate with other other threads via signals & slots.
Another quick thread on the topic w/ some decent bullet-points.
https://qt-project.org/forums/viewthread/50165/
Best of luck on your project, and welcome to Qt!

Strategy to handle race conditions with regrads to web applicaiton backend?

I have been asked questions regarding race conditions in web application like movie ticket or travel website often in interviews.
Question is something like this.
Say for a bus or plane ticket website, there is only seat left. Two(or many in extreme scenario) users on different computer log into the website at the same time and see that one seat is left. They both go ahead, select that seat and place the order.
Now there are two requests we have to handle. For the first request, we will book the ticket and but for the second request, we have to sort-of throw an error and show the error message to the end user saying the seat is not available.
Say the database schema is some-thing like this:
bus_id, seat_id,is_taken
so for the first request, we make the is_taken for corresponding bus_id, seat_id 1. Then for the second request, there won't be any seat_id with is_taken =0 so we won't book the ticket.
But here, in my opinion, we have put a restriction that at one time, only one request can be handled; Second request can be handled, only after first request has been completed.
However that is not practical, since we might have a huge website with loads of traffic and application running on several servers in parallel. We have to process requests in parallel.
Since I don't have much experience with handling race conditions in these sorts of multi-threaded web applications, I can't quite figure, what is the right way about solving this.
What is the right(even if basic) approach/ design patterns to tackle these scenarios?
Web applictions are necessarily multithreaded. There are two ways of solving this.
Application level (Not preferred)
I am not sure which programming language are you using for building the application. But all the programming language used for building websites will have something like "synchornize" which allows you to prevent two threads accessing same block of code simultaneously.
This is not preferred as this solution is not horizontally scalable. When you decide to do the increase the capacity by running one more instance of your web application, this solution fails terribly.
Database level
This is the preferred solution. You obtain the lock on the record in the database before you update.
SQL provides an option for selecting the record for update.
SELECT * FROM BUS_SEATS WHERE BUS_ID = 1 FOR UPDATE;
Above sql is one way to obtain lock. All the database provide this kind of feature. With this feature you can lock the required row and do the update and ensure consistency in the database.
At some point, there has to be some sort of synchronization.
Since you're using a database, which is usually the bottleneck anyway, you might as well let it handle the race condition.
All you have to do is update the row atomically. The requests can still be handled in parallel by the application.
Sql-pseudocode:
DECLARE #success = false;
UPDATE bus_seats
SET is_taken = 1, success = true
WHERE seat_id = #seat_id AND is_taken=0
return #success;

Resources