How to run a time consuming task on startup in a web application while deploying it - multithreading

we are facing an issue with initializing our cache at server startup or application deployment. Initializing the cache involves
Querying a database to get the list of items
Making an rmi call for each item
Listening to the data on a JMS queue/topic
Constructing the cache
This initialization process is in startup code. All this is taking lot of time due to which the deployment is taking lot of time or server start time is increasing.
So what I proposed is to create a thread in the startup and run the initialization code in it. I wrote a sample application to demonstrate it.
It involves a ServletContextListener, a filter. In the listener I am creating a new thread in which the HeavyProcess will run. When it finishes an event will be fired which the filter will be listening. On receiving the event the filter will allow incoming http requests. Until then the filter redirects all clients to a default page which shows a message that the application is initializing.
I presented this approach and few concerns were raised.
We should not ideally create a thread because handling the thread will be difficult.
My question is why cant we create a thread like these in web applications.
If this is not good, then what is the best approach?

If you can use managed threads, avoid unmanaged ones. The container has no control over unmanaged threads, and unmanaged threads survive redeployments, if you do not terminate these properly. So you have to register unmanaged threads, and terminate these somehow (which is not easy as well, because you have to handle race-conditions carefully).
So one solution is to use #Startup, and something like this:
#Schedule(second = "*/45", minute = "*", hour = "*")
protected void asyncInit(final Timer timer) {
timer.cancel();
// Do init here
// Set flag that init has been completed
}
I have learned about this method here: Executing task after deployment of Java EE application
So this gives you an async managed thread, and deployment will not be delayed by #PostConstruct. Note the timer.cancel().
Looking at your actual problem: I suggest using a cache which supports "warm starts".
For example, Infinispan supports cache stores so that the cache content survives restarts. If you have a cluster, there are distributed or replicated caching modes as well.
JBoss 7 embeds Infinispan (it's an integrated service in the same JVM), but it can be operated independently as well.
Another candidate is Redis (and any other key/value store with persistence will do as well).

In general, creating unmanaged threads in a Java EE environment is a bad idea. You will loose container managed transactions, user context and many more Java EE concepts within your unmanaged thread. Additionally unmanaged threads may block the conainer on shutdown if your thread handling isn't appropriate.
Which Java EE Version are you using? Perhaps you can use Servlet 3.0's async feature?
Or call a asynchronous EJB for doing the heavy stuff at startup (#PostConstruct). The call will then set a flag when its job is done.

Related

Execute something which takes 5 seconds (like email send) but return with response immediately?

Context
In an ASP.NET Core application I would like to execute an operation which takes say 5 seconds (like sending email). I do know async/await and its purpose in ASP.NET Core, however I do not want to wait the end of the operation, instead I would like to return back to the to the client immediately.
Issue
So it is kinda Fire and Forget either homebrew, either Hangfire's BackgroundJob.Enqueue<IEmailSender>(x => x.Send("hangfire#example.com"));
Suppose I have some more complex method with injected ILogger and other stuff and I would like to Fire and Forget that method. In the method there are error handling and logging.(note: not necessary with Hangfire, the issue is agnostic to how the background worker is implemented). My problem is that method will run completely out of context, probably nothing will work inside, no HttpContext (I mean HttpContextAccessor will give null etc) so no User, no Session etc.
Question
How to correctly solve say this particular email sending problem? No one wants wait with the response 5 seconds, and the same time no one wants to throw and email, and not even logging if the send operation returned with error...
How to correctly solve say this particular email sending problem?
This is a specific instance of the "run a background job from my web app" problem.
there is no universal solution
There is - or at least, a universal pattern; it's just that many developers try to avoid it because it's not easy.
I describe it pretty fully in my blog post series on the basic distributed architecture. I think one important thing to acknowledge is that since your background work (sending an email) is done outside of an HTTP request, it really should be done outside of your web app process. Once you accept that, the rest of the solution falls into place:
You need a durable storage queue for the work. Hangfire uses your database; I tend to prefer cloud queues like Azure Storage Queues.
This means you'll need to copy all the data over that you will need, since it needs to be serialized into that queue. The same restriction applies to Hangfire, it's just not obvious because Hangfire runs in the same web application process.
You need a background process to execute your work queue. I tend to prefer Azure Functions, but another common approach is to run an ASP.NET Core Worker Service as a Win32 service or Linux daemon. Hangfire has its own ad-hoc in-process thread. Running an ASP.NET Core hosted service in-process would also work, though that has some of the same drawbacks as Hangfire since it also runs in the web application process.
Finally, your work queue processor application has its own service injection, and you can code it to create a dependency scope per work queue item if desired.
IMO, this is a normal threshold that's reached as your web application "grows up". It's more complex than a simple web app: now you have a web app, a durable queue, and a background processor. So your deployment becomes more complex, you need to think about things like versioning your worker queue schema so you can upgrade without downtime (something Hangfire can't handle well), etc. And some devs really balk at this because it's more complex when "all" they want to do is send an email without waiting for it, but the fact is that this is the necessary step upwards when a baby web app becomes distributed.

Node/Express: running specific CPU-instensive tasks in the background

I have a site that makes the standard data-bound calls, but then also have a few CPU-intensive tasks which are ran a few times per day, mainly by the admin.
These tasks include grabbing data from the db, running a few time-consuming different algorithms, then reuploading the data. What would be the best method for making these calls and having them run without blocking the event loop?
I definitely want to keep the calculations on the server so web workers wouldn't work here. Would a child process be enough here? Or should I have a separate thread running in the background handling all /api/admin calls?
The basic answer to this scenario in Node.js land is to use the core cluster module - https://nodejs.org/docs/latest/api/cluster.html
It is an acceptable API to :
easily launch worker node.js instances on the same machine (each instance will have its own event loop)
keep a live communication channel for short messages between instances
this way, any work done in the child instance will not block your master event loop.

web app, jsp, and multithreaded

I'm currently building a new web app, in Java EE with Apache Tomcat as a webserver.
I'm using jsps and servlets in my work.
my question is:
I have a very standard business logic. None of it is synchronized, will only a single thread of my web app logic will run at any given time?
Since making all functions "synchronized" will cause a huge overhead, is there any alternative for it?
If i have a static method in my project. that makes a very simple thing, such as:
for (int i=0;i<10000;i++ ) {
counter++;
}
If the method is not thread safe, what is the behavior of such method in a multi user app? Is it unexpected behavior?!
and again in a simple manner:
if i mean to build a multi user web app, should i "synchronize" everything in my project? if not all , than what should i sync?
will only a single thread of my web app logic will run at any given time?
No, the servlet container will create only one instance of each servlet and call that servlet's doService() method from multiple threads. In Tomcat by default you can expect up to 200 threads calling your servlet at the same time.
If the servlet was single-thread, your application would be slow like hell, see SingleThreadModel - deprecated for a reason.
making all function "synchronized" will make a huge overhead, is there any alternative for it?
There is - your code should be thread safe or better - stateless. Note that if your servlet does not have any state (like mutable fields - unusual), it can be safely accessed by multiple threads. The same applies to virtually all objects.
In your sample code with a for loop - this code is thread safe if counter is a local variable. It is unsafe if it is a field. Each thread has a local copy of the local variables while all threads accessing the same object access the same fields concurrently (and need synchronization).
what should i sync?
Mutable, global, shared state. E.g. In your sample code if counter is a field and is modified from multiple threads, incrementing it must be synchronized (consider AtomicInteger).

Destroy a wcf thread

I'm using multithreaded wcf maxConcurrentCalls = 10. By logging calls to my service I see that 10 different threads are executing in my service class and that they are reused in the following calls.
Can I tell WCF to destroy/delete a thread so it will create a new one on the next call?
This is because I have thread-static state that I sometimes want to be cleared (on unexpected exceptions). I am using the thread-static scope to gain performance.
WCF doesn't create new threads. It uses threads from a thread pool to service requests. So when a request begins it draws a thread from this pool to execute the request and after it finishes it returns the thread to the pool. The way that WCF uses threads underneath is an implementation detail that you should not rely on. You should never use Thread Static in ASP.NET/WCF to store state.
In ASP.NET you should use HttpContext.Items and in WCF OperationContext to store some state that would be available through the entire request.
Here's a good blog post you may take a look at which illustrates a nice way to abstract this.

Prevent thread blocking in Tomcat

I have a Java servlet that acts as a facade to other webservices deployed on the same Tomcat instance. My wrapper servlet creates N more threads, each which invokes a webservice, collates the response and sends it back to the client. The webservices are deployed all on the same Tomcat instance as different applications.
I am seeing thread blocking on this facade wrapper service after a few hours of deployment which brings down the Tomcat instance. All blocked threads are endpoints to this facade webservice (like http://domain/appContext/facadeService)
Is there a way to control such thread-blocking, due to starvation of available threads that actually do the processing? What are the best practices to prevent such deadlocks?
The common solution to this problem is to use the Executor framework. You need to express your web service call as Callable and pass it to the executor either as it stands, or as a Collection<Callable> (see the Javadoc for complete list of options).
You have two choices to control the time. First is to use parameters of an appropriate method of the Executor class where you specify the max web service timeout. Another option is to do get the result (which is expressed as Future<T>) and use .get(long, TimeUnit) to specify the maximum amount of time you can wait for a result.

Resources