Implementing concurrency in Java EE Web application - multithreading

We are creating a web app where we need to have concurrency for a few business cases. This application would be deployed in a tomcat container. I know that creating user defined threads in the web container is a bad idea and am trying to explore options that i have.
Have my multi-threaded library used as a JCA component. We are averse to using this approach because of the learning curve that might be involved.
I know that there's WorkManager API's available but i guess thats not implemented by tomcat so this option goes out.
I did some research and found out that CommonJ library is recommended for Tomcat. Has anyone used it?
Also, I see that there are ManagedExecutorService available but I am not sure how to use it and is it different from WorkManager API's (and the commonJ library)?
Any help on this appreciated. By the way, using JMS is out of question because of deployment environment. I am inclining towards points 3 and 4 but i do not have much knowledge on it. Could someone guide pls.

Since you're using Tomcat, don't worry about it and do whatever you want. The Servlet section of Java EE makes no mention of threads etc. That's mostly under the EJB section.
Tomcat itself doesn't do much at all in terms of worrying about managing threads, it's a pretty non-invasive container.
Its best to tie your threads to a ServletContextListener so that you can pay attention to the application lifecycle, and shutdown your stuff when you app shuts down, but beyond that, don't overly concern yourself about it and use whatever you're happy with.
Addenda -
The simple truth is Tomcat does not care, and it's not that sophisticated. Tomcat has a thread pool for each of the HTTP listeners and that's about the end of its level of management. Tomcat is not going to take threads from a quiet HTTP listener and dedicate them to a busy one, for example. If Tomcat was truly interested in how you create threads, it would prevent you from doing so -- and it doesn't.
That means that thread management outside of the HTTP context falls squarely on your shoulders as an implementor. Java EE exposes these kinds of facilities, and the interfaces make great reads. But the simple truth is that the theoretical capabilities espoused by the Java EE API docs, and the reality of modern implementations is far different, particularly on low end systems such as Tomcat.
Not to disparage Tomcat. Tomcat is a great piece of software. But for most of its use cases, the extra management capability simply is not necessary.
Setting up your own thread pool (using the JDK provided facilities) and working with your own thread lifecycle model will likely see you successfully through whatever project you're working on. It's really not a big deal.

There are a couple of options. Regardless container restrictions that might or might not be in place, spawning individual threads on demand is nearly always a bad idea. It's not that this wouldn't work in a Servlet environment, but the number of threads you can potentially create might get completely out of hand.
The simplest solution to go with is a plain old Java SE thread pool via a normal executer service. Start the pool in a Servlet listener and provide access to it via some static variable. Not overly pretty, but it gets the job done. Depending on your exact use case this might actually be the best solution (if your use case is pretty low-level).
Another option is to add OpenEJB to your war, and then take advantage of the #Asynchronous annotation.
Yet another option, is to realize that one typically uses Tomcat if the business requirements are extremely simple or low-level. That's pretty much the entire point of using something as bare bone a Tomcat. As soon as you find yourself in need of adding (tons of) libraries, you might have outgrown Tomcat and might be better of using a server that already has the functionality you need (in this case asynchronous execution). Examples are TomEE, GlassFish, Resin, JBoss AS, Geronimo, etc.

Every Servlet -Java EE base component for HTTP request processing- in your Web Application is a Singleton, and each request runs in its own independent thread so there is no need to start/stop user generated threads on your own. Your Web Container -in this case Tomcat- manages all that stuff.
Besides that, you need to have in mind some considerations for multi-threaded processing in your code. For example, since Servlets are singletons and many threads are spawned for this class is a bad idea to have instance attributes in this components.

I have used CommonJ many times and it works very well. It can be initialized and destroyed from a ServletContextListener.

Related

JBoss AS 7 and Node.js hype

I dont want to compare apple with orange, this was already done, for example, in:
http://blog.shinetech.com/2011/06/10/nodejs-from-the-enterprise-java-perspective/
http://adamgent.com/post/10440924094/does-java-have-an-answer-to-node-js
Actually, I do not have a concurrency problem with JBoss AS 7, but if I had on, what should I do?
Should I:
SOLUTION I:
horizontal/vertical scale
use HTTPD in front of JBoss AS
use #Asynchrounous or Messaging Systems (such as AKKA) for ALL tasks
...
SOLUTION II:
use node.js (rhino.js)
Can anyone provide practical experiences where JBoss AS 7 failed to scale? I have never had such experiences for myself.
For example, imagine a web application with 10.000.000 concurrent request *on a single machine* (with a single jboss-as-instance or a single node.js-instance).
What would be the result?
Would node.js works normally while JBoss AS 7 crash?
In general, Java EE is primarily targetted at high-scale application and most of the design decisions comply to that, from persistence over session and cache replication to view layers.
While Node.js may be good for certain tasks, Java EE 6 is (IMO) good if you want to use it for whole stack.
There's more comming regarding scalability, like JSR-352. So Java EE is yet to improve.
Regarding your question - "while JBoss AS 7 crash" - it doesn't generally crash. Usually it's faulty or poorly written application what crashes. Node.js gives certain API. Java EE gives a multitude of APIs, which sometimes are misused or conceptually misunderstood, which results in problems like OutOfMemoryException. With a properly designed application, there should be no problem with horizontal scaling.
There are few proven examples of node.js supporting a large and popular application as of yet. Like any new technology of this type, it will get tested and if it scales will garner support. Nobody can truly answer your question from a technical standpoint.

using threads in servlets

I am confused if we should make our own threads in servlet or not,as they have threading mechanism
internally?. If yes how can we make sure if the program thread safe? How to implement thread safe mechanism in servlets.
You are asking two different questions:
I am confused if we should make our own threads in servlet or not,as
they have threading mechanism internally?.
Normally, you should not start threads in a Java EE application. If you need seperate threads, make sure you use a Scheduler Service that your application knows about, so that it has the chance to shut down the threads when the application is shut down. Quartz is what's used most of the time.
If yes how can we make sure if the program thread safe? How to
implement thread safe mechanism in servlets.
Servlets are just like any other Java class. Find a tutorial on thread safety or read Java Concurrency in Practice.
From what you write in the comment, I understand that you have a set of threads continuusly monitoring log-files and sending email if something interresting is found in the log.
First question: why is this a servlet? Is there a web-gui? What is this used for?
For the log-scanning part, I would have implemented that as a separate process outside of the servlet-container. For everything this process found which it needs to send somewhere, I would add a message to a JMS-queue. Then I would create a messagedriven bean to recieve messages from this queue and send them as email. (This is really an integration problem, transforming messages from JMS to email, you might want to look into something like Mule to solve this).
As for how to integrate this with your servlet, it depends on what your servlet does in addition to scanning logs (I suppose it presents the user with some kind of interface)
With this design, you can chose to re-write the programs generating the log in the future. Instead of having one program writing log and another program parsing the log, the first program might as well put the interresting message directly on the JMS-queue. In other words, you can change the log-generation part of your architecture in the future, without having to re-write the mail-sending part.
I also had a similar concern.
Only EJB specification disallows the creation of threads from the application.
It is ok to start a thread from a servlet.
I have done it many times with no problems but to be honest I am not 100% sure:
that this is allowed by container but is violating a standard
or
it is allowed by all containers.
But in Tomcat I never had an issue starting threads from a servlet.
You can make it thread safe the same way you do in every multithreading program.
You will use all the available constructs offered by Java for synchronization.

Are Message-Driven Beans (MDB) bound to the same restrictions as other EJB beans?

In a Message-Driven Bean am I restricted to the same rules of Session Beans (EJB3 or EJB3.1), i.e:
use the java.lang.reflect Java Reflection API to access information unavailable by way of the security rules of the Java runtime environment
read or write nonfinal static fields
use this to refer to the instance in a method parameter or result
access packages (and classes) that are otherwise made unavailable by the rules of Java programming language
define a class in a package
use the java.awt package to create a user interface
create or modify class loaders and security managers
redirect input, output, and error streams
obtain security policy information for a code source
access or modify the security configuration objects
create or manage threads
use thread synchronization primitives to synchronize access with other enterprise bean instances
stop the Java virtual machine
load a native library
listen on, accept connections on, or multicast from a network socket
change socket factories in java.net.Socket or java.net.ServerSocket, or change the stream handler factory of java.net.URL.
directly read or write a file descriptor
create, modify, or delete files in the filesystem
use the subclass and object substitution features of the Java serialization protocol
It is always a good idea not to create threads manually (ExecutorService seems fine in some cases though).
Actually MDBs are very often used to address this limitation: instead of creating a separate thread, send some task object (put something like MyJob extends Serializable in ObjectMessage) into the queue and let it be executed in MDB thread pool. This approach is much more heavyweight but scales very well and you don't have to manage any threads manually. In this scenario JMS is just a fancy way of running jobs asynchronously.
These EJB restrictions are typically not hard restrictions. In fact, they're not caveats on making your EJBs work properly, they're more like advisories on how to make your EJBs portable across EJB containers.
From time to time, some very fussy EJB container providers (cough.... WebSphere... cough) will actually enforce these restrictions through java security policies, but I would say about half of those restrictions are routinely ignored ( I mean just using log4j in your MDB potentially violates about 30% of them).
Violating the the other 70% probably indicates some architectural or design problem.
So, can you call System.exit() in an MDB ? The answer is yes, but only once... :)
It sounds like, in your case, you need some of these restrictions to reign in potentially misbehaving plugins. I don't know if MDBs are going to get you out of that problem. I suppose it depends on how much you trust the third party developers, but rather than use the invocation based models in EJB, I would install the components as JMX ModelMBeans. You can use the java security model to limit what they can do, but I suppose that would defeat the purpose.
Perhaps using some run (or load) time AOP byte code engineering, you could rewrite all requests for threads to be redirected to a per component thread factory that you allocate and limits the threads that can be created. Because you don't want to stop them from doing whatever it is that they do, you just don't want them to take down the whole server when they crash/stall/misbehave.
Interesting problem.

Node.js event vs thread programming on server side

We are planning to start a fairly complex web-portal which is expected to attract good local traffic and I've been told by my boss to consider/analyse node.js for the serve side.
I think scalability and multi-core support can be handled with an Nginx or Cherokee in front.
1) Is this node.js ready for some serious/big business?
2) Does this 'event/asynchronous' paradigm on server side has the potential to support the heavy traffic and data operation ? considering the fact that 'everything' is being processed in a single thread and all the live connections would be lost if it got crashed (though its easy to restart).
3) What are the advantages of event based programming compared to thread based style ? or vice-versa.
(I know of higher cost associated with thread switching but hardware can be squeezed with event model.)
Following are interesting but contradicting (to some extent) papers:-
1) http://www.usenix.org/events/hotos03/tech/full_papers/vonbehren/vonbehren_html
2) http://pdos.csail.mit.edu/~rtm/papers/dabek:event.pdf
Node.js is developing extremely rapidly, and most of its functionality is sturdy and ready for business. However, there are a lot of places where its lacking, like database drivers, jquery and DOM, multiple http headers, etc. There are plenty of modules coming up tackling every aspect, but for a production environment you'll have to be careful to pick ones that are stable.
Its actually much MUCH more efficient using a single thread than a thousand (or even fifty) from an operating system perspective, and benchmarks I've read (sorry, don't have them on hand -- will try to find them and link them later) show that it's able to support heavy traffic -- not sure about file-system access though.
Event based programming is:
Cleaner-looking code than threaded code (in JavaScript, that is)
The JavaScript engine is extremely efficient with processing events and handling callbacks, and its easily one of the languages seeing the most runtime optimization right now.
Harder to fit when you are thinking in terms of control flow. With events, you can never be sure of the flow. However, you can also come to think of it as more dynamic programming. You can treat each event being fired as independent.
It forces you to be more security-conscious when programming, for the above reason. In that sense, its better than linear systems, where sometimes you take sanitized input for granted.
As for the two papers, both are relatively old. The first benchmarks against this, which as you can see, has a more recent note about these studies:
http://www.eecs.harvard.edu/~mdw/proj/seda/
It also cites the second paper you linked about what they have done, but refuses to comment on its relevance to the comparison between event-based systems and thread-based ones :)
Try yourself to discover the truth
See What is Node.js? where we cover exactly that:
Node in production is definitely possible, but far from the "turn-key" deployment seemingly promised by the docs. With Node v0.6.x, "cluster" has been integrated into the platform, providing one of the essential building blocks, but my "production.js" script is still ~150 lines of logic to handle stuff like creating the log directory, recycling dead workers, etc. For a "serious" production service, you also need to be prepared to throttle incoming connections and do all the stuff that Apache does for PHP. To be fair, Rails has this exact problem. It is solved via two complementary mechanisms: 1) Putting Rails/Node behind a dedicated webserver (written in C and tested to hell and back) like Nginx (or Apache / Lighttd). The webserver can efficiently serve static content, access logging, rewrite URLs, terminate SSL, enforce access rules, and manage multiple sub-services. For requests that hit the actual node service, the webserver proxies the request through. 2) Using a framework like "Unicorn" that will manage the worker processes, recycle them periodically, etc. I've yet to find a Node serving framework that seems fully baked; it may exist, but I haven't found it yet and still use ~150 lines in my hand-rolled "production.js".

Managing multiple-processes: What are the common strategies?

While multithreading is faster in some cases, sometimes we just want to spawn multiple worker processes to do work. This has the benefits of not crashing the main app if one of the worker crashes, and that the user doesn't need to worry a lot about inter-locking stuffs.
COM+'s Application Pooling seems like a good way to achieve this on Windows. The downside is that we need to write a COM+ wrapper for the worker process.
However, when I search for Application Pooling on Google, it seems like most of its usages are related to IIS. Don't other applications (such as scientific/graphics) find it useful to spawn multiple worker processes?
So there are several questions:
Why isn't COM+ more popular in areas other than IIS? If I write a non-IIS application and want to use process management on Windows, should I go with COM+ or are there better alternatives out there?
What would be the cross platform way to do it? Are there libraries out there that give me a "process pool" (worker processes will intelligently pick up work, can be managed, etc.)
I can't offer any answers to the COM aspect of your question, but it's worth noting there's another world (besides HPC MPI) where multi-processing (rather than the more common multi-threading approach) is apparently alive, well and thriving: Python.
Why ? Python's GIL ("global interpreter lock") cripples most attempts to multithread python code so badly that multiprocessing is the generally recommended approach to parallelising Python on SMP. The standard library includes process pools; there are various other options too.
Python certainly ought to satisfy any multi-platform requirement!
You might want to investigate how the apache web server manages process pools. From version 2.0 it runs natively on windows and one of the multi-processing models it supports are process pools. A part of apache is also APR (apache portable runtime), which handles platform-specific issues.
No one can answer why something is not popular because may be no body is looking for what you are looking for. After .NET came in picture, people shifted from COM to Managed Environment, before .NET, COM and ATL and relative other technologies were quite painful to implement and they would crash and were also quite difficult to debug.
That is the reason, managed environment came in existence.
However, .NET 4 onwards, parallel libraries give much more power to user for parallel programming and also you can spawn and control other proceeses.
For multiplatform, you can look for zvrba's answer.
Yes, other applications--especially science applications--find it useful to spawn multiple processes. Since few super-computers run Microsoft Windows, scientists generally avoid using anything that ties them to a Microsoft platform. Nothing related to COM will help scientists leverage their enormous existing code base written in Fortran.
People who choose to run IIS have generally already drunk the Microsoft Koolaid, so they have fewer inhibitions to tying themselves to Microsoft's proprietary platforms, which is why COM-specific terminology will get lots of hits related to IIS.
One of the open standards for doing what you want is the Message Passing Interface. Several implementations exist and some of them run on supercomputers using Fortran. Some of them run on cheaper computers using sexier languages.
See http://en.wikipedia.org/wiki/Message_Passing_Interface
There hasn't been a mob rushing through the doors of COM application pooling primarily because of two factors:
COM is a pain in the ass to deal with compared to just about anything else
Threading can be a headache, but it's a lot easier and more convenient to manage than inter-process communication
COM application pooling was essentially created for IIS. It has one very specific benefit over normal multithreading: the multiple processes are fully isolated from each other. This is important for data security and for app stability when dealing with third party plugins of questionable stability.
Scientific computing generally doesn't need strong data security isolation between operations, and I would venture to guess that scientific computing doesn't rely much on third party plugins of questionable stability. When doing big math operations, you're either using a sexy numerics library that had better be rock solid to be taken seriously, or you're using your own code, in which case crashes should be fixed and repeat offenders should be spanked.
Oh, and all crashes except stack overflow can be trapped and dealt with within a multithreaded app, especially if it's your own code.
In short, COM app pooling is overkill for just about anything other than IIS.
Google's webbrowser chrome is a multi-process architecture software. It is open source, so you can check out its code and see how to manage processes.

Resources