How does ThreadMonitor work? - multithreading

I use work manager to do database synchronization in several university to core banking:
the sync will start every 5 minutes until completed.
but I've got an error:
ThreadMonitor W WSVR0605W: Thread "WorkManager.DefaultWorkManager : 1250" (00001891) has been active for 1009570 milliseconds and may be hung. There is/are 2 thread(s) in total in the server that may be hung.
This error causes the database sync to rollback automatically.
I found some documentation here: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp?topic=/com.ibm.websphere.express.doc/info/exp/ae/ttrb_confighangdet.html
ThreadMonitor always monitors the active thread, and after the thread is active for more than N milliseconds than set in alarm threshold, ThreadMonitor always gives the above error message. However, I notice all my sync operations take longer than N to complete.
My question is, does ThreadMonitor just report a warning when the active thread runs more than N milliseconds (i.e., it's a hung thread) or does ThreadMonitor also kill hung threads?

ThreadMonitor simply monitors the threads which are active beyond a threshold time.
This should serve as warnings to the WAS administrators that some thread is using a lot of time to process (which might be genuine or otherwise)
The ThreadMonitor will not kill the thread.
In many cases, it might genuinely take a long time to process (depending on what it does) so the ThreadMonitor simply restricts itself to identifying potentially hung threads and leaves the actual job of finding out what the thread is doing (based on thread dumps and locating the specific ThreadID)
The threshold time can be configured for your servers if you desire to have a different value from the default.
#Muky,
com.ibm.websphere.threadmonitor.threshold is the property that you need to configure.
Look at this URL: http://pic.dhe.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=%2Fcom.ibm.websphere.express.doc%2Finfo%2Fexp%2Fae%2Fttrb_confighangdet.html for more details.
HTH
Manglu

Related

What could delay pthread_join() after threads have exited successfully?

My main thread creates 8 worker threads (on a machine with a 4 core, 8 thread CPU), and then waits for them to complete with pthread_join(). The threads all exit successfully, and the pthread_join() successfully completes. However, I log the times that the threads exit and the time that pthread_join() completes for the last thread; the threads all exit essentially simultaneously (not surprising -- they are servicing a queue of work to be done), and the pthread_join() sometimes takes quite a long time to complete -- I have seen times in excess of 15 minutes after the last worker thread has exited!
More information: The worker threads are all set at the highest allowable round-robin scheduling priority (SCHED_RR); I have tried setting the main thread (waiting on the pthread_join()s) to the same thing and have also tried setting it to the highest SCHED_FIFO priority (where so far I have only seen it take as long as 27 seconds to complete; more testing is needed). My test is very CPU and memory intensive and takes about 90 -- 100 minutes to complete; during that time it is generally using all 8 threads at close to 100% capacity, and fairly quickly gets to where it is using about 90% of the 256 GB of RAM. This is running on a Linux (Fedora) OS at run level 3 (so no graphics or Window Manager -- essentially just a terminal -- because at the usual run level 5, a process using that much memory gets killed by the system).
An earlier version that took closer to 4 hours to complete (I have since made some performance improvements...) and in which I did not bother explicitly setting the priority of the main thread once took over an hour and 20 minutes for the pthread_join() to complete. I mention it because I don't really think that the main thread priority should be much of an issue -- there is essentially nothing else happening on the machine, it is not even on the network.
As I mentioned, all the threads complete with EXIT_SUCCESS. And in lighter weight tests, where the processing is over in seconds, I see no such delay. And so I am left suspecting that this is a scheduler issue. I know very little about the scheduler, but informally the impression I have is that here is this thread that has been waiting on a pthread_join() for well over an hour; perhaps the scheduler eventually shuffles it off to a queue of "very unlikely to require any processing time" tasks, and only checks it rarely.
Okay, eventually it completes. But ultimately, to get my work done, I have to run about 1000 of these, and some are likely to take a great deal longer than the 90 minutes or so that the case I have been testing takes. So I have to worry that the pthread_join() in those cases might delay even longer, and with 1000 iterations, those delays are going to add up to real time...
Thanks in advance for any suggestions.
In response to Nate's excellent questions and suggestions:
I have used top to spy on the process when it is in this state; all I can report is that it is using minimal CPU (maybe an occasional 2%, compared to the usual 700 - 800% that top reports for 8 threads running flat out, modulo some contention for locked resources). I am aware that top has all kinds of options I haven't investigated, and will look into how to run it to display information about the state of the main thread. (I see: I can use the -H option, and look in the S column... will do.) It is definitely not a matter of all the memory being swapped out -- my code is very careful to stay below the limit of physical memory, and does some disk I/O of its own to save and restore information that can't fit in memory. As a result little to no virtual memory is in use at any time.
I don't like my theory about the scheduler either... It's just the best I have been able to come up with so far...
As far as how I am determining when things happen: The exiting code does:
time_t now;
time(&now);
printf("Thread exiting, %s", ctime(&now));
pthread_exit(EXIT_SUCCESS);
and then the main thread does:
for (int i = 0; i < WORKER_THREADS; i++)
{
pthread_join(threads[i], NULL);
}
time(&now);
printf("Last worker thread has exited, %s", ctime(&now));
I like the idea of printing something each time pthread_join() returns, to see if we're waiting for the first thread to complete, the last thread to complete, or one in the middle, and will make that change.
A couple of other potentially relevant facts that have occurred to me since my original posting: I am using the GMP (GNU Multiprecision Arithmetic) library, which I can't really imagine matters; and I am also using a 3rd party (open source) library to create "canonical graphs," and that library, in order to be used in a multithreaded environment, does use some thread_local storage. I will have to dig into the particulars; still, it doesn't seem like cleaning that up should take any appreciable amount of time, especially without also using an appreciable amount of CPU.

How to prevent consistent java pause pattern on Linux Mint

I have a Java app running on Linux Mint. EVERY minute, the program shows a very noticeable slow down -- A pause. The pause is a consistent 3 to 4 seconds. When we run further instances of the same program, they also pause 3 to 4 seconds each minute. Each program stops on a different second of the minute.
latest update:
After the last update (below) increasing the thread pool's thread count saw the GUI problem go away. After running for around ~40 hours we observed a thread leak in the Jetty HttpClient blocking-GET (Request.send()) call. To explain the mechanics, using the Executor class: a main thread runs every few minutes. It uses Executor to run an independent thread to call the host with a HTTP GET command, Jetty's HttpClient.request.send().
After about 40 hours of operation, there was a spike on the number of threads running in the HttpClient pool. So for 40 hours, the same threads ran fine. The working hypothesis is that around that time, one or more send() calls did not complete or time-out and have not returned to the calling thread. Essentially this/these threads are hung inside the Jetty Client.
When watching each regular cycle in jVisualMV we see the normal behaviour each cycle; some HttpClient threads fire up for the host GET, execute and go-away in just a few seconds. Also on the monitor are about 10 thread belonging to the Jetty HttpClient thread pool that have been 'present' for (now) 10 hours.
The expectation is that there was some error in underlying client or network processing. I am surprised there was no time-out exception or programming exception. There are some clear question I can ask now.
What can happen inside HttpClient that could just hang a Request.send()
What is the time-out on the call return? I would think there will still be absolute time-outs or checks for locking, etc (no?)
Can the I/O system hang and leave the caller-thread hanging -- While Java obediently ...
Fires the manager thread at the scheduled time, then
The next Http.Request.send() happens,
A new thread(s) from the pool run-up for the next send (as appears to have happened).
While the earlier send() is stuck in limbo
Can I limit or other wise put a clean-up on these stuck threads?
This was happening before we increased the thread pool size. What's happened is that the 'blame' has become more focused on the problem area. also we are suspicious of the underlying system because we also had lock-ups with Apache HttpClient again around the same (non-specific) time of day.
(prior update) ...
The pause behaviour observed is the JavaFX GUI does not update/refresh; the display's clock (textView), setText() call was logged during the freeze with two x updates per second (that's new information). The clock doesn't update (on Mint Linux), it continues to update when running on Windows. To forestall me repeating myself for questions about GC, logs, probes, etc. the answer will be the same; we have run extensive diagnostics over weeks now. The issue is unmistakably a mix of: Linux JVM / Linux Mint / Threads (per JavaFX). Other piece of new data is that increasing the thread-pool count by +2, appears to remove the freeze -- Further testing is needed to confirm that and tune the numbers. The question though is "What are the parameters that make the difference between the two platforms?"
We have run several instances of the program on Windows for days with no pauses. When we run on a Mint Linux platform we see the freeze, it is very consistent.
The program has several running threads running on a schedule. One thread opens the internet for an http socket. When we comment out that area, the pause vanishes. However we don't see that behaviour using Windows. Experiments point to something specific to the Mint networking I/O subsystem, linux scheduling, the Linux Java 8 JVM, or some interaction between the two.
As you may guess, we are tearing our hair out on this one. For example, we turned off logging and the pause remained. We resumed logging and just did one call to the http server, pause every 60 seconds, on the same second count. This happens even when we do no other processing. We tried different http libraries, etc. Seems very clear it is in the JVM or Linux.
Does anyone know of a way to resolve this?

.net 4.0 c# : Pausing/Resuming parallel running threads from threadpool temporarily?

I could setup a multi-threaded environment using the .net ThreadPool and I do get a significant performance benefit. This runs in the background of my application.
Now when a new task is requested by the user, I want it to get maximum CPU resources to maximize performance. Hence I would like to temporarily pause all the threads that I began (via the ThreadPool.Queueuserworkitem method) and then resume once the new task, requested by the user in foreground, is completed.
There could be several solutions to my problem:
a. Starting lesser background threads so that any new user request gets some share of the CPU resources. (but I loose the performance gain I had :( )
b. Set higher priority for the thread for a new user requested task. (not sure if this works?)
c. Suspending/resuming the ThreadPool threads I began. But suspending / resuming / interrupting threads is highly discouraged. Moreover, this could get tricky and error prone.
Any other ideas?
Note: when the user makes a request, performing the task would normally not take more than 300ms. However, when I start ThreadPool threads in background, it now takes about 3 seconds to complete (10 times worse)! I am OK if it takes 500-800ms though. All background threads complete in about 8 seconds (and I am OK if they take 1-2 seconds more). Hence, I am trying out option ( a ) for now.
Thanks in advance!
Be noted that Thread scheduling is done by CPU and hence cannot be directed from within a program. Only thing that can be done is setting ThreadPriority (that too on new Threads, not on ThreadPool threads). Check section Limitations of Using the Thread Pool
As your requirement is to suspend all background threads while executing a new task, what you can do is to create a class level flag.
Now you can put checkpoints in methods to be executed in Background task. At the checkpoints, check the class level flag, if it is set, call Thread.Sleep, which should (NOT MUST) trigger thread context switch by OS/CPU thread scheduler.
Putting checkpoints in methods (to be executed by ThreadPool) is analogous to putting checkpoints for cancellation support in background worker.

Detecting low user activity and checking email on background

I'm writing an application that must do some things in background: check emails and parse them to inject some data in a database, and connect to a web service to check status for some asynchronous operations.
Right now, my solution is a simple timer that performs these operations on a predefined schedule: email every five minutes, and web service checks every minute (but these are only performed if there is pending activity, so most of the time this does nothing.)
Right now I'm not using a thread for this (I'm early in the development stage.) But my plan is to create a background thread and let it do the work offline.
Couple of questions:
I plan to control everything in the timer(s). Set up a global variable (rudimentary "locking",) start the thread. If the "lock" is already set, ignore it. The thread cleans it up on termination. Should I use a more robust locking / queue mechanism for my threads? (I already have OmniThread installed)
How can I run a thread with low priority? I don't want the application to feel sluggish when the background thread is performing data insertion or networking.
Is there a clean way to verify for user activity and start this thread only when the user is not busy at the keyboard / mouse?
Please have in mind that I'm not experienced with threads. I wrote an FTP sync application once so I'm not a complete newbie, but that was long time ago.
For part 3 of your question, the Windows API has a GetLastInputInfo function which should return information about the last time the user did something. It even says it's
"This function is useful for input idle detection". I did plan to use this for something myself, but haven't had a chance to test it.
Edit: Delphi implementation link
I plan to control everything in the timer(s). Set up a global variable (rudimentary "locking",) start the thread. If the "lock" is already set, ignore it. The thread cleans it up on termination. Should I use a more robust locking / queue mechanism for my threads? (I already have OmniThread installed)
I wouldn't bother with the Timer at all. Make your thread's loop look like this, and you'll have your delays. You will NOT need a lock because there's only one thread, it will not sleep until the previous job is over.
procedure YourThread;
var N: Integer;
begin
while not Terminated do
begin
// Figure out if there's a job to do
// Do the job
// Sleep for a while, but give the thread a chance to notice
// it needs to terminate.
for N := 1 to 500 do
if not Terminated then
Sleep(100);
end;
end;
How can I run a thread with low priority? I don't want the application to feel sluggish when the background thread is performing data insertion or networking.
Don't bother. You can easily use SetThreadPriority but it's not worth the trouble. If your background thread is waiting for I/O (networking), then it will not consume any CPU resource. Even if your background thread works full-speed, your GUI will not feel sluggish because Windows does a good job of splitting available CPU time among all available threads.
Is there a clean way to verify for user activity and start this thread only when the user is not busy at the keyboard / mouse?
Again, why bother checking for user activity? Checking for email is network (ie: I/O) bound, the thread checking for email will mostly be idle.
Can you not just do all this in the background thread, getting rid of all the thread micro-management? Seems to me that you could just loop around a sleep(60000) call in the background thread. Check the web service every time round the loop, check the email every 5 times round. You can set the priority to tpLower, if you want, but this thread is going to be sleeping or blocked on I/O nearly all the time, so I don't think it's even worth the typing.
I would be surprised if such a thread is noticeable at all to the user at the keyboard/mouse, no matter when it runs.
'Set up a global variable (rudimentary "locking",) start the thread' - what is this global variable intended to do? What is there to lock?

Question about app with multiple threads in a few CPU-machine

Given a machine with 1 CPU and a lot of RAM. Besides other kinds of applications (web server etc.), there are 2 other server applications running on that machine doing the exact same kind of processing although one uses 10 threads and the other users 1 thread. Assume the processing logic for each request is 100% CPU-bound and typically takes no longer than 2 seconds to finish. The question is whose throughput, in terms of transactions processed per minute, might be better? Why?
Note that the above is not a real environment, I just make up the data to make the question clear. My current thinking is that there should be no difference because the apps are 100% CPU-bound and therefore if the machine can handle 30 requests per minute for the 2nd app, it will also be able to handle 3 requests per minute for each of the 10 threads of the 1st app. But I'm glad to be proven wrong, given the fact that there are other applications running in the machine and one application might not be always given 100% CPU time.
There's always some overhead involved in task switching, so if the threads aren't blocking on anything, fewer threads is generally better. Also, if the threads aren't executing the same part of code, you'll get some cache flushing each time you swtich.
On the other hand, the difference might not be measurable.
Interesting question.
I wrote a sample program that does just this. It has a class that will go do some processor intensive work, then return. I specify the total number of threads I want to run, and the total number of times I want the work to run. The program will then equally divide the work between all the threads (if there's only one thread, it just gets it all) and start them all up.
I ran this on a single proc VM since I could find a real computer with only 1 processor in it anymore.
Run independently:
1 Thread 5000 Work Units - 50.4365sec
10 Threads 5000 Work Units - 49.7762sec
This seems to show that on a one proc PC, with lots of threads that are doing processor intensive work, windows is smart enough not to rapidly switch them back and fourth, and they take about the same amount of time.
Run together (or as close as I could get to pushing enter at the same time):
1 Thread 5000 Work Units - 99.5112sec
10 Threads 5000 Work Units - 56.8777sec
This is the meat of the question. When you run 10 threads + 1 thread, they all seem to be scheduled equally. The 10 threads each took 1/10th longer (because there was an 11th thread running) while the other thread took almost twice its time (really, it got 1/10th of its work done in the first 56sec, then did the other 9/10ths in the next 43sec...which is about right).
The result: Window's scheduler is fair on a thread level, but not on a process level. If you make a lot of threads, it you can leave the other processes that weren't smart enought to make lots of threads high and dry. Or just do it right and us a thread pool :-)
If you're interested in trying it for yourself, you can find my code:
http://teeks99.com/ThreadWorkTest.zip
The scheduling overhead could make the app with 10 threads slower than the one with 1 thread. You won't know for sure unless you create a test.
For some background on multithreading see http://en.wikipedia.org/wiki/Thread_(computer_science)
This might very well depend on the operating system scheduler. For example, back in single-thread days the scheduler knew only about processes, and had measures like "niceness" to figure out how much to allocate.
In multithreaded code, there is probably a way in which one process that has 100 threads doesn't get 99% of the CPU time if there's another process that has a single thread. On the other hand, if you have only two processes and one of them is multithreaded I would suspect that the OS may give it more overall time. However, AFAIK nothing is really guaranteed.
Switching costs between threads in the same process may be cheaper than switching between processes (e.g., due to cache behavior).
One thing you must consider is wait time on the other end of the transaction. Having multiple threads will allow you to be waiting for a response on one while preparing the next transaction on the next. At least that's how I understand it. So I think a few threads will turn out better than one.
On the other hand you must consider the overhead involved with dealing on multiple threads. The details of the application are important part of the consideration here.

Resources