ThreadMonitor WSDR06W: Thread "server.startup : 2" (000000c0) has been active for 804,563 milliseconds and may be hung - websphere-7

I am geting below exception while starting server(Websphere)
ThreadMonitor WSD5566W: Thread "server.startup : 2" (000000c0) has been active for 804,563 milliseconds and may be hung.
I have increase thread pool size from 10 to 100 but still have issue.

Related

Bull.js jobs stalling despite timeout being set

I have a Bull queue running lengthy video upload jobs which could take any amount of time from < 1 min up to many minutes.
The jobs stall after the default 30 seconds, so I increased the timeout to several minutes, but this is not respected. If I set the timeout to 10ms it immediately stalls, so it is taking timeout into account.
Job {
opts: {
attempts: 1,
timeout: 600000,
delay: 0,
timestamp: 1634753060062,
backoff: undefined
},
...
}
Despite the timeout, I am receiving a stalled event, and the job starts to process again.
EDIT: I thought "stalling" was the same as timing out, but apparently there is a separate timeout for how often Bull checks for stalled jobs. In other words the real problem is why jobs are considered "stalled" even though they are busy performing an upload.
The problem seems to be your job stalling because of the operation you are running which blocks the event loop. you could convert your code into a non-blocking one and solve the problem that way.
That being said, stalled interval check could be set in queue settings while initiating the queue (more of a quick solution):
const queue = new Bull('queue', {
port: 6379,
host: 'localhost',
db: 0,
settings: {
stalledInterval: 60 * 60 * 1000, // change default from 30 sec to 1 hour, set 0 for disabling the stalled interval
},
})
based on bull's doc:
timeout: The number of milliseconds after which the job should be fail with a timeout error
stalledInterval: How often check for stalled jobs (use 0 for never checking)
Increasing the stalledInterval (or disabling it by setting it as 0) would remove the check that makes sure event loop is running thus enforcing the system to ignore the stall state.
again for docs:
When a worker is processing a job it will keep the job "locked" so other workers can't process it.
It's important to understand how locking works to prevent your jobs from losing their lock - becoming _stalled_ -
and being restarted as a result. Locking is implemented internally by creating a lock for `lockDuration` on interval
`lockRenewTime` (which is usually half `lockDuration`). If `lockDuration` elapses before the lock can be renewed,
the job will be considered stalled and is automatically restarted; it will be __double processed__. This can happen when:
1. The Node process running your job processor unexpectedly terminates.
2. Your job processor was too CPU-intensive and stalled the Node event loop, and as a result, Bull couldn't renew the job lock (see [#488](https://github.com/OptimalBits/bull/issues/488) for how we might better detect this). You can fix this by breaking your job processor into smaller parts so that no single part can block the Node event loop. Alternatively, you can pass a larger value for the `lockDuration` setting (with the tradeoff being that it will take longer to recognize a real stalled job).
As such, you should always listen for the `stalled` event and log this to your error monitoring system, as this means your jobs are likely getting double-processed.
As a safeguard so problematic jobs won't get restarted indefinitely (e.g. if the job processor always crashes its Node process), jobs will be recovered from a stalled state a maximum of `maxStalledCount` times (default: `1`).

too many thread count issue

I have some questions about thread name in the PlayFramework.
I've developed Rest-API service on the Play for about 5 months.
The app simply accesses MySQL, and send back json formatted data to clients.
I've already understood the pit fall of the 'blocking io', so
I create a thread pool for blocking io, and use it all the Future block that
block thread execution.
The definition of the thread pool is as follows.
akka {
actor-system = "myActorSystem"
blocking-io-dispatcher {
type = Dispatcher
executor = "thread-pool-executor"
thread-pool-executor {
fixed-pool-size = 64
}
throughput = 10
}
}
I checked the log file, and be sure that all non-blocking logics
run under thread named 'application-akka.actor.default-dispatcher-#' where
is integer value and that all blocking logics run under thread named
'application-blocking-io-dispatcher'.
Then I checked the all thread name and count using 'Jconsole'.
The number of thread named 'application-akka.actor.default-dispatcher-#' is
always under 13, and thread count of 'application-blocking-io-dispatcher-#'
is always under 30.
However, the total thread count of the JVM under which my app runs increases
constantly. The total number of thread is more than 10,000.
There is so many threads whose name start with 'default-scheduler-' or
'default-akka.actor.default-dispatcher'.
My questions are
a. What's the difference between 'application-akka.actor.default-dispatcher'
and 'default-akka.actor.default-dispatcher-' ?
b. Is there any reason thread count increases?
I want to solve this issue.
Here's my environment.
OS : Windows 10 Pro. 64bit
CPU : Intel(R) Core i7 # 3.5GHz
RAM : 64GB
JVM : 1.8.0_162 64bit
PlayFramework : 2.6
RDBMS : MySQL 5.7.21
Any suggestions will be greatly appreciated. Thanks in advance.
Finally I solved the problem. There was a bug that would not shutdown the instance of
akka's Materializor. After modifying the code, thread count in the VM keeps stable value.
Thanks.

Precise Throughput Timer stuck with simple setup

I have a similar issue as Synchronizing timer hangs with simple setup, but with Precise Throughput Timer which suppose to replace Synchronizing timer:
Certain cases might be solved via Synchronizing Timer, however Precise Throughput Timer has native way to issue requests in packs. This behavior is disabled by default, and it is controlled with "Batched departures" settings
Number of threads in the batch (threads). Specifies the number of samples in a batch. Note the overall number of samples will still be in line with Target Throughput
Delay between threads in the batch (ms). For instance, if set to 42, and the batch size is 3, then threads will depart at x, x+42ms, x+84ms
I'm setting 10 thread number , 1 ramp up and 1 loop count,
I'm adding 1 HTTP Request only (less than 1 seconds response) and before it Test Action with Precise Throughput Timer as a child with the following setup:
Thread stuck after 5 threads succeeded:
EDIT 1
According to #Dimitri T solution:
Change Duration to 100 and add line to logging configuration and got 5 errors:
2018-03-12 15:43:42,330 INFO o.a.j.t.JMeterThread: Stopping Thread: org.apache.jorphan.util.JMeterStopThreadException: The thread is scheduled to stop in -99886 ms and the throughput timer generates a delay of 20004077. JMeter (as of 4.0) does not support interrupting of sleeping threads, thus terminating the thread manually.
EDIT 2
According to #Dimitri T solution set "Loop Count" to -1 executed 10 threads, but if I change Number of threads in batch from 2 to 5 it execute only 3 threads and stops
INFO o.a.j.t.JMeterThread: Stopping Thread: org.apache.jorphan.util.JMeterStopThreadException: The thread is scheduled to stop in -89233 ms and the throughput timer generates a delay of 19999450. JMeter (as of 4.0) does not support interrupting of sleeping threads, thus terminating the thread manually.
Set "Duration (seconds)" in your Thread Group to something non-zero (i.e. to 100)
Depending on what you're trying to achieve you might also want to set "Loop Count" to -1
You can also add the following line to log4j2.xml file:
<Logger name="org.apache.jmeter.timers" level="debug" />
This way you will be able to see what's going on with your timer(s) in jmeter.log file

Mule http connector active threads keep remained in thread pool

I use mule 2.2.1 with the following http incoming receiver configuration.
<http:connector name="abc.connector.http" >
<receiver-threading-profile maxThreadsActive="500"
maxThreadsIdle="50" threadTTL="60000"
poolExhaustedAction="WAIT" maxBufferSize="100" />
</http:connector>
On production server, JVM frequently crashed. The JVM dump created as "hs_err_pid.log" is having threads like: 0x07990c00 JavaThread "ActiveMQ Session Task" [_thread_blocked, id=69807, stack(0x08770000,0x087b0000)].
There are around 2100 to 2300 threads in this crash every time.
My Question is:
Why it shows _thread_blocked?
When there is no load on Server, the count of threads are not reduced then 2000. Why it is so? I use jstack -l PID to check the no of running threads and prstat | grep PID to monitor the NLWP on solaris. It gives result like:
17725 application_pprd 3409M 2593M sleep 59 0 0:10:51 0.1% **java/2375**
How to remove this unused/inactive threads from pool to avoid crash?
How to increase this limit of NLWP for java process?

Guidance OnMessageOptions.AutoRenewTimeout

Can someone offer some more guidance on the use of the Azure Service Bus OnMessageOptions.AutoRenewTimeout
http://msdn.microsoft.com/en-us/library/microsoft.servicebus.messaging.onmessageoptions.autorenewtimeout.aspx
as I haven't found much documentation on this option, and would like to know if this is the correct way to renew a message lock
My use case:
1) Message Processing Queue has a Lock Duration of 5 minutes (The maximum allowed)
2) Message Processor using the OnMessageAsync message pump to read from the queue (with a ReceiveMode.PeekLock) The long running processing may take up to 10 minutes to process the message before manually calling msg.CompleteAsync
3) I want the message processor to automatically renew it's lock up until the time it's expected to Complete processing (~10minutes). If after that period it hasn't been completed, the lock should be automatically released.
Thanks
-- UPDATE
I never did end up getting any more guidance on AutoRenewTimeout. I ended up using a custom MessageLock class that auto renews the Message Lock based on a timer.
See the gist -
https://gist.github.com/Soopster/dd0fbd754a65fc5edfa9
To handle long message processing you should set AutoRenewTimeout == 10 min (in your case). That means that lock will be renewed during these 10 minutes each time when LockDuration is expired.
So if for example your LockDuration is 3 minutes and AutoRenewTimeout is 10 minutes then every 3 minute lock will be automatically renewed (after 3 min, 6 min and 9 min) and lock will be automatically released after 12 minutes since message was consumed.
To my personal preference, OnMessageOptions.AutoRenewTimeout is a bit too rough of a lease renewal option. If one sets it to 10 minutes and for whatever reason the Message is .Complete() only after 10 minutes and 5 seconds, the Message will show up again in the Message Queue, will be consumed by the next stand-by Worker and the entire processing will execute again. That is wasteful and also keeps the Workers from executing other unprocessed Requests.
To work around this:
Change your Worker process to verify if the item it just received from the Message Queue had not been already processed. Look for Success/Failure result that is stored somewhere. If already process, call BrokeredMessage.Complete() and move on to wait for the next item to pop up.
Call periodically BrokeredMessage.RenewLock() - BEFORE the lock expires, like every 10 seconds - and set OnMessageOptions.AutoRenewTimeout to TimeSpan.Zero. Thus if the Worker that processes an item crashes, the Message will return into the MessageQueue sooner and will be picked up by the next stand-by Worker.
I have the very same problem with my workers. Even the message was successfully processing, due to long processing time, service bus removes the lock applied to it and the message become available for receiving again. Other available worker takes this message and start processing it again. Please, correct me if I'm wrong, but in your case, OnMessageAsync will be called many times with the same message and you will ended up with several tasks simultaneously processing it. At the end of the process MessageLockLost exception will be thrown because the message doesn't have a lock applied.
I solved this with the following code.
_requestQueueClient.OnMessage(
requestMessage =>
{
RenewMessageLock(requestMessage);
var messageLockTimer = new System.Timers.Timer(TimeSpan.FromSeconds(290));
messageLockTimer.Elapsed += (source, e) =>
{
RenewMessageLock(requestMessage);
};
messageLockTimer.AutoReset = false; // by deffault is true
messageLockTimer.Start();
/* ----- handle requestMessage ----- */
requestMessage.Complete();
messageLockTimer.Stop();
}
private void RenewMessageLock(BrokeredMessage requestMessage)
{
try
{
requestMessage.RenewLock();
}
catch (Exception exception)
{
}
}
There are a few mounts since your post and maybe you have solved this, so could you share your solution.

Resources