640 enterprise library caching threads - how?

640 enterprise library caching threads - how? - multithreading

We have an application that is undergoing performance testing. Today, I decided to take a dump of w3wp & load it in windbg to see what is going on underneath the covers. Imagine my surprise when I ran !threads and saw that there are 640 background threads, almost all of which seem to say the following:
OS Thread Id: 0x1c38 (651)
Child-SP RetAddr Call Site
0000000023a9d290 000007ff002320e2 Microsoft.Practices.EnterpriseLibrary.Caching.ProducerConsumerQueue.WaitUntilInterrupted()
0000000023a9d2d0 000007ff00231f7e Microsoft.Practices.EnterpriseLibrary.Caching.ProducerConsumerQueue.Dequeue()
0000000023a9d330 000007fef727c978 Microsoft.Practices.EnterpriseLibrary.Caching.BackgroundScheduler.QueueReader()
0000000023a9d380 000007fef9001552 System.Threading.ExecutionContext.runTryCode(System.Object)
0000000023a9dc30 000007fef72f95fd System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
0000000023a9dc80 000007fef9001552 System.Threading.ThreadHelper.ThreadStart()
If i had to give a guess, I'm thinkign that one of these threads are getting spawned for each run of our app - we have 2 app servers, 20 concurrent users, and ran the test approximately 30 times...it's in the neighborhood.
Is this 'expected behavior', or perhaps have we implemented something improperly? The test ran hours ago, so i would have expected any timeouts to have occurred already.
Edit: Thank you all for your replies. It has been requested that more detail be shown about the callstack - here is the output of !mk from sosex.dll.
ESP RetAddr
00:U 0000000023a9cb38 00000000775f72ca ntdll!ZwWaitForMultipleObjects+0xa
01:U 0000000023a9cb40 00000000773cbc03 kernel32!WaitForMultipleObjectsEx+0x10b
02:U 0000000023a9cc50 000007fef8f5f595 mscorwks!WaitForMultipleObjectsEx_SO_TOLERANT+0xc1
03:U 0000000023a9ccf0 000007fef8f59f49 mscorwks!Thread::DoAppropriateAptStateWait+0x41
04:U 0000000023a9cd50 000007fef8e55b99 mscorwks!Thread::DoAppropriateWaitWorker+0x191
05:U 0000000023a9ce50 000007fef8e2efe8 mscorwks!Thread::DoAppropriateWait+0x5c
06:U 0000000023a9cec0 000007fef8f0dc7a mscorwks!CLREvent::WaitEx+0xbe
07:U 0000000023a9cf70 000007fef8fba72e mscorwks!Thread::Block+0x1e
08:U 0000000023a9cfa0 000007fef8e1996d mscorwks!SyncBlock::Wait+0x195
09:U 0000000023a9d0c0 000007fef9463d3f mscorwks!ObjectNative::WaitTimeout+0x12f
0a:M 0000000023a9d290 000007ff002321b3 *** ERROR: Module load completed but symbols could not be loaded for Microsoft.Practices.EnterpriseLibrary.Caching.DLL
Microsoft.Practices.EnterpriseLibrary.Caching.ProducerConsumerQueue.WaitUntilInterrupted()(+0x0 IL)(+0x11 Native)
0b:M 0000000023a9d2d0 000007ff002320e2 Microsoft.Practices.EnterpriseLibrary.Caching.ProducerConsumerQueue.Dequeue()(+0xf IL)(+0x18 Native)
0c:M 0000000023a9d330 000007ff00231f7e Microsoft.Practices.EnterpriseLibrary.Caching.BackgroundScheduler.QueueReader()(+0x9 IL)(+0x12 Native)
0d:M 0000000023a9d380 000007fef727c978 System.Threading.ExecutionContext.runTryCode(System.Object)(+0x18 IL)(+0x106 Native)
0e:U 0000000023a9d440 000007fef9001552 mscorwks!CallDescrWorker+0x82
0f:U 0000000023a9d490 000007fef8e9e5e3 mscorwks!CallDescrWorkerWithHandler+0xd3
10:U 0000000023a9d530 000007fef8eac83f mscorwks!MethodDesc::CallDescr+0x24f
11:U 0000000023a9d790 000007fef8f0cbd2 mscorwks!ExecuteCodeWithGuaranteedCleanupHelper+0x12a
12:U 0000000023a9da20 000007fef945e572 mscorwks!ReflectionInvocation::ExecuteCodeWithGuaranteedCleanup+0x172
13:M 0000000023a9dc30 000007fef7261722 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)(+0x60 IL)(+0x51 Native)
14:M 0000000023a9dc80 000007fef72f95fd System.Threading.ThreadHelper.ThreadStart()(+0x8 IL)(+0x2a Native)
15:U 0000000023a9dcd0 000007fef9001552 mscorwks!CallDescrWorker+0x82
16:U 0000000023a9dd20 000007fef8e9e5e3 mscorwks!CallDescrWorkerWithHandler+0xd3
17:U 0000000023a9ddc0 000007fef8eac83f mscorwks!MethodDesc::CallDescr+0x24f
18:U 0000000023a9e010 000007fef8f9ae8d mscorwks!ThreadNative::KickOffThread_Worker+0x191
19:U 0000000023a9e330 000007fef8f59374 mscorwks!TypeHandle::GetParent+0x5c
1a:U 0000000023a9e380 000007fef8e52045 mscorwks!SVR::gc_heap::make_heap_segment+0x155
1b:U 0000000023a9e450 000007fef8f66139 mscorwks!ZapStubPrecode::GetType+0x39
1c:U 0000000023a9e490 000007fef8e1c985 mscorwks!ILCodeStream::GetToken+0x25
1d:U 0000000023a9e4c0 000007fef8f594e1 mscorwks!Thread::DoADCallBack+0x145
1e:U 0000000023a9e630 000007fef8f59399 mscorwks!TypeHandle::GetParent+0x81
1f:U 0000000023a9e680 000007fef8e52045 mscorwks!SVR::gc_heap::make_heap_segment+0x155
20:U 0000000023a9e750 000007fef8f66139 mscorwks!ZapStubPrecode::GetType+0x39
21:U 0000000023a9e790 000007fef8e20e15 mscorwks!ThreadNative::KickOffThread+0x401
22:U 0000000023a9e7f0 000007fef8e20ae7 mscorwks!ThreadNative::KickOffThread+0xd3
23:U 0000000023a9e8d0 000007fef8f814fc mscorwks!Thread::intermediateThreadProc+0x78
24:U 0000000023a9f7a0 00000000773cbe3d kernel32!BaseThreadInitThunk+0xd
25:U 0000000023a9f7d0 00000000775d6a51 ntdll!RtlUserThreadStart+0x1d

Yes, the caching block has some - issues - with regard to the scavenger threads in older versions of Entlib, particularly if things are coming in faster than the scavenging settings let them come out.
This was completely rewritten in Entlib 5, so that now you'll never have more than two threads sitting in the caching block, regardless of the load, and usually it'll only be one.
Unfortunately there's no easy tweak to change the behavior in earlier versions. The best you can do is change the cache settings so that each scavenge will clean out more items at a time so not as many scavenge requests need to get scheduled.

640 threads is very bad for performance. If they are all waiting for something, then I'd say it's a fair bet that you have a deadlock and they will never exit. If they are all running (not waiting)... well, with 600+ threads on a 2 or 4 core processor none of them will get enough time slices to run very far! ;>
If your app is set up with a main thread that waits on the thread handles to find out when the threads exit, and the background threads get caught up in a loop or in a wait state and never exit the thread proc, then the process and all of its threads will never exit.
Check your thread code to make sure that every threadproc has a clear path to exit the threadproc. It's bad form to write an infinite loop in a background thread on the assumption that the thread will be forcibly terminated when the process shuts down.
If the background thread code spins in a loop waiting for an event handle to signal, make sure that you have some way to signal that event so that the thread can perform a normal orderly exit. Otherwise, you need to write the background thread to wait on multiple events and unblock when any one of the events signals. One of those events can be the activity that the background thread is primarily interested in and the other can be a shutdown event.
From the names of things in the stack dump you posted, it would appear that the thread is waiting for something to appear in the ProducerConsumerQueue. Investigate how that queue object is supposed to be shut down, probably on the producer side, and whether shutting down the queue will automatically release all consumers that are waiting on that queue.
My guess is that either the queue is not being shut down correctly or shutting it down does not implicitly release the consumers that are waiting on it. If the latter case, you may need to pump a terminate message through the queue to wake up all the consumers waiting on that queue and tell them to break out of their wait loop and exit.

You have an major issue. Every Thread occupies 1MB of stack and there is significant cost paid for Context Switching every thread in and out. Especially it becomes worst with managed code because every time GC has to run , it would have walk the threads stack to look for roots and when these threads are paged to the disk the cost to read from the disk is expensive,which adds up Perf issue.
Creating threads are Bad unless you know what you are doing? Jeffery Richter has written in detail about this.
To solve the above issue I would look what these threads are blocked on and also put a break-point on Thread Create (example sxe ct within windbg)
And later rearchitect from avoid creating threads , instead use the thread pool.
It would have been nice to some callstacks of these threads.

In Microsoft Enterprise Library 4.1, the BackgroundScheduler class creates a new thread each time an object is instantiated. It will be fixed in version 5.0. I do not know enough of this Microsoft Library to advise you how to avoid that behavior, but you may try the beta version: http://entlib.codeplex.com/wikipage?title=EntLib5%20Beta2

Related

redis-py not closing threads on exit

I am using redis-py 2.10.6 and redis 4.0.11.
My application uses redis for both the db and the pubsub. When I shut down I often get either hanging or a crash. The latter usually complains about a bad file descriptor or an I/O error on a file (I don't use any) which happens while handling a pubsub callback, so I'm guessing the underlying issue is the same: somehow I don't get disconnected properly and the pool used by my redis.Redis object is alive and kicking.
An example of the output of the former kind of error (during _read_from_socket):
redis.exceptions.ConnectionError: Error while reading from socket: (9, 'Bad file descriptor')
Other times the stacktrace clearly shows redis/connection.py -> redis/client.py -> threading.py, which proves that redis isn't killing the threads it uses.
When I star the application I run:
self.redis = redis.Redis(host=XXXX, port=XXXX)
self.pubsub = self.redis.pubsub()
subscriptions = {'chan1': self.cb1, 'chan2': self.cb2} # cb1 and cb2 are functions
self.pubsub.subscribe(**subscriptions)
self.pubsub_thread = self.pubsub.run_in_thread(sleep_time=1)
When I want to exit the application the last instruction I execute in main is a call to a function in my redis using class, whose implementation is:
self.pubsub.close()
self.pubsub_thread.stop()
self.redis.connection_pool.disconnect()
My understanding is that in theory I do not even need to do any of these 'closing' calls, and yet, with or without them, I still can't guarantee a clean shutdown.
My question is, how am I supposed to guarantee a clean shutdown?

I ran into this same issue and it's largely caused by improper handling of the shutdown by the redis library. During the cleanup, the thread continues to process new messages and doesn't account for situations where the socket is no longer available. After scouring the code a bit, I couldn't find a way to prevent additional processing without just waiting.
Since this is run during a shutdown phase and it's a remedy for a 3rd party library, I'm not overly concerned about the sleep, but ideally the library should be updated to prevent further action while shutting down.
self.pubsub_thread.stop()
time.sleep(0.5)
self.pubsub.reset()
This might be worth an issue log or PR on the redis-py library.

PubSubWorkerThread class check for self._running.is_set() inside the loop.
To do a "clean shutdown" you should call self.pubsub_thread._running.clean() to set the thread event to false and it will stop.
Check how it work here:
https://redis.readthedocs.io/en/latest/_modules/redis/client.html?highlight=PubSubWorkerThread#

How to safely use [NSTask waitUntilExit] off the main thread?

I have a multithreaded program that needs to run many executables at once and wait for their results.
I use [nstask waitUntilExit] in an NSOperationQueue that runs it on non-main thread (running NSTask on the main thread is completely out of the question).
My program randomly crashes or runs into assertion failures, and the crash stacks always point to the runloop run by waitUntilExit, which executes various callbacks and handlers, including—IMHO incorrectly—KVO and bindings updating the UI, which causes them to run on non-main thread (It's probably the problem described by Mike Ash)
How can I safely use waitUntilExit?
Is it a problem of waitUntilExit being essentially unusable, or do I need to do something special (apart from explicitly scheduling my callbacks on the main thread) when using KVO and IB bindings to prevent them from being handled on a wrong thread running waitUntilExit?

As Mike Ash points out, you just can't call waitUntilExit on a random runloop. It's convenient, but it doesn't work. You have to include "doesn't work" in your computation of "is this actually convenient?"
You can, however, use terminationHandler in 10.7+. It does not pump the runloop, so shouldn't create this problem. You can recreate waitUntilExit with something along these lines (untested; probably doesn't compile):
dispatch_group group = dispatch_group_create();
dispatch_group_enter(group);
task.terminationHandler = ^{ dispatch_group_leave(group); };
[task launch];
dispatch_group_wait(group, DISPATCH_TIME_FOREVER);
// If not using ARC:
dispatch_release(group);

Hard to say without general context of what are you doing...
In general you can't update interface from the non main threads. So if you observe some KVO notifications of NSTasks in non main thread and update UI then you are wrong.
In that case you can fix situation by simple
-[NSObject performSelectorOnMainThread:];
or similar when you want to update UI.
But as for me more grace solution:
write separated NSOperationQueue with maxConcurentOperationsCount = 1 (so FIFO queue) and write subclass of NSOperation which will execute NSTask and update UI through delegate methods. In that way you will control amount of executing tasks in application. (or you may stop all of them or else)
But high level solution for your problem I think will be writing privileged helper tool. Using this approach you will get 2 main benefits: your NSTask's will be executes in separated process and you will have root privilegies for executing your tasks.
I hope my answer covers your problem.

Does timer need mutex

I use timer in my program:
timer = new Qtimer(); connect(timer, SIGNAL(timeout()), this, SLOT(readData())); timer.start(1000);
And there is also other slots which may be triggered by UI interation:
/*SLOT FUNCTION*/ on_pushbutton_triggered(){..../*write data*/...}.
(the code is written in qt, but I think it's a common question)
So I worry about the potential problem: may readData() reads wrong data while on_pushbutton_triggereed() is writting data?
I am not so familiar with how the timer really work behind the screen: is it in the same thread with my program?
Will readData() and on_pushbutton_triggereed() be called, executed, finished serially and have no mutex problem(that is: I have to use lock() and unlock())? Thank you for reading! I really hope for your hints!

Qt is using an event loop to implement concurrent activity in general and QTimer in particular within a single thread.
The event providers (QTimer in this case) are producing events and publish them to the event loop. Then they are processed according to their priority and order of publishing. This approach doesn't require any synchronization as there is only one section of code executed at the time, so it's safe to access data.
On Unix-like systems ps -eLf command will show information about all processes (PID column in the output) and their threads (LWP column). NLWP column shows how many threads particular process has.

NIO Server : use a worker thread or not?

I'm building a server with NIO, I have two questions.
Do I have to use a worker thread or a thread pool to process the messages received, or let the main thread do all this stuff ( I have performance needs).
I have two kind of sending, sendNow method which ends with selector.selectNow() and simple send method which ends with selector.wakeup().. can I have loss of data those methods?
thanks

If possible try to do it all in one thread. It gets very complicated very quickly otherwise.
I don't know why you think a sendNow() method needs to end with either selectNow() or wakeup(), but neither of them is intrinsically going to cause a data loss.

Is the first thread that gets to run inside a Win32 process the "primary thread"? Need to understand the semantics

I create a process using CreateProcess() with the CREATE_SUSPENDED and then go ahead to create a little patch of code inside the remote process to load a DLL and call a function (exported by that DLL), using VirtualAllocEx() (with ..., MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE), WriteProcessMemory(), then call FlushInstructionCache() on that patch of memory with the code.
After that I call CreateRemoteThread() to invoke that code, creating me a hRemoteThread. I have verified that the remote code works as intended. Note: this code simply returns, it does not call any APIs other than LoadLibrary() and GetProcAddress(), followed by calling the exported stub function that currently simply returns a value that will then get passed on as the exit status of the thread.
Now comes the peculiar observation: remember that the PROCESS_INFORMATION::hThread is still suspended. When I simply ignore hRemoteThread's exit code and also don't wait for it to exit, all goes "fine". The routine that calls CreateRemoteThread() returns and PROCESS_INFORMATION::hThread gets resumed and the (remote) program actually gets to run.
However, if I call WaitForSingleObject(hRemoteThread, INFINITE) or do the following (which has the same effect):
DWORD exitCode = STILL_ACTIVE;
while(STILL_ACTIVE == exitCode)
{
Sleep(500);
if(!GetExitCodeThread(hRemoteThread, &exitCode))
break;
}
followed by CloseHandle() this leads to hRemoteThread finishing before PROCESS_INFORMATION::hThread gets resumed and the process simply "disappears". It is enough to allow hRemoteThread to finish somehow without PROCESS_INFORMATION::hThread to cause the process to die.
This looks suspiciously like a race condition, since under certain circumstances hRemoteThread may still be faster and the process would likely still "disappear", even if I leave the code as is.
Does that imply that the first thread that gets to run within a process becomes automatically the primary thread and that there are special rules for that primary thread?
I was always under the impression that a process finishes when its last thread dies, not when a particular thread dies.
Also note: there is no call to ExitProcess() involved here in any way, because hRemoteThread simply returns and PROCESS_INFORMATION::hThread is still suspended when I wait for hRemoteThread to return.
This happens on Windows XP SP3, 32bit.
Edit: I have just tried Sysinternals Process Monitor to see what's happening and I could verify my observations from before. The injected code does not crash or anything, instead I get to see that if I don't wait for the thread it doesn't exit before I close the program where the code got injected. I'm thinking whether the call to CloseHandle(hRemoteThread) should be postponed or something ...
Edit+1: it's not CloseHandle(). If I leave that out just for a test, the behavior doesn't change when waiting for the thread to finish.

The first thread to run isn't special.
For example, create a console app which creates a suspended thread and terminates the original thread (by calling ExitThread). This process never terminates (on Windows 7 anyway).
Or make the new thread wait for five seconds then exit. As expected, the process will live for five seconds and exit when the secondary thread terminates.
I don't know what's happening with your example. The easiest way to avoid the race is to make the new thread resume the original thread.
Speculating now, I do wonder if what you're doing isn't likely to cause problems anyway. For example, what happens to all the DllMain calls for the implicitly loaded DLLs? Are they unexpectedly happening on the wrong thread, are they being skipped, or are they postponed until after your code has run and the main thread starts?

Odds are good that the thread with the main (or equivalent) function calls ExitProcess (either explicitly or in its runtime library). ExitProcess, well, exits the entire process, including killing all threads. Since the main thread doesn't know about your injected code, it doesn't wait for it to finish.
I don't know that there's a good way to make the main thread wait for yours to complete...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string