windbg output from !thread? - multithreading

Running windbg on a full memory dump. The !process command generates thread information (see below). Frequently the THREAD line is followed by multiple event-like things, like "fffffa800a0c0060 SynchronizationTimer". What do they signify? Are they objects the thread owns? Or is waiting on?
THREAD fffffa8005718b50 Cid 16c0.1660 Teb: 00000000fffd8000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Alertable
fffffa800a0c0060 SynchronizationTimer
fffffa800a7c1060 SynchronizationTimer
<etc...>
fffffa8007a9f4e0 SynchronizationEvent
fffffa800ae48b20 SynchronizationTimer
Not impersonating
DeviceMap fffff8a01480f1e0

A thread doesn't really own objects, so it has to be the latter.
The documentation doesn't say this, but it's mentioned, for example, here: How can I work out what events are being waited for with WinDBG in a kernel debug session

Related

How can I know who call System.gc() in spark streaming program?

The GC time is too long in my spark streaming programme. In the GC log, I found that Someone called System.gc() in the programme. I do not call System.gc() in my code. So the caller should be the api I used.
I add -XX:-DisableExplicitGC to JVM and fix this problem. However, I want to know who call the System.gc().
I tried some methods.
Use jstack. But the GC is not so frequent, it is difficult to dump the thread that call the method.
I add trigger that add thread dump when invoke method java.lang.System.gc() in JProfiler. But it doesn't seem to work.
How can I know who call System.gc() in spark streaming program?
You will not catch System.gc with jstack, because during stop-the-world pauses JVM does not accept connections from Dynamic Attach tools, including jstack, jmap, jcmd and similar.
It's possible to trace System.gc callers with async-profiler:
Start profiling beforehand:
$ profiler.sh start -e java.lang.System.gc <pid>
After one or more System.gc happens, stop profiling and print the stack traces:
$ profiler.sh stop -o traces <pid>
Example output:
--- Execution profile ---
Total samples : 6
Frame buffer usage : 0.0007%
--- 4 calls (66.67%), 4 samples
[ 0] java.lang.System.gc
[ 1] java.nio.Bits.reserveMemory
[ 2] java.nio.DirectByteBuffer.<init>
[ 3] java.nio.ByteBuffer.allocateDirect
[ 4] Allocate.main
--- 2 calls (33.33%), 2 samples
[ 0] java.lang.System.gc
[ 1] sun.misc.GC$Daemon.run
In the above example, System.gc is called 6 times from two places. Both are typical situations when JDK internally forces Garbage Collection.
The first one is from java.nio.Bits.reserveMemory. When there is not enough free memory to allocate a new direct ByteBuffer (because of -XX:MaxDirectMemorySize limit), JDK forces full GC to reclaim unreachable direct ByteBuffers.
The second one is from GC Daemon thread. This is called periodically by Java RMI runtime. For example, if you use JMX remote, periodic GC is automatically enabled once per hour. This can be tuned with -Dsun.rmi.dgc.client.gcInterval system property.

What could cause trace/breakpoint trap (core dumped)? [duplicate]

I know this question has been asked before, but I have read all the threads and I didn't find an answer.
From the moment I execure run to start debugging my project, I get this : Program received signal SIGTRAP, Trace/breakpoint trap. [Switching to Thread 6]. When I do ctrl+c, gdb tells me : Program received signal SIGINT, Interrupt.
0x00000000 in ?? ()
Usually it'll tell me which file and which function it got interrupted at not 0x00000000 in ?? ()
GDB no longer hits breakpoints, and what makes matter crazier is the fact that a colleague and I, are sharing the same session (the debug is done using cygwin with a remote machine) and it works fine for them but not for me.
when I try to get info about the threads using info threads here's what I get :
[New Thread 20]
[New Thread 21]
[New Thread 22]
Id Target Id Frame
4 Thread 22 (ssp=0xa9004d5c) 0x00000000 in ?? ()
3 Thread 21 (ssp=0xa9002e64) 0x00000010 in ?? ()
2 Thread 20 (ssp=0xa9000ef4) 0x00000000 in ?? ()
The current thread <Thread ID 1> has terminated. See `help thread'
there's no thread 6, there's no * to indicate which thread gdb is using. And I don't even know if that's linked to the problem.
Can anyone please help me?
You are not providing nearly enough info to help you. Details matter, and you are withholding them. Versions of GDB and gdbserver matter, how you invoke GDB and gdbserver matter, what warnings you receive from GDB (if any) matter.
Now, this error message:
Program received signal SIGTRAP, Trace/breakpoint trap. [Switching to Thread 6]
usually means that gdbserver has not attached one of the threads of your process, and that thread has tried to execute breakpoint instruction (you do have breakpoints set before this happens, don't you?).
One of the reasons this may happen is when your GDB loads "wrong" libthread_db.so (one that doesn't match the target libc.so.6).
what makes matter crazier is the fact that a colleague and I, are sharing the same session (the debug is done using cygwin with a remote machine) and it works fine for them but not for me.
I am not sure what you mean by "same session", but it's probably not "when he types commands, they work; but when I type the same commands into the same GDB, they don't".
One difference between you and your colleague could be LD_LIBRATY_PATH environment variable setting. Another could be in ~/.gdbinit or in ./.gdbinit.
I suggest running gdb -nx to get rid of the latter, and unsetting LD_LIBRARY_PATH to get rid of the former.
The problem with the whole thing and for some reason no one seemed to notice it is this :
this is how I call gdb /usr/local/build/gdbx.y/gdb/gdb what I should be doing is this : /usr/local/build/gdbx.y/build/gdb/gdb
It was a path problem.

OpenGL: glClientWaitSync on separate thread

I am using glMapBufferRange with the GL_MAP_UNSYNCHRONIZED_BIT to map a buffer object. I then pass the returned pointer to a worker thread to compute the new vertices asynchronously. The Object is doubly buffered so I can render one object while the other is written to. Using GL_MAP_UNSYNCHRONIZED_BIT gives me significantly better performance (mainly because glUnmapBuffer returns sooner), but I am getting some visual artifacts (despite the double buffering) - so I assume either the GPU starts rendering while the DMA upload is still in progress, or the worker thread starts writing to the vertices too early.
If I understand glFenceSync, glWaitSync and glClientWaitSync correctly, then I am supposed to address these issues in the following way:
A: avoid having the GPU render the buffer object before the DMA process completed:
directly after glUnmapBufferRange, call on the main thread
GLsync uploadSync = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
glFlush();
glWaitSync(uploadSync, 0, GL_TIMEOUT_IGNORED);
B: avoid writing to the buffer from the worker thread before the GPU has finished rendering it:
direclty after glDrawElements, call on the main thread
GLsync renderSync = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
and on the worker thread, right before starting to write data to the pointer that has previously been returned from glMapBufferRange
glClientWaitSync(renderSync,0,100000000);
...start writing to the mapped pointer
1: Is my approach to the explicit syncing correct?
2: How can I handle the second case? I want to wait in the worker thread (I don't want to make my main thread stall), but I cannot issue glCommands from the worker thread. Is there another way to check if the GLsync has been signalled other than the gl call?
What you could do is create an OpenGL context in the worker thread, and then share it with the main thread. Next:
Run on the main thread:
GLsync renderSync = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
glFlush();
then
Run on the worker thread:
glClientWaitSync(renderSync,0,100000000);
The glFlush on the main thread is important, since otherwise you could have an infinite wait. See also the OpenGL docs:
4.1.2 Signaling
Footnote 3: The simple flushing behavior defined by SYNC_FLUSH_COMMANDS_BIT will not help when waiting for a fence command issued in another context’s command stream to complete. Applications which block on a fence sync object must take additional steps to assure that the context from which the corresponding fence command was issued has flushed that command to the graphics
pipeline.

640 enterprise library caching threads - how?

We have an application that is undergoing performance testing. Today, I decided to take a dump of w3wp & load it in windbg to see what is going on underneath the covers. Imagine my surprise when I ran !threads and saw that there are 640 background threads, almost all of which seem to say the following:
OS Thread Id: 0x1c38 (651)
Child-SP RetAddr Call Site
0000000023a9d290 000007ff002320e2 Microsoft.Practices.EnterpriseLibrary.Caching.ProducerConsumerQueue.WaitUntilInterrupted()
0000000023a9d2d0 000007ff00231f7e Microsoft.Practices.EnterpriseLibrary.Caching.ProducerConsumerQueue.Dequeue()
0000000023a9d330 000007fef727c978 Microsoft.Practices.EnterpriseLibrary.Caching.BackgroundScheduler.QueueReader()
0000000023a9d380 000007fef9001552 System.Threading.ExecutionContext.runTryCode(System.Object)
0000000023a9dc30 000007fef72f95fd System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
0000000023a9dc80 000007fef9001552 System.Threading.ThreadHelper.ThreadStart()
If i had to give a guess, I'm thinkign that one of these threads are getting spawned for each run of our app - we have 2 app servers, 20 concurrent users, and ran the test approximately 30 times...it's in the neighborhood.
Is this 'expected behavior', or perhaps have we implemented something improperly? The test ran hours ago, so i would have expected any timeouts to have occurred already.
Edit: Thank you all for your replies. It has been requested that more detail be shown about the callstack - here is the output of !mk from sosex.dll.
ESP RetAddr
00:U 0000000023a9cb38 00000000775f72ca ntdll!ZwWaitForMultipleObjects+0xa
01:U 0000000023a9cb40 00000000773cbc03 kernel32!WaitForMultipleObjectsEx+0x10b
02:U 0000000023a9cc50 000007fef8f5f595 mscorwks!WaitForMultipleObjectsEx_SO_TOLERANT+0xc1
03:U 0000000023a9ccf0 000007fef8f59f49 mscorwks!Thread::DoAppropriateAptStateWait+0x41
04:U 0000000023a9cd50 000007fef8e55b99 mscorwks!Thread::DoAppropriateWaitWorker+0x191
05:U 0000000023a9ce50 000007fef8e2efe8 mscorwks!Thread::DoAppropriateWait+0x5c
06:U 0000000023a9cec0 000007fef8f0dc7a mscorwks!CLREvent::WaitEx+0xbe
07:U 0000000023a9cf70 000007fef8fba72e mscorwks!Thread::Block+0x1e
08:U 0000000023a9cfa0 000007fef8e1996d mscorwks!SyncBlock::Wait+0x195
09:U 0000000023a9d0c0 000007fef9463d3f mscorwks!ObjectNative::WaitTimeout+0x12f
0a:M 0000000023a9d290 000007ff002321b3 *** ERROR: Module load completed but symbols could not be loaded for Microsoft.Practices.EnterpriseLibrary.Caching.DLL
Microsoft.Practices.EnterpriseLibrary.Caching.ProducerConsumerQueue.WaitUntilInterrupted()(+0x0 IL)(+0x11 Native)
0b:M 0000000023a9d2d0 000007ff002320e2 Microsoft.Practices.EnterpriseLibrary.Caching.ProducerConsumerQueue.Dequeue()(+0xf IL)(+0x18 Native)
0c:M 0000000023a9d330 000007ff00231f7e Microsoft.Practices.EnterpriseLibrary.Caching.BackgroundScheduler.QueueReader()(+0x9 IL)(+0x12 Native)
0d:M 0000000023a9d380 000007fef727c978 System.Threading.ExecutionContext.runTryCode(System.Object)(+0x18 IL)(+0x106 Native)
0e:U 0000000023a9d440 000007fef9001552 mscorwks!CallDescrWorker+0x82
0f:U 0000000023a9d490 000007fef8e9e5e3 mscorwks!CallDescrWorkerWithHandler+0xd3
10:U 0000000023a9d530 000007fef8eac83f mscorwks!MethodDesc::CallDescr+0x24f
11:U 0000000023a9d790 000007fef8f0cbd2 mscorwks!ExecuteCodeWithGuaranteedCleanupHelper+0x12a
12:U 0000000023a9da20 000007fef945e572 mscorwks!ReflectionInvocation::ExecuteCodeWithGuaranteedCleanup+0x172
13:M 0000000023a9dc30 000007fef7261722 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)(+0x60 IL)(+0x51 Native)
14:M 0000000023a9dc80 000007fef72f95fd System.Threading.ThreadHelper.ThreadStart()(+0x8 IL)(+0x2a Native)
15:U 0000000023a9dcd0 000007fef9001552 mscorwks!CallDescrWorker+0x82
16:U 0000000023a9dd20 000007fef8e9e5e3 mscorwks!CallDescrWorkerWithHandler+0xd3
17:U 0000000023a9ddc0 000007fef8eac83f mscorwks!MethodDesc::CallDescr+0x24f
18:U 0000000023a9e010 000007fef8f9ae8d mscorwks!ThreadNative::KickOffThread_Worker+0x191
19:U 0000000023a9e330 000007fef8f59374 mscorwks!TypeHandle::GetParent+0x5c
1a:U 0000000023a9e380 000007fef8e52045 mscorwks!SVR::gc_heap::make_heap_segment+0x155
1b:U 0000000023a9e450 000007fef8f66139 mscorwks!ZapStubPrecode::GetType+0x39
1c:U 0000000023a9e490 000007fef8e1c985 mscorwks!ILCodeStream::GetToken+0x25
1d:U 0000000023a9e4c0 000007fef8f594e1 mscorwks!Thread::DoADCallBack+0x145
1e:U 0000000023a9e630 000007fef8f59399 mscorwks!TypeHandle::GetParent+0x81
1f:U 0000000023a9e680 000007fef8e52045 mscorwks!SVR::gc_heap::make_heap_segment+0x155
20:U 0000000023a9e750 000007fef8f66139 mscorwks!ZapStubPrecode::GetType+0x39
21:U 0000000023a9e790 000007fef8e20e15 mscorwks!ThreadNative::KickOffThread+0x401
22:U 0000000023a9e7f0 000007fef8e20ae7 mscorwks!ThreadNative::KickOffThread+0xd3
23:U 0000000023a9e8d0 000007fef8f814fc mscorwks!Thread::intermediateThreadProc+0x78
24:U 0000000023a9f7a0 00000000773cbe3d kernel32!BaseThreadInitThunk+0xd
25:U 0000000023a9f7d0 00000000775d6a51 ntdll!RtlUserThreadStart+0x1d
Yes, the caching block has some - issues - with regard to the scavenger threads in older versions of Entlib, particularly if things are coming in faster than the scavenging settings let them come out.
This was completely rewritten in Entlib 5, so that now you'll never have more than two threads sitting in the caching block, regardless of the load, and usually it'll only be one.
Unfortunately there's no easy tweak to change the behavior in earlier versions. The best you can do is change the cache settings so that each scavenge will clean out more items at a time so not as many scavenge requests need to get scheduled.
640 threads is very bad for performance. If they are all waiting for something, then I'd say it's a fair bet that you have a deadlock and they will never exit. If they are all running (not waiting)... well, with 600+ threads on a 2 or 4 core processor none of them will get enough time slices to run very far! ;>
If your app is set up with a main thread that waits on the thread handles to find out when the threads exit, and the background threads get caught up in a loop or in a wait state and never exit the thread proc, then the process and all of its threads will never exit.
Check your thread code to make sure that every threadproc has a clear path to exit the threadproc. It's bad form to write an infinite loop in a background thread on the assumption that the thread will be forcibly terminated when the process shuts down.
If the background thread code spins in a loop waiting for an event handle to signal, make sure that you have some way to signal that event so that the thread can perform a normal orderly exit. Otherwise, you need to write the background thread to wait on multiple events and unblock when any one of the events signals. One of those events can be the activity that the background thread is primarily interested in and the other can be a shutdown event.
From the names of things in the stack dump you posted, it would appear that the thread is waiting for something to appear in the ProducerConsumerQueue. Investigate how that queue object is supposed to be shut down, probably on the producer side, and whether shutting down the queue will automatically release all consumers that are waiting on that queue.
My guess is that either the queue is not being shut down correctly or shutting it down does not implicitly release the consumers that are waiting on it. If the latter case, you may need to pump a terminate message through the queue to wake up all the consumers waiting on that queue and tell them to break out of their wait loop and exit.
You have an major issue. Every Thread occupies 1MB of stack and there is significant cost paid for Context Switching every thread in and out. Especially it becomes worst with managed code because every time GC has to run , it would have walk the threads stack to look for roots and when these threads are paged to the disk the cost to read from the disk is expensive,which adds up Perf issue.
Creating threads are Bad unless you know what you are doing? Jeffery Richter has written in detail about this.
To solve the above issue I would look what these threads are blocked on and also put a break-point on Thread Create (example sxe ct within windbg)
And later rearchitect from avoid creating threads , instead use the thread pool.
It would have been nice to some callstacks of these threads.
In Microsoft Enterprise Library 4.1, the BackgroundScheduler class creates a new thread each time an object is instantiated. It will be fixed in version 5.0. I do not know enough of this Microsoft Library to advise you how to avoid that behavior, but you may try the beta version: http://entlib.codeplex.com/wikipage?title=EntLib5%20Beta2

when a process is killed is this information recorded anywhere?

Question:
When a process is killed, is this information recorded anywhere (i.e., in kernel), such as syslog (or can be configured to be recorded syslog.conf)
Is the information of the killer's PID, time and date when killed and reason
update - you have all giving me some insight, thank you very much|
If your Linux kernel is compiled with the process accounting (CONFIG_BSD_PROCESS_ACT) option enabled, you can start recording process accounting info using the accton(8) command and use sa(8) to access the recorded info. The recorded information includes the 32 bit exit code which includes the signal number.
(This stuff is not widely known / used these days, but I still remember it from the days of 4.x Bsd on VAXes ...)
Amended:
In short, the OS kernel does not care if the process is killed. That is dependant on whether the process logs anything. All the kernel cares about at this stage is reclaiming memory. But read on, on how to catch it and log it...
As per caf and Stephen C's mention on their comments...
If you are running BSD accounting daemon module in the kernel, everything gets logged. Thanks to Stephen C for pointing this out! I did not realize that functionality as I have this switched off/disabled.
In my hindsight, as per caf's comment - the two signals that cannot be caught are SIGKILL and SIGSTOP, and also the fact that I mentioned atexit, and I described in the code, that should have been exit(0);..ooops Thanks caf!
Original
The best way to catch the kill signal is you need to use a signal handler to handle a few signals , not just SIGKILL on its own will suffice, SIGABRT (abort), SIGQUIT (terminal program quit), SIGSTOP and SIGHUP (hangup). Those signals together is what would catch the command kill on the command line. The signal handler can then log the information stored in /var/log/messages (environment dependant or Linux distribution dependant). For further reference, see here.
Also, see here for an example of how to use a signal handler using the sigaction function.
Also it would be a good idea to adopt the usage of atexit function, then when the code exits at runtime, the runtime will execute the last function before returning back to the command line. Reference for atexit is here.
When the C function exit is used, and executed, the atexit function will execute the function pointer where applied as in the example below. - Thanks caf for this!
An example usage of atexit as shown:
#include <stdlib.h>
int main(int argc, char **argv){
atexit(myexitfunc); /* Beginning, immediately right after declaration(s) */
/* Rest of code */
return 0;
exit(0);
}
int myexitfunc(void){
fprintf(stdout, "Goodbye cruel world...\n");
}
Hope this helps,
Best regards,
Tom.
I don't know of any logging of signals sent to processes, unless the OOM killer is doing it.
If you're writing your own program you can catch the kill signal and write to a logfile before actually dying. This doesn't work with kill -9 though, just the normal kill.
You can see some details over thisaway.
If you use sudo, it will be logged. Other than that, the killed process can log some information (unless it's being terminated with extreme prejudice). You could even hack the kernel to log signals.
As for recording the reason a process was killed, I've yet to see a psychic program.
Kernel hacking is not for the weak of heart, but hella fun. You'd need to patch the signal dispatch routines to log information using printk(9) when kill(3), sigsend(2) or the like is called. Read "The Linux Signals Handling Model" for more information on how signals are handled.
If the process is getting it via kill(2), then unless the process is already logging the only external trace would be a kernel mod. It's pretty simple; just do a printk(), it's like printf(). Find the output in dmesg.
If the process is getting it via /bin/kill, then it would be a relatively easy matter to install a wrapper executable that did logging. But this (signal delivery via /bin/kill) is unlikely because kill is also a bash built-in.
By the way, if a process is killed with a signal is announced by the kernel to the parent process via de wait(2) system call. The value returned by this call is the exit status of the child (the lower byte) and some signal related info in the upper byte in case this process has been killed. See wait(2) for more information.

Resources