How to get the parent thread in WinDBG? - multithreading

When I analyzed a crush dump file, I often got such errors:
0:025> kP
Child-SP RetAddr Call Site
00000000`05a4fc78 00000000`77548638 ntdll!DbgBreakPoint(void) [d:\w7rtm\minkernel\ntos\rtl\amd64\debugstb.asm # 51]
00000000`05a4fc80 00000000`774b39cb ntdll!DbgUiRemoteBreakin(
void * Context = 0x00000000`00000000)+0x38 [d:\w7rtm\minkernel\ntdll\dlluistb.c # 310]
00000000`05a4fcb0 00000000`00000000 ntdll!RtlUserThreadStart(
<function> * StartAddress = 0x00000000`00000000,
void * Argument = 0x00000000`00000000)+0x25 [d:\w7rtm\minkernel\ntos\rtl\rtlexec.c # 3179]
It seems that the process crushed when creating a thread. So, I want to find who or which thread created the current thread. How can I get it?

You can look at the other threads in the process with ~*k to see if there's anything interesting. Other than that, this info simply isn't there in the dump.
-scott

Related

logging multithreading deadlock in python

I run 10 processes with 10 threads per each, and they constantly and quite often write to 10 log file (one per process) using logging.info() & logging.debug() during 30 seconds.
Once, usually after 10 seconds, there happens a deadlock. Processes stops processing (all of them).
gdp python [pid] with py-bt & info threads shows that it stuck here:
Id Target Id Frame
* 1 Thread 0x7ff50f020740 (LWP 1622) "python" 0x00007ff50e8276d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x564f17c8aa80)
at ../sysdeps/unix/sysv/linux/futex-internal.h:205
2 Thread 0x7ff509636700 (LWP 1624) "python" 0x00007ff50eb57bb7 in epoll_wait (epfd=8, events=0x7ff5096351d0, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
3 Thread 0x7ff508e35700 (LWP 1625) "python" 0x00007ff50eb57bb7 in epoll_wait (epfd=12, events=0x7ff508e341d0, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
4 Thread 0x7ff503fff700 (LWP 1667) "python" 0x00007ff50e8276d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x564f17c8aa80)
at ../sysdeps/unix/sysv/linux/futex-internal.h:205
...[threads 5-6 like 4]...
7 Thread 0x7ff5027fc700 (LWP 1690) "python" 0x00007ff50eb46187 in __GI___libc_write (fd=2, buf=0x7ff50967bc24, nbytes=85) at ../sysdeps/unix/sysv/linux/write.c:27
...[threads 8-13 like 4]...
Stack of thread 7:
Traceback (most recent call first):
File "/usr/lib/python2.7/logging/__init__.py", line 889, in emit
stream.write(fs % msg)
...[skipped useless lines]...
And this code (I guess the code of logging __init__ function):
884 #the codecs module, but fail when writing to a
885 #terminal even when the codepage is set to cp1251.
886 #An extra encoding step seems to be needed.
887 stream.write((ufs % msg).encode(stream.encoding))
888 else:
>889 stream.write(fs % msg)
890 except UnicodeError:
891 stream.write(fs % msg.encode("UTF-8"))
892 self.flush()
893 except (KeyboardInterrupt, SystemExit):
894 raise
Stack of the rest threads is similar -- waiting for GIL:
Traceback (most recent call first):
Waiting for the GIL
File "/usr/lib/python2.7/threading.py", line 174, in acquire
rc = self.__block.acquire(blocking)
File "/usr/lib/python2.7/logging/__init__.py", line 715, in acquire
self.lock.acquire()
...[skipped useless lines]...
Its written that package logging is multithreaded without additional locks. So why does package logging may deadlock? Does it open too many file descriptors or limit anything else?
That's how I initialize it (if it is important):
def get_logger(log_level, file_name='', log_name=''):
if len(log_name) != 0:
logger = logging.getLogger(log_name)
else:
logger = logging.getLogger()
logger.setLevel(logger_state[log_level])
formatter = logging.Formatter('%(asctime)s [%(levelname)s][%(name)s:%(funcName)s():%(lineno)s] - %(message)s')
# file handler
if len(file_name) != 0:
fh = logging.FileHandler(file_name)
fh.setLevel(logging.DEBUG)
fh.setFormatter(formatter)
logger.addHandler(fh)
# console handler
console_out = logging.StreamHandler()
console_out.setLevel(logging.DEBUG)
console_out.setFormatter(formatter)
logger.addHandler(console_out)
return logger
Problem was because I've been writing output to console & to file, but all those processes were initialized with redirection to pipe, which was never listened.
p = Popen(proc_params,
stdout=PIPE,
stderr=STDOUT,
close_fds=ON_POSIX,
bufsize=1
)
So it seems pipes in this case have it's buffer size limit, and after filling it deadlocks.
Here the explanation: https://docs.python.org/2/library/subprocess.html
Note
Do not use stdout=PIPE or stderr=PIPE with this function as that can deadlock based on the child process output volume. Use Popen with the communicate() method when you need pipes.
It's done for functions which I don't use, but it seems valid for Popen run, if then you don't read out the pipes.

distributed Tensorflow tracking timestamps for synchronization operations

I am new to TensorFlow. Currently, I am trying to evaluate the performance of distributed TensorFlow using Inception model provided by TensorFlow team.
The thing I want is to generate timestamps for some critical operations in a Parameter Server - Worker architecture, so I can measure the bottleneck (the network lag due to parameter transfer/synchronization or parameter computation cost) on replicas for one iteration (batch).
I came up with the idea of adding a customized dummy py_func operator designated of printing timestamps inside inception_distributed_train.py, with some control dependencies. Here are some pieces of code that I added:
def timer(s):
print ("-------- thread ID ", threading.current_thread().ident, ", ---- Process ID ----- ", getpid(), " ~~~~~~~~~~~~~~~ ", s, datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d %H:%M:%S.%f'))
return Falsedf
dummy1 = tf.py_func(timer, ["got gradients, before dequeues token "], tf.bool)
dummy2 = tf.py_func(timer, ["finished dequeueing the token "], tf.bool)
I modified
apply_gradients_op = opt.apply_gradients(grads, global_step=global_step)
with tf.control_dependencies([apply_gradients_op]):
train_op = tf.identity(total_loss, name='train_op')
into
with tf.control_dependencies([dummy1]):
apply_gradients_op = opt.apply_gradients(grads, global_step=global_step)
with tf.control_dependencies([apply_gradients_op]):
with tf.control_dependencies([dummy2]):
train_op = tf.identity(total_loss, name='train_op')
hoping to print the timestamps before evaluating the apply_gradient_op and after finishing evaluating the apply_gradient_op by enforcing node dependencies.
I did similar things inside sync_replicas_optimizer.apply_gradients, by adding two dummy print nodes before and after update_op:
dummy1 = py_func(timer, ["---------- before update_op "], tf_bool)
dummy2 = py_func(timer, ["---------- finished update_op "], tf_bool)
# sync_op will be assigned to the same device as the global step.
with ops.device(global_step.device), ops.name_scope(""):
with ops.control_dependencies([dummy1]):
update_op = self._opt.apply_gradients(aggregated_grads_and_vars, global_step)
# Clear all the gradients queues in case there are stale gradients.
clear_queue_ops = []
with ops.control_dependencies([update_op]):
with ops.control_dependencies([dummy2]):
for queue, dev in self._one_element_queue_list:
with ops.device(dev):
stale_grads = queue.dequeue_many(queue.size())
clear_queue_ops.append(stale_grads)
I understand that apply_gradient_op is the train_op returned by sync_replicas_optimizer.apply_gradient. And apply_gradient_op is the op to dequeue a token (global_step) from sync_queue managed by the chief worker using chief_queue_runner, so that replica can exit current batch and start a new batch.
In theory, apply_gradient_op should take some time as replica has to wait before it can dequeue the token (global_step) from sync_queue, but the print result for one replica I got, such as the time differences for executing apply_gradient_op is pretty short (~1/1000 sec) and sometimes the print output is indeterministic (especially for chief worker). Here is a snippet of the output on the workers (I am running 2 workers and 1 PS):
chief worker (worker 0) output
worker 1 output
My questions are:
1) I need to record the time TensorFlow takes to execute an op (such as train_op, apply_gradients_op, compute_gradients_op, etc.)
2) Is this the right direction to go, given my ultimate goal is to record the elapsed time for executing certain operations (such as the difference between the time a replica finishes computing gradients and the time it gets the global_step from sync_token)?
3) If this is not the way it should go, please guide me with some insights about the possible ways I could achieve my ultimate goal.
Thank you so much for reading my long long posts. as I have spent weeks working on this!

Bluez MediaEndpoint1 Timeout Issue When Replying To Dbus Call From Node Addon

Bluez Version: 5.43
Let me get straight to the point:
I have following error inside the Bluez log file:
Calling SetConfiguration: name = :1.3 path = /MediaEndpoint/A2DPSink
...
Endpoint replied with an error: org.freedesktop.DBus.Error.NoReply
If I change this line of code
#define REQUEST_TIMEOUT (3 * 1000) /* 3 seconds */
inside the ~/bluez-5.43/profiles/audio/media.c file,
to be a value greater, like 5 or so... The bug goes away.
So what is this bug?
basically, I have nodejs addon code that does the following:
Intialize endpoint
void endpoint_init(DBusConnection *connection, const char *endpoint) {
DBusObjectPathVTable vtable_endpoint;
vtable_endpoint.message_function = endpoint_handler;
dbus_connection_register_object_path(connection, endpoint, &vtable_endpoint, NULL);
}
Inside the Bluez log you will see bluetoothd[25176]: Endpoint registered: sender=:1.130 path=/MediaEndpoint/A2DPSink
The endpoint_handler function will be notified of a call to set_configuration or select_configuration function...
When a call is received, it will be be replied to like so...
sender = dbus_message_get_sender(m);
r = dbus_message_new_method_return(m);
printf("!! ----- endpoint_set_configuration, time_right_before_reply_sent: ");
print_time();
assert( dbus_connection_send(conn, r, NULL) );
dbus_connection_flush(conn);
printf("!! ----- endpoint_set_configuration, time_right_after_reply_sent: ");
print_time();
As you can see I am logging some time information.
Now, I also logged time information inside Bluez and recompiled it.
Here is log from Bluez:
bluetoothd[789]: profiles/audio/media.c:media_endpoint_async_call() Calling SetConfiguration: name = :1.3 path = /MediaEndpoint/A2DPSink
bluetoothd[789]: profiles/audio/media.c:endpoint_reply() [GOT HERE -- endpoint_reply -- original_msg --] SetConfiguration: name = :1.3 path = /MediaEndpoint/A2DPSink
bluetoothd[789]: profiles/audio/media.c:print_time() TIME BEFORE -- dbus_pending_call_steal_reply --: 2017-01-25 04:54:01
bluetoothd[789]: profiles/audio/media.c:print_time() TIME AFTER -- dbus_pending_call_steal_reply --: 2017-01-25 04:54:01
bluetoothd[789]: profiles/audio/media.c:endpoint_reply() [GOT HERE -- endpoint_reply -- reply_msg] (null): name = (null) path = (null)
bluetoothd[789]: Endpoint replied with an error: org.freedesktop.DBus.Error.NoReply
Here is log from my node addon:
endpoint_handler: path=/MediaEndpoint/A2DPSink, interface=org.bluez.MediaEndpoint1, member=SetConfiguration
!! ----- endpoint_set_configuration, endpoint_path: /MediaEndpoint/A2DPSink
!! ----- endpoint_set_configuration, time_right_before_reply_sent:
2017-01-25 04:54:03
!! ----- endpoint_set_configuration, time_right_after_reply_sent:
2017-01-25 04:54:03
You can CLEARLY see with the Bluez default timeout of 3 seconds is too short... the reply is still on its way...
But pulseaudio's implementation does not have this problem... why?
Is it because there are two different event loops, ie the node addon uses lib-uv event loop and Bluez and pulse use the glib event loop...
What is going on here, can anyone please explain.
I would prefer to either identify it as Bluez bug or understand how to fix it on my node addon end...
Thank You Stackoverflowers :)
P.S.
Bluez ~/bluez-5.43/profiles/audio/media.c has code that advises to keep the REQUEST_TIMEOUT at 3, this worries me...
/* Timeout should be less than avdtp request timeout (4 seconds) */
if (g_dbus_send_message_with_reply(btd_get_dbus_connection(),
msg, &request->call,
REQUEST_TIMEOUT) == FALSE) {
error("D-Bus send failed");
g_free(request);
return FALSE;
}
I found that there is some conflict with the glib and libuv running together under the same process..
the node-dbus addon i am using is a c level binding and it instantiates a glib event loop...
nodejs has a libuv event loop
they dont work well together...
this is what i can assume the problem is..
my solution was to rip the c code from bluez media enpoint and i created my own nodejs NAN bindings to it...without using any glib event loop.
this is the node-dbus library i am using:
https://github.com/Shouqun/node-dbus

Groovy: Missing Method Exception when calling Java API

I've been using Groovy for a few years, but not in the last few months, so this could just be a newbie question. I'm trying to parse a log file, but when I try to do this:
myFile.eachLine { line ->
/* 2014 Jul 30 08:55:42:645 GMT -4 BW.TMSJobService-TMSJobService-1
* User [BW-User] - Job-2584 [Process/Common/LogAuditInfo.process/WriteToLog]: */
/* 1234567890123456789012345678901 */
/* 0 1 2 3 */
LogItem logItem = new LogItem()
// get the time stamp
String timestamp = line.substring(0, 31)
SimpleDateFormat sdf = new SimpleDateFormat('yyyy MMM dd HH:mm:ss:S')
logItem.date = sdf.parse(timestamp)
}
I get this exception:
Exception in thread "main" groovy.lang.MissingMethodException: No signature of method: java.text.SimpleDateFormat.parse() is applicable for argument types: (java.lang.String, ce.readscript.TmsLogReader$_read_closure1_closure3) values: [2014 Jul 30 08:34:47:079 GMT -4, ce.readscript.TmsLogReader$_read_closure1_closure3#14235ed5]
Possible solutions: parse(java.lang.String), parse(java.lang.String, java.text.ParsePosition), parse(java.lang.String, java.text.ParsePosition), wait(), clone(), clone()
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.unwrap(ScriptBytecodeAdapter.java:55)
at org.codehaus.groovy.runtime.callsite.PojoMetaClassSite.call(PojoMetaClassSite.java:46)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
It's always the last line in the closure. If I add code after the 'parse', then it bombs on this code. Even a "079".toLong() call gets a error.
I see some similar errors in stack overflow, but nothing that solves my problem.
It is trying to invoke SimpleDateFormat::parse(String, Closure) which doesn't exist. There seems to be a typo somewhere. It is working fine under groovy 2.1.8 and 2.3.4. You can try to make it a bit more groovy, to check whether it has some typing error not in your example:
new File("log.log").eachLine { line ->
def item = new LogItem()
def timestamp = line[0..30]
item.date = Date.parse('yyyy MMM dd HH:mm:ss:S', timestamp)
}
I used the time honored technique of deleting the file and starting over. I haven't encountered the issue again.

Can jmap -histo trigger full garbage collection?

We know that jmap -histo:live triggers a full gc in order to determine live objects:
Does jmap force garbage collection when the live option is used?
Since jmap -histo considers all objects in the heap (those in the young and old generation), my point is, jmap -histo can also trigger a full gc, too. However, I could not encounter a solid documentation about whether jmap -histo may trigger a full gc or not.
Can jmap -histo trigger full garbage collection?
jmap -histo will not trigger a full gc, but jmap -histo:live will.
Someone with more JDK experience should verify this, but I'm fairly confident it does trigger full GC, at least in OpenJDK 1.7. Start with jdk/src/share/classes/sun/tools/jmap/JMap.java:
public class JMap {
...
private static String LIVE_HISTO_OPTION = "-histo:live";
...
...
} else if (option.equals(LIVE_HISTO_OPTION)) {
histo(pid, true);
...
private static final String LIVE_OBJECTS_OPTION = "-live";
private static final String ALL_OBJECTS_OPTION = "-all";
private static void histo(String pid, boolean live) throws IOException {
VirtualMachine vm = attach(pid);
InputStream in = ((HotSpotVirtualMachine)vm).
heapHisto(live ? LIVE_OBJECTS_OPTION : ALL_OBJECTS_OPTION);
drain(vm, in);
}
The ternary operator in Jmap.histo() makes a call to heapHisto in jdk/src/share/classes/sun/tools/attach/HotSpotVirtualMachine.java with the -live argument:
// Heap histogram (heap inspection in HotSpot)
public InputStream heapHisto(Object ... args) throws IOException {
return executeCommand("inspectheap", args);
}
And if we look at inspectheap itself, in hotspot/src/share/vm/services/attachListener.cpp:
// Implementation of "inspectheap" command
//
// Input arguments :-
// arg0: "-live" or "-all"
static jint heap_inspection(AttachOperation* op, outputStream* out) {
bool live_objects_only = true; // default is true to retain the behavior before this change is made
const char* arg0 = op->arg(0);
if (arg0 != NULL && (strlen(arg0) > 0)) {
if (strcmp(arg0, "-all") != 0 && strcmp(arg0, "-live") != 0) {
out->print_cr("Invalid argument to inspectheap operation: %s", arg0);
return JNI_ERR;
}
live_objects_only = strcmp(arg0, "-live") == 0;
}
VM_GC_HeapInspection heapop(out, live_objects_only /* request full gc */, true /* need_prologue */);
VMThread::execute(&heapop);
return JNI_OK;
}
Note, in particular, the live_objects_only strcmp and the resulting heapop call two lines later. If inspectheap gets the -live argument via any avenue, it requests a full gc.
No, jmap -histo will not trigger a FullGC. I am printing Histogram quite regularly and do not see any Full GCs in my GC logs.
I do not know how it is implemented in the VM but you do not need to worry about full GCs.
In my experience:yes,it is.when you do this experiment,you can use the command:
sudo -utomcat jstat -gcutil {PID} 1000 1000
description:
The first parameter 1000 after pid is time interval for print.
second parameter 1000 after pid is loop count.
use this command you can monitor the jvm gc activity.you can see the full gc time and count like below:
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 18.45 13.12 84.23 47.64 206149 5781.308 83 115.479 5896.786
0.00 21.84 5.64 84.24 47.64 206151 5781.358 83 115.479 5896.837
0.00 32.27 1.66 84.24 47.64 206153 5781.409 83 115.479 5896.888
0.00 13.96 53.54 84.24 47.64 206155 5781.450 83 115.479 5896.929
0.00 21.56 91.77 84.24 47.64 206157 5781.496 83 115.479 5896.974
and now you can execute the jmap command in other terminal,firstly,you execute the command without :live parameter and then execute it again with this parameter ,you should see a full gc activity when the command is executed with ;live parameter , in other word , the full gc count will increment.
The second command maybe like this:
sudo -u tomcat /home/path/to/jmap -histo:live {pid} | head -n 40
By the way,my JDK version is JDK7

Resources