New Thread spawned by cudaMalloc | Behaviour?

New Thread spawned by cudaMalloc | Behaviour? - multithreading

cudaMalloc seemed to have spawned a thread when it was called, even though it's asynchronous. This was observed during debugging using cuda-gdb.
It also took a while to return.
The same thread exited, although as a different LWP, at the end of the program.
Can someone explain this behaviour ?

The thread is not specifically spawned by cudaMalloc. The user side CUDA driver API library seems to spawn threads at some stage during lazy context setup which have the lifetime of the CUDA context. The exact processes are not publicly documented.
You see this associated with cudaMallocbecause I would guess this is the first API to trigger whatever setup/callbacks need to be done to make the userspace driver support work. You should notice that only the first call spawns a thread. Subsequent calls do not. And the threads stay alive for the lifetime of the CUDA context, after which they are terminated. You can trigger explicit thread destruction by calling cudaDeviceReset at any point in program execution.
Here is a trivial example which demonstrates cudaMemcpyToSymbol triggering the thread spawning from the driver API library, rather than cudaMalloc:
__device__ float someconstant;
int main()
{
cudaSetDevice(0);
const float x = 3.14159f;
cudaMemcpyToSymbol(someconstant, &x, sizeof(float));
for(int i=0; i<10; i++) {
int *x;
cudaMalloc((void **)&x, size_t(1024));
cudaMemset(x, 0, 1024);
cudaFree(x);
}
return int(cudaDeviceReset());
}
In gdb I see this:
(gdb) tbreak main
Temporary breakpoint 1 at 0x40254f: file gdb_threads.cu, line 5.
(gdb) run
Starting program: /home/talonmies/SO/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Temporary breakpoint 1, main () at gdb_threads.cu:5
5 cudaSetDevice(0);
(gdb) next
6 const float x = 3.14159f;
(gdb) next
7 cudaMemcpyToSymbol(someconstant, &x, sizeof(float));
(gdb) next
[New Thread 0x7ffff5eb5700 (LWP 14282)]
[New Thread 0x7fffed3ff700 (LWP 14283)]
8 for(int i=0; i<10; i++) {
(gdb) info threads
Id Target Id Frame
3 Thread 0x7fffed3ff700 (LWP 14283) "a.out" pthread_cond_timedwait##GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
2 Thread 0x7ffff5eb5700 (LWP 14282) "a.out" 0x00007ffff74d812d in poll () at ../sysdeps/unix/syscall-template.S:81
* 1 Thread 0x7ffff7fd1740 (LWP 14259) "a.out" main () at gdb_threads.cu:8
(gdb) thread apply all bt
Thread 3 (Thread 0x7fffed3ff700 (LWP 14283)):
#0 pthread_cond_timedwait##GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007ffff65cad97 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff659582d in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff65ca4d8 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff79bc182 in start_thread (arg=0x7fffed3ff700) at pthread_create.c:312
#5 0x00007ffff74e547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 2 (Thread 0x7ffff5eb5700 (LWP 14282)):
#0 0x00007ffff74d812d in poll () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007ffff65c9953 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff66571ae in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff65ca4d8 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff79bc182 in start_thread (arg=0x7ffff5eb5700) at pthread_create.c:312
#5 0x00007ffff74e547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 1 (Thread 0x7ffff7fd1740 (LWP 14259)):
#0 main () at gdb_threads.cu:8

Related

When does thread get into uncancellable sleep state

I have this piece of code where it works perfects in normal case. however , sometimes thread get into uncancelable sleep state.
It means from the state of the process, I see this thread getting into this https://code.woboq.org/userspace/glibc/sysdeps/unix/sysv/linux/nanosleep_nocancel.c.html#__nanosleep_nocancel
struct timespec convertticktotimespec(unsigned long numticks)
{
struct timespec tm;
/* separate the integer and decimal portions */
long nanoseconds =
((numticks / (float)sysconf(_SC_CLK_TCK)) - floor(numticks / (float)sysconf(_SC_CLK_TCK))) *
NANOSEC_MULTIPLIER;
tm.tv_sec = numticks / sysconf(_SC_CLK_TCK);
tm.tv_nsec = nanoseconds;
return tm;
}
void *thread(void *args)
{
struct_S *s = (struct_S *)args;
while(1)
{
s->var = 1;
struct timespec tm = convertticktotimespec(sysClkRateGet() * 13);
if ( 0 !=nanosleep(&tm, NULL) ) {
perror(nanosleep);
}
}
}
stack trace looks like this
Thread 19 (Thread 0x7f225a043700 (LWP 16023)):
#0 0x00007f225b8913ed in __accept_nocancel () at ../sysdeps/unix/syscall-
template.S:84
#1 0x0000000000000000 in ?? ()
Thread 18 (Thread 0x7f225a076700 (LWP 15952)):
#0 0x00007f225b89126d in __close_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1 0x0000000000000000 in ?? ()
Thread 14 (Thread 0x7f225a021700 (LWP 16035)):
#0 0x00007f225b8917dd in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1 0x0000000000000000 in ?? ()
Thread 13 (Thread 0x7f225a032700 (LWP 16034)):
#0 0x00007f225b8917dd in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1 0x0000000000000000 in ?? ()
Thread 3 (Thread 0x7f225bbb3700 (LWP 15950)):
#0 0x00007f225ab1e3f3 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x0000000000000000 in ?? ()
Thread 2 (Thread 0x7f225a010700 (LWP 16036)):
#0 0x00007f225b8911ad in __write_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1 0x0000000000000000 in ?? ()
some how this thread getting into uncancelable sleep state on random , although I dont find the clear definition of this anywhere on internet, so I assume thread getting sleep state forever which can not be interrupted .hence this thread goes to inactive state forever.
I have no clue why is this happening having executing or responsible for fewer lines of code or instructions.
fromcode.woboq , I found that this gets called from mutex lock.
https://code.woboq.org/userspace/glibc nptl/pthread_mutex_timedlock.c.html#416, but the thread is not using any mutex.
the only thing that i suspect here is , structure struct_s is allocated in the shared memory. this variable is also accessed and assigned by other thread from an another process. does the thread get into this state , internally depending on priority of the threads ?

QThread dumps core

I'm looking at a program that crashes, leading to a useless (or so it seems) core dump. I didn't write the program but I'm trying to find what may be the cause.
First strange thing is that the core dump is named after QThread instead of my executable itself.
Then inside the backtrace, there's no hint at line numbers of the program itself:
$ gdb acqui ../../appli/core.QThread.31667.1448795278
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./acqui'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fcf4a1ce107 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007fcf4a1ce107 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007fcf4a1cf4e8 in __GI_abort () at abort.c:89
#2 0x00007fcf4aab9b3d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007fcf4aab7bb6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007fcf4aab7c01 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007fcf4aab7e69 in __cxa_rethrow () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007fcf4b8707db in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/x86_64-linux-gnu/libQtCore.so.4
#7 0x00007fcf4b764e99 in QThread::exec() () from /usr/lib/x86_64-linux-gnu/libQtCore.so.4
#8 0x00007fcf4b76770f in ?? () from /usr/lib/x86_64-linux-gnu/libQtCore.so.4
#9 0x00007fcf4ad6c0a4 in start_thread (arg=0x7fcf0b7fe700) at pthread_create.c:309
#10 0x00007fcf4a27f04d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) info threads
Id Target Id Frame
16 Thread 0x7fcf297fa700 (LWP 31676) 0x00007fcf4a27650d in poll () at ../sysdeps/unix/syscall-template.S:81
15 Thread 0x7fcf28ff9700 (LWP 60474) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
14 Thread 0x7fcf08ff9700 (LWP 60516) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
13 Thread 0x7fcf0bfff700 (LWP 60513) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
12 Thread 0x7fcf3932c700 (LWP 60494) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
11 Thread 0x7fcf29ffb700 (LWP 60444) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
10 Thread 0x7fcf39b2d700 (LWP 31668) 0x00007fcf4a27650d in poll () at ../sysdeps/unix/syscall-template.S:81
9 Thread 0x7fcf2affd700 (LWP 31673) 0x00007fcf4a27650d in poll () at ../sysdeps/unix/syscall-template.S:81
8 Thread 0x7fcf2bfff700 (LWP 31671) 0x00007fcf4a27650d in poll () at ../sysdeps/unix/syscall-template.S:81
7 Thread 0x7fcf38b2b700 (LWP 60432) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
6 Thread 0x7fcf2a7fc700 (LWP 31674) 0x00007fcf4a27650d in poll () at ../sysdeps/unix/syscall-template.S:81
5 Thread 0x7fcf4d4f9780 (LWP 31667) 0x00007fcf4a27650d in poll () at ../sysdeps/unix/syscall-template.S:81
4 Thread 0x7fcf097fa700 (LWP 60430) pthread_cond_timedwait##GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
3 Thread 0x7fcf09ffb700 (LWP 31682) 0x00007fcf4a27650d in poll () at ../sysdeps/unix/syscall-template.S:81
2 Thread 0x7fcf0affd700 (LWP 31680) 0x00007fcf4a27650d in poll () at ../sysdeps/unix/syscall-template.S:81
* 1 Thread 0x7fcf0b7fe700 (LWP 31679) 0x00007fcf4a1ce107 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
I'm at a loss as to where to start. Is it a problem in using QThread ? Something else ? How can I enable more (or better) debugging info ? The program itself is compiled with -g -ggdb.

This part:
#4 0x00007fcf4aab7c01 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007fcf4aab7e69 in __cxa_rethrow () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
... means that the code in question is re-throwing an exception, but there is no exception handler for it. So, the runtime calls std::terminate.
This is a programming error, though exactly what to do depends on your libraries and program -- maybe not re-throw, maybe install an outermost exception handler and log a message, etc.

How to understand "/proc/[pid]/stack"?

According to proc manual:
/proc/[pid]/stack (since Linux 2.6.29)
This file provides a symbolic trace of the function calls in
this process's kernel stack. This file is provided only if
the kernel was built with the CONFIG_STACKTRACE configuration
option.
So I write a program to test:
#include <stdio.h>
#include <sys/wait.h>
#include <unistd.h>
#include <pthread.h>
void *thread_func(void *p_arg)
{
pid_t pid = fork();
if (pid > 0) {
wait(NULL);
return 0;
} else if (pid == 0) {
sleep(1000);
return 0;
}
return NULL;
}
int main(void)
{
pthread_t t1, t2;
pthread_create(&t1, NULL, thread_func, "Thread 1");
pthread_create(&t2, NULL, thread_func, "Thread 2");
sleep(1000);
return 0;
}
After running, use pstack to check the threads of progress:
linux-uibj:~ # pstack 24976
Thread 3 (Thread 0x7fd6e4ed5700 (LWP 24977)):
#0 0x00007fd6e528d3f4 in wait () from /lib64/libpthread.so.0
#1 0x0000000000400744 in thread_func ()
#2 0x00007fd6e52860a4 in start_thread () from /lib64/libpthread.so.0
#3 0x00007fd6e4fbb7fd in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7fd6e46d4700 (LWP 24978)):
#0 0x00007fd6e528d3f4 in wait () from /lib64/libpthread.so.0
#1 0x0000000000400744 in thread_func ()
#2 0x00007fd6e52860a4 in start_thread () from /lib64/libpthread.so.0
#3 0x00007fd6e4fbb7fd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7fd6e569f700 (LWP 24976)):
#0 0x00007fd6e4f8d6cd in nanosleep () from /lib64/libc.so.6
#1 0x00007fd6e4f8d564 in sleep () from /lib64/libc.so.6
#2 0x00000000004007b1 in main ()
At the same time, check /proc/24976/stack:
linux-uibj:~ # cat /proc/24976/stack
[<ffffffff804ba1a7>] system_call_fastpath+0x16/0x1b
[<00007fd6e4f8d6cd>] 0x7fd6e4f8d6cd
[<ffffffffffffffff>] 0xffffffffffffffff
The 24976 process has 3 threads, and they all block on system call(nanosleep and wait), so all 3 threads now work in kernel space, and turn into kernel threads now, right? If this is true, there should be 3 stacks in /proc/[pid]/stack file. But it seems there is only 1 stack in /proc/[pid]/stack file.
How should I understand /proc/[pid]/stack?

How should I understand /proc/[pid]/stack ?
Taken from the man pages for proc:
There are additional helpful pseudo-paths:
[stack]
The initial process's (also known as the main thread's) stack.
Just below this, you can find:
[stack:[tid]] (since Linux 3.4)
A thread's stack (where the [tid] is a thread ID).
It corresponds to the /proc/[pid]/task/[tid]/path.
Which seems to be what you are looking for.

Nan Xiao is right.
Thread kernel mode stack is under /proc/[PID]/task/[TID]/stack.
you are checking /proc/[PID]/stack, that's the main thread stack so you have only 1. Others are under task folder.

That is for sleep locks. You might also look at perf -g to see spin locks including high system time.

libexpect sigalrm nanosleep crash

I'm working with libexpect, but if the read times out (expected return code EXP_TIMEOUT) I instead get a crash as follows.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f1366275bb9 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007f1366275bb9 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f1366278fc8 in __GI_abort () at abort.c:89
#2 0x00007f13662b2e14 in __libc_message (do_abort=do_abort#entry=2, fmt=fmt#entry=0x7f13663bf06b "*** %s ***: %s terminated\n") at ../sysdeps/posix/libc_fatal.c:175
#3 0x00007f136634a7dc in __GI___fortify_fail (msg=<optimized out>) at fortify_fail.c:37
#4 0x00007f136634a6ed in ____longjmp_chk () at ../sysdeps/unix/sysv/linux/x86_64/____longjmp_chk.S:100
#5 0x00007f136634a649 in __longjmp_chk (env=0x1, val=1) at ../setjmp/longjmp.c:38
#6 0x00007f1366ed2a95 in ?? () from /usr/lib/libexpect.so.5.45
#7 <signal handler called>
#8 0x00007f1367334b9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#9 0x000000000044cc13 in main (argc=3, argv=0x7fffca4013b8) at main_thread.c:6750
(gdb)
As you can see, I'm using nanosleep, which is supposed to not interact with signals like usleep and sleep (http://linux.die.net/man/2/nanosleep). As I understand it, libexpect uses SIGALRM to time out, but it's unclear to me how the two threads are interacting. If I had to guess, the expect call is raising a sigalrm, and it's interrupting the nanosleep call, but beyond that I don't know what's going on.
Thread 1:
while (stuff)
{
//dothings
struct timespec time;
time.tv_sec = 0.25;
time.tv_nsec = 250000000;
nanosleep(&time, NULL);
}
Thread 2:
switch(exp_expectl(fd, exp_glob, (char*)user_prompt, OK, exp_end))
{
case OK:
DG_LOG_DEBUG("Recieved user prompt");
break;
case EXP_TIMEOUT:
DG_LOG_DEBUG("Expect timed out");
goto error;
default:
DG_LOG_DEBUG("Expect failed for unknown reasons");
goto error;
}
I have done some reading about signals and sleep, but I've used sleep in multiple threads on many occasions and had no difficulties until now. What am I missing?
edit: misc version info
ubuntu 14.04 3.13.0-44-generic
/usr/lib/libexpect.so.5.45
code is in C
compiler is gcc (-lexpect -ltcl)
include <tcl8.6/expect.h>

Identify crash in multi threaded programs - How to?

I am quiet experienced in debugging with GDB, so my question is not related to debug symbols not being available :). I am dealing with a crash in mutti threaded program. Just before the crash I see following logs in GDB, and backtrace does not give me much info. Info thread again shows I am not running under application space. Any suggestions as to how I can approach this problem.
[New Thread 1755]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1755]
warning: GDB can't find the start of the function at 0x2ac17638.
GDB is unable to find the start of the function at 0x2ac17638
and thus can't determine the size of that function's stack frame.
This means that GDB may be unable to access that stack frame, or
the frames below it.
This problem is most likely caused by an invalid program counter or
stack pointer.
However, if you think GDB should simply search farther back
from 0x2ac17638 for code which looks like the beginning of a
function, you can increase the range of the search using the `set
heuristic-fence-post' command.
0x2ac17638 in ?? ()
(gdb) bt
#0 0x2ac17638 in ?? ()
(gdb) info thread
[New Thread 1737]
[New Thread 1738]
[New Thread 1739]
[New Thread 1740]
[New Thread 1741]
[New Thread 1742]
[New Thread 1744]
[New Thread 1745]
[New Thread 1746]
[New Thread 1747]
[New Thread 1748]
[New Thread 1749]
[New Thread 1750]
[New Thread 1751]
[New Thread 1752]
[New Thread 1753]
[New Thread 1754]
[New Thread 1756]
20 Thread 1756 0x2aac1068 in ?? ()
19 Thread 1754 0x2abd62b4 in ?? ()
18 Thread 1753 0x2abd62b4 in ?? ()
17 Thread 1752 0x2aabda58 in ?? ()
16 Thread 1751 0x2abd62b4 in ?? ()
15 Thread 1750 0x2aabda58 in ?? ()
14 Thread 1749 0x2aabda58 in ?? ()
13 Thread 1748 0x2aabda58 in ?? ()
12 Thread 1747 0x2aabfb44 in ?? ()
11 Thread 1746 0x2aabfb44 in ?? ()
10 Thread 1745 0x2aabfb44 in ?? ()
9 Thread 1744 0x2aabfb44 in ?? ()
8 Thread 1742 0x2aabfb44 in ?? ()
7 Thread 1741 0x2aac15dc in ?? ()
6 Thread 1740 0x2abd62b4 in ?? ()
5 Thread 1739 0x2abd62b4 in ?? ()
4 Thread 1738 0x2abd62b4 in ?? ()
3 Thread 1737 0x2aabfb44 in ?? ()
* 2 Thread 1755 0x2ac17638 in ?? ()
1 Thread 1736 0x2abd56bc in ?? ()
warning: GDB can't find the start of the function at 0x2ac17638.
P.S. Its a MIPS based embedded linux, the problem is rarely reproducible, but on boot up.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

New Thread spawned by cudaMalloc | Behaviour? - multithreading

cudaMalloc seemed to have spawned a thread when it was called, even though it's asynchronous. This was observed during debugging using cuda-gdb. It also took a while to return. The same thread exited, although as a different LWP, at the end of the program. Can someone explain this behaviour ?

Related

When does thread get into uncancellable sleep state

QThread dumps core

How to understand "/proc/[pid]/stack"?

libexpect sigalrm nanosleep crash

Identify crash in multi threaded programs - How to?

Categories

Resources