I created a simple C++ multi-threaded program using g++ under Cygwin in Windows.
#include <unistd.h>
#include <thread> // std::thread
using namespace std;
void thread_func(void)
{
sleep(20);
}
int main()
{
thread th1(thread_func);
thread th2(thread_func);
th1.join();
th2.join();
}
I am guessing the threads are implemented in some Windows library, using Windows native thread. And guessing that is why they don't show in "ps" or "ps -W" when run in cygwin. I only see one entry, when I a expecting three.
Is there some other tool I can use in cygwin or native windows tools to get the complete list of threads?
Thank you,
Ahmed.
ps shows processes not threads.
You can use strace; assuming your program is thread.cpp
g++ -ggdb thread.cpp -o thread
strace -o thread.strace thread.exe
you can see when the threads where initiated at pthread::thread_init_wrapper
$ grep thread thread.strace
--- Process 7404 thread 7840 created
--- Process 7404 thread 8804 created
--- Process 7404 thread 8920 created
--- Process 7404 thread 8004 created
635 4552571 [ldap_init] t 7404 cygthread::stub: thread 'ldap_init', id 0x1F44, stack_ptr 0xC1CCD0
--- Process 7404 thread 4364 created
400 9133089 [t] t 7404 pthread::thread_init_wrapper: tid 0xFFDFCE00
--- Process 7404 thread 5740 created
340 9133541 [t] t 7404 pthread::thread_init_wrapper: tid 0xFFBFCE00
--- Process 7404 thread 4364 exited with status 0x0
--- Process 7404 thread 5740 exited with status 0x0
--- Process 7404 thread 8804 exited with status 0x0
--- Process 7404 thread 7840 exited with status 0x0
--- Process 7404 thread 1328 exited with status 0x0
--- Process 7404 thread 8004 exited with status 0x0
You can also use gdb but it is a bit more complicated.
Related
My goal is to implement a system call in linux kernel that enables/disables a CPU core.
First, I implemented a system call that disbales CPU3 in a 4-core system.
The system call code is as follows:
#include <linux/kernel.h>
#include <linux/slab.h>
#include <asm/uaccess.h>
#include <asm/unistd.h>
#include <linux/cpumask.h>
#include <linux/smp.h>
asmlinkage long sys_new_syscall(void)
{
unsigned int cpu3;
set_cpu_online (cpu3, false) ; /* clears the CPU in the cpumask */
printk ("CPU%u is offline\n", cpu3);
return 0;
}
The system call was registered correctly in the kernel and I enabled 'cpu hotplug' feature during kernel configuration. See picture
However, the kernel failed to compile in the last stage and I got this error:
gzip: stdout: No space left on device
E: mkinitramfs failure cpio 141 gzip 1
update-initramfs: failed for /boot/initrd.img-4.6.7-rt13-v7+ with 1.
run-parts: /etc/kernel/postinst.d/initramfs-tools exited with return code 1
arch/arm/boot/Makefile:99: recipe for target 'install' failed
make[1]: *** [install] Error 1
arch/arm/Makefile:333: recipe for target 'install' failed
make: *** [install] Error 2
What am I doing wrong ?
gzip: stdout: No space left on device
This issue has nothing to do with your code. Your /boot filesystem is full.
Note: This error was thrown before the components were executed by spark.
Logs
Worker Node1:
17/05/18 23:12:52 INFO Worker: Successfully registered with master spark://spark-master-1.com:7077
17/05/18 23:58:41 ERROR Worker: RECEIVED SIGNAL 15: SIGTERM
Master Node:
17/05/18 23:12:52 INFO Master: Registering worker spark-worker-1com:56056 with 2 cores, 14.5 GB RAM
17/05/18 23:14:20 INFO Master: Registering worker spark-worker-2.com:53986 with 2 cores, 14.5 GB RAM
17/05/18 23:59:42 WARN Master: Removing spark-worker-1com-56056 because we got no heartbeat in 60 seconds
17/05/18 23:59:42 INFO Master: Removing spark-worker-2.com:56056
17/05/19 00:00:03 ERROR Master: RECEIVED SIGNAL 15: SIGTERM
Worker Node2:
17/05/18 23:14:20 INFO Worker: Successfully registered with master spark://spark-master-node-2.com:7077
17/05/18 23:59:40 ERROR Worker: RECEIVED SIGNAL 15: SIGTERM
TL;DR I think someone has explicitly called kill command or sbin/stop-worker.sh.
"RECEIVED SIGNAL 15: SIGTERM" is reported by a shutdown hook to log TERM, HUP, INT signals on UNIX-like systems:
/** Register a signal handler to log signals on UNIX-like systems. */
def registerLogger(log: Logger): Unit = synchronized {
if (!loggerRegistered) {
Seq("TERM", "HUP", "INT").foreach { sig =>
SignalUtils.register(sig) {
log.error("RECEIVED SIGNAL " + sig)
false
}
}
loggerRegistered = true
}
}
In your case it means that the process received SIGTERM to stop itself:
The SIGTERM signal is a generic signal used to cause program termination. Unlike SIGKILL, this signal can be blocked, handled, and ignored. It is the normal way to politely ask a program to terminate.
That's what is sent when you execute KILL or use ./sbin/stop-master.sh or ./sbin/stop-worker.sh shell scripts that in turn call sbin/spark-daemon.sh with stop command that kills a JVM process for a master or a worker:
kill "$TARGET_ID" && rm -f "$pid"
I'm trying to develop a system call that is able to receive pid as argument, kill the pid and print to the kernel log. So far I have the code below but i get this error when trying to compile the kernel. How do i fix this? And is there a way to find the username of that killed the pid that is to be killed in this case?
kill_log/kill_log.c:2:24: fatal error: signal.h: No such file or directory
compilation terminated.
scripts/Makefile.build:289: recipe for target 'kill_log/kill_log.o' failed
make[1]: * [kill_log/kill_log.o] Error 1
Makefile:968: recipe for target 'kill_log' failed
make: * [kill_log] Error 2
#include <linux/kernel.h>
#include <signal.h>
asmlinkage long sys_kill_log(pid_t pid)
{
kill(pid, SIGUSR1);
printk(KERN_WARNING "The process %d has been killed\n", pid);
return 0;
}
Based on your error message, you're missing signal.h. On debian-based systems, you would need to install libc6-dev.
In regards to retrieving the username, you could try with getpwuid.
I'm trying to use the reverse debugging features of gdb 7.3.1 on a multi-threaded project (using libevent), but I get the following error:
(gdb) reverse-step
Target multi-thread does not support this command.
From this question, I thought perhaps that it was an issue loading libthread_db but, when I run the program, gdb says:
Starting program: /home/robb/slug/slug
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
How can I enable reverse debugging with gdb 7.3.1 on a multi-threaded project? Is it possible?
To do this, you need to activate the instruction-recording target, by executing the command
record
from the point where you want to go forward and backward (remember that the recording will significantly slow down the execution, especially if you have several threads!)
I've just checked that it's working correctly:
(gdb) info threads
Id Target Id Frame
2 Thread 0x7ffff7860700 (LWP 5503) "a.out" hello (arg=0x601030) at test2.c:16
* 1 Thread 0x7ffff7fca700 (LWP 5502) "a.out" main (argc=2, argv=0x7fffffffe2e8) at test2.c:47
...
(gdb) next
49 p[i].id=i;
(gdb) reverse-next
47 for (i=0; i<n; i++)
...
17 printf("Hello from node %d\n", p->id);
(gdb) next
Hello from node 1
18 return (NULL);
(gdb) reverse-next
17 printf("Hello from node %d\n", p->id);
I have process that works perfectly in the same machine in 2 accounts but when i copy the process to other account and run the process im getting core dump.
when i run the process with strace in the end im getting :
--- SIGBUS (Bus error) # 0 (0) ---
+++ killed by SIGBUS (core dumped) +++
when i open the core dump im getting :
#0 0x000000360046fed3 in malloc_consolidate () from /lib64/libc.so.6
#1 0x00000036004723fd in _int_malloc () from /lib64/libc.so.6
#2 0x000000360047402a in malloc () from /lib64/libc.so.6
#3 0x00000036004616ba in __fopen_internal () from /lib64/libc.so.6
#4 0x0000000000fe9652 in LogMngr::OpenFile (this=0x2aaaaad17010, iLogIndex=0) at LogMngr.c:801
i can see it something with opening the file for logging , but why it only in one account and in the other is fine ?
You can get a SIGBUS from an unaligned memory access . Are you using something like mmap, shared memory regions, or something similar ?
Any core dump inside malloc always indicates heap corruption, and heap corruption in general is sneaky like that: it may never show up on machine A, sometimes show up on machine B, and always show up on machine C.
Valgrind will likely point you straight at the problem.