remote debugging gdb multiple process - linux

I am unable to debug a child process of a remote debugging session. I found a similar question How to debug the entry-point of fork-exec process in GDB?
I am following the same procedure, although for a remote target. Is follow-fork-mode child supported for remote targets ?
Following is my sample code..
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <sys/types.h>
4 #include <unistd.h>
5
6 int spawn (void)
7 {
8 pid_t child_pid;
9 /* Duplicate this process. */
10 child_pid = fork ();
11 if (child_pid != 0)
12 /* This is the parent process. */
13 return child_pid;
14 else {
15 /* Now execute PROGRAM, searching for it in the path. */
16 while(1)
17 {
18 static int i = 0;
19 if(i==0) /* break here for child */
20 {
21 printf("I am child\n");
22 }
23 else if(i==3)
24 {
25 return 1;
26 }
27 else
28 {
29 i = 0/0;
30 }
31 i++;
32 }
33 }
34 return 0;
35 }
36 int main ()
37 {
38 /* Spawn a child process running the .ls. command. Ignore the
39 returned child process ID. */
40 printf("Hello World..!!\n");
41 spawn ();
42 printf ("Bbye World..!!\n");
43 return 0;
44 }
Running it with gdb, I can set set break point in child.. all fine here.!!
sh-3.2# gdb fork
(gdb) set follow-fork-mode child
(gdb) set detach-on-fork off
(gdb) b 19
Breakpoint 1 at 0x80483d0: file fork-exec.c, line 19.
(gdb) c
The program is not being run.
(gdb) start
Breakpoint 2 at 0x8048437: file fork-exec.c, line 40.
Starting program: fork
main () at fork-exec.c:40
40 printf("Hello World..!!\n");
(gdb) c
Continuing.
Hello World..!!
[Switching to process 10649]
Breakpoint 1, spawn () at fork-exec.c:19
19 if(i==0) /* break here for child */
(gdb)
However if I try to catch child via gdbserver break point is lost..
sh-3.2# gdbserver :1234 fork &
[5] 10686
sh-3.2# Process fork created; pid = 10689
Listening on port 1234
Run as target remote
sh-3.2# gdb fork
(gdb) target remote localhost:1234
Remote debugging using localhost:1234
Remote debugging from host 127.0.0.1
[New Thread 10689]
0x00bd2810 in _start () from /lib/ld-linux.so.2
(gdb) break 19
Breakpoint 1 at 0x80483d0: file fork-exec.c, line 19.
(gdb) c
Continuing.
Hello World..!!
Bbye World..!!
Child exited with retcode = 0
Program exited normally.
Child exited with status 0
GDBserver exiting
What is the procedure to debug child process in embedded world. I know I can do a process attach, but I want to debug from the very beginning of the child process..

It is called follow-fork. No, it is not supported in gdbserver.

As a (dirty!) workaround, you can just add a sleep() call right after the fork() with a delay long enough for you to get the child PID, attach it with another instance of gdbserver and connect to it with gdb.

It should work with a modern gdb version, according to this bug-report.

Related

Interactively toggle output on and off in Linux?

For controlling output on Linux there is control-s and control-t, which provides a method for temporarily halting terminal output and then resuming it. On VMS in addition there was control-O, which would toggle all output on and off. This didn't pause output, it discarded it.
Is there an equivalent keyboard shortcut in Linux?
This comes up most often for me in gdb, when debugging programs which output millions of status lines. It would be very convenient to be able to temporarily send most of that to /dev/null rather than the screen, and then pick up with the output stream further on, having dispensed with a couple of million lines in between.
(Edited: The termios(3) man page mentions VDISCARD - and then says that it isn't going to work in POSIX or Linux. So it looks like this is out of the question for general command line use on linux. gdb might still be able to discard output though, through one of its own commands. Can it?)
Thanks.
On VMS in addition there was control-O ...
This functionality doesn't appear to exist on any UNIX system I've ever dealt with (or maybe I just never knew it existed; it's documented in e.g. FreeBSD man page, and is referenced by Solaris and HP-UX docs as well).
gdb might still be able to discard output though, through one of its own commands. Can it?
I don't believe so: GDB doesn't actually intercept the output from the inferior (being debugged) process, it simply makes it run (between breakpoints) with the inferior output going to wherever it's going.
That said, you could do it yourself:
#include <stdio.h>
int main()
{
int i;
for (i = 0; i < 1000; ++i) {
printf("%d\n", i);
}
}
gcc -g foo.c
gdb -q ./a.out
(gdb) break 6
Breakpoint 1 at 0x40053e: file foo.c, line 6.
(gdb) run 20>/dev/null # run the program, file descriptor 20 goes to /dev/null
Starting program: /tmp/a.out 20>/dev/null
Breakpoint 1, main () at foo.c:6
6 printf("%d\n", i);
(gdb) c
Continuing.
0
Breakpoint 1, main () at foo.c:6
6 printf("%d\n", i);
We've now run two iterations. Let's prevent further output for 100 iterations:
(gdb) call dup2(20, 1)
$1 = 1
(gdb) ign 1 100
Will ignore next 100 crossings of breakpoint 1.
(gdb) c
Continuing.
Breakpoint 1, main () at foo.c:6
6 printf("%d\n", i);
(gdb) p i
$2 = 102
No output, as desired. Now let's restore output:
(gdb) call dup2(2, 1)
$3 = 1
(gdb) ign 1 10
Will ignore next 10 crossings of breakpoint 1.
(gdb) c
Continuing.
102
103
104
105
106
107
108
109
110
111
112
Breakpoint 1, main () at foo.c:6
6 printf("%d\n", i);
Output restored!

Suppressing the segfault signal

I am analyzing a set of buggy programs that under some test they may terminate with segfault. The segfault event is logged in /var/log/syslog.
For example the following snippet returns Segmentation fault and it is logged.
#!/bin/bash
./test
My question is how to suppress the segfault such that it does NOT appear in the system log. I tried trap to capture the signal in the following script:
#!/bin/bash
set -bm
trap "echo 'something happened'" {1..64}
./test
It returns:
Segmentation fault
something happened
So, it does traps the segfault but the segfault is still logged.
kernel: [81615.373989] test[319]: segfault at 0 ip 00007f6b9436d614
sp 00007ffe33fb77f8 error 6 in libc-2.19.so[7f6b942e1000+1bb000]
You can try to change ./test to the following line:
. ./test
This will execute ./test in the same shell.
We can suppress the log message system-wide with e. g.
echo 0 >/proc/sys/debug/exception-trace
- see also
Making the Linux kernel shut up about segfaulting user programs
Is there a way to temporarily disable segfault messages in dmesg?
We can suppress the log message for a single process if we run it under ptrace() control, as in a debugger. This program does that:
exe.c
#include <sys/wait.h>
#include <sys/ptrace.h>
main(int argc, char *args[])
{
pid_t pid;
if (*++args)
if (pid = fork())
{
int status;
while (wait(&status) > 0)
{
if (!WIFSTOPPED(status))
return WIFSIGNALED(status) ? 128+WTERMSIG(status)
: WEXITSTATUS(status);
int signal = WSTOPSIG(status);
if (signal == SIGTRAP) signal = 0;
ptrace(PTRACE_CONT, pid, 0, signal);
}
perror("wait");
}
else
{
ptrace(PTRACE_TRACEME, 0, 0, 0);
execvp(*args, args);
perror(*args);
}
return 1;
}
It is called with the buggy program as its argument, in your case
exe ./test
- then the exit status of exe normally is the exit status of test, but if test was terminated by signal n (11 for Segmentation fault), it is 128+n.
After I wrote this, I realized that we can also use strace for the purpose, e. g.
strace -enone ./test

How to finding all runnable processes

I'm learning about the scheduler and trying to print all runnable proceeses. So I have written a kernel module that uses the for_each_process macro to iterate over all processes, and prints the ones at "runnable" state. But this seems like a stupid (and inefficient) way of doing this. So I thought about getting a reference to all running queues and use their Red-Black-Tree to go over the runnable processes, but couldn't find a way to do this.
I have found out that there is a list of sched_classs for each CPU which are stop_sched_class->rt_sched_class->fair_sched_class->idle_sched_class and each one of them has it's own running queue. But couldn't find a way to reach them all.
I have used the module that uses the tasks_timeline to find all runnable processes, to print the address of the running queues - seems I have 3 running queues (while having only two processors).
The module:
#include <linux/module.h> /* Needed by all modules */
#include <linux/kernel.h> /* Needed for KERN_INFO */
#include <linux/sched.h>
MODULE_LICENSE("GPL");
struct cfs_rq {
struct load_weight load;
unsigned int nr_running, h_nr_running;
};
void printList(void){
int count;
struct task_struct * tsk;
count = 0;
for_each_process(tsk){
if(tsk->state)
continue;
printk("pid: %d rq: %p (%d)\n", tsk->pid, tsk->se.cfs_rq, tsk->se.cfs_rq->nr_running);
count++;
}
printk("count is: %d\n", count);
}
int init_module(void)
{
printList();
return 0;
}
void cleanup_module(void)
{
printk(KERN_INFO "Goodbye world proc.\n");
}
The output:
[ 8215.627038] pid: 9147 ffff88007bbe9200 (3)
[ 8215.627043] pid: 9148 ffff8800369b0200 (2)
[ 8215.627045] pid: 9149 ffff8800369b0200 (2)
[ 8215.627047] pid: 9150 ffff88007bbe9200 (3)
[ 8215.627049] pid: 9151 ffff88007bbe9200 (3)
[ 8215.627051] pid: 9154 ffff8800a46d4600 (1)
[ 8215.627053] count is: 6
[ 8215.653741] Goodbye world proc.
About the computer:
$ uname -a
Linux k 3.13.0-39-generic #66-Ubuntu SMP Tue Oct 28 13:30:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/cpuinfo | grep 'processor' | wc -l
2
So my questions are:
How can I print all runnable processes in a nicer way?
How are running queues made and managed?
Are the running queues somehow linked each other? (How?)
$ps -A -l and find the instance where both the process state (R) and the Process Flags (1) are as mentioned.
You can try this below cmd.
Sample output.
127:~$ ps -A -l | grep -e R -e D
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
1 S 0 1367 2 0 80 0 - 0 - ? 00:00:01 SEPDRV_ABNORMAL
4 R 1000 2634 2569 2 80 0 - 794239 - ? 00:25:06 Web Content
1 D 0 20091 2 0 80 0 - 0 - ? 00:00:00 kworker/3:2
4 R 1000 21077 9361 0 80 0 - 7229 - pts/17 00:00:00 ps

Issues regarding mutexes on POSIX threads

I'm having some issues with the following code. I just can't seem to find the bug:
1 #include <pthread.h>
2 #include <stdio.h>
3 #include <stdlib.h>
4 #include <unistd.h>
5
6 struct bla
7 {
8 pthread_mutex_t mut;
9 int x;
10 };
11
12
13 pthread_t tid;
14
15 void *thr_fn1 (struct bla *fp);
16
17 int main()
18 {
19 struct bla *fp;
20 fp = (struct bla *) malloc (sizeof(struct bla));
21 fp->x=3;
22 printf ("Initializing mutex_init\n");
23 pthread_mutex_init (&fp->mut,NULL);
24 pthread_create (&tid,NULL,thr_fn1,fp);
25 sleep (2);
26 printf ("Main thread ended sleep. Incrementing.\n");
27 fp->x++;
28 pthread_join (tid,NULL);
29 printf ("x=%d\n",fp->x);
30 pthread_mutex_destroy(&fp->mut);
31 return 0;
32
33 }
34
35 void *thr_fn1 (struct bla *fp)
36 {
37 printf ("Locking new thread!\n");
38 pthread_mutex_lock (&fp->mut);
39 printf ("Sleeping.\n");
40 sleep (5);
41 pthread_mutex_unlock (&fp->mut);
42 printf ("Thread unlocked.\n");
43 return ((void *) 0);
44 }
Why does the value still get incremented at line 27? Shouldn't it be protected by the mutex in the second thread by the lock (line 38)?
Thanks!
There is no automatic association between mutexes and data. If you want a mutex to protect some particular set of data, you are responsible for locking and unlocking the mutex around accesses to that data:
sleep (2);
pthread_mutex_lock(&fp->mut);
printf ("Main thread ended sleep. Incrementing.\n");
fp->x++;
pthread_mutex_unlock(&fp->mut);

MPI_Comm_size Segmentation fault

Mhm,Hello,everyone.I get these errors when running parallel program wiht MPI and OpenMP in Linux,
[node65:03788] *** Process received signal ***
[node65:03788] Signal: Segmentation fault (11)
[node65:03788] Signal code: Address not mapped (1)
[node65:03788] Failing at address: 0x44000098
[node65:03788] [ 0] /lib64/libpthread.so.0 [0x2b663e446c00]
[node65:03788] [ 1] /public/share/mpi/openmpi- 1.4.5//lib/libmpi.so.0(MPI_Comm_size+0x60) [0x2b663d694360]
[node65:03788] [ 2] fdtd_3D_xyzPML_MPI_OpenMP(main+0xaa) [0x42479a]
[node65:03788] [ 3] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b663e56f184]
[node65:03788] [ 4] fdtd_3D_xyzPML_MPI_OpenMP(_ZNSt8ios_base4InitD1Ev+0x39) [0x405d79]
[node65:03788] *** End of error message ***
-----------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 3787 on node node65 exited on signal 11 (Segmentation fault).
-----------------------------------------------------------------------------
After I analysis the core files,I get following message:
[Thread debugging using libthread_db enabled]
[New Thread 47310344057648 (LWP 26962)]
[New Thread 1075841344 (LWP 26966)]
[New Thread 1077942592 (LWP 26967)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 47310344057648 (LWP 26962)]
0x00002b074afb3360 in PMPI_Comm_size () from /public/share/mpi/openmpi-1.4.5//lib/libmpi.so.0
what causes these? Thanks for your help
the code(test.cpp) is as follows,and you can have a try:
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
#include "mpi.h"
int main(int argc, char* argv[])
{
int nprocs = 1; //the number of processes
int myrank = 0;
int provide;
MPI_Init_thread(&argc,&argv,MPI_THREAD_FUNNELED,&provide);
if (MPI_THREAD_FUNNELED != provide)
{
printf ("%d != required %d", MPI_THREAD_FUNNELED, provide);
return 0;
}
MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myrank);
int num_threads = 1; //Openmp
omp_set_dynamic(1);
num_threads = 16;
omp_set_num_threads(num_threads);
#pragma omp parallel
{
printf ("%d omp thread from %d mpi process\n", omp_get_thread_num(), myrank);
}
MPI_Finalize();
}
Well, this is probably not much, or even a bit of a lame answer, but I had this problem when mixing up different MPI installations (an OpenMPI and a MVAPICH2 to be precise).
Here are a few things to check
against what version of MPI you linked
ldd <application> | grep -i mpi
libmpi.so.1 => /usr/lib64/mpi/gcc/openmpi/lib64/libmpi.so.1 (0x00007f90c03cc000)
what version of MPI is dynamically loaded
echo $LD_LIBRARY_PATH | tr : "\n" | grep -i mpi
/usr/lib64/mpi/gcc/openmpi/lib64
whether you override this dynamic loading (this variable should be empty, unless you know what you're doing)
echo $LD_PRELOAD
If that's all OK, you need to check that each library you linked to and that relies on MPI was also linked with the same version. If no other library is linked to MPI, nothing should appear.
ldd <application> | sed "s/^\s*\(.*=> \)\?//;s/ (0x[0-9a-fA-F]*)$//" | xargs -L 1 ldd | grep -i mpi
If something suspect does show up, say libmpich.so.3 => /usr/lib64/mpi/gcc/MVAPICH2/1.8.1/lib/libmpich.so.3 for example, you should remove the -L 1 and replace grep with something to visualize (nothing ? or less, or vim - ...), then search for that suspect line.

Resources