gdbserver can't intrrupt "SOME" process,kill(pid,2) called by gdbserver didn't send SIGINT to process,what's happening? - linux

Envirment is:
target:x86_64 client,runs the program which is striped
Host:x86_64 server ,has code,toolchain,striped program,symbles file for debug
run gdbserver on target:
%gdbserver --multi :1234 /pathtolog/gdb.log
run program on target:
./someprogram &
[1] PID
run gdb on host:
%gdb
(gdb)target extended-remote TARGETIP:1234
(gdb)file someprogram
(gdb)setrootfs pathtorootfs
(gdb)...//set lib path etc.
(gdb)attach PID
...//load everything as normal
...//stop somewhere
(gdb)c
^C^CThe target is not responding to interrupt requests.
Stop debugging it? (y or n)
tried to find the root cause:
on the target:
gdb attach to gdbserver(yes I can use gdb on the target right now,but the target machine shall be released without gdb,symbles,etc. for size).
(gdb) b kill
Breakpoint 1 at 0xf760afb0
(gdb) c
Continuing.
when press ctrl+c from host gdb ,gdbserver will break into the breakpoint
Breakpoint 1, 0xf760afb0 in kill () from /lib/libc.so.6
(gdb)
I'v checked register,the %esp register shows like this:
(gdb) x /32wx 0xffee8070
0xffee8070: 0xfffffe0c 0x00000002 0x00000001 0x00000000
0xfffffe0c = -PID
0x00000002 = SIGINT
some program will get the signalwhen gdbserver continue .
so,kill() is good for "SOME PROGRAM",not all.
And I'v use tcpdump monitored data between gdb/gdbserver.
If kill() worked (for "GOOD" program),gdbserver will send a packet to gdb.
I'v tried sigmonitor,found out gdbserver didn't send any sigal to "BAD program" in this case.but I can call kill(pid,2) int gdbserver debuging gdb process
(gdb) call kill(PID,2)
then dmesg shows like this
[11902.060722] ==========send_signal===========
SIG 2 to 6141[a.out], tgid=6141
...
SIG 19 to 6142[a.out], tgid=6141
[11902.111135] Task Tree of 6142 = {
...
Any ideas?

Found out a possible match bug of gdbserver.
parameter of kill() called by gdbserver is -PID,not PID.
gdbserver sends SIGINT not to the process, but to the process group (-signal_pid).
But the attached process is not always a process group leader.
If not, "kill (-signal_pid, SIGINT)" returns error and fails to interrupt the attached process.
static void linux_request_interrupt (void)
{
/* Send a SIGINT to the process group. This acts just like the user
typed a ^C on the controlling terminal. */
- kill (-signal_pid, SIGINT);
+ kill (signal_pid, SIGINT);
}
This problem remained in gdb-8.1,don't know why they don't think it's a problem.

Related

Why does POSIX demand that system(3) ignores SIGINT and SIGQUIT?

The POSIX spec says
The system() function shall ignore the SIGINT and SIGQUIT signals, and shall block the SIGCHLD signal, while waiting for the command to terminate. If this might cause the application to miss a signal that would have killed it, then the application should examine the return value from system() and take whatever action is appropriate to the application if the command terminated due to receipt of a signal.
This means that a program that starts a long-running sub-process will have SIGINT and SIGQUIT blocked for a long time. Here is a test program compiled on my Ubuntu 18.10 laptop:
$ cat > test_system.c <<< EOF
#include <stdlib.h>
int main() {
system("sleep 86400"); // Sleep for 24 hours
}
EOF
$ gcc test_system.c -o test_system
If I start this test program running in the background...
$ ./test_system &
[1] 7489
..Then I can see that SIGINT(2) and SIGQUIT(3) are marked as ignored in the bitmask.
$ ps -H -o pid,pgrp,cmd,ignored
PID PGRP CMD IGNORED
6956 6956 -bash 0000000000380004
7489 7489 ./test_system 0000000000000006
7491 7489 sh -c sleep 86400 0000000000000000
7492 7489 sleep 86400 0000000000000000
Trying to kill test_system with SIGINT has no effect..
$ kill -SIGINT 7489
.. But sending SIGINT to the process group does kill it (this is expected, it means that every process in the process group receives the signal - sleep will exit and system will return).
$ kill -SIGINT -7489
[1]+ Done ./test_system
Questions
What is the purpose of having SIGINT and SIGQUIT ignored since the process can still be killed via the process group (that's what happens when you do a ^C in the terminal).
Bonus question: Why does POSIX demand that SIGCHLD should be blocked?
Update If SIGINT and SIGQUIT are ignored to ensure we don't leave children behind, then why is there no handling for SIGTERM - it's the default signal sent by kill!
SIGINT and SIGQUIT are terminal generated signals. By default, they're sent to the foreground process group when you press Ctrl+C or Ctrl+\ respectively.
I believe the idea for ignoring them while running a child via system is that the terminal should be as if it was temporarily owned by the child and Ctrl+C or Ctrl+\ should temporarily only affect the child and its descendants, not the parent.
SIGCHLD is blocked so that system's the SIGCHLD caused by the child terminating won't trigger a SIGCHLD handler if you have one, because such a SIGCHLD handler might reap the child started by system before system reaps it.

gdb do not echo my input until I press Enter [duplicate]

I have a program running on a remote machine which expects to receive SIGINT from the parent. That program needs to receive that signal to function correctly. Unfortunately, if I run that process remotely over SSH and send SIGINT, the ssh process itself traps and interrupts rather than forwarding the signal.
Here's an example of this behavior using GDB:
Running locally:
$ gdb
GNU gdb 6.3.50-20050815 (Apple version gdb-1344) (Fri Jul 3 01:19:56 UTC 2009)
...
This GDB was configured as "x86_64-apple-darwin".
^C
(gdb) Quit
^C
(gdb) Quit
^C
(gdb) Quit
Running remotely:
$ ssh foo.bar.com gdb
GNU gdb Red Hat Linux (6.3.0.0-1.159.el4rh)
...
This GDB was configured as "i386-redhat-linux-gnu".
(gdb) ^C
Killed by signal 2.
$
Can anybody suggest a way of working around this problem? The local ssh client is OpenSSH_5.2p1.
$ ssh -t foo.bar.com gdb
...
(gdb) ^C
Quit
Try signal SIGINT at the gdb prompt.
It looks like you're doing ctrl+c. The problem is that your terminal window is sending SIGINT to the ssh process running locally, not to the process on the remote system.
You'll have to specify a signal manually using the kill command or system call on the remote system.
or more conveniently using killall
$killall -INT gdb
Can you run a terminal on the remote machine and use kill -INT to send it the signal?

Signal handling with qemu-user

On my machine I have an aarch64 binary, that is statically compiled. I run it using qemu-aarch64-static with the -g 6566 flag. In another terminal I start up gdb-multiarch and connect as target remote localhost:6566.
I expect the binary to raise a signal for which I have a handler defined in the binary. I set a breakpoint at the handler from inside gdb-multiarch after connecting to remote. However, when the signal arises, the breakpoint is not hit on gdb-multiarch. Instead, on the terminal that runs the binary, I get a message along the lines of :-
[1] + 8388 suspended (signal) qemu-aarch64-static -g 6566 ./testbinary
Why does this happen? How can I set a breakpoint on the handler and debug it? I've tried SIGCHLD and SIGFPE.
This works for me with a recent QEMU:
$ cat sig.c
#include <stdlib.h>
#include <signal.h>
#include <stdio.h>
void handler(int sig) {
printf("In signal handler, signal %d\n", sig);
return;
}
int main(void) {
printf("hello world\n");
signal(SIGUSR1, handler);
raise(SIGUSR1);
printf("done\n");
return 0;
}
$ aarch64-linux-gnu-gcc -g -Wall -o sig sig.c -static
$ qemu-aarch64 -g 6566 ./sig
and then in another window:
$ gdb-multiarch
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
[etc]
(gdb) set arch aarch64
The target architecture is assumed to be aarch64
(gdb) file /tmp/sigs/sig
Reading symbols from /tmp/sigs/sig...done.
(gdb) target remote :6566
Remote debugging using :6566
0x0000000000400c98 in _start ()
(gdb) break handler
Breakpoint 1 at 0x400e44: file sig.c, line 6.
(gdb) c
Continuing.
Program received signal SIGUSR1, User defined signal 1.
0x0000000000405c68 in raise ()
(gdb) c
Continuing.
Breakpoint 1, handler (sig=10) at sig.c:6
6 printf("In signal handler, signal %d\n", sig);
(gdb)
As you can see, gdb gets control both immediately the process receives the signal and then again when we hit the breakpoint for the handler function.
Incidentally, (integer) dividing by zero is not a reliable way to provoke a signal. This is undefined behaviour in C, and the implementation is free to do the most convenient thing. On x86 this typically results in a SIGFPE. On ARM you will typically find that the result is zero and execution will continue without a signal. (This is a manifestation of the different behaviour of the underlying hardware instructions for division between the two architectures.)
i was doing some R&D for your answer and find following answer
"Internally, bad memory accesses result in the Mach exception EXC_BAD_ACCESS being sent to the program. Normally, this is translated into a SIGBUS UNIX signal. However, gdb intercepts Mach exceptions directly, before the signal translation. The solution is to give gdb the command set dont-handle-bad-access 1 before running your program. Then the normal mechanism is used, and breakpoints inside your signal handler are honored."
The link is gdb: set a breakpoint for a SIGBUS handler
It perhaps help you by considering that qemu does not change the functionality of base operations

What does killed mean, as response to loading a program in Fedora linux?

I have an assembler program with a simple structure, a text segment and a bss segment. Similar programs have been compiled by me over a decade. It is a Forth compiler and I play tricks with the elf header.
I'm used to it that if I mess up the elf header, I can't start the program and the loader says "killed" before it even segfaults.
But now I've a user of a Fedora version 6 linux, who does the following:
as -32 lina.s
ld a.out -N -melf_i386 -o lina
./lina
and get a message "killed" and 137 as result of 'echo $?'
Clearly this procedure uses only official tools, such that the elf header should at least be valid.
The exact same procedure on other systems like my ubuntu or Debian systems lead to programs that work normally. The objdumps of the resulting programs are the same at least what the mapping of segments is concerned.
Please give me some indication of what is going on here, I have no clue of how to tackle this problem.
I'd like to stress that probably no instruction is executed, i.e. gdb refuses to run it. Like so
(gdb) run
Starting program: /home/gerard/Desktop/lina-5.1/glina32
Warning:
Cannot insert breakpoint -2.
Error accessing memory address 0x8048054: Input/output error.
(gdb)
In rare cases if an error occurs while when a process tries to execute a new program the Linux kernel will send a SIGKILL signal to that process instead of returning an error. That signal will result in the shell printing "Killed", rather than a more useful error message like "out of memory". Something about the executable you've created triggers an error that the kernel can only recover from by killing the process that tried to execute it.
Normally shells execute a program by making two system calls: fork and execve. The first system call creates a new process, but doesn't load a new executable. Instead the fork system call duplicates the process that invoked it. The second system call loads a new executable but doesn't create a new process. Instead the program running in the process is replaced by a new program from the executable.
In the process of performing the execve system call the kernel needs to discard the previous contents of the process's address space so it can completely replace it with an entirely new one. After this point the execve system call can no longer return an error code to the program that invoked it as that program no longer exists. If error occurs after this point which prevents the executable from loading then the kernel has no other option but to kill the process.
This behaviour is documented in the Linux execve(2) man page:
In most cases where execve() fails, control returns to the original
executable image, and the caller of execve() can then handle the
error. However, in (rare) cases (typically caused by resource
exhaustion), failure may occur past the point of no return: the
original executable image has been torn down, but the new image could
not be completely built. In such cases, the kernel kills the process
with a SIGKILL signal.
The message is printed by bash, according to what signal number terminated the process. "Killed" means the process received SIGKILL:
$ pgrep cat # check that there aren't other cat processes running that you might not want to kill while doing this
$ cat # in another terminal, pkill cat
Terminated
$ cat # in another terminal, pkill -9 cat (or pkill -KILL cat)
Killed
$ cat # pkill -QUIT cat or hit control-\
Quit (core dumped)
$ cat # pkill -STOP cat or hit control-z
[1]+ Stopped cat
$ fg
cat # pkill -BUS cat
Bus error (core dumped)
$ cat # pkill -PWR cat
Power failure
Bash doesn't print anything for SIGINT, because that's what control-C sends.
Run kill -l to list signal abbreviations.
$ strace cat # then pkill -KILL cat
... dynamic library opening and mapping, etc. etc.
read(0, <unfinished ...>) = ?
+++ killed by SIGKILL +++
Killed
I can't reproduce your problem with as -32 hello.s / ld -N -melf_i386 to make an executable that my kernel won't run, or that receives SIGKILL right away.
With gcc -m32 -c / ld -N, or with gcc -m32 -E hello.S > hello.s && as -32, I get a binary that prints Hello World (using sys_write and sys_exit).
// hello.S a simple example I had lying around
// Use gcc -m32 -E hello.S > hello.s to create input for as -32
#include <asm/unistd.h>
#include <syscall.h>
#define STDOUT 1
.data # should really be .rodata
hellostr:
.ascii "hello wolrd\n";
helloend:
.text
.globl _start
_start:
movl $(SYS_write) , %eax //ssize_t write(int fd, const void *buf, size_t count);
movl $(STDOUT) , %ebx
movl $hellostr , %ecx
movl $(helloend-hellostr) , %edx
int $0x80
movl $(SYS_exit), %eax //void _exit(int status);
xorl %ebx, %ebx
int $0x80
ret
You can start by using strace, to see which syscalls, if any, are issued by the executable, prior to it killing itself.
Looking at the syscalls will often point towards a clue, as to where the problem lies.
The same ininformative message "killed" appears if you're trying to run a 64 bit program on a 32 bit Linux. So my interpretation is that it is a message from the shell if it tried to load a program, and somehow didn't manage to run it.

How to terminate gdbserver?

I am trying to debug with gdbserver. after I terminat the gdb client on the host I see that the gdbserver is still listening :
Remote side has terminated connection. GDBserver will reopen the connection.
Listening on port 5004
I tried to exit gdbserver with everything I have found anywhere no luck: quit,exit,q, monitor exit,Esc,Cnt+c... nothing kills it. Moreover, when I opened another terminal and looked for the process running gdbserver (with the commands ps,top) I couldn't find it there...
my question is - How to terminate gdbserver ?
Give command
monitor exit
from your host gdb before terminating the client. If you have already terminated it, just attach with another one.
monitor exit step-by-step
https://stackoverflow.com/a/23647002/895245 mentions it, but this is the full setup you need.
Remote:
# pwd contains cross-compiled ./myexec
gdbserver --multi :1234
Local:
# pwd also contains the same cross-compiled ./myexec
gdb -ex 'target extended-remote 192.168.0.1:1234' \
-ex 'set remote exec-file ./myexec' \
--args ./myexec arg1
(gdb) r
[Inferior 1 (process 1234) exited normally]
(gdb) monitor exit
Tested in Ubuntu 14.04.
gdbserver runs on the target, not the host.
Terminating it is target dependent. For example, if your target is UNIX-ish, you could remote login and use ps and kill from a target shell.
For any type of target, rebooting should kill gdbserver.
(If this isn't enough to answer your question, include more information about the target in the question.)
on linux write:
ps -ef |grep gdbserver
Now find the pid of the gdbserver process and then
kill -9 <pid>
Here is a script which I'm using to start gdb server via ssh and kill it when necessary with ctrl+c
#!/usr/bin/env bash
trap stop_gdb_server INT
function stop_gdb_server {
ssh remote-srv-name "pkill gdbserver"
echo "GDB server killed"
}
ssh remote-srv-name "cd /path/to/project/dir/ && gdbserver localhost:6789 my-executable"
quit [expression]
q
To exit GDB, use the quit command (abbreviated q), or type an end-of-file character (usually C-d). If you do not supply expression, GDB will terminate normally; otherwise it will terminate using the result of expression as the error code.
gdbserver should exit when your target exits. The question is how your target is exiting: does it
do nothing: just fall through
return 0 in main
exit(0) in main
From the debug sessions I've been running, in the first case, gdbserver will not exit. It will just hang around forever and you have to kill it. In the latter two cases, gdbserver will exit.

Resources