Signal handling with qemu-user - linux

On my machine I have an aarch64 binary, that is statically compiled. I run it using qemu-aarch64-static with the -g 6566 flag. In another terminal I start up gdb-multiarch and connect as target remote localhost:6566.
I expect the binary to raise a signal for which I have a handler defined in the binary. I set a breakpoint at the handler from inside gdb-multiarch after connecting to remote. However, when the signal arises, the breakpoint is not hit on gdb-multiarch. Instead, on the terminal that runs the binary, I get a message along the lines of :-
[1] + 8388 suspended (signal) qemu-aarch64-static -g 6566 ./testbinary
Why does this happen? How can I set a breakpoint on the handler and debug it? I've tried SIGCHLD and SIGFPE.

This works for me with a recent QEMU:
$ cat sig.c
#include <stdlib.h>
#include <signal.h>
#include <stdio.h>
void handler(int sig) {
printf("In signal handler, signal %d\n", sig);
return;
}
int main(void) {
printf("hello world\n");
signal(SIGUSR1, handler);
raise(SIGUSR1);
printf("done\n");
return 0;
}
$ aarch64-linux-gnu-gcc -g -Wall -o sig sig.c -static
$ qemu-aarch64 -g 6566 ./sig
and then in another window:
$ gdb-multiarch
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
[etc]
(gdb) set arch aarch64
The target architecture is assumed to be aarch64
(gdb) file /tmp/sigs/sig
Reading symbols from /tmp/sigs/sig...done.
(gdb) target remote :6566
Remote debugging using :6566
0x0000000000400c98 in _start ()
(gdb) break handler
Breakpoint 1 at 0x400e44: file sig.c, line 6.
(gdb) c
Continuing.
Program received signal SIGUSR1, User defined signal 1.
0x0000000000405c68 in raise ()
(gdb) c
Continuing.
Breakpoint 1, handler (sig=10) at sig.c:6
6 printf("In signal handler, signal %d\n", sig);
(gdb)
As you can see, gdb gets control both immediately the process receives the signal and then again when we hit the breakpoint for the handler function.
Incidentally, (integer) dividing by zero is not a reliable way to provoke a signal. This is undefined behaviour in C, and the implementation is free to do the most convenient thing. On x86 this typically results in a SIGFPE. On ARM you will typically find that the result is zero and execution will continue without a signal. (This is a manifestation of the different behaviour of the underlying hardware instructions for division between the two architectures.)

i was doing some R&D for your answer and find following answer
"Internally, bad memory accesses result in the Mach exception EXC_BAD_ACCESS being sent to the program. Normally, this is translated into a SIGBUS UNIX signal. However, gdb intercepts Mach exceptions directly, before the signal translation. The solution is to give gdb the command set dont-handle-bad-access 1 before running your program. Then the normal mechanism is used, and breakpoints inside your signal handler are honored."
The link is gdb: set a breakpoint for a SIGBUS handler
It perhaps help you by considering that qemu does not change the functionality of base operations

Related

gdbserver can't intrrupt "SOME" process,kill(pid,2) called by gdbserver didn't send SIGINT to process,what's happening?

Envirment is:
target:x86_64 client,runs the program which is striped
Host:x86_64 server ,has code,toolchain,striped program,symbles file for debug
run gdbserver on target:
%gdbserver --multi :1234 /pathtolog/gdb.log
run program on target:
./someprogram &
[1] PID
run gdb on host:
%gdb
(gdb)target extended-remote TARGETIP:1234
(gdb)file someprogram
(gdb)setrootfs pathtorootfs
(gdb)...//set lib path etc.
(gdb)attach PID
...//load everything as normal
...//stop somewhere
(gdb)c
^C^CThe target is not responding to interrupt requests.
Stop debugging it? (y or n)
tried to find the root cause:
on the target:
gdb attach to gdbserver(yes I can use gdb on the target right now,but the target machine shall be released without gdb,symbles,etc. for size).
(gdb) b kill
Breakpoint 1 at 0xf760afb0
(gdb) c
Continuing.
when press ctrl+c from host gdb ,gdbserver will break into the breakpoint
Breakpoint 1, 0xf760afb0 in kill () from /lib/libc.so.6
(gdb)
I'v checked register,the %esp register shows like this:
(gdb) x /32wx 0xffee8070
0xffee8070: 0xfffffe0c 0x00000002 0x00000001 0x00000000
0xfffffe0c = -PID
0x00000002 = SIGINT
some program will get the signalwhen gdbserver continue .
so,kill() is good for "SOME PROGRAM",not all.
And I'v use tcpdump monitored data between gdb/gdbserver.
If kill() worked (for "GOOD" program),gdbserver will send a packet to gdb.
I'v tried sigmonitor,found out gdbserver didn't send any sigal to "BAD program" in this case.but I can call kill(pid,2) int gdbserver debuging gdb process
(gdb) call kill(PID,2)
then dmesg shows like this
[11902.060722] ==========send_signal===========
SIG 2 to 6141[a.out], tgid=6141
...
SIG 19 to 6142[a.out], tgid=6141
[11902.111135] Task Tree of 6142 = {
...
Any ideas?
Found out a possible match bug of gdbserver.
parameter of kill() called by gdbserver is -PID,not PID.
gdbserver sends SIGINT not to the process, but to the process group (-signal_pid).
But the attached process is not always a process group leader.
If not, "kill (-signal_pid, SIGINT)" returns error and fails to interrupt the attached process.
static void linux_request_interrupt (void)
{
/* Send a SIGINT to the process group. This acts just like the user
typed a ^C on the controlling terminal. */
- kill (-signal_pid, SIGINT);
+ kill (signal_pid, SIGINT);
}
This problem remained in gdb-8.1,don't know why they don't think it's a problem.

Why can GDB mask tracee's SIGKILL when attaching to the tracee

The signal(7) man page says that SIGKILL cannot be caught, blocked, or ignored. But I just observed that after attaching to a process with GDB, I can no longer send SIGKILL to that process (similarly, other signal cannot be delivered either). But after I detach and quit GDB, SIGKILL is delivered as usual.
It seems to me that GDB has blocked that signal (on behalf of the tracee) when attaching, and unblocked it when detaching. However, the ptrace(2) man page says:
While being traced, the tracee will stop each time a signal is delivered, even if the signal is being ignored. (An exception is SIGKILL, which has its usual effect.)
So why does it behave this way? What tricks is GDB using?
Here is an trivial example for demonstration:
1. test program
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <errno.h>
#include <string.h>
/* Simple error handling functions */
#define handle_error_en(en, msg) \
do { errno = en; perror(msg); exit(EXIT_FAILURE); } while (0)
struct sigaction act;
void sighandler(int signum, siginfo_t *info, void *ptr) {
printf("Received signal: %d\n", signum);
printf("signal originate from pid[%d]\n", info->si_pid);
}
int
main(int argc, char *argv[])
{
printf("Pid of the current process: %d\n", getpid());
memset(&act, 0, sizeof(act));
act.sa_sigaction = sighandler;
act.sa_flags = SA_SIGINFO;
sigaction(SIGQUIT, &act, NULL);
while(1) {
;
}
return 0;
}
If you try to kill this program using SIGKILL (i.e., using kill -KILL ${pid}), it will die as expected. If you try to send it SIGQUIT (i.e., using kill -QUIT ${pid}), those printf statements get executed, as expected. However, if you have attached it with GDB before sending it signal, nothing will happen:
$ ##### in shell 1 #####
$ gdb
(gdb) attach ${pid}
(gdb)
/* now that gdb has attached successfully, in another shell: */
$ #### in shell 2 ####
$ kill -QUIT ${pid} # nothing happen
$ kill -KILL ${pid} # again, nothing happen!
/* now gdb detached */
##### in shell 1 ####
(gdb) quit
/* the process will receive SIGKILL */
##### in shell 2 ####
$ Killed # the tracee receive **SIGKILL** eventually...
FYI, I am using a CentOS-6u3 and uname -r result in 2.6.32_1-16-0-0. My GDB version is: GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6) and my GCC version is: gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-19.el6). An old machine...
Any idea will be appreciated ;-)
$ ##### in shell 1 #####
$ gdb
(gdb) attach ${pid}
(gdb)
The issue is that once GDB has attached to ${pid}, the inferior (being debugged) process is no longer running -- it is stopped.
The kernel will not do anything to it until it is either continued (with the (gdb) continue command), or it is no longer being traced ((gdb) detach or quit).
If you issue continue (either before or after kill -QUIT), you'll see this:
(gdb) c
Continuing.
kill -QUIT $pid executed in another shell:
Program received signal SIGQUIT, Quit.
main (argc=1, argv=0x7ffdcc9c1518) at t.c:35
35 }
(gdb) c
Continuing.
Received signal: 3
signal originate from pid[123419]
kill -KILL executed in another window:
Program terminated with signal SIGKILL, Killed.
The program no longer exists.
(gdb)

What does killed mean, as response to loading a program in Fedora linux?

I have an assembler program with a simple structure, a text segment and a bss segment. Similar programs have been compiled by me over a decade. It is a Forth compiler and I play tricks with the elf header.
I'm used to it that if I mess up the elf header, I can't start the program and the loader says "killed" before it even segfaults.
But now I've a user of a Fedora version 6 linux, who does the following:
as -32 lina.s
ld a.out -N -melf_i386 -o lina
./lina
and get a message "killed" and 137 as result of 'echo $?'
Clearly this procedure uses only official tools, such that the elf header should at least be valid.
The exact same procedure on other systems like my ubuntu or Debian systems lead to programs that work normally. The objdumps of the resulting programs are the same at least what the mapping of segments is concerned.
Please give me some indication of what is going on here, I have no clue of how to tackle this problem.
I'd like to stress that probably no instruction is executed, i.e. gdb refuses to run it. Like so
(gdb) run
Starting program: /home/gerard/Desktop/lina-5.1/glina32
Warning:
Cannot insert breakpoint -2.
Error accessing memory address 0x8048054: Input/output error.
(gdb)
In rare cases if an error occurs while when a process tries to execute a new program the Linux kernel will send a SIGKILL signal to that process instead of returning an error. That signal will result in the shell printing "Killed", rather than a more useful error message like "out of memory". Something about the executable you've created triggers an error that the kernel can only recover from by killing the process that tried to execute it.
Normally shells execute a program by making two system calls: fork and execve. The first system call creates a new process, but doesn't load a new executable. Instead the fork system call duplicates the process that invoked it. The second system call loads a new executable but doesn't create a new process. Instead the program running in the process is replaced by a new program from the executable.
In the process of performing the execve system call the kernel needs to discard the previous contents of the process's address space so it can completely replace it with an entirely new one. After this point the execve system call can no longer return an error code to the program that invoked it as that program no longer exists. If error occurs after this point which prevents the executable from loading then the kernel has no other option but to kill the process.
This behaviour is documented in the Linux execve(2) man page:
In most cases where execve() fails, control returns to the original
executable image, and the caller of execve() can then handle the
error. However, in (rare) cases (typically caused by resource
exhaustion), failure may occur past the point of no return: the
original executable image has been torn down, but the new image could
not be completely built. In such cases, the kernel kills the process
with a SIGKILL signal.
The message is printed by bash, according to what signal number terminated the process. "Killed" means the process received SIGKILL:
$ pgrep cat # check that there aren't other cat processes running that you might not want to kill while doing this
$ cat # in another terminal, pkill cat
Terminated
$ cat # in another terminal, pkill -9 cat (or pkill -KILL cat)
Killed
$ cat # pkill -QUIT cat or hit control-\
Quit (core dumped)
$ cat # pkill -STOP cat or hit control-z
[1]+ Stopped cat
$ fg
cat # pkill -BUS cat
Bus error (core dumped)
$ cat # pkill -PWR cat
Power failure
Bash doesn't print anything for SIGINT, because that's what control-C sends.
Run kill -l to list signal abbreviations.
$ strace cat # then pkill -KILL cat
... dynamic library opening and mapping, etc. etc.
read(0, <unfinished ...>) = ?
+++ killed by SIGKILL +++
Killed
I can't reproduce your problem with as -32 hello.s / ld -N -melf_i386 to make an executable that my kernel won't run, or that receives SIGKILL right away.
With gcc -m32 -c / ld -N, or with gcc -m32 -E hello.S > hello.s && as -32, I get a binary that prints Hello World (using sys_write and sys_exit).
// hello.S a simple example I had lying around
// Use gcc -m32 -E hello.S > hello.s to create input for as -32
#include <asm/unistd.h>
#include <syscall.h>
#define STDOUT 1
.data # should really be .rodata
hellostr:
.ascii "hello wolrd\n";
helloend:
.text
.globl _start
_start:
movl $(SYS_write) , %eax //ssize_t write(int fd, const void *buf, size_t count);
movl $(STDOUT) , %ebx
movl $hellostr , %ecx
movl $(helloend-hellostr) , %edx
int $0x80
movl $(SYS_exit), %eax //void _exit(int status);
xorl %ebx, %ebx
int $0x80
ret
You can start by using strace, to see which syscalls, if any, are issued by the executable, prior to it killing itself.
Looking at the syscalls will often point towards a clue, as to where the problem lies.
The same ininformative message "killed" appears if you're trying to run a 64 bit program on a 32 bit Linux. So my interpretation is that it is a message from the shell if it tried to load a program, and somehow didn't manage to run it.

reboot within an initrd image

I am looking for a method to restart/reset my linux system from within an init-bottom script*. At the time my script is executed the system is found under /root and I have access to a busybox.
But the "reboot" command which is part of my busybox does not work. Is there any other possibility?
My system is booted normally with an initramfs image and my script is eventually causing an update process. The new systemd which comes with debian irritates this. But with a power reset everything is fine.
I have found this:
echo b >/proc/sysrq-trigger
(it's like pressing CTRL+ALT+DEL)
If you -are- init (the PID of your process/script is 0), then starting the busybox reboot program won't work since it tries to signal init (which is not started) to reboot.
Instead, as PID 0, you should do what init would do. This is call the correct kernel API for the reboot. See Man reboot(2) for details.
Assuming you are running a c program or something, one would do:
#include <unistd.h>
#include <sys/reboot.h>
void main() { reboot(0x1234567); }
This is much better than executing the sysrq trigger which will act more like a panic restart than a clean restart.
As a final note, busybox's init actually forks a process to do the reboot for it. This is because the reboot systemcall actually also exists the program, and the system should never run without an init process (which will also panic the kernel). Hence in this case, you would do something like:
pid_t pid;
pid = vfork();
if (pid == 0) { /* child */
reboot(0x1234567);
_exit(EXIT_SUCCESS);
}
while (1); /* Parent (init) waits */

How to generate a core dump in Linux on a segmentation fault?

I have a process in Linux that's getting a segmentation fault. How can I tell it to generate a core dump when it fails?
This depends on what shell you are using. If you are using bash, then the ulimit command controls several settings relating to program execution, such as whether you should dump core. If you type
ulimit -c unlimited
then that will tell bash that its programs can dump cores of any size. You can specify a size such as 52M instead of unlimited if you want, but in practice this shouldn't be necessary since the size of core files will probably never be an issue for you.
In tcsh, you'd type
limit coredumpsize unlimited
As explained above the real question being asked here is how to enable core dumps on a system where they are not enabled. That question is answered here.
If you've come here hoping to learn how to generate a core dump for a hung process, the answer is
gcore <pid>
if gcore is not available on your system then
kill -ABRT <pid>
Don't use kill -SEGV as that will often invoke a signal handler making it harder to diagnose the stuck process
To check where the core dumps are generated, run:
sysctl kernel.core_pattern
or:
cat /proc/sys/kernel/core_pattern
where %e is the process name and %t the system time. You can change it in /etc/sysctl.conf and reloading by sysctl -p.
If the core files are not generated (test it by: sleep 10 & and killall -SIGSEGV sleep), check the limits by: ulimit -a.
If your core file size is limited, run:
ulimit -c unlimited
to make it unlimited.
Then test again, if the core dumping is successful, you will see “(core dumped)” after the segmentation fault indication as below:
Segmentation fault: 11 (core dumped)
See also: core dumped - but core file is not in current directory?
Ubuntu
In Ubuntu the core dumps are handled by Apport and can be located in /var/crash/. However, it is disabled by default in stable releases.
For more details, please check: Where do I find the core dump in Ubuntu?.
macOS
For macOS, see: How to generate core dumps in Mac OS X?
What I did at the end was attach gdb to the process before it crashed, and then when it got the segfault I executed the generate-core-file command. That forced generation of a core dump.
Maybe you could do it this way, this program is a demonstration of how to trap a segmentation fault and shells out to a debugger (this is the original code used under AIX) and prints the stack trace up to the point of a segmentation fault. You will need to change the sprintf variable to use gdb in the case of Linux.
#include <stdio.h>
#include <signal.h>
#include <stdlib.h>
#include <stdarg.h>
static void signal_handler(int);
static void dumpstack(void);
static void cleanup(void);
void init_signals(void);
void panic(const char *, ...);
struct sigaction sigact;
char *progname;
int main(int argc, char **argv) {
char *s;
progname = *(argv);
atexit(cleanup);
init_signals();
printf("About to seg fault by assigning zero to *s\n");
*s = 0;
sigemptyset(&sigact.sa_mask);
return 0;
}
void init_signals(void) {
sigact.sa_handler = signal_handler;
sigemptyset(&sigact.sa_mask);
sigact.sa_flags = 0;
sigaction(SIGINT, &sigact, (struct sigaction *)NULL);
sigaddset(&sigact.sa_mask, SIGSEGV);
sigaction(SIGSEGV, &sigact, (struct sigaction *)NULL);
sigaddset(&sigact.sa_mask, SIGBUS);
sigaction(SIGBUS, &sigact, (struct sigaction *)NULL);
sigaddset(&sigact.sa_mask, SIGQUIT);
sigaction(SIGQUIT, &sigact, (struct sigaction *)NULL);
sigaddset(&sigact.sa_mask, SIGHUP);
sigaction(SIGHUP, &sigact, (struct sigaction *)NULL);
sigaddset(&sigact.sa_mask, SIGKILL);
sigaction(SIGKILL, &sigact, (struct sigaction *)NULL);
}
static void signal_handler(int sig) {
if (sig == SIGHUP) panic("FATAL: Program hanged up\n");
if (sig == SIGSEGV || sig == SIGBUS){
dumpstack();
panic("FATAL: %s Fault. Logged StackTrace\n", (sig == SIGSEGV) ? "Segmentation" : ((sig == SIGBUS) ? "Bus" : "Unknown"));
}
if (sig == SIGQUIT) panic("QUIT signal ended program\n");
if (sig == SIGKILL) panic("KILL signal ended program\n");
if (sig == SIGINT) ;
}
void panic(const char *fmt, ...) {
char buf[50];
va_list argptr;
va_start(argptr, fmt);
vsprintf(buf, fmt, argptr);
va_end(argptr);
fprintf(stderr, buf);
exit(-1);
}
static void dumpstack(void) {
/* Got this routine from http://www.whitefang.com/unix/faq_toc.html
** Section 6.5. Modified to redirect to file to prevent clutter
*/
/* This needs to be changed... */
char dbx[160];
sprintf(dbx, "echo 'where\ndetach' | dbx -a %d > %s.dump", getpid(), progname);
/* Change the dbx to gdb */
system(dbx);
return;
}
void cleanup(void) {
sigemptyset(&sigact.sa_mask);
/* Do any cleaning up chores here */
}
You may have to additionally add a parameter to get gdb to dump the core as shown here in this blog here.
There are more things that may influence the generation of a core dump. I encountered these:
the directory for the dump must be writable. By default this is the current directory of the process, but that may be changed by setting /proc/sys/kernel/core_pattern.
in some conditions, the kernel value in /proc/sys/fs/suid_dumpable may prevent the core to be generated.
There are more situations which may prevent the generation that are described in the man page - try man core.
For Ubuntu 14.04
Check core dump enabled:
ulimit -a
One of the lines should be :
core file size (blocks, -c) unlimited
If not :
gedit ~/.bashrc and add ulimit -c unlimited to end of file and save, re-run terminal.
Build your application with debug information :
In Makefile -O0 -g
Run application that create core dump (core dump file with name ‘core’ should be created near application_name file):
./application_name
Run under gdb:
gdb application_name core
In order to activate the core dump do the following:
In /etc/profile comment the line:
# ulimit -S -c 0 > /dev/null 2>&1
In /etc/security/limits.conf comment out the line:
* soft core 0
execute the cmd limit coredumpsize unlimited and check it with cmd limit:
# limit coredumpsize unlimited
# limit
cputime unlimited
filesize unlimited
datasize unlimited
stacksize 10240 kbytes
coredumpsize unlimited
memoryuse unlimited
vmemoryuse unlimited
descriptors 1024
memorylocked 32 kbytes
maxproc 528383
#
to check if the corefile gets written you can kill the relating process with cmd kill -s SEGV <PID> (should not be needed, just in case no core file gets written this can be used as a check):
# kill -s SEGV <PID>
Once the corefile has been written make sure to deactivate the coredump settings again in the relating files (1./2./3.) !
Ubuntu 19.04
All other answers themselves didn't help me. But the following sum up did the job
Create ~/.config/apport/settings with the following content:
[main]
unpackaged=true
(This tells apport to also write core dumps for custom apps)
check: ulimit -c. If it outputs 0, fix it with
ulimit -c unlimited
Just for in case restart apport:
sudo systemctl restart apport
Crash files are now written in /var/crash/. But you cannot use them with gdb. To use them with gdb, use
apport-unpack <location_of_report> <target_directory>
Further information:
Some answers suggest changing core_pattern. Be aware, that that file might get overwritten by the apport service on restarting.
Simply stopping apport did not do the job
The ulimit -c value might get changed automatically while you're trying other answers of the web. Be sure to check it regularly during setting up your core dump creation.
References:
https://stackoverflow.com/a/47481884/6702598
By default you will get a core file. Check to see that the current directory of the process is writable, or no core file will be created.
Better to turn on core dump programmatically using system call setrlimit.
example:
#include <sys/resource.h>
bool enable_core_dump(){
struct rlimit corelim;
corelim.rlim_cur = RLIM_INFINITY;
corelim.rlim_max = RLIM_INFINITY;
return (0 == setrlimit(RLIMIT_CORE, &corelim));
}
It's worth mentioning that if you have a systemd set up, then things are a little bit different. The set up typically would have the core files be piped, by means of core_pattern sysctl value, through systemd-coredump(8). The core file size rlimit would typically be configured as "unlimited" already.
It is then possible to retrieve the core dumps using coredumpctl(1).
The storage of core dumps, etc. is configured by coredump.conf(5). There are examples of how to get the core files in the coredumpctl man page, but in short, it would look like this:
Find the core file:
[vps#phoenix]~$ coredumpctl list test_me | tail -1
Sun 2019-01-20 11:17:33 CET 16163 1224 1224 11 present /home/vps/test_me
Get the core file:
[vps#phoenix]~$ coredumpctl -o test_me.core dump 16163
This is typically sufficient:
ulimit -c unlimited
Note this will not persist between ssh sections! To add persistence:
echo '* soft core unlimited' >> /etc/security/limits.conf
Now, if you're using Ubuntu, "apport" is probably running. Here's how to check:
sudo systemctl status apport.service
If it is, you'll probably find core dumps in one of these places:
/var/lib/apport/coredump
/var/crash
If you want to change the location of core dumps
Make sure that you have the permissions to create files and the directory exists in the directory you're sending a core dump to!
Here's an example. Note this will not persist across reboots:
sysctl -w kernel.core_pattern=/coredumps/core-%e-%s-%u-%g-%p-%t
mkdir /coredumps
Make sure that the process that's crashing has access to write to this. The easiest way would be an example like this:
chmod 777 /coredumps
Test that core dumps works
> crash.c
gcc -Wl,--defsym=main=0 crash.c
./a.out
==output== Segmentation fault (core dumped)
If it doesn't say "core dumped" above, something isn't working.

Resources