I have this below program
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
int x = 1;
void ouch(int sig) {
printf("OUCH! dividing by zero!\n");
x = 0;
}
void fpe(int sig) {
printf("FPE! I got a signal: %d\n",sig);
psignal(sig, "psignal");
x = 1;
}
int main(void) {
(void) signal(SIGINT, ouch);
(void) signal(SIGFPE, fpe);
while(1)
{
printf("Hello World: %d\n",1/x);
sleep(1);
}
}
Problem: While executing this program - when I give a SIGINT from the terminal to the program - the ""OUCH! dividing by zero! " is output - as Expected.
the next message is the
"FPE! I got a signal: 8
psignal: Floating point exception " .
and this message goes on and on - doesn't stop. My doubt is after calling the fpe signal handler , I set x to be 1 . I hence expect Hello World should be displayed in the output.
Below is a transcript of the output I am getting :
Hello World: 1
Hello World: 1
^COUCH! dividing by zero!
FPE! I got a signal: 8
psignal: Floating point exception
FPE! I got a signal: 8
psignal: Floating point exception
FPE! I got a signal: 8
psignal: Floating point exception
^COUCH! dividing by zero!
.
.
.
.
When the signal handler is entered, the program counter (CPU register pointing at the currently executing instruction) is saved where the divide-by-zero occurred. Ignoring the signal restores the PC to exactly the same place, upon which the signal is triggered again (and again, and again).
The value or volatility of 'x' is irrelevant by this point - the zero has been transferred into a CPU register in readiness to perform the divide.
man 2 signal notes that:
According to POSIX, the behaviour of a process is undefined after it ignores a SIGFPE, SIGILL, or SIGSEGV signal that was not generated by the kill(2) or the raise(3) functions. Integer division by zero has undefined result. On some architectures it will generate a SIGFPE signal. (Also dividing the most negative integer by -1 may generate SIGFPE.) Ignoring this signal might lead to an endless loop.
We can see this in gdb if you compile with the debug flag:
simon#diablo:~$ gcc -g -o sigtest sigtest.c
simon#diablo:~$ gdb sigtest
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
By default gdb won't pass SIGINT to the process - change this so it sees the first signal:
(gdb) handle SIGINT pass
SIGINT is used by the debugger.
Are you sure you want to change it? (y or n) y
Signal Stop Print Pass to program Description
SIGINT Yes Yes Yes Interrupt
Off we go:
(gdb) run
Starting program: /home/simon/sigtest
x = 1
Hello World: 1
Now let's interrupt it:
^C
Program received signal SIGINT, Interrupt.
0xb767e17b in nanosleep () from /lib/libc.so.6
and onwards to the divide:
(gdb) cont
Continuing.
OUCH! dividing by zero!
x = 0
Program received signal SIGFPE, Arithmetic exception.
0x0804853a in main () at sigtest.c:30
30 printf("Hello World: %d\n",1/x);
Check the value of 'x', and continue:
(gdb) print x
$1 = 0
(gdb) cont
Continuing.
FPE! I got a signal: 8
psignal: Floating point exception
Program received signal SIGFPE, Arithmetic exception.
0x0804853a in main () at sigtest.c:30
30 printf("Hello World: %d\n",1/x);
(gdb) print x
$2 = 1
x is clearly now 1 and we still got a divide-by-zero - what's going on? Let's inspect the underlying assembler:
(gdb) disassemble
Dump of assembler code for function main:
0x080484ca : lea 0x4(%esp),%ecx
0x080484ce : and $0xfffffff0,%esp
...
0x08048533 : mov %eax,%ecx
0x08048535 : mov %edx,%eax
0x08048537 : sar $0x1f,%edx
0x0804853a : idiv %ecx <<-- address FPE occurred at
0x0804853c : mov %eax,0x4(%esp)
0x08048540 : movl $0x8048653,(%esp)
0x08048547 : call 0x8048384
0x0804854c : jmp 0x8048503
End of assembler dump.
One Google search later tells us that IDIV divides the value in the EAX register by the source operand (ECX). You can probably guess the register contents:
(gdb) info registers
eax 0x1 1
ecx 0x0 0
...
You should use volatile int x to ensure that the compiler reloads x from memory each time through the loop. Given that your SIGINT handler works, this probably does not explain your specific problem, but if you try more complicated examples (or crank up the optimization) it will eventually bite you.
After handling a signal raised while executing an instruction, the PC may return to either that instruction or to the following one. Which one it does is very CPU + OS specific. In addition, whether integer division by zero raises SIGFPE is also CPU + OS dependant.
At the CPU level, after taking an exception, it makes most sense to return to the offending instruction, after the OS has had the chance to do whatever it needs to (think of page faults/TLB misses), and run that instruction again. (The OS may have had to do some address correction, for instance, ARM CPUs point two instructions after the offending instruction as a testament to their original 3-stage pipeline, while MIPS CPU's point to the jump when taking an exception from an instruction on a jump delay slot).
At the OS level, there are several ways to handle exceptions:
Do the necessary handling (swap memory in, update page tables, etc...) and rerun the instruction.
Emulate that instruction, advance the PC accordingly and return to the next instruction. This allows for emulation of unimplemented instructions (CPUs without/with incomplete FPUs, LL/SC on MIPSI CPUs, ...), and unsupported alignment (after taking an alignment exception, the OS may decide sending a SIGBUS to the process, or emulating the unsupported access, possibly while logging it).
Send a fatal signal to the process. The process may take the role of the OS here in handling the exception, using CPU + OS dependent methods, such as the siginfo method linked by Simonj.
A non-portable method to deal with SIGFPE is calling longjmp() from the signal handler, as in my answer to a similar question on SIGSEGV.
n1318 has more details on the longjmp() from signal handler that you ever wanted to know. Also note that POSIX specifies that longjmp() should work from non-nested signal handlers.
Related
Consider the program:
main.c
#include <stdlib.h>
void my_asm_func(void);
__asm__(
".global my_asm_func;"
"my_asm_func:;"
"call abort;"
"ret;"
);
int main(int argc, char **argv) {
if (argv[1][0] == '0') {
abort();
} else if (argv[1][0] == '1') {
__asm__("call abort");
} else {
my_asm_func();
}
}
Which I compile as:
gcc -ggdb3 -O0 -o main.out main.c
Then I have:
$ ./main.out 0; echo $?
Aborted (core dumped)
134
$ ./main.out 1; echo $?
Aborted (core dumped)
134
$ ./main.out 2; echo $?
Segmentation fault (core dumped)
139
Why do I get the segmentation fault only for the last run, and not an abort signal as expected?
man 7 signal:
SIGABRT 6 Core Abort signal from abort(3)
SIGSEGV 11 Core Invalid memory reference
confirms the signals due to the 128 + SIGNUM rule.
As a sanity check I also tried to make other function calls from assembly as in:
#include <stdlib.h>
void my_asm_func(void);
__asm__(
".global my_asm_func;"
"my_asm_func:;"
"lea puts_message(%rip), %rdi;"
"call puts;"
"ret;"
"puts_message: .asciz \"hello puts\""
);
int main(void) {
my_asm_func();
}
and that did work and print:
hello puts
Tested in Ubuntu 19.04 amd64, GCC 8.3.0, glibc 2.29.
I also tried it in an Ubunt Ubuntu 18.04 docker, and the results were the same, except that the program outputs when running:
./main.out: Symbol `abort' causes overflow in R_X86_64_PC32 relocation
./main.out: Symbol `abort' causes overflow in R_X86_64_PC32 relocation
which feels like a good clue.
In this code that defines a function at global scope (with basic assembly):
void my_asm_func(void);
__asm__(
".global my_asm_func;"
"my_asm_func:;"
"call abort;"
"ret;"
);
You violate one of the x86-64(AMD64) System V ABI rules that requires 16 byte stack alignment (may be higher depending on the parameters) at a point just before a CALL is made.
3.2.2 The Stack Frame
In addition to registers, each function has a frame on the run-time stack. This stack grows downwards from high
addresses. Figure 3.3 shows the stack organization.
The end of the input argument area shall be aligned on a 16 (32, if __m256 is passed
on stack) byte boundary. In other words, the value (%rsp + 8) is
always a multiple of 16 (32) when control is transferred to the
function entry point. The stack pointer, %rsp, always points to the
end of the latest allocated stack frame.
Upon entry to a function the stack will be misaligned by 8 because the 8 byte return address is now on the stack. To align the stack back on a 16 byte boundary subtract 8 from RSP at the beginning of the function and add 8 back to RSP when finished. You can also just push any register like RBP at the beginning and pop it after to get the same effect.
This version of the code should work:
void my_asm_func(void);
__asm__(
".global my_asm_func;"
"my_asm_func:;"
"push %rbp;"
"call abort;"
"pop %rbp;"
"ret;"
);
Regarding this code that happened to work:
__asm__("call abort");
The compiler likely generated the main function in such away that the stack was aligned on a 16 byte boundary prior to this call so it happened to work. You shouldn't rely on this behavior. There are other potential issues with this code, but don't present as a failure in this case. The stack should be properly aligned before the call; you should be concerned in general about the red zone; and you should specify all the volatile registers in the calling conventions as clobbers including RAX/RCX/RDX/R8/R9/R10/R11, the FPU registers, and the SIMD registers. In this case abort never returns so this isn't an issue related to your code.
The red-zone is defined in the ABI this way:
The 128-byte area beyond the location pointed to by %rsp is considered to
be reserved and shall not be modified by signal or interrupt handlers.8 Therefore,
functions may use this area for temporary data that is not needed across function
calls. In particular, leaf functions may use this area for their entire stack frame,
rather than adjusting the stack pointer in the prologue and epilogue. This area is
known as the red zone.
It is generally a bad idea to call a function in inline assembly. An example of calling printf can be found in this other Stackoverflow answer which shows the complexities of doing a CALL especially in 64-bit code with red-zone. David Wohlferd's Dont Use Inline Asm is always a good read.
This code happened to work:
void my_asm_func(void);
__asm__(
".global my_asm_func;"
"my_asm_func:;"
"lea puts_message(%rip), %rdi;"
"call puts;"
"ret;"
"puts_message: .asciz \"hello puts\""
);
but you were probably lucky that puts didn't need proper alignment and you happened to get no failure. You should be aligning the stack before calling puts as described earlier with the my_asm_func that called abort. Ensuring compliance with the ABI is the key to ensuring code will work as expected.
Regarding the relocation errors, that is probably because the version of Ubuntu being used is using Position Independent Code (PIC) by default for GCC code generation. You could fix the issue by making the C library calls though the Procedure Linkage Table by appending #plt to the function names you CALL. Peter Cordes wrote a related Stackoverflow answer on this topic.
I've got a program that's SIGSEGV'ing in library code. Nothing is jumping out at me when looking at the statement that's causing the SIGSEGV (see below). But the code uses Intel's AES-NI, and I'm not that familiar with it.
I issued handle all in hopes of catching the trap that's causing the SIGSEGV, but the program still just crashes rather than telling me the trap.
How can I get GDB to display the CPU trap that's causing the SIGSEGV?
Program received signal SIGSEGV, Segmentation fault.
0x00000000004ddf0b in CryptoPP::AESNI_Dec_Block(long long __vector&, long long __vector const*, unsigned int) (block=..., subkeys=0x7fffffffdc60, rounds=0x0)
at rijndael.cpp:1040
1040 block = _mm_aesdec_si128(block, subkeys[i+1]);
(gdb) p block
$1 = (__m128i &) #0x7fffffffcec0: {0x2e37c840668d6030, 0x431362358943e432}
(gdb) x/16b 0x7fffffffcec0
0x7fffffffcec0: 0x30 0x60 0x8d 0x66 0x40 0xc8 0x37 0x2e
0x7fffffffcec8: 0x32 0xe4 0x43 0x89 0x35 0x62 0x13 0x43
How can I get GDB to display the CPU trap that's causing the SIGSEGV
You can't: GDB doesn't get to see the trap, only the OS does.
What you can see is the instruction that caused the trap:
(gdb) x/i $pc
It's likely that the problem is alignment. I don't know what long long __vector is, but if it's not a 16-byte entity, then subkeys[i+1] is not going to be 16-byte aligned, which would be a problem for _mm_aesdec_si128, since it requires 16-byte alignment for both arguments.
These instructions are quite new (AVX). It could also possibly be that the CPU doesn't support the instruction, or that the OS isn't configured to allow them. I know one would normally expect SIGILL in such a case, but x86 can be surprising in the exceptions it generates, particularly if the OS has disabled use of an instruction that the CPU supports, SIGSEGV is quite common. (In case it's not clear from my tone, I'm just guessing here, just saying that it is an avenue of investigation that you might want to look into.)
I'm trying to run through a buffer overflow exercise, here is the code:
#include <stdio.h>
int badfunction() {
char buffer[8];
gets(buffer);
puts(buffer);
}
int cantrun() {
printf("This function cant run because it is never called");
}
int main() {
badfunction();
}
This is a simple piece of code. The objective is to overflow the buffer in badfunction()and override the return address having it point to the memory address of the function cantrun().
Step 1: Find the offset of the return address (in this case it's 12bytes, 8 for the buffer and 4 for the base pointer).
Step 2: Find the memory location of cantrun(), gdb say it's 0x0804849a.
When I run the program printf "%012x\x9a\x84\x04\x08" | ./vuln, I get the error "illegal instruction". This suggests to me that I have correctly overwritten the EIP, but that the memory location of cantrun() is incorrect.
I am using Kali Linux, Kernel 3.14, I have ASLR turned off and I am using execstack to allow an executable stack. Am I doing something wrong?
UPDATE:
As a shot in the dark I tried to find the correct instruction by moving the address around and 0x0804849b does the trick. Why is this different than what GDB shows. When running GDB, 0x0804849a is the location of the prelude instruction push ebp and 0x0804849b is the prelude instruction mov ebp,esp.
gdb doesn't do anything to change the locations of functions in the programs it executes. ASLR may matter, but by default gdb turns this off to enable simpler debugging.
It's hard to say why you are seeing the results you are. What does disassembling the function in gdb show?
I am trying to execute the privileged instruction rdmsr in user mode, and I expect to get some kind of privilege error, but I get a segfault instead. I have checked the asm and I am loading 0x186 into ecx, which is supposed to be PERFEVTSEL0, based on the manual, page 1171.
What is the cause of the segfault, and how can I modify the code below to fix it?
I want to resolve this before hacking a kernel module, because I don't want this segfault to blow up my kernel.
Update: I am running on Intel(R) Xeon(R) CPU X3470.
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
#include <sched.h>
#include <assert.h>
uint64_t
read_msr(int ecx)
{
unsigned int a, d;
__asm __volatile("rdmsr" : "=a"(a), "=d"(d) : "c"(ecx));
return ((uint64_t)a) | (((uint64_t)d) << 32);
}
int main(int ac, char **av)
{
uint64_t start, end;
cpu_set_t cpuset;
unsigned int c = 0x186;
int i = 0;
CPU_ZERO(&cpuset);
CPU_SET(i, &cpuset);
assert(sched_setaffinity(0, sizeof(cpuset), &cpuset) == 0);
printf("%lu\n", read_msr(c));
return 0;
}
The question I will try to answer: Why does the above code cause SIGSEGV instead of SIGILL, though the code has no memory error, but an illegal instruction (a privileged instruction called from non-privileged user pace)?
I would expect to get a SIGILL with si_code ILL_PRVOPC instead of a segfault, too. Your question is currently 3 years old and today, I stumbled upon the same behavior. I am disappointed too :-(
What is the cause of the segfault
The cause seems to be that the Linux kernel code decides to send SIGSEGV. Here is the responsible function:
http://elixir.free-electrons.com/linux/v4.9/source/arch/x86/kernel/traps.c#L487
Have a look at the last line of the function.
In your follow up question, you got a list of other assembly instructions which get propagated as SIGSEGV to userspace though they are actually general protection faults. I found your question because I triggered the behavior with cli.
and how can I modify the code below to fix it?
As of Linux kernel 4.9, I'm not aware of any reliable way to distinguish between a memory error (what I would expect to be a SIGSEGV) and a privileged instruction error from userspace.
There may be very hacky and unportable way to distibguish these cases. When a privileged instruction causes a SIGSEGV, the siginfo_t si_code is set to a value which is not directly listed in the SIGSEGV section of man 2 sigaction. The documented values are SEGV_MAPERR, SEGV_ACCERR, SEGV_PKUERR, but I get SI_KERNEL (0x80) on my system. According to the man page, SI_KERNEL is a code "which can be placed in si_code for any signal". In strace, you see SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0}. The responsible kernel code is here.
It would also be possible to grep dmesg for this string.
Please, never ever use those two methods to distinguish between GPF and memory error on a production system.
Specific solution for your code: Just don't run rdmsr from user space. But this answer is really unsatisfying if you are looking for a generic way to figure out why a program received a SIGSEGV.
main()
{
printf( "%d\n" , 1/fork() );
}
by running this app my output is: 0.
I know that at parent fork value is number ,and at Son the value is 0.
So why don't I get any problem dividing 1/0 ?
Actually, the 1/0 Arithmetic Exception do occur, but it just do not print out in the console.
set core file size to unlimited you will see the core file
$ ulimit -c unlimited
And use gdb you can see the Arithmetic Exception
$ gdb a.out core
The compiler is transforming your code into more elementary steps
(you could pass the -fdump-tree-all option to GCC, or use MELT graphical probe to look into some intermediate GCC representations)
So bascially the compiler is transforming your code into something like
int main()
{
int t1 = fork();
int t2 = 1 / t1;
printf("%d\n", t2);
}
So if t1 gets 0 (in the child process), the assignment to t2 is an undefined behavior, which usually crashes with a division by zero (i.e. a SIGFPE asynchronous signal), and the printf is not reached.
Probably, on a PowerPC processor where you can make a division by zero which does not crash, the behavior (still undefined) would be different.
BTW, you should run your program with strace -f to understand what syscalls & signals are involved.