I've got a program that's SIGSEGV'ing in library code. Nothing is jumping out at me when looking at the statement that's causing the SIGSEGV (see below). But the code uses Intel's AES-NI, and I'm not that familiar with it.
I issued handle all in hopes of catching the trap that's causing the SIGSEGV, but the program still just crashes rather than telling me the trap.
How can I get GDB to display the CPU trap that's causing the SIGSEGV?
Program received signal SIGSEGV, Segmentation fault.
0x00000000004ddf0b in CryptoPP::AESNI_Dec_Block(long long __vector&, long long __vector const*, unsigned int) (block=..., subkeys=0x7fffffffdc60, rounds=0x0)
at rijndael.cpp:1040
1040 block = _mm_aesdec_si128(block, subkeys[i+1]);
(gdb) p block
$1 = (__m128i &) #0x7fffffffcec0: {0x2e37c840668d6030, 0x431362358943e432}
(gdb) x/16b 0x7fffffffcec0
0x7fffffffcec0: 0x30 0x60 0x8d 0x66 0x40 0xc8 0x37 0x2e
0x7fffffffcec8: 0x32 0xe4 0x43 0x89 0x35 0x62 0x13 0x43
How can I get GDB to display the CPU trap that's causing the SIGSEGV
You can't: GDB doesn't get to see the trap, only the OS does.
What you can see is the instruction that caused the trap:
(gdb) x/i $pc
It's likely that the problem is alignment. I don't know what long long __vector is, but if it's not a 16-byte entity, then subkeys[i+1] is not going to be 16-byte aligned, which would be a problem for _mm_aesdec_si128, since it requires 16-byte alignment for both arguments.
These instructions are quite new (AVX). It could also possibly be that the CPU doesn't support the instruction, or that the OS isn't configured to allow them. I know one would normally expect SIGILL in such a case, but x86 can be surprising in the exceptions it generates, particularly if the OS has disabled use of an instruction that the CPU supports, SIGSEGV is quite common. (In case it's not clear from my tone, I'm just guessing here, just saying that it is an avenue of investigation that you might want to look into.)
Related
In x86, I understand multi-byte objects are stored in memory little endian style.
Now generally speaking, when it comes to CPU instructions, the OPCODE determines the purpose of the instruction and data/memory addresses may follow the opcode in it's encoded format. My understanding is the Opcode portion of the instruction should be the most significant byte and thus appear at the highest address of any given instruction encoding representation.
Can someone explain the memory layout on this x86 linux gdb example? I would imagine that the opcode 0xb8 would appear at a higher address due to it being the most significant byte.
(gdb) disassemble _start
Dump of assembler code for function _start:
0x08048080 <+0>: mov eax,0x11223344
(gdb) x/1xb _start+0
0x8048080 <_start>: 0xb8
(gdb) x/1xb _start+1
0x8048081 <_start+1>: 0x44
(gdb) x/1xb _start+2
0x8048082 <_start+2>: 0x33
(gdb) x/1xb _start+3
0x8048083 <_start+3>: 0x22
(gdb) x/1xb _start+4
0x8048084 <_start+4>: 0x11
It appears the instruction mov eax, 0x11223344 is encoding as 0x11 0x22 0x33 0x44 0xb8.
Questions.
1.) How does the CPU know how many bytes the instruction will take up if the first byte it see's is not an opcode?
2.) I'm wondering if perhaps x86 cpu instructions do not even have endian-ness and are considering some type of string? (probably way off here)
x86 is a variable length instruction set, you start with a single byte which has no endianness, it is wherever it is.
Then depending on the opcode there may be more bytes and those might for example be a 32 bit immediate, and (if that group of bytes is an immediate or address of some sort) THOSE bytes will be little endian. Say you have the five bytes ABCDE (no endianess, think array or string). The A byte is the opcode, the B byte would then be the lower 8 bits of the immediate and the E the upper 8 bits of the immediate.
Opcode is a hard to use term, in these older 8/16 bit CISC processors like x86 the entire byte was an opcode that you basically looked up in a table to see what it meant (and inside the processor they did use a table to figure out how to execute it). When you look at MIPS or ARM or other (certainly RISC) instruction sets like those, only a portion of the 32 bits are the "opcode" and in neither of those cases is it the same set of bits from one instruction to another, you have to look at various places in the instruction (yes there is overlap to make the decoding sane), MIPS is a lot more consistent you have one blob in one place you look at but one of those patterns requires you to look at another blob of bits to fully decode. ARM you start at a common bit and as you work your way across you are further decoding the instruction, then you may have to grab some random looking spots to fully decode. The rest of the bits are operands, what register to use or immediate or whatever the kind of thing that in a CISC you needed a look up table for (are implied by the opcode but not defined by bits in the opcode).
1) the next byte after the prior instruction will be interpreted as an opcode even if not intended to be one (if execution continues to that byte and doesnt branch). I dont remember my x86 table off hand to know if there are any undefined instructions or not, if undefined then it will hit a handler, otherwise it will decode what it finds as machine code and if it is not properly formed instructions will likely crash, sometimes you get lucky and it just messes something up and keeps going, or even more lucky and you cant tell that it almost crashed.
2) you are right for these 8/16 bit CISC or similar instruction sets they are treated more like strings that you parse through linearly.
Here is a description of my situation: I have to take care of the bug in our product. The thread is created as joinable , it must do its work, terminate and nobody will call pthread_join() for it. So the thread is created with JOINABLE attribute (by default) and before termination it calls the next code:
{ pthread_detach(pthread_self()); pthread_exit(NULL); }
It works like a charm on all 32 bit linux distros I met, but it causes SIGSEGV on 64 bit distros (Ubuntu 13.04 x86_64 and Debian). I didn't try with Slackware. Here is a core:
Core was generated by `IsaVM -s=1 -PrjPath="/home/taf/Linux_Fov_540148/Cmds" -stgMode=1 -PR -Failover'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f5911a7c009 in pthread_detach () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0 0x00007f5911a7c009 in pthread_detach () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x000000000041310d in _kerCltDownloadThr (StartParams=0x6bfce0 <RESFOV>) at ./dker0clt.c:1258
#2 0x00007f5911a7ae9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f591159f3fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x0000000000000000 in ?? ()
I figured out how to fix this bug - I set CREATE_DETACHABLE attribute (with pthread_attr_setdetachstate()) for the thread before it is created and it works as expected.
But my question - is it a crime to call this code?
{ pthread_detach(pthread_self()); pthread_exit(NULL); }
Does pthread_detach() do something asynchronously after call and that causes pthread_exit() to bring problems? But the crash point is pthread_detach() not pthread_exit()! I don't understand the reason for this crash completely! Why does it work on 32 bits? Is it a race condition somewhere in the pthread implementation?
pthread_join() doesn't called for this thread.
Thanks in advance for any ideas.
A thread detaching itself does not feel right. It is normally responsibility of the thread that called pthread_create() which can create a detached thread if necessary.
It could be that the thread has already been detached. Because attempting to detach an already detached thread results in unspecified behaviour.
My top wild guesses would be:
The thread gets detached more than once. As a quick check I would try setting a breakpoint on pthread_detach in gdb to see whether duplicate thread ids gets passed in this function. If it is difficult to run your application under gdb, another option is to override pthread_create and pthread_detach and track thread ids to detect double detach. See http://hackerboss.com/overriding-system-functions-for-fun-and-profit/
Memory corruption. valgrind may help you detect memory corruption if it is possible to run your application under it. Alternatively, try instrumenting your application with run-time error checks by compiling with -fstack-protector-all, -fsanitize=address, -fsanitize=thread if you use gcc. clang compiler also have an array of options to detect such errors, see sanitizers on http://clang.llvm.org/docs/index.html.
I finished my research with approaches offered by a respectable #MaximYegorushkin. AddressSanitizer shows me one buffer obverflow in our product but it isn't related to my problem (I will definitely fix it later, it is always good to have such a wise tool to hunt the bugs). So decided to override all necessary pthread_xxx functions with LD_PRELOAD method. I run a simple test to be sure my library works as expected:
[HACK] Loading pthread hack.
Starting thread...!
[HACK] pthread_create: thread=7FAC6C86D700
Waiting for 2 seconds...
[HACK] pthread_self: thread=7FAC6C86D700
thread_func: thread id = 7FAC6C86D700
Thread: sin(3.26) = -0.121109
[HACK] pthread_self: thread=7FAC6C86D700
[HACK] pthread_detach: thread=7FAC6C86D700
Terminating...
All strings started from [HACK] are produced by my threadhack.so library.
Then I run my project with this library it points me exactly where the problem is:
Code executed: { pthread_detach(pthread_self()); pthread_exit(NULL); }
Debug traces:
[HACK] pthread_create: thread=7F403251CB00
.....
[HACK] pthread_self: thread=7F403251CB00
[HACK] pthread_detach: thread=3251CB00
So we see that pthread_self returns a good thread id, but pthread_detach received it already mangled (cut to 32 bit). How could this be? I generated assembler code for both my simple working test application as a reference and for my project:
Reference application:
call pthread_self
movq %rax, %rdi
call pthread_detach
movl $0, %edi
call pthread_exit
So we see here that movq instruction is used to copy 64 bit thread id (movq %rax, %rdi). OK, check what GCC generated for my project:
movl $0, %eax
call pthread_self
movl %eax, %edi
movl $0, %eax
call pthread_detach
movl $0, %edi
movl $0, %eax
call pthread_exit
Woa! We have two movl instructions (32 bit), one copies the least significant 32 bits (movl %eax, %edi) and instead of most significan part it always put zero! (movl $0, %eax). So this is a reason for the mangled thead id. I have no idea why the code is so different - compilation flags are the same. I saw this bug in GCC 4.7 I see this bug in GCC 4.8 (Latest package from the Ubuntu 13.10 x86_64).
So at least now I see what hapenning. Thanks to #Maxim and brilliant tools. I learned a new thing again.
P.S. I don't know how to submit a bug report to the GCC team. I can't reproduce the problem on a small simple application and I can't hand them my project because it is a proprietary software and I'm NDA-ed to not distribute it.
My guess is that you don't have the prototype for either pthread_detach or pthread_self in the code that invokes pthread_detach(pthread_self()); Without the prototype, the compiler will assume the argument is int (pthread_detach) or that the function returns an int (pthread_self).
Although thinking it through further, I'm more suspecting that pthread_self is the culprit being either undefined (returning an int) or defined incorrectly as returning an int. The compiler then correctly extends this to a 64 bit integer by adding the leading 32 bits of zero.
1. Problem Background
Recently a core dump occurred on one of our on-line search server. The core happens in memset() due to the attempt to write to an invalid address, and hence received the SIGSEGV signal. The following information is from dmsg:
is_searcher_ser[17405]: segfault at 000000002c32a668 rip 0000003da0a7b006 rsp 0000000053abc790 error 6
The environment of our on-line servers goes as follows:
OS: RHEL 5.3
Kernel: 2.6.18-131.el5.custom, x86_64 (64-bit)
GCC: 4.1.2 20080704 (Red Hat 4.1.2-44)
Glibc: glibc-2.5-49.6
The following is the relevant code snippet:
CHashMap<…>::CHashMap(…)
{
…
typedef HashEntry *HashEntryPtr;
m_ppEntry = new HashEntryPtr[m_nHashSize]; // m_nHashSize is 389 when core
assert(m_ppEntry != NULL);
memset(m_ppEntry, 0x0, m_nHashSize*sizeof(HashEntryPtr)); // Core in this memset() invocation
…
}
The assembly code of the above code is:
…
0x000000000091fe9e <+110>: callq 0x502638 <_Znam#plt> // new HashEntryPtr[m_nHashSize]
0x000000000091fea3 <+115>: mov 0xc(%rbx),%edx // Get the value of m_nHashSize
0x000000000091fea6 <+118>: mov %rax,%rdi // Put m_ppEntry pointer to %rdi for later memset invocation
0x000000000091fea9 <+121>: mov %rax,0x20(%rbx) // Store the pointer to m_ppEntry member variable(%rbx holds the this pointer)
0x000000000091fead <+125>: xor %esi,%esi // Generate 0
0x000000000091feaf <+127>: shl $0x3,%rdx // m_nHashSize*sizeof(HashEntryPtr)
0x000000000091feb3 <+131>: callq 0x502b38 <memset#plt> // Call the memset() function
…
In the core dump, the assembly of memset#plt is:
(gdb) disassemble 0x502b38
Dump of assembler code for function memset#plt:
0x0000000000502b38 <+0>: jmpq *0x771b92(%rip) # 0xc746d0 <memset#got.plt>
0x0000000000502b3e <+6>: pushq $0x53
0x0000000000502b43 <+11>: jmpq 0x5025f8
End of assembler dump.
(gdb) x/ag 0x0000000000502b3e+0x771b92
0xc746d0 <memset#got.plt>: 0x3da0a7acb0 <memset>
(gdb) disassemble 0x3da0a7acb0
Dump of assembler code for function memset:
0x0000003da0a7acb0 <+0>: cmp $0x1,%rdx
0x0000003da0a7acb4 <+4>: mov %rdi,%rax
…
For the above GDB analysis, we know that the address of memset() has been resolved in the relocation PLT table. That is to say, the first jmpq *0x771b92(%rip) will directly jump to the first instruction of function memset(). Besides, the program had run nearly one day on-line, the relocation address of memset() should have been already resolved earlier.
2. Weird phenomenon
This core fired at the instruction => 0x0000003da0a7b006 <+854>: mov %rdx,-0x8(%rdi) in the memset(). Actually this is the instruction in the memset() to set the 0 at the right begin position of the buffer which is the first parameter of memset().
When cored , in frame 0, the value of $rdi is 0x2c32a670 ,and $rax is 0x2c32a668. From the assembly analysis and off-line test, $rax should hold the source buffer of the memset, i.e., the first parameter of memset().
So, in our example, $rax should be same as the address of m_ppEntry, the value of which is stored in the this object (this pointer is stored in %rbx) first before it is zeroed by memset later. However, the value of m_ppEntry is 0x2ab02c32a668.
Then use info files GDB command to check, the address 0x2c32a668 is indeed invalid (not mapped), and address 0x2ab02c32a668 is a valid address.
3. Why it is weird?
The weird place of this core is that: If the real address of memset has been resolved already(very very probably), then there are only very few instructions between the operation to put the pointer value into m_ppEntry and the attempt to memset it. And actually the value of register $rax (holding the passed buffer address) are not changed at all during these instructions. So, how can m_ppEntry isn’t equal to $rax?
What is weird More is that: when core, the value of $rax (0x2c32a668) is actually the value of lower 4 bytes of m_ppEntry (0x2ab02c32a668). If there is indeed some relationship between the two values, is the m_ppEntry parameter passed to memset being truncated? However, the involved several instructions all use %rax, rather than %eax. By the way, I cannot reproduce this issue offline.
So,
1) Which address is valid? If 0x2c32a668 is valid? Is the heap corrupted just between the several instructions? And how to paraphrase that the value of m_ppEntry is 0x2ab02c32a668, and why the low 4 bytes of this two value is the same?
2) If 0x2ab02c32a668 is valid, why the address is truncated when passed into the 64-bit memset()? Under which condition this error will occur? I cannot reproduce this offline. Is this issue an known bug? I didn't find it through Google.
3) Or, is it due to some hardware or power issue to make the 4 higher bytes of %rdi passed to memset zeroed? (I’m very very reluctant to believe this).
At last, any comment on this core is appreciated.
Thanks,
Gary Hu
I'm assuming most of the time this code works fine, given your mention of one day's running.
I agree signals are worth inspecting, it does look suspiciously like pointer truncation is happening somewhere else.
Only other thing I'm thinking it could be an issue with the new. Is there any possibly that on occasion you could end up calling an overloaded new operator?
Also for completeness what is the declaration of m_ppEntry ?
I'm assuming you're using a no throw new otherwise the assert(m_ppEntry != NULL); would be meaningless.
Trying to understand this ARM assembly . I get a SIGSTOP signal for this. Something is going wrong here.I ll try but need some help
afd0c750: push {r4, r7}
afd0c754: mov r7, #252 ; 0xfc // what is this I think that its calling the SWI and it need the r7 to have this value.
afd0c758: svc 0x00000000
afd0c75c: pop {r4, r7}
afd0c760: movs r0, r0
afd0c764: bxpl lr
afd0c768: b 0xafd3896c
SIGSTOP is sent to process to suspend it - for later resumption - and is not an error condition - or directly generated by one.
The most likely scenario is that your process has received SIGSTOP whilst the thread is executing in the kernel - and most likely blocked there. Your backtrace will show the SVC instruction as the last executed on the user-stack as this is a user-space to kernel transition.
It is not clear from your description how the SIGSTOP is manifested. A likely candidate is gdb being the source of it.
The full backtrace would be very useful here.
I have this below program
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
int x = 1;
void ouch(int sig) {
printf("OUCH! dividing by zero!\n");
x = 0;
}
void fpe(int sig) {
printf("FPE! I got a signal: %d\n",sig);
psignal(sig, "psignal");
x = 1;
}
int main(void) {
(void) signal(SIGINT, ouch);
(void) signal(SIGFPE, fpe);
while(1)
{
printf("Hello World: %d\n",1/x);
sleep(1);
}
}
Problem: While executing this program - when I give a SIGINT from the terminal to the program - the ""OUCH! dividing by zero! " is output - as Expected.
the next message is the
"FPE! I got a signal: 8
psignal: Floating point exception " .
and this message goes on and on - doesn't stop. My doubt is after calling the fpe signal handler , I set x to be 1 . I hence expect Hello World should be displayed in the output.
Below is a transcript of the output I am getting :
Hello World: 1
Hello World: 1
^COUCH! dividing by zero!
FPE! I got a signal: 8
psignal: Floating point exception
FPE! I got a signal: 8
psignal: Floating point exception
FPE! I got a signal: 8
psignal: Floating point exception
^COUCH! dividing by zero!
.
.
.
.
When the signal handler is entered, the program counter (CPU register pointing at the currently executing instruction) is saved where the divide-by-zero occurred. Ignoring the signal restores the PC to exactly the same place, upon which the signal is triggered again (and again, and again).
The value or volatility of 'x' is irrelevant by this point - the zero has been transferred into a CPU register in readiness to perform the divide.
man 2 signal notes that:
According to POSIX, the behaviour of a process is undefined after it ignores a SIGFPE, SIGILL, or SIGSEGV signal that was not generated by the kill(2) or the raise(3) functions. Integer division by zero has undefined result. On some architectures it will generate a SIGFPE signal. (Also dividing the most negative integer by -1 may generate SIGFPE.) Ignoring this signal might lead to an endless loop.
We can see this in gdb if you compile with the debug flag:
simon#diablo:~$ gcc -g -o sigtest sigtest.c
simon#diablo:~$ gdb sigtest
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
By default gdb won't pass SIGINT to the process - change this so it sees the first signal:
(gdb) handle SIGINT pass
SIGINT is used by the debugger.
Are you sure you want to change it? (y or n) y
Signal Stop Print Pass to program Description
SIGINT Yes Yes Yes Interrupt
Off we go:
(gdb) run
Starting program: /home/simon/sigtest
x = 1
Hello World: 1
Now let's interrupt it:
^C
Program received signal SIGINT, Interrupt.
0xb767e17b in nanosleep () from /lib/libc.so.6
and onwards to the divide:
(gdb) cont
Continuing.
OUCH! dividing by zero!
x = 0
Program received signal SIGFPE, Arithmetic exception.
0x0804853a in main () at sigtest.c:30
30 printf("Hello World: %d\n",1/x);
Check the value of 'x', and continue:
(gdb) print x
$1 = 0
(gdb) cont
Continuing.
FPE! I got a signal: 8
psignal: Floating point exception
Program received signal SIGFPE, Arithmetic exception.
0x0804853a in main () at sigtest.c:30
30 printf("Hello World: %d\n",1/x);
(gdb) print x
$2 = 1
x is clearly now 1 and we still got a divide-by-zero - what's going on? Let's inspect the underlying assembler:
(gdb) disassemble
Dump of assembler code for function main:
0x080484ca : lea 0x4(%esp),%ecx
0x080484ce : and $0xfffffff0,%esp
...
0x08048533 : mov %eax,%ecx
0x08048535 : mov %edx,%eax
0x08048537 : sar $0x1f,%edx
0x0804853a : idiv %ecx <<-- address FPE occurred at
0x0804853c : mov %eax,0x4(%esp)
0x08048540 : movl $0x8048653,(%esp)
0x08048547 : call 0x8048384
0x0804854c : jmp 0x8048503
End of assembler dump.
One Google search later tells us that IDIV divides the value in the EAX register by the source operand (ECX). You can probably guess the register contents:
(gdb) info registers
eax 0x1 1
ecx 0x0 0
...
You should use volatile int x to ensure that the compiler reloads x from memory each time through the loop. Given that your SIGINT handler works, this probably does not explain your specific problem, but if you try more complicated examples (or crank up the optimization) it will eventually bite you.
After handling a signal raised while executing an instruction, the PC may return to either that instruction or to the following one. Which one it does is very CPU + OS specific. In addition, whether integer division by zero raises SIGFPE is also CPU + OS dependant.
At the CPU level, after taking an exception, it makes most sense to return to the offending instruction, after the OS has had the chance to do whatever it needs to (think of page faults/TLB misses), and run that instruction again. (The OS may have had to do some address correction, for instance, ARM CPUs point two instructions after the offending instruction as a testament to their original 3-stage pipeline, while MIPS CPU's point to the jump when taking an exception from an instruction on a jump delay slot).
At the OS level, there are several ways to handle exceptions:
Do the necessary handling (swap memory in, update page tables, etc...) and rerun the instruction.
Emulate that instruction, advance the PC accordingly and return to the next instruction. This allows for emulation of unimplemented instructions (CPUs without/with incomplete FPUs, LL/SC on MIPSI CPUs, ...), and unsupported alignment (after taking an alignment exception, the OS may decide sending a SIGBUS to the process, or emulating the unsupported access, possibly while logging it).
Send a fatal signal to the process. The process may take the role of the OS here in handling the exception, using CPU + OS dependent methods, such as the siginfo method linked by Simonj.
A non-portable method to deal with SIGFPE is calling longjmp() from the signal handler, as in my answer to a similar question on SIGSEGV.
n1318 has more details on the longjmp() from signal handler that you ever wanted to know. Also note that POSIX specifies that longjmp() should work from non-nested signal handlers.