Trying to understand this ARM assembly . I get a SIGSTOP signal for this. Something is going wrong here.I ll try but need some help
afd0c750: push {r4, r7}
afd0c754: mov r7, #252 ; 0xfc // what is this I think that its calling the SWI and it need the r7 to have this value.
afd0c758: svc 0x00000000
afd0c75c: pop {r4, r7}
afd0c760: movs r0, r0
afd0c764: bxpl lr
afd0c768: b 0xafd3896c
SIGSTOP is sent to process to suspend it - for later resumption - and is not an error condition - or directly generated by one.
The most likely scenario is that your process has received SIGSTOP whilst the thread is executing in the kernel - and most likely blocked there. Your backtrace will show the SVC instruction as the last executed on the user-stack as this is a user-space to kernel transition.
It is not clear from your description how the SIGSTOP is manifested. A likely candidate is gdb being the source of it.
The full backtrace would be very useful here.
Related
I've got a program that's SIGSEGV'ing in library code. Nothing is jumping out at me when looking at the statement that's causing the SIGSEGV (see below). But the code uses Intel's AES-NI, and I'm not that familiar with it.
I issued handle all in hopes of catching the trap that's causing the SIGSEGV, but the program still just crashes rather than telling me the trap.
How can I get GDB to display the CPU trap that's causing the SIGSEGV?
Program received signal SIGSEGV, Segmentation fault.
0x00000000004ddf0b in CryptoPP::AESNI_Dec_Block(long long __vector&, long long __vector const*, unsigned int) (block=..., subkeys=0x7fffffffdc60, rounds=0x0)
at rijndael.cpp:1040
1040 block = _mm_aesdec_si128(block, subkeys[i+1]);
(gdb) p block
$1 = (__m128i &) #0x7fffffffcec0: {0x2e37c840668d6030, 0x431362358943e432}
(gdb) x/16b 0x7fffffffcec0
0x7fffffffcec0: 0x30 0x60 0x8d 0x66 0x40 0xc8 0x37 0x2e
0x7fffffffcec8: 0x32 0xe4 0x43 0x89 0x35 0x62 0x13 0x43
How can I get GDB to display the CPU trap that's causing the SIGSEGV
You can't: GDB doesn't get to see the trap, only the OS does.
What you can see is the instruction that caused the trap:
(gdb) x/i $pc
It's likely that the problem is alignment. I don't know what long long __vector is, but if it's not a 16-byte entity, then subkeys[i+1] is not going to be 16-byte aligned, which would be a problem for _mm_aesdec_si128, since it requires 16-byte alignment for both arguments.
These instructions are quite new (AVX). It could also possibly be that the CPU doesn't support the instruction, or that the OS isn't configured to allow them. I know one would normally expect SIGILL in such a case, but x86 can be surprising in the exceptions it generates, particularly if the OS has disabled use of an instruction that the CPU supports, SIGSEGV is quite common. (In case it's not clear from my tone, I'm just guessing here, just saying that it is an avenue of investigation that you might want to look into.)
From the OSDev page on the A20 line, the code for enabling A20 is given as:
enable_A20:
cli
call a20wait
mov al,0xAD
out 0x64,al
call a20wait
mov al,0xD0
out 0x64,al
call a20wait2
in al,0x60
push eax
call a20wait
mov al,0xD1
out 0x64,al
call a20wait
pop eax
or al,2
out 0x60,al
call a20wait
mov al,0xAE
out 0x64,al
call a20wait
sti
ret
a20wait:
in al,0x64
test al,2
jnz a20wait
ret
a20wait2:
in al,0x64
test al,1
jz a20wait2
ret
a20wait waits on the input buffer and a20wait2 on the output buffer.
From what I understood, writing to/reading from 0x64 access the command/status register and not the buffer registers.
Then why are there are so many waits on the input/output buffers ? Shouldn't there be one on the output buffer before reading the status register, and one on the input buffer after writing the new command byte ?
I tried disabling all other wait calls except the two I mentioned in the previous paragraph and it worked fine. But I'm curious as to why they are there. Is there some other reason ?
The A20 gate control signal is provided by another processor. Traditionally an 8042 micro-controller, one of its output port pins drives the signal. That micro-controller was intended to handle the keyboard interface, it had a unused output pin so the IBM engineers that designed the AT decided to cut hardware cost and control the A20 gate signal with it.
The interface between the main processor and that microcontroller is a very simplistic one, just two 8-bit ports. I/O address 0x60 is the data port, 0x64 is the command/status port.
The 8042 executes its own program, completely independent from the main processor. So some care is required to talk to it, the handshaking has to be done in software. You can only write something after you made sure that the 8042 obtained the previous command and executed it. And only read something after you made sure that the 8042 wrote to the data port. Spinning on the input and output buffer status bits is thus required to let the 8042 catch up.
Removing that spinning may work in an emulator. Pretty unlikely to work correctly on real hardware, you could get lucky. There's completely no point in risking it.
Here is a description of my situation: I have to take care of the bug in our product. The thread is created as joinable , it must do its work, terminate and nobody will call pthread_join() for it. So the thread is created with JOINABLE attribute (by default) and before termination it calls the next code:
{ pthread_detach(pthread_self()); pthread_exit(NULL); }
It works like a charm on all 32 bit linux distros I met, but it causes SIGSEGV on 64 bit distros (Ubuntu 13.04 x86_64 and Debian). I didn't try with Slackware. Here is a core:
Core was generated by `IsaVM -s=1 -PrjPath="/home/taf/Linux_Fov_540148/Cmds" -stgMode=1 -PR -Failover'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f5911a7c009 in pthread_detach () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0 0x00007f5911a7c009 in pthread_detach () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x000000000041310d in _kerCltDownloadThr (StartParams=0x6bfce0 <RESFOV>) at ./dker0clt.c:1258
#2 0x00007f5911a7ae9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f591159f3fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x0000000000000000 in ?? ()
I figured out how to fix this bug - I set CREATE_DETACHABLE attribute (with pthread_attr_setdetachstate()) for the thread before it is created and it works as expected.
But my question - is it a crime to call this code?
{ pthread_detach(pthread_self()); pthread_exit(NULL); }
Does pthread_detach() do something asynchronously after call and that causes pthread_exit() to bring problems? But the crash point is pthread_detach() not pthread_exit()! I don't understand the reason for this crash completely! Why does it work on 32 bits? Is it a race condition somewhere in the pthread implementation?
pthread_join() doesn't called for this thread.
Thanks in advance for any ideas.
A thread detaching itself does not feel right. It is normally responsibility of the thread that called pthread_create() which can create a detached thread if necessary.
It could be that the thread has already been detached. Because attempting to detach an already detached thread results in unspecified behaviour.
My top wild guesses would be:
The thread gets detached more than once. As a quick check I would try setting a breakpoint on pthread_detach in gdb to see whether duplicate thread ids gets passed in this function. If it is difficult to run your application under gdb, another option is to override pthread_create and pthread_detach and track thread ids to detect double detach. See http://hackerboss.com/overriding-system-functions-for-fun-and-profit/
Memory corruption. valgrind may help you detect memory corruption if it is possible to run your application under it. Alternatively, try instrumenting your application with run-time error checks by compiling with -fstack-protector-all, -fsanitize=address, -fsanitize=thread if you use gcc. clang compiler also have an array of options to detect such errors, see sanitizers on http://clang.llvm.org/docs/index.html.
I finished my research with approaches offered by a respectable #MaximYegorushkin. AddressSanitizer shows me one buffer obverflow in our product but it isn't related to my problem (I will definitely fix it later, it is always good to have such a wise tool to hunt the bugs). So decided to override all necessary pthread_xxx functions with LD_PRELOAD method. I run a simple test to be sure my library works as expected:
[HACK] Loading pthread hack.
Starting thread...!
[HACK] pthread_create: thread=7FAC6C86D700
Waiting for 2 seconds...
[HACK] pthread_self: thread=7FAC6C86D700
thread_func: thread id = 7FAC6C86D700
Thread: sin(3.26) = -0.121109
[HACK] pthread_self: thread=7FAC6C86D700
[HACK] pthread_detach: thread=7FAC6C86D700
Terminating...
All strings started from [HACK] are produced by my threadhack.so library.
Then I run my project with this library it points me exactly where the problem is:
Code executed: { pthread_detach(pthread_self()); pthread_exit(NULL); }
Debug traces:
[HACK] pthread_create: thread=7F403251CB00
.....
[HACK] pthread_self: thread=7F403251CB00
[HACK] pthread_detach: thread=3251CB00
So we see that pthread_self returns a good thread id, but pthread_detach received it already mangled (cut to 32 bit). How could this be? I generated assembler code for both my simple working test application as a reference and for my project:
Reference application:
call pthread_self
movq %rax, %rdi
call pthread_detach
movl $0, %edi
call pthread_exit
So we see here that movq instruction is used to copy 64 bit thread id (movq %rax, %rdi). OK, check what GCC generated for my project:
movl $0, %eax
call pthread_self
movl %eax, %edi
movl $0, %eax
call pthread_detach
movl $0, %edi
movl $0, %eax
call pthread_exit
Woa! We have two movl instructions (32 bit), one copies the least significant 32 bits (movl %eax, %edi) and instead of most significan part it always put zero! (movl $0, %eax). So this is a reason for the mangled thead id. I have no idea why the code is so different - compilation flags are the same. I saw this bug in GCC 4.7 I see this bug in GCC 4.8 (Latest package from the Ubuntu 13.10 x86_64).
So at least now I see what hapenning. Thanks to #Maxim and brilliant tools. I learned a new thing again.
P.S. I don't know how to submit a bug report to the GCC team. I can't reproduce the problem on a small simple application and I can't hand them my project because it is a proprietary software and I'm NDA-ed to not distribute it.
My guess is that you don't have the prototype for either pthread_detach or pthread_self in the code that invokes pthread_detach(pthread_self()); Without the prototype, the compiler will assume the argument is int (pthread_detach) or that the function returns an int (pthread_self).
Although thinking it through further, I'm more suspecting that pthread_self is the culprit being either undefined (returning an int) or defined incorrectly as returning an int. The compiler then correctly extends this to a 64 bit integer by adding the leading 32 bits of zero.
The following code snippet is taken from linux v2.6.11. Something similar is present in v3.8 as well.
mrs r13, cpsr
bic r13, r13, #MODE_MASK
orr r13, r13, #MODE_SVC
msr spsr_cxsf, r13 # switch to SVC_32 mode
and lr, lr, #15
ldr lr, [pc, lr, lsl #2]
movs pc, lr # Changes mode and branches
Check out the following link for the actual file:
http://lxr.linux.no/linux+v2.6.11/arch/arm/kernel/entry-armv.S
I think writing into the mode bits of CPSR can change the current ARM mode. But how writing into SPSR (instead of CPSR), has resulted in switching to SVC_32 mode?
(or) Is something happening in the last instruction "movs pc, lr". Could someone help me understand this?
A mov or sub instruction with the 'S' suffix and the program counter as its destination register means a exception return.
It copies the contents of the SPSR to the CPSR and moves the value of the source register into the program counter (in this case, the link register).
In your example, this effectively sets the mode to SVC mode and returns from the function in one go.
There's more information on this in the ARM reference manual.
I am answering the SPSR Vs CPSR question here.
CPSR is user/system mode register, and doesn't exist in other modes, like fiq or irq modes. Whereas, SPSR exists in fiq and irq modes. On a mode change CPSR is copied into SPSR and the changed mode has to use SPSR to make any changes to the current status of the processor. SPSR is not available in user mode. And any changes made to CPSR in non-user mode won't take effect.
CPSR - Current Program Status Register
SPSR - Saved Program Status Register
I have this below program
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
int x = 1;
void ouch(int sig) {
printf("OUCH! dividing by zero!\n");
x = 0;
}
void fpe(int sig) {
printf("FPE! I got a signal: %d\n",sig);
psignal(sig, "psignal");
x = 1;
}
int main(void) {
(void) signal(SIGINT, ouch);
(void) signal(SIGFPE, fpe);
while(1)
{
printf("Hello World: %d\n",1/x);
sleep(1);
}
}
Problem: While executing this program - when I give a SIGINT from the terminal to the program - the ""OUCH! dividing by zero! " is output - as Expected.
the next message is the
"FPE! I got a signal: 8
psignal: Floating point exception " .
and this message goes on and on - doesn't stop. My doubt is after calling the fpe signal handler , I set x to be 1 . I hence expect Hello World should be displayed in the output.
Below is a transcript of the output I am getting :
Hello World: 1
Hello World: 1
^COUCH! dividing by zero!
FPE! I got a signal: 8
psignal: Floating point exception
FPE! I got a signal: 8
psignal: Floating point exception
FPE! I got a signal: 8
psignal: Floating point exception
^COUCH! dividing by zero!
.
.
.
.
When the signal handler is entered, the program counter (CPU register pointing at the currently executing instruction) is saved where the divide-by-zero occurred. Ignoring the signal restores the PC to exactly the same place, upon which the signal is triggered again (and again, and again).
The value or volatility of 'x' is irrelevant by this point - the zero has been transferred into a CPU register in readiness to perform the divide.
man 2 signal notes that:
According to POSIX, the behaviour of a process is undefined after it ignores a SIGFPE, SIGILL, or SIGSEGV signal that was not generated by the kill(2) or the raise(3) functions. Integer division by zero has undefined result. On some architectures it will generate a SIGFPE signal. (Also dividing the most negative integer by -1 may generate SIGFPE.) Ignoring this signal might lead to an endless loop.
We can see this in gdb if you compile with the debug flag:
simon#diablo:~$ gcc -g -o sigtest sigtest.c
simon#diablo:~$ gdb sigtest
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
By default gdb won't pass SIGINT to the process - change this so it sees the first signal:
(gdb) handle SIGINT pass
SIGINT is used by the debugger.
Are you sure you want to change it? (y or n) y
Signal Stop Print Pass to program Description
SIGINT Yes Yes Yes Interrupt
Off we go:
(gdb) run
Starting program: /home/simon/sigtest
x = 1
Hello World: 1
Now let's interrupt it:
^C
Program received signal SIGINT, Interrupt.
0xb767e17b in nanosleep () from /lib/libc.so.6
and onwards to the divide:
(gdb) cont
Continuing.
OUCH! dividing by zero!
x = 0
Program received signal SIGFPE, Arithmetic exception.
0x0804853a in main () at sigtest.c:30
30 printf("Hello World: %d\n",1/x);
Check the value of 'x', and continue:
(gdb) print x
$1 = 0
(gdb) cont
Continuing.
FPE! I got a signal: 8
psignal: Floating point exception
Program received signal SIGFPE, Arithmetic exception.
0x0804853a in main () at sigtest.c:30
30 printf("Hello World: %d\n",1/x);
(gdb) print x
$2 = 1
x is clearly now 1 and we still got a divide-by-zero - what's going on? Let's inspect the underlying assembler:
(gdb) disassemble
Dump of assembler code for function main:
0x080484ca : lea 0x4(%esp),%ecx
0x080484ce : and $0xfffffff0,%esp
...
0x08048533 : mov %eax,%ecx
0x08048535 : mov %edx,%eax
0x08048537 : sar $0x1f,%edx
0x0804853a : idiv %ecx <<-- address FPE occurred at
0x0804853c : mov %eax,0x4(%esp)
0x08048540 : movl $0x8048653,(%esp)
0x08048547 : call 0x8048384
0x0804854c : jmp 0x8048503
End of assembler dump.
One Google search later tells us that IDIV divides the value in the EAX register by the source operand (ECX). You can probably guess the register contents:
(gdb) info registers
eax 0x1 1
ecx 0x0 0
...
You should use volatile int x to ensure that the compiler reloads x from memory each time through the loop. Given that your SIGINT handler works, this probably does not explain your specific problem, but if you try more complicated examples (or crank up the optimization) it will eventually bite you.
After handling a signal raised while executing an instruction, the PC may return to either that instruction or to the following one. Which one it does is very CPU + OS specific. In addition, whether integer division by zero raises SIGFPE is also CPU + OS dependant.
At the CPU level, after taking an exception, it makes most sense to return to the offending instruction, after the OS has had the chance to do whatever it needs to (think of page faults/TLB misses), and run that instruction again. (The OS may have had to do some address correction, for instance, ARM CPUs point two instructions after the offending instruction as a testament to their original 3-stage pipeline, while MIPS CPU's point to the jump when taking an exception from an instruction on a jump delay slot).
At the OS level, there are several ways to handle exceptions:
Do the necessary handling (swap memory in, update page tables, etc...) and rerun the instruction.
Emulate that instruction, advance the PC accordingly and return to the next instruction. This allows for emulation of unimplemented instructions (CPUs without/with incomplete FPUs, LL/SC on MIPSI CPUs, ...), and unsupported alignment (after taking an alignment exception, the OS may decide sending a SIGBUS to the process, or emulating the unsupported access, possibly while logging it).
Send a fatal signal to the process. The process may take the role of the OS here in handling the exception, using CPU + OS dependent methods, such as the siginfo method linked by Simonj.
A non-portable method to deal with SIGFPE is calling longjmp() from the signal handler, as in my answer to a similar question on SIGSEGV.
n1318 has more details on the longjmp() from signal handler that you ever wanted to know. Also note that POSIX specifies that longjmp() should work from non-nested signal handlers.