GCC Segmentation fault with -O1 and inline assembler - linux

I have detected a strange segmentation fault in my code and I would like to hear your opinion if that could be a GCC bug or is just my fault!
The function looks like that:
void testMMX( ... ) {
unsigned long a = ...;
unsigned char const* b = ...;
unsigned long c = ...;
__asm__ volatile (
"pusha;"
);
__asm__ volatile ( "mov %0, %%eax;" : : "m"( a ) : "%eax" ); // with "r"( a ) it just works fine!
__asm__ volatile ( "add %0, %%eax;" : : "m"( b ) : "%eax" );
__asm__ volatile ( "mov %0, %%esi;" : : "m"( c ) : "%eax", "%esi" );
__asm__ volatile (
"sub %eax, %esi;"
"dec %esi;"
"movd (%esi), %mm0;"
"popa;"
);
}
If I compile this with -O0 it just works fine. But it SegFaults with -O1 and -O2. It took me a long time to figure out that this segfault was caused by frame pointer omission. The pusha instruction increases the stack size by 4*8=32 byte (x86_32) and therefore ESP should be increases as well. But gcc does not recognize this. If I add the ESP fix manually
__asm__("add $32, %esp")
or use the "-fno-omit-frame-pointer" flag in gcc I can compile and run it with -O1 and -O2 without any errors!
So my question now is: why does gcc not adjust the ESP with any push/pop inline assembler operations if frame-pointer-omission is enabled? Is this a gcc bug? Is gcc even capable of detecting this? Am I missing something?
It would be very interesting to solve this.
Thanks in advance!

No - gcc is not capable of detecting this. It doesn't perform any analysis of the instructions that appear in the asm block. It is your responsibility to inform the compiler of any side effects. Can you explain what test you are performing?
Also, you should consider using a single asm block for this code; volatile may prevent reordering of the asm blocks, but you cannot assume this yields consecutive instructions.

Related

multiple assembly instruction using asm volatile in c code

I need to modified an random number generator using rdrand(only),
It implemented in c code as follows.
uint64_t _rdrand(void)
{
uint64_t r;
__asm__ volatile("rdrand %0\n\t" : "=r"(r));
return r;
}
Now i need to modify such that it returns only if carry flag is set. (According to rdrand documentation).I think it can be implimented by jc instruction,but don't know how to use inside __asm__ volatile.please help me.

How can I select a static library to be linked while ARM cross compiling?

I have an ARM cross compiler in Ubuntu(arm-linux-gnueabi-gcc) and the default archtecture is ARMv7. However, I want to compile an ARMv5 binary. I do this by giving the compiler the -march=armv5te option.
So far, so good. Since my ARM system uses BusyBox, I have to compile my binary statically linked. So I give gcc the -static option.
However, I have a problem with libc.a which the linker links to my ARMv5 binary. This file is compiled with the ARMv7 architecture option. So, even if I cross-compile my ARM binary with ARMv5, I can't run it on my BusyBox based ARMv5 box.
How can I solve this problem?
Where can I get the ARMv5 libc.a static library, and how can I link it?
Thank you in advance.
You have two choices,
Get the right compiler.
Write your own 'C' Library.
Get the right compiler.
You are always safest to have a compiler match your system. This applies to x86 Linux and various distributions. You are lucky if different compilers work. It is more difficult when you cross-compile as often the compiler will not be automatically synced. Try to run a program on a 1999 x86 Mandrake Linux compiled on your 2014 Ubuntu system.
As well as instruction compatibility (which you have identified), there are ABI and OS dependencies. Specifically, the armv7 is most likely hardfloat (has floating point FPU and register call convention) and you need a softfloat (emulated FPU). The specific glibc (or ucLibc) has specific calls and expectations of the Linux OS. For instance, the way threads works has changed over time.
Write your own
You can always use -fno-builtin and -ffreestanding as well as -static. Then you can not use any libc functions, but you can program them your self.
There are external source, like Mark Martinec's snprintf and building blocks like write() which is easy to implement,
#define _SYS_IOCTL_H 1
#include <linux/unistd.h>
#include <linux/ioctl.h>
static inline int write(int fd, void *buf, int len)
{
int rval;
asm volatile ("mov r0, %1\n\t"
"mov r1, %2\n\t"
"mov r2, %3\n\t"
"mov r7, %4\n\t"
"swi #0\n\t"
"mov %0, r0\n\t"
: "=r" (rval)
: "r" (fd),
"r" (buf),
"r" (len),
"Ir" (__NR_write)
: "r0", "r1", "r2", "r7");
return rval;
}
static inline void exit(int status)
{
asm volatile ("mov r0, %0\n\t"
"mov r7, %1\n\t"
"swi #0\n\t"
: : "r" (status),
"Ir" (__NR_exit)
: "r0", "r7");
}
You have to add your own start-up machinery taken care of by the 'C' library,
/* Called from assembler startup. */
int main (int argc, char*argv[])
{
write(STDOUT, "Hello world\n", sizeof("Hello world\n"));
return 0;
}
/* Wrapper for main return code. */
void __attribute__ ((unused)) estart (int argc, char*argv[])
{
int rval = main(argc,argv);
exit(rval);
}
/* Setup arguments for estart [like main()]. */
void __attribute__ ((naked)) _start (void)
{
asm(" sub lr, lr, lr\n" /* Clear the link register. */
" ldr r0, [sp]\n" /* Get argc... */
" add r1, sp, #4\n" /* ... and argv ... */
" b estart\n" /* Let's go! */
);
}
If this is too daunting, because you need to implement a lot of functionality, then you can try and get various library source and rebuild them with -fno-builtin and make sure that the libraries do not get linked with the Ubuntu libraries, which are incompatible.
Projects like crosstool-ng can allow you to build a correct compiler (maybe with more advanced code generation) that suits the armv5 system exactly. This may seem like a pain, but the alternatives above aren't easy either.

what's the difference between gcc __sync_bool_compare_and_swap and cmpxchg?

to use cas, gcc provides some useful functions such as
__sync_bool_compare_and_swap
but we can also use asm code like cmpxchg
bool ret;
__asm__ __volatile__(
"lock cmpxchg16b %1;\n"
"sete %0;\n"
:"=m"(ret),"+m" (*(volatile pointer_t *) (addr))
:"a" (old_value.ptr), "d" (old_value.tag), "b" (new_value.ptr), "c" (new_value.tag));
return ret;
I have grep the source code of gcc 4.6.3, and found that __sync_bool_compare_and_swap is implemented use
typedef int (__kernel_cmpxchg_t) (int oldval, int newval, int *ptr);
#define __kernel_cmpxchg (*(__kernel_cmpxchg_t *) 0xffff0fc0)
it seems that 0xffff0fc0 is the adress of some kernel helper functions
but in gcc 4.1.2 , there is no such codes like __kernel_cmpxchg_t, and I can't find the implementation of __sync_bool_compare_and_swap.
so what's the difference between __sync_bool_compare_and_swap and cmpxchg?
is __sync_bool_compare_and_swap implemented by cmpxchg?
and with kernel helper function __kernel_cmpxchg_t, is it implementd by cmpxchg?
thanks!
I think the __kernel_cmpxchg is a fallback which Linux makes available on some architectures which don't have native hardware support for CAS. E.g. ARMv5 or something like that.
Usually, GCC inline expands the _sync* builtins. Unless you're really interested in GCC internals, an easier way to find out what it does is to make a simple C example and look at the ASM the compiler generates.
Consider
#include <stdbool.h>
bool my_cmpchg(int *ptr, int oldval, int newval)
{
return __sync_bool_compare_and_swap(ptr, oldval, newval);
}
Compiling this on an x86_64 Linux machine with GCC 4.4 the following asm is generated:
my_cmpchg:
.LFB0:
.cfi_startproc
movl %esi, %eax
lock cmpxchgl %edx, (%rdi)
sete %al
ret
.cfi_endproc

Inline assembly in Haskell

Can I somehow use inline assembly in Haskell (similar to what GCC does for C)?
I want to compare my Haskell code to the reference implementation (ASM) and this seems the most straightforward way. I guess I could just call Haskell from C and use GCC inline assembly, but I'm still interested if I can do it the other way around.
(I'm on Linux/x86)
There are two ways:
Call C via the FFI, and use inline assembly on the C side.
Write a CMM fragment that calls C (without the FFI), and uses inlined assembly.
Both solutions use inline assembly on the C side. The former is the most idiomatic. Here's an example, from the rdtsc package:
cycles.h:
static __inline__ ticks getticks(void)
{
unsigned int tbl, tbu0, tbu1;
do {
__asm__ __volatile__ ("mftbu %0" : "=r"(tbu0));
__asm__ __volatile__ ("mftb %0" : "=r"(tbl));
__asm__ __volatile__ ("mftbu %0" : "=r"(tbu1));
} while (tbu0 != tbu1);
return (((unsigned long long)tbu0) << 32) | tbl;
}
rdtsc.c:
unsigned long long rdtsc(void)
{
return getticks();
}
rdtsc.h:
unsigned long long rdtsc(void);
rdtsc.hs:
foreign import ccall unsafe "rdtsc.h" rdtsc :: IO Word64
Finally:
A slightly non-obvious solution is to use the LLVM or Harpy packages to call some generated assembly.

Linux assembler error "impossible constraint in ‘asm’"

I'm starting with assembler under Linux. I have saved the following code as testasm.c
and compiled it with: gcc testasm.c -otestasm
The compiler replies: "impossible constraint in ‘asm’".
#include <stdio.h>
int main(void)
{
int foo=10,bar=15;
__asm__ __volatile__ ("addl %%ebx,%%eax"
: "=eax"(foo)
: "eax"(foo), "ebx"(bar)
: "eax"
);
printf("foo = %d", foo);
return 0;
}
How can I resolve this problem?
(I've copied the example from here.)
Debian Lenny, kernel 2.6.26-2-amd64
gcc version 4.3.2 (Debian 4.3.2-1.1)
Resolution:
See the accepted answer - it seems the 'modified' clause is not supported any more.
__asm__ __volatile__ ("addl %%ebx,%%eax" : "=a"(foo) : "a"(foo), "b"(bar));
seems to work. I believe that the syntax for register constraints changed at some point, but it's not terribly well documented. I find it easier to write raw assembly and avoid the hassle.
The constraints are single letters (possibly with extra decorations), and you can specify several alternatives (i.e., an inmediate operand or register is "ir"). So the constraint "eax" means constraints "e" (signed 32-bit integer constant), "a" (register eax), or "x" (any SSE register). That is a bit different that what OP meant... and output to an "e" clearly doesn't make any sense. Also, if some operand (in this case an input and an output) must be the same as another, you refer to it by a number constraint. There is no need to say eax will be clobbered, it is an output. You can refer to the arguments in the inline code by %0, %1, ..., no need to use explicit register names. So the correct version for the code as intended by OP would be:
#include <stdio.h>
int main(void)
{
int foo=10, bar=15;
__asm__ __volatile__ (
"addl %2, %0"
: "=a" (foo)
: "0" (foo), "b" (bar)
);
printf("foo = %d", foo);
return 0;
}
A better solution would be to allow %2 to be anything, and %0 a register (as x86 allows, but you'd have to check your machine manual):
#include <stdio.h>
int main(void)
{
int foo=10, bar=15;
__asm__ __volatile__ (
"addl %2, %0"
: "=r" (foo)
: "0" (foo), "g" (bar)
);
printf("foo = %d", foo);
return 0;
}
If one wants to use multiline, then this will also work..
__asm__ __volatile__ (
"addl %%ebx,%%eax; \
addl %%eax, %%eax;"
: "=a"(foo)
: "a"(foo), "b"(bar)
);
'\' should be added for the compiler to accept a multiline string (the instructions).

Resources