sys_execve hooking on 3.5 kernel - linux

I am trying to hook sys_execve syscall in Linux kernel v3.5 on x86_32. I simply change sys_call_table entry address to my hook function
asmlinkage long (*real_execve)( const char __user*, const char __user* const __user*,
const char __user* const __user* );
...
asmlinkage long hook_execve( const char __user* filename, const char __user* const __user* argv,
const char __user* const __user* envp )
{
printk( "Called execve hook\n" );
return real_execve( filename, argv, envp );
}
...
real_execve = (void*)sys_call_table[ __NR_execve ];
sys_call_table[ __NR_execve ] = (unsigned long)hook_execve;
I do set page permission for modifying sys_call_table entries, and mentioned scheme works well for another syscalls (chdir, mkdir and so on). But on execve hooking i got null pointer dereference:
Mar 11 14:18:08 mbz-debian kernel: [ 5590.596033] Called execve hook
Mar 11 14:18:08 mbz-debian kernel: [ 5590.596408] BUG: unable to handle kernel NULL pointer dereference at (null)
Mar 11 14:18:08 mbz-debian kernel: [ 5590.596486] IP: [< (null)>] (null)
Mar 11 14:18:08 mbz-debian kernel: [ 5590.596526] *pdpt = 0000000032302001 *pde = 0000000000000000
Mar 11 14:18:08 mbz-debian kernel: [ 5590.596584] Oops: 0010 [#1] SMP
I call sys_execve with three parameters because of arch/x86/kernel/entry_32.S, that contains PTREGSCALL3(execve). However, i've tried calling it with four parameters (adding struct pt_regs*) but i got the same error. Maybe something is totally wrong with this approach to execve? Or did i miss something?
UPDATE #1
I found that sys_call_table[ __NR_execve ] actually contains address of ptregs_execve (not sys_execve). It is defined as follows in arch/x86/kernel/entry_32.S:
#define PTREGSCALL3(name) \
ENTRY(ptregs_##name) ; \
CFI_STARTPROC; \
leal 4(%esp),%eax; \
pushl_cfi %eax; \
movl PT_EDX(%eax),%ecx; \
movl PT_ECX(%eax),%edx; \
movl PT_EBX(%eax),%eax; \
call sys_##name; \
addl $4,%esp; \
CFI_ADJUST_CFA_OFFSET -4; \
ret; \
CFI_ENDPROC; \
ENDPROC(ptregs_##name)
...
PTREGSCALL3(execve)
So in order to modify sys_execve i need to replace its code without modifying its address? I have read something similar here, is this the way to go?
UPDATE #2
Actually i found following call sequence: do_execve->do_execve_common->search_binary_handler->security_bprm_check, and this security_bprm_check is a wrapper around LSM(Linux Security Module) operation, that controls execution of a binary. After that i've read and followed this link and i got it working. It solves my problem as now i can see the name of process to be executed, but i am still unsure about correctness of it. Maybe someone else will add some clarity about all this stuff.

In the past, hooking syscalls in the Linux kernel was an easier task, however, in newer kernels, assembly stubs were added to the syscalls. In order to solve this problem, I patch the kernel's memory on the fly.
You can view my full solution for hooking sys_execve here:
https://github.com/kfiros/execmon

Related

Why does my RIP value change after overwriting via an overflow?

I've been working on a buffer overflow on a 64 bit Linux machine for the past few days. The code I'm attacking takes in a file. This original homework ran on a 32-bit system, so a lot is differing. I thought I'd run with it and try to learn something new along the way. I set sudo sysctl -w kernel.randomize_va_space=0 and compiled the program below gcc -o stack -z execstack -fno-stack-protector stack.c && chmod 4755 stack
/* This program has a buffer overflow vulnerability. */
/* Our task is to exploit this vulnerability */
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int bof(char *str)
{
char buffer[12];
/* The following statement has a buffer overflow problem */
strcpy(buffer, str);
return 1;
}
int main(int argc, char **argv)
{
char str[517];
FILE *badfile;
badfile = fopen("badfile", "r");
fread(str, sizeof(char), 517, badfile);
bof(str);
printf("Returned Properly\n");
return 1;
}
I could get it to crash by making a file with 20 "A"s. I made a small script to help.
#!/usr/bin/bash
python -c 'print("A"*20)' | tr -d "\n" > badfile
So now, if I add 6 "B"s to the mix after hitting the SIGSEGV in gbd I get.
RIP: 0x424242424242 ('BBBBBB')
0x0000424242424242 in ?? ()
Perfect! Now we can play with the RIP and put our address in for it to jump to! This is where I'm getting a little confused.
I added some C's to the script to try to find a good address to jump to
python -c 'print("A"*20 + "B"*6 + C*32)' | tr -d "\n" > badfile
after creating the file and getting the SEGFAULT my RIP address is no longer our B's
RIP: 0x55555555518a (<bof+37>: ret)
Stopped reason: SIGSEGV
0x000055555555518a in bof (
str=0x7fffffffe900 'C' <repeats 14 times>)
at stack.c:16
I thought this might be due to using B's, so I went ahead and tried to find an address. I then ran x/100 $rsp in gbd, but it looks completely different than before without the Cs
# Before Cs
0x7fffffffe8f0: 0x00007fffffffec08 0x0000000100000000
0x7fffffffe900: 0x4141414141414141 0x4141414141414141
0x7fffffffe910: 0x4242424241414141 0x0000000000004242
# After Cs
0x7fffffffe8e8: "BBBBBB", 'C' <repeats 32 times>
0x7fffffffe90f: "AAAAABBBBBB", 'C' <repeats 32 times>
0x7fffffffe93b: ""
I've been trying to understand why this is happening. I know after this I can supply an address plus code to get a shell like so
python -c 'print("noOPs"*20 + "address" + "shellcode")' | tr -d "\n" > badfile
The only thing that has come to mind is the buffer size? I'm not too sure, though. Any help would be great. Doing this alone without help has made me learn a lot. I'm just dying to create a working exploit!

Difference in behavior when hooking a library function via LD_PRELOAD on Ubuntu and CentOS

There is a hook function socketHook.c that intercepts socket() calls:
#include <stdio.h>
int socket(int domain, int type, int protocol)
{
printf("socket() has been intercepted!\n");
return 0;
}
gcc -c -fPIC socketHook.c
gcc -shared -o socketHook.so socketHook.o
And a simple program getpwuid.c (1) that just invokes the getpwuid() function:
#include <pwd.h>
int main()
{
getpwuid(0);
return 0;
}
gcc getpwuid.c -o getpwuid
getpwuid() internally makes a socket() call.
On CentOS:
$ strace -e trace=socket ./getpwuid
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
socket(AF_UNIX, SOCK_STREAM, 0) = 4
On Ubuntu:
$ strace -e trace=socket ./getpwuid
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 5
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 5
When running (1), socket() is intercepted on CentOS, but not on Ubuntu.
CentOS. printf() from socketHook.c is present:
$ uname -a
Linux centos-stream 4.18.0-301.1.el8.x86_64 #1 SMP Tue Apr 13 16:24:22 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
$ LD_PRELOAD=$(pwd)/socketHook.so ./getpwuid
socket() has been intercepted!
Ubuntu(Xubuntu 20.04). printf() from socketHook.c is NOT present:
$ uname -a
Linux ibse-VirtualBox 5.8.0-50-generic #56~20.04.1-Ubuntu SMP Mon Apr 12 21:46:35 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
$ LD_PRELOAD=$(pwd)/socketHook.so ./getpwuid
$
So my question is:
What does it depend on? I think this is affected by the fact that socket() is not called directly from the executable, but from getpwuid(), which in turn is called, if I understand correctly, from libc.so
How to achieve the same behavior in CentOS as in Ubuntu? I don't want intercept indirect calls from libc
What does it depend on?
There are two questions to ask:
Which function actually calls the socket system call?
How is that function getting called.
You can see how the socket system call is invoked by running your program under GDB, and using catch syscall socket command. On Ubuntu:
(gdb) catch syscall socket
Catchpoint 1 (syscall 'socket' [41])
(gdb) run
Starting program: /tmp/a.out
Catchpoint 1 (call to syscall socket), 0x00007ffff7ed3477 in socket () at ../sysdeps/unix/syscall-template.S:120
120 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0 0x00007ffff7ed3477 in socket () at ../sysdeps/unix/syscall-template.S:120
#1 0x00007ffff7f08010 in open_socket (type=type#entry=GETFDPW, key=key#entry=0x7ffff7f612ca "passwd", keylen=keylen#entry=7) at nscd_helper.c:171
#2 0x00007ffff7f084fa in __nscd_get_mapping (type=type#entry=GETFDPW, key=key#entry=0x7ffff7f612ca "passwd", mappedp=mappedp#entry=0x7ffff7f980c8 <map_handle+8>) at nscd_helper.c:269
#3 0x00007ffff7f0894f in __nscd_get_map_ref (type=type#entry=GETFDPW, name=name#entry=0x7ffff7f612ca "passwd", mapptr=mapptr#entry=0x7ffff7f980c0 <map_handle>,
gc_cyclep=gc_cyclep#entry=0x7fffffffda0c) at nscd_helper.c:419
#4 0x00007ffff7f04fb7 in nscd_getpw_r (key=0x7fffffffdaa6 "0", keylen=2, type=type#entry=GETPWBYUID, resultbuf=resultbuf#entry=0x7ffff7f96520 <resbuf>,
buffer=buffer#entry=0x5555555592a0 "", buflen=buflen#entry=1024, result=0x7fffffffdb60) at nscd_getpw_r.c:93
#5 0x00007ffff7f05412 in __nscd_getpwuid_r (uid=uid#entry=0, resultbuf=resultbuf#entry=0x7ffff7f96520 <resbuf>, buffer=buffer#entry=0x5555555592a0 "", buflen=buflen#entry=1024,
result=result#entry=0x7fffffffdb60) at nscd_getpw_r.c:62
#6 0x00007ffff7e9e95d in __getpwuid_r (uid=uid#entry=0, resbuf=resbuf#entry=0x7ffff7f96520 <resbuf>, buffer=0x5555555592a0 "", buflen=buflen#entry=1024,
result=result#entry=0x7fffffffdb60) at ../nss/getXXbyYY_r.c:255
#7 0x00007ffff7e9dfd3 in getpwuid (uid=0) at ../nss/getXXbyYY.c:134
#8 0x0000555555555143 in main () at t.c:5
(gdb) info sym $pc
socket + 7 in section .text of /lib/x86_64-linux-gnu/libc.so.6
(gdb) up
#1 0x00007ffff7f08010 in open_socket (type=type#entry=GETFDPW, key=key#entry=0x7ffff7f612ca "passwd", keylen=keylen#entry=7) at nscd_helper.c:171
171 nscd_helper.c: No such file or directory.
(gdb) x/i $pc-5
0x7ffff7f0800b <open_socket+59>: callq 0x7ffff7ed3470 <socket>
From this we can see that
The function socket is called. Using nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep ' socket' we can confirm that that function is exported from libc.so.6, and thus should be interposable.
The caller does not call socket#plt (i.e. does not use the procedure linkage table), and so LD_PRELOAD will have no effect.
The call from open_socket() to socket() has been non-interposable since 2004, so it's likely that this call isn't intercepted on CentOS either, but some other call is. Probably the 3rd one in your strace output.
Using above method you should be able to tell where that call comes from.
I don't want intercept indirect calls from libc
In that case, LD_PRELOAD may be the wrong mechanism to use.
If you want to only intercept socket() calls from your own code, it's trivial to redirect them to e.g. mysocket() without any need for LD_PRELOAD.
You can do that at source level by adding e.g.
#define socket mysocket
to all your files, or using -Dsocket=mysocket argument at compile time.
Alternatively, using the linker --wrap=socket will do the redirection without recompiling.

C program stores function parameters from $rbp+4 in memory? My check failed

I was trying to learn how to use rbp/ebp to visit function parameters and local variables on ubuntu1604, 64bit. I've got a simply c file:
#include<stdio.h>
int main(int argc,char*argv[])
{
printf("hello\n");
return argc;
}
I compiled it with:
gcc -g my.c
Then debug it with argument parameters:
gdb --args my 01 02
Here I know the "argc" should be 3, so I tried to check:
(gdb) b main
Breakpoint 1 at 0x400535: file ret.c, line 5.
(gdb) r
Starting program: /home/a/cpp/my 01 02
Breakpoint 1, main (argc=3, argv=0x7fffffffde98) at ret.c:5
5 printf("hello\n");
(gdb) x $rbp+4
0x7fffffffddb4: 0x00000000
(gdb) x $rbp+8
0x7fffffffddb8: 0xf7a2e830
(gdb) x/1xw $rbp+8
0x7fffffffddb8: 0xf7a2e830
(gdb) x/1xw $rbp+4
0x7fffffffddb4: 0x00000000
(gdb) x/1xw $rbp
0x7fffffffddb0: 0x00400550
I don't find any clue that a dword of "3" is saved in any of bytes in $rbp+xBytes. Did I get anything wrong in my understanding or commands?
Thanks!
I was trying to learn how to use rbp/ebp to visit function parameters and local variables
The x86_64 ABI does not use stack to pass parameters; they are passed in registers. Because of that, you wouldn't find them at any offset off $rbp (this is different from ix86 calling convention).
To find the parameters, you'll need to look at the $rdi and $rsi regusters:
Breakpoint 1, main (argc=3, argv=0x7fffffffe3a8) at my.c:4
4 printf("hello\n");
(gdb) p/x $rdi
$1 = 0x3 # matches argc
(gdb) p/x $rsi
$2 = 0x7fffffffe3a8 # matches argv
x $rbp+4
You almost certainly wouldn't find anything useful at $rbp+4, because it is usually incremented or decremented by 8, in order to store the entire 64-bit value.

How to compile STM32f103 program on ubuntu?

I've some experience with programming stm32 arm cortex m3 micro controllers on Windows using Keil. I now want to move to linux environment and use open source tools to program STM32 cortex m3 devices.
I've researched a bit and found that I can use OpenOCD or Texane's ST Link to flash the chip. I also found out that I'll need a cross compiler to compile the code viz. gcc-arm-none-eabi toolchain.
I want to know what basic source and header files are needed? Which are the core and systems file required to make a simple blink program.
I'm not intending to use HAL libraries as of now. I'm using stm32f103zet6 mcu (a very generic board). I went to http://regalis.com.pl/en/arm-cortex-stm32-gnulinux/ , but couldn't exactly pinpoint the files.
If there is any tutorial to start stm32 programming on linux environment, please let me know.
Any help is appreciated. Thanks!
Here is a very simple example that is fairly portable across the stm32 family. Doesnt do anything useful you have to fill in the blanks to blink an led or something (read the schematic, the manuals, enable the clocks to the gpio, follow the instructions to make it a push/pull output and so on, the set the bit or clear the bit, etc).
I have my reasons for how I do it others have theirs, and we all have various numbers of years or decades of experience behind those opinions. But at the end of they day they are opinions and many different solutions will work.
On the last so many releases of ubuntu you can simply do this to get a toolchain:
apt-get install gcc-arm-linux-gnueabi binutils-arm-linux-gnueabi
Or you can go here and get a pre-built for your operating system
https://launchpad.net/gcc-arm-embedded
flash.s
.cpu cortex-m0
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.thumb_func
reset:
bl notmain
b hang
.thumb_func
hang: b .
.align
.thumb_func
.globl PUT16
PUT16:
strh r1,[r0]
bx lr
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.thumb_func
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.thumb_func
.globl dummy
dummy:
bx lr
.end
flash.ld
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
sram.s
.cpu cortex-m0
.thumb
.thumb_func
.global _start
_start:
ldr r0,stacktop
mov sp,r0
bl notmain
b hang
.thumb_func
hang: b .
.align
stacktop: .word 0x20001000
.thumb_func
.globl PUT16
PUT16:
strh r1,[r0]
bx lr
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.thumb_func
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.thumb_func
.globl dummy
dummy:
bx lr
.end
sram.ld
MEMORY
{
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > ram
.rodata : { *(.rodata*) } > ram
.data : { *(.data*) } > ram
.bss : { *(.bss*) } > ram
}
notmain.c
void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
void dummy ( unsigned int );
#define STK_CSR 0xE000E010
#define STK_RVR 0xE000E014
#define STK_CVR 0xE000E018
#define STK_MASK 0x00FFFFFF
int delay ( unsigned int n )
{
unsigned int ra;
while(n--)
{
while(1)
{
ra=GET32(STK_CSR);
if(ra&(1<<16)) break;
}
}
return(0);
}
int notmain ( void )
{
unsigned int rx;
PUT32(STK_CSR,4);
PUT32(STK_RVR,1000000-1);
PUT32(STK_CVR,0x00000000);
PUT32(STK_CSR,5);
for(rx=0;;rx++)
{
dummy(rx);
delay(50);
dummy(rx);
delay(50);
}
return(0);
}
Makefile
#ARMGNU ?= arm-none-eabi
ARMGNU ?= arm-linux-gnueabi
AOPS = --warn --fatal-warnings -mcpu=cortex-m0
COPS = -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m0
all : notmain.gcc.thumb.flash.bin notmain.gcc.thumb.sram.bin
clean:
rm -f *.bin
rm -f *.o
rm -f *.elf
rm -f *.list
rm -f *.bc
rm -f *.opt.s
rm -f *.norm.s
rm -f *.hex
#---------------------------------
flash.o : flash.s
$(ARMGNU)-as $(AOPS) flash.s -o flash.o
sram.o : sram.s
$(ARMGNU)-as $(AOPS) sram.s -o sram.o
notmain.gcc.thumb.o : notmain.c
$(ARMGNU)-gcc $(COPS) -mthumb -c notmain.c -o notmain.gcc.thumb.o
notmain.gcc.thumb.flash.bin : flash.ld flash.o notmain.gcc.thumb.o
$(ARMGNU)-ld -o notmain.gcc.thumb.flash.elf -T flash.ld flash.o notmain.gcc.thumb.o
$(ARMGNU)-objdump -D notmain.gcc.thumb.flash.elf > notmain.gcc.thumb.flash.list
$(ARMGNU)-objcopy notmain.gcc.thumb.flash.elf notmain.gcc.thumb.flash.bin -O binary
notmain.gcc.thumb.sram.bin : sram.ld sram.o notmain.gcc.thumb.o
$(ARMGNU)-ld -o notmain.gcc.thumb.sram.elf -T sram.ld sram.o notmain.gcc.thumb.o
$(ARMGNU)-objdump -D notmain.gcc.thumb.sram.elf > notmain.gcc.thumb.sram.list
$(ARMGNU)-objcopy notmain.gcc.thumb.sram.elf notmain.gcc.thumb.sram.hex -O ihex
$(ARMGNU)-objcopy notmain.gcc.thumb.sram.elf notmain.gcc.thumb.sram.bin -O binary
You can also try/use this approach if you prefer. I have my reasons not to, TL;DW.
void dummy ( unsigned int );
#define STK_MASK 0x00FFFFFF
#define STK_CSR (*((volatile unsigned int *)0xE000E010))
#define STK_RVR (*((volatile unsigned int *)0xE000E014))
#define STK_CVR (*((volatile unsigned int *)0xE000E018))
int delay ( unsigned int n )
{
unsigned int ra;
while(n--)
{
while(1)
{
ra=STK_CSR;
if(ra&(1<<16)) break;
}
}
return(0);
}
int notmain ( void )
{
unsigned int rx;
STK_CSR=4;
STK_RVR=1000000-1;
STK_CVR=0x00000000;
STK_CSR=5;
for(rx=0;;rx++)
{
dummy(rx);
delay(50);
dummy(rx);
delay(50);
}
return(0);
}
Between the arm docs which to some extent ST publishes a derivative for you (not everyone does that you should still go to arm). Plus the st docs.
There is uart based bootloader built in (might be usb, etc), that is pretty easy to interface, lets see...my host code to download programs is in the hundreds of lines of code, probably took an evening or an afternoont to write. YMMV. You can get if you dont already have, one of the discovery or nucleo boards, I recommend those anyway, you can use the debug end of it to program other stm32 or even other non st arm chips (not all, depends on what openocd supports, etc, but some) can get those for 30% cheaper than the dedicated stlink usb dongles and you dont need an extension usb cable, etc, etc. YMMV. Can certainly use an stlink with openocd or texane stlink as you have already mentioned.
Due to the way the cortex-m boots I have provided two examples, one for burning to flash the other for downloading via openocd to ram and running that way, could arguably use the flash one too but you have to tweak the start address when you run. I prefer this method. YMMV.
This approach you are portable and completely unencumbered by HAL limitations or requirements, build environments, etc. But I recommend you try the various methods. Bare metal like this the HAL types of bare metal with one or more st solutions and the cmsis approach. Every year or so try again, see if the one you picked is still the one you like.
This example demonstrates though it does not take a whole lot. I picked the cortex-m0 simply to avoid the armv7m thumb2 extensions. thumb without those extensions is the most portable arm instruction set. so again the code does mostly nothing, but does nothing on any stm32 cortex-m with a systick timer.
EDIT
This along with whatever you need to feed the linker would be the minimal non-C code.
.global _start
_start:
.word 0x20001000
.word reset
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
And this is abbreviated depending on the chip vendor and core there can be dozens to hundreds of vectors for every little interrupt of every little thing. The labels reset and hang in this case would be the names of C functions to handle those vectors (the documentation for the chip and core determine what vector handles what). The first vector is always the initalization value of the stack pointer. The second is always reset, the next few are common, after that they are generic interrupt pins on the core that the chip vendor wires up so you have to look at the chip vendor documentation.
The core design is such that registers are preserved for you so you dont need a little bit of assembly. Going without any bootstrap then you assume to not have .bss zeroed nor .data initialized, and you cant return from the reset function, which in a real implementation you wouldnt but for demonstration tests, you might (blink an led 10 times then program is finished).
Your toolchain may have some other way to do this. Since all toolchains should have an assembler and assemblers can generate tables of words, there is always that option, doesnt really make sense to create yet another tool and language for this but some folks feel the need. Your toolchain may not require the entry point named _start and/or it may have a different entry point name requirement.
Even if you use Kiel, you should also try the gnu tools, easy(easier) to get, significantly more support and experience in the world than for Kiel. May not produce as "good" of code as Kiel, performance wise or other, but should always have that in your back pocket as you will always be able to find help with gnu tools.
http://gnuarmeclipse.github.io/
There you'll find everything, including an IDE (Eclipse), toolchain, debugger, headers.
Look at this package. This is IDE + toolchain + debugger and it available for linux platforms. You can research it and get any ideas to do what you want. I hope most of linux programs have commnad line interface.
In addition I can suggest to you: try to use LL api if it already available for your mcu.

ARM inline asm: exit system call with value read from memory

Problem
I want to execute the exit system call in ARM using inline assembly on a Linux Android device, and I want the exit value to be read from a location in memory.
Example
Without giving this extra argument, a macro for the call looks like:
#define ASM_EXIT() __asm__("mov %r0, #1\n\t" \
"mov %r7, #1\n\t" \
"swi #0")
This works well.
To accept an argument, I adjust it to:
#define ASM_EXIT(var) __asm__("mov %r0, %0\n\t" \
"mov %r7, #1\n\t" \
"swi #0" \
: \
: "r"(var))
and I call it using:
#define GET_STATUS() (*(int*)(some_address)) //gets an integer from an address
ASM_EXIT(GET_STATUS());
Error
invalid 'asm': operand number out of range
I can't explain why I get this error, as I use one input variable in the above snippet (%0/var). Also, I have tried with a regular variable, and still got the same error.
Extended-asm syntax requires writing %% to get a single % in the asm output. e.g. for x86:
asm("inc %eax") // bad: undeclared clobber
asm("inc %%eax" ::: "eax"); // safe but still useless :P
%r7 is treating r7 as an operand number. As commenters have pointed out, just omit the %s, because you don't need them for ARM, even with GNU as.
Unfortunately, there doesn't seem to be a way to request input operands in specific registers on ARM, the way you can for x86. (e.g. "a" constraint means eax specifically).
You can use register int var asm ("r7") to force a var to use a specific register, and then use an "r" constraint and assume it will be in that register. I'm not sure this is always safe, or a good idea, but it appears to work even after inlining. #Jeremy comments that this technique was recommended by the GCC team.
I did get some efficient code generated, which avoids wasting an instruction on a reg-reg move:
See it on the Godbolt Compiler Explorer:
__attribute__((noreturn)) static inline void ASM_EXIT(int status)
{
register int status_r0 asm ("r0") = status;
register int callno_r7 asm ("r7") = 1;
asm volatile("swi #0\n"
:
: "r" (status_r0), "r" (callno_r7)
: "memory" // any side-effects on shared memory need to be done before this, not delayed until after
);
// __builtin_unreachable(); // optionally let GCC know the inline asm doesn't "return"
}
#define GET_STATUS() (*(int*)(some_address)) //gets an integer from an address
void foo(void) { ASM_EXIT(12); }
push {r7} # # gcc is still saving r7 before use, even though it sees the "noreturn" and doesn't generate a return
movs r0, #12 # stat_r0,
movs r7, #1 # callno,
swi #0
# yes, it literally ends here, after the inlined noreturn
void bar(int status) { ASM_EXIT(status); }
push {r7} #
movs r7, #1 # callno,
swi #0 # doesn't touch r0: already there as bar()'s first arg.
Since you always want the value read from memory, you could use an "m" constraint and include a ldr in your inline asm. Then you wouldn't need the register int var asm("r0") trick to avoid a wasted mov for that operand.
The mov r7, #1 might not always be needed either, which is why I used the register asm() syntax for it, too. If gcc wants a 1 constant in a register somewhere else in a function, it can do it in r7 so it's already there for the ASM_EXIT.
Any time the first or last instructions of a GNU C inline asm statement are mov instructions, there's probably a way to remove them with better constraints.

Resources