QEMU hosting MTE enabled kernel does not raise fault - linux

I'm trying to compile and test ARMv8.5 MTE extensions on QEMU environment (running MTE-enabled kernel).
I try to raise a fault on QEMU that hosts a kernel with MTE enabled. I have a simple C code I run that should raise a fault because of MTE, but it runs just fine (attaching logs and info). I cross compile the code to arm64 on a x86 machine, with the relevant clang MTE-related flags.
compiling on 5.4.0-1040-gcp #43-Ubuntu SMP Fri Mar 19 17:49:48 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
running the executable on Linux lab 5.11.13 #1 SMP PREEMPT Sun Apr 11 11:30:52 UTC 2021 aarch64 GNU/Linux with CONFIG_ARM64_MTE=y
The code:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
int main()
{
printf("Hi %ld\n", (long)getpid());
int *array = (int*) malloc (sizeof(int) * 1);
array[32] = 1;
printf("here is the value: %d", array[32]);
return 0;
}
clang-11 flags:
clang-11 -target aarch64-linux-gnu -march=armv8+memtag -fsanitize=memtag main.c -static
QEMU version:
lab#qemu-mte:~$ qemu-system-aarch64 --version
QEMU emulator version 5.2.92
QEMU flags:
sudo /opt/qemu/build/qemu-system-aarch64 -machine virt,mte=on -cpu max -kernel Image -hda stretch.img -m 2G -display none -serial stdio -append "root=/dev/vda"
executable strace output:
root#lab:/# strace ./test
execve("./test", ["./test"], [/* 11 vars */]) = 0
brk(NULL) = 0x3ada7000
brk(0x3ada7f80) = 0x3ada7f80
uname({sysname="Linux", nodename="lab", ...}) = 0
readlinkat(AT_FDCWD, "/proc/self/exe", "/test", 4096) = 5
brk(0x3adc8f80) = 0x3adc8f80
brk(0x3adc9000) = 0x3adc9000
mprotect(0x489000, 4096, PROT_READ) = 0
getpid() = 235
fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(204, 64), ...}) = 0
ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
write(1, "Hi 235\n", 7Hi 235
) = 7
write(1, "here is the value: 1", 20here is the value: 1) = 20
exit_group(0) = ?
+++ exited with 0 +++
Am I missing something?
If any additional information is needed, please let me know.
Thanks.

Nothing in that strace log seems to be doing anything to enable MTE for the process, which seems suspicious. Try building and running the example MTE program given in the kernel documentation for MTE. That does MTE entirely "manually", i.e. without help from the compiler. If it works then you can start looking at the clang side of things; if it doesn't work then you need to check the QEMU and kernel.

For heap tagging, assuming you're using glibc for malloc, you probably need to enable the memory tagging tunable : see this : https://www.gnu.org/software/libc/manual/html_node/Memory-Related-Tunables.html
export GLIBC_TUNABLES=glibc.mem.tagging=1
You might actually need to compile glibc v2.35 with MTE support and then use it (https://www.gnu.org/software/libc/manual/html_node/Configuring-and-compiling.html says that memory tagging support is disabled by default)

Related

Difference in behavior when hooking a library function via LD_PRELOAD on Ubuntu and CentOS

There is a hook function socketHook.c that intercepts socket() calls:
#include <stdio.h>
int socket(int domain, int type, int protocol)
{
printf("socket() has been intercepted!\n");
return 0;
}
gcc -c -fPIC socketHook.c
gcc -shared -o socketHook.so socketHook.o
And a simple program getpwuid.c (1) that just invokes the getpwuid() function:
#include <pwd.h>
int main()
{
getpwuid(0);
return 0;
}
gcc getpwuid.c -o getpwuid
getpwuid() internally makes a socket() call.
On CentOS:
$ strace -e trace=socket ./getpwuid
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
socket(AF_UNIX, SOCK_STREAM, 0) = 4
On Ubuntu:
$ strace -e trace=socket ./getpwuid
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 5
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 5
When running (1), socket() is intercepted on CentOS, but not on Ubuntu.
CentOS. printf() from socketHook.c is present:
$ uname -a
Linux centos-stream 4.18.0-301.1.el8.x86_64 #1 SMP Tue Apr 13 16:24:22 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
$ LD_PRELOAD=$(pwd)/socketHook.so ./getpwuid
socket() has been intercepted!
Ubuntu(Xubuntu 20.04). printf() from socketHook.c is NOT present:
$ uname -a
Linux ibse-VirtualBox 5.8.0-50-generic #56~20.04.1-Ubuntu SMP Mon Apr 12 21:46:35 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
$ LD_PRELOAD=$(pwd)/socketHook.so ./getpwuid
$
So my question is:
What does it depend on? I think this is affected by the fact that socket() is not called directly from the executable, but from getpwuid(), which in turn is called, if I understand correctly, from libc.so
How to achieve the same behavior in CentOS as in Ubuntu? I don't want intercept indirect calls from libc
What does it depend on?
There are two questions to ask:
Which function actually calls the socket system call?
How is that function getting called.
You can see how the socket system call is invoked by running your program under GDB, and using catch syscall socket command. On Ubuntu:
(gdb) catch syscall socket
Catchpoint 1 (syscall 'socket' [41])
(gdb) run
Starting program: /tmp/a.out
Catchpoint 1 (call to syscall socket), 0x00007ffff7ed3477 in socket () at ../sysdeps/unix/syscall-template.S:120
120 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0 0x00007ffff7ed3477 in socket () at ../sysdeps/unix/syscall-template.S:120
#1 0x00007ffff7f08010 in open_socket (type=type#entry=GETFDPW, key=key#entry=0x7ffff7f612ca "passwd", keylen=keylen#entry=7) at nscd_helper.c:171
#2 0x00007ffff7f084fa in __nscd_get_mapping (type=type#entry=GETFDPW, key=key#entry=0x7ffff7f612ca "passwd", mappedp=mappedp#entry=0x7ffff7f980c8 <map_handle+8>) at nscd_helper.c:269
#3 0x00007ffff7f0894f in __nscd_get_map_ref (type=type#entry=GETFDPW, name=name#entry=0x7ffff7f612ca "passwd", mapptr=mapptr#entry=0x7ffff7f980c0 <map_handle>,
gc_cyclep=gc_cyclep#entry=0x7fffffffda0c) at nscd_helper.c:419
#4 0x00007ffff7f04fb7 in nscd_getpw_r (key=0x7fffffffdaa6 "0", keylen=2, type=type#entry=GETPWBYUID, resultbuf=resultbuf#entry=0x7ffff7f96520 <resbuf>,
buffer=buffer#entry=0x5555555592a0 "", buflen=buflen#entry=1024, result=0x7fffffffdb60) at nscd_getpw_r.c:93
#5 0x00007ffff7f05412 in __nscd_getpwuid_r (uid=uid#entry=0, resultbuf=resultbuf#entry=0x7ffff7f96520 <resbuf>, buffer=buffer#entry=0x5555555592a0 "", buflen=buflen#entry=1024,
result=result#entry=0x7fffffffdb60) at nscd_getpw_r.c:62
#6 0x00007ffff7e9e95d in __getpwuid_r (uid=uid#entry=0, resbuf=resbuf#entry=0x7ffff7f96520 <resbuf>, buffer=0x5555555592a0 "", buflen=buflen#entry=1024,
result=result#entry=0x7fffffffdb60) at ../nss/getXXbyYY_r.c:255
#7 0x00007ffff7e9dfd3 in getpwuid (uid=0) at ../nss/getXXbyYY.c:134
#8 0x0000555555555143 in main () at t.c:5
(gdb) info sym $pc
socket + 7 in section .text of /lib/x86_64-linux-gnu/libc.so.6
(gdb) up
#1 0x00007ffff7f08010 in open_socket (type=type#entry=GETFDPW, key=key#entry=0x7ffff7f612ca "passwd", keylen=keylen#entry=7) at nscd_helper.c:171
171 nscd_helper.c: No such file or directory.
(gdb) x/i $pc-5
0x7ffff7f0800b <open_socket+59>: callq 0x7ffff7ed3470 <socket>
From this we can see that
The function socket is called. Using nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep ' socket' we can confirm that that function is exported from libc.so.6, and thus should be interposable.
The caller does not call socket#plt (i.e. does not use the procedure linkage table), and so LD_PRELOAD will have no effect.
The call from open_socket() to socket() has been non-interposable since 2004, so it's likely that this call isn't intercepted on CentOS either, but some other call is. Probably the 3rd one in your strace output.
Using above method you should be able to tell where that call comes from.
I don't want intercept indirect calls from libc
In that case, LD_PRELOAD may be the wrong mechanism to use.
If you want to only intercept socket() calls from your own code, it's trivial to redirect them to e.g. mysocket() without any need for LD_PRELOAD.
You can do that at source level by adding e.g.
#define socket mysocket
to all your files, or using -Dsocket=mysocket argument at compile time.
Alternatively, using the linker --wrap=socket will do the redirection without recompiling.

A function declared with __attribute__((constructor)) is invoked more than once with LD_PRELOAD

Define a shared library as follows:
#include <unistd.h>
#include <stdio.h>
static void init(void) __attribute__((constructor));
static void init(void)
{
fprintf(stderr, "pid=%u\n", (unsigned) getpid());
}
Build using GCC 10 on an AMD64 machine:
gcc -shared -fPIC lib.c
Run a trivial process to validate:
$ LD_PRELOAD=`realpath a.out` ls
pid=15771
a.out lib.c
The line pid=15771 is printed by init() as expected.
Now, repeat with a complex process that spawns children and threads:
$ LD_PRELOAD=`realpath a.out` python3
pid=15835
pid=15835
pid=15835
pid=15835
pid=15839
pid=15844
pid=15835
pid=15835
pid=15846
pid=15846
pid=15847
pid=15847
pid=15849
pid=15849
pid=15851
pid=15852
pid=15853
pid=15853
pid=15856
pid=15857
pid=15857
pid=15858
pid=15858
pid=15861
pid=15862
pid=15862
pid=15865
pid=15868
pid=15835
Python 3.8.2 (default, Apr 19 2020, 18:33:14)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
Observe that there are repeated entries like pid=15835, which indicates that init() has been executed more than once for some of the processes.
Why?
Your Python installation executes some programs when the Python interpreter is launched. If a process image is replaced using execve, the process ID does not change, but constructors run for the new process image.
A simpler example looks like this:
$ LD_PRELOAD=`realpath a.out` bash -c 'exec /bin/true'
You can see more details by invoking strace:
$ strace -f -E LD_PRELOAD=`realpath a.out` -eexecve bash -c 'exec /bin/true'
execve("/usr/bin/bash", ["bash", "-c", "exec /bin/true"], 0x55b275d8b830 /* 29 vars */) = 0
pid=801315
execve("/bin/true", ["/bin/true"], 0x5564dedbeb80 /* 29 vars */) = 0
pid=801315
+++ exited with 0 +++

Why does system call fails?

I try to compile simple utility (kernel 3.4.67), which
all it does is trying to use system call very simply as following:
int main(void)
{
int rc;
printf("hello 1\n");
rc = system("echo hello again");
printf("system returned %d\n",rc);
rc = system("ls -l");
printf("system returned %d\n",rc);
return 0;
}
Yet, system call fails as you can see in the following log:
root#w812a_kk:/ # /sdcard/test
hello 1
system returned 32512
system returned 32512
I compile it as following:
arm-linux-gnueabihf-gcc -s -static -Wall -Wstrict-prototypes test.c -o test
That's really wierd becuase I used system in past in different linux and never had any issue with it.
I even tried another cross cpompiler but I get the same failure.
Version of kernel & cross compiler:
# busybox uname -a
Linux localhost 3.4.67 #1 SMP PREEMPT Wed Sep 28 18:18:33 CST 2016 armv7l GNU/Linux
arm-linux-gnueabihf-gcc --version
arm-linux-gnueabihf-gcc (crosstool-NG linaro-1.13.1-4.7-2013.03-20130313 - Linaro GCC 2013.03) 4.7.3 20130226 (prerelease)
EDIT:
root#w812a_kk:/ # echo hello again && echo $? && echo $0
hello again
0
tmp-mksh
root#w812a_kk:/ #
But I did find something interesting:
On calling test_expander() withing the main, it works OK. so I suspect that maybe system call try to find a binary which is not founded ?
int test_expander(void)
{
pid_t pid;
char *const parmList[] = {"/system/bin/busybox", "echo", "hello", NULL};
if ((pid = fork()) == -1)
perror("fork error");
else if (pid == 0) {
execv("/system/bin/busybox", parmList);
printf("Return not expected. Must be an execv error.n");
}
return 0;
}
Thank you for any idea.
Ran
The return value of system(), 32512 decimal, is 7F00 in hex.
This value is strangely similar to 0x7F, which is the result of system() if /bin/sh can not be executed. It seems there is some problem with byte ordering (big/little endian). Very strange.
Update: while writing the answer, you edited the question and pulled in something about /system/bin/busybox.
Probably you simply don't have /bin/sh.
I think I understand what happens
From system man page:
The system() library function uses fork(2) to create a child process
that executes the shell command specified in command using execl(3)
as follows:
execl("/bin/sh", "sh", "-c", command, (char *) 0);
But in my filesystem sh is founded only /system/bin , not in /bin
So I better just use execv instead. (I can't do static link becuase it's read-only filesystem)
Thanks,
Ran

Why is mmap done during printfs calls?

Why does printf() do an sys_mmap() and then copy the contents of string in chunks (of 1024) to new address space for sys_write() ?
Strace of simple static "hello" program is shown below.
> gcc -o hello -static hello.c
> strace ./hello
execve("./hello", ["./hello"], [/* 71 vars */]) = 0
uname({sys="Linux", node="Kumar", ...}) = 0
brk(0) = 0x1ce8000
brk(0x1ce91c0) = 0x1ce91c0
arch_prctl(ARCH_SET_FS, 0x1ce8880) = 0
readlink("/proc/self/exe", "/home/admin/hello", 4096) = 18
brk(0x1d0a1c0) = 0x1d0a1c0
brk(0x1d0b000) = 0x1d0b000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 28), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7feda2130000
write(1, "Hello", 5Hello) = 5
exit_group(0) = ?
+++ exited with 0 +++
Objdump of rodata
> objdump -s --start-address=0x4935a0 ./hello | head -5
./hello: file format elf64-x86-64
Contents of section .rodata:
4935a0 01000200 48656c6c 6f006c69 62632d73 ....Hello.libc-s
If we hook the address of sys_write() system call at kernel level, we see the address passed to it is of mmap-ed address region. Is it not just a waste of new address space, given that the string already exits in .rodata section in first loadable segment of binary. Has it got something to do with NO write permissions etc? Then why not make compiler put the string in .data section (which is writable as well) at first place?
UPDATE:
Mmap-ed address is indeed for sys_write() which can be verified in an easier way when we make the string bigger (say ~1500 chars). GDB will confirm the data address being printed [Note the second breakpoint]
(gdb) c
Continuing.
Hello World hhhhhhhhhhalhfafeuirafheuhrgiegieguehguergjkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqwwwwwwwwwwwwwwwwwwwwww pppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuiiiiiiiiiiiiiiiiiiiiiiiiiiiiiwqiuwqiuwiquwiqhchasnvjnavjanvjdanvjdanvjdanjfanvjaddijuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuquweuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuunnnnnnnnnnnnnnnnnnnnnnnnnnnzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz,,,,,,,,,,,,,,,,,,,,,,
Breakpoint 1, _IO_new_file_write (f=0x6b8300 <_IO_2_1_stdout_>, data=0x7ffff7ffc000, n=706) at fileops.c:1257
1257 {
Have you tried using a debugger?
$ gdb /tmp/hello
...
(gdb) b __mmap
Breakpoint 1 at 0x4152e0
(gdb) r
Starting program: /tmp/hello
Breakpoint 1, 0x00000000004152e0 in mmap64 ()
(gdb) bt
#0 0x00000000004152e0 in mmap64 ()
#1 0x000000000045d73c in _IO_file_doallocate ()
#2 0x0000000000401fec in _IO_doallocbuf ()
#3 0x000000000042ca10 in _IO_new_file_overflow ()
#4 0x000000000042be9d in _IO_new_file_xsputn ()
#5 0x000000000040111d in puts ()
#6 0x00000000004004de in main () at hello.c:4
(gdb) c
Continuing.
Hello, w
[Inferior 1 (process 4294) exited with code 011]
So it allocates memory for buffered input-output, which FILE* uses. Note that using printf with only constant string will cause puts call because GCC is smart enough. And puts(string) is actually an fputs(string, stdout) where stdout is FILE*.
Using raw write, however doesn't incur such behaviour:
#include <unistd.h>
int main() {
write(1, "Hello, w\n", sizeof("Hello, w\n"));
}

i cannot allocate 100KB with "fileuser - memlock unlimited" in /etc/security/limits.conf

I'm using Fedora release 17 (Beefy Miracle) in my lab, i trying to block 100KB of resident memory with mlock C function, the code is as follows.
#include <sys/mman.h>
int main(){
char *p;
mlock(p, 100000);
sleep(100);
}
When i compiled the code with gcc i saw the following error
gcc -o mymlock mymlock.c
strace -e mlock ./mlock
mlock(0x4c668ff4, 100000) = -1 ENOMEM (Cannot allocate memory)
Why do i get this error if i have "fileuser - memlock unlimited" in limits.conf?
my memory usage
[fileuser#Rossetti ~]$ free -m
total used free shared buffers cached
Mem: 2900 2674 226 0 58 957
-/+ buffers/cache: 1657 1242
Swap: 4927 146 4781
My C code was wrong, now it work
New Code
#include <sys/mman.h>
#include <limits.h>
int main(){
char *p = malloc(4096*1024);
mlock(p, (4096*1024));
sleep(100);
}

Resources