QEMU Semihosting not working when building QEMU from Source - linux

I'm emulating a Cortex-M33 using QEMU on a linux host. I've installed QEMU using sudo apt-get qemu-system-arm and semihosting is working fine (printf and file IO).
I'm calling QEMU as follows:
/usr/bin/qemu-system-arm -machine mps2-an505 -cpu cortex-m33 -m 16M -nographic --semihosting-config enable=on,target=native -kernel build/ARMCM33/kernel.elf -S -s
Now however I am building QEMU from source. Build steps:
git clone https://gitlab.com/qemu-project/qemu.git
Navigate to the cloned repo
./configure --target-list=arm-softmmu,arm-linux-user, from here
make
I can see the executable here: qemu/build/qemu-system-arm, however when I run
<path to repo>qemu/build/qemu-system-arm -machine mps2-an505 -cpu cortex-m33 -m 16M -nographic --semihosting-config enable=on,target=native -kernel build/ARMCM33/kernel.elf -S -s
the semihosting no longer works (the program no longer prints to the console).
I've looked through configure --help but cannot see anything obvious. Is there something I've missed?
Edit:
I think I now have a minimal example (kubuntu jammy)
CMSIS tag v5.6.0
QEMU v7.2.0 (built locally)
File:
microbit.s:
.cpu cortex-m33
.code 16
.equ SYS_WRITE0 , 0x04
.equ angel_SWIreason_ReportException, 0x18
.global _start
_start: mov r0, #SYS_WRITE0
ldr r1,=hello
bkpt 0xab
mov r0, #angel_SWIreason_ReportException
ldr r1,=ADP_Stopped_ApplicationExit
bkpt 0xab
.balign 4
hello: .asciz "Hello, World!\n"
ADP_Stopped_ApplicationExit: .word 0x20026
.end
gcc_arm.ld (taken from CMSIS_5/Device/ARM/ARMCM33/Source/GCC where the memory regions have been updated according to the an505):
/******************************************************************************
* #file gcc_arm.ld
* #brief GNU Linker Script for Cortex-M based device
* #version V2.2.0
* #date 16. December 2020
******************************************************************************/
/*
* Copyright (c) 2009-2020 Arm Limited. All rights reserved.
*
* SPDX-License-Identifier: Apache-2.0
*
* Licensed under the Apache License, Version 2.0 (the License); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an AS IS BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
*-------- <<< Use Configuration Wizard in Context Menu >>> -------------------
*/
/*---------------------- Flash Configuration ----------------------------------
<h> Flash Configuration
<o0> Flash Base Address <0x0-0xFFFFFFFF:8>
<o1> Flash Size (in Bytes) <0x0-0xFFFFFFFF:8>
</h>
-----------------------------------------------------------------------------*/
/* See https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwiY67e2meD8AhUKQ0EAHSwmBSIQFnoECCAQAQ&url=https%3A%2F%2Fdocumentation-service.arm.com%2Fstatic%2F5ed11469ca06a95ce53f8ed7%3Ftoken%3D&usg=AOvVaw0o2b4qMG6MiKjhd_STNKqR
*/
__ROM_BASE = 0x10000000;
__ROM_SIZE = 512K;
/*--------------------- Embedded RAM Configuration ----------------------------
<h> RAM Configuration
<o0> RAM Base Address <0x0-0xFFFFFFFF:8>
<o1> RAM Size (in Bytes) <0x0-0xFFFFFFFF:8>
</h>
-----------------------------------------------------------------------------*/
/* See https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwiY67e2meD8AhUKQ0EAHSwmBSIQFnoECCAQAQ&url=https%3A%2F%2Fdocumentation-service.arm.com%2Fstatic%2F5ed11469ca06a95ce53f8ed7%3Ftoken%3D&usg=AOvVaw0o2b4qMG6MiKjhd_STNKqR
*/
__RAM_BASE = 0x38000000;
__RAM_SIZE = 2048K;
/*--------------------- Stack / Heap Configuration ----------------------------
<h> Stack / Heap Configuration
<o0> Stack Size (in Bytes) <0x0-0xFFFFFFFF:8>
<o1> Heap Size (in Bytes) <0x0-0xFFFFFFFF:8>
</h>
-----------------------------------------------------------------------------*/
__STACK_SIZE = 0x00100000;
__HEAP_SIZE = 0x00000100;
/*
*-------------------- <<< end of configuration section >>> -------------------
*/
MEMORY
{
FLASH (rx) : ORIGIN = __ROM_BASE, LENGTH = __ROM_SIZE
RAM (rwx) : ORIGIN = __RAM_BASE, LENGTH = __RAM_SIZE
}
/* Linker script to place sections and symbol values. Should be used together
* with other linker script that defines memory regions FLASH and RAM.
* It references following symbols, which must be defined in code:
* Reset_Handler : Entry of reset handler
*
* It defines following symbols, which code can use without definition:
* __exidx_start
* __exidx_end
* __copy_table_start__
* __copy_table_end__
* __zero_table_start__
* __zero_table_end__
* __etext
* __data_start__
* __preinit_array_start
* __preinit_array_end
* __init_array_start
* __init_array_end
* __fini_array_start
* __fini_array_end
* __data_end__
* __bss_start__
* __bss_end__
* __end__
* end
* __HeapLimit
* __StackLimit
* __StackTop
* __stack
*/
ENTRY(Reset_Handler)
SECTIONS
{
.text :
{
KEEP(*(.vectors))
*(.text*)
KEEP(*(.init))
KEEP(*(.fini))
/* .ctors */
*crtbegin.o(.ctors)
*crtbegin?.o(.ctors)
*(EXCLUDE_FILE(*crtend?.o *crtend.o) .ctors)
*(SORT(.ctors.*))
*(.ctors)
/* .dtors */
*crtbegin.o(.dtors)
*crtbegin?.o(.dtors)
*(EXCLUDE_FILE(*crtend?.o *crtend.o) .dtors)
*(SORT(.dtors.*))
*(.dtors)
*(.rodata*)
KEEP(*(.eh_frame*))
} > FLASH
/*
* SG veneers:
* All SG veneers are placed in the special output section .gnu.sgstubs. Its start address
* must be set, either with the command line option --section-start or in a linker script,
* to indicate where to place these veneers in memory.
*/
/*
.gnu.sgstubs :
{
. = ALIGN(32);
} > FLASH
*/
.ARM.extab :
{
*(.ARM.extab* .gnu.linkonce.armextab.*)
} > FLASH
__exidx_start = .;
.ARM.exidx :
{
*(.ARM.exidx* .gnu.linkonce.armexidx.*)
} > FLASH
__exidx_end = .;
.copy.table :
{
. = ALIGN(4);
__copy_table_start__ = .;
LONG (__etext)
LONG (__data_start__)
LONG (__data_end__ - __data_start__)
/* Add each additional data section here */
/*
LONG (__etext2)
LONG (__data2_start__)
LONG (__data2_end__ - __data2_start__)
*/
__copy_table_end__ = .;
} > FLASH
.zero.table :
{
. = ALIGN(4);
__zero_table_start__ = .;
/* Add each additional bss section here */
/*
LONG (__bss2_start__)
LONG (__bss2_end__ - __bss2_start__)
*/
__zero_table_end__ = .;
} > FLASH
/**
* Location counter can end up 2byte aligned with narrow Thumb code but
* __etext is assumed by startup code to be the LMA of a section in RAM
* which must be 4byte aligned
*/
__etext = ALIGN (4);
.data : AT (__etext)
{
__data_start__ = .;
*(vtable)
*(.data)
*(.data.*)
. = ALIGN(4);
/* preinit data */
PROVIDE_HIDDEN (__preinit_array_start = .);
KEEP(*(.preinit_array))
PROVIDE_HIDDEN (__preinit_array_end = .);
. = ALIGN(4);
/* init data */
PROVIDE_HIDDEN (__init_array_start = .);
KEEP(*(SORT(.init_array.*)))
KEEP(*(.init_array))
PROVIDE_HIDDEN (__init_array_end = .);
. = ALIGN(4);
/* finit data */
PROVIDE_HIDDEN (__fini_array_start = .);
KEEP(*(SORT(.fini_array.*)))
KEEP(*(.fini_array))
PROVIDE_HIDDEN (__fini_array_end = .);
KEEP(*(.jcr*))
. = ALIGN(4);
/* All data end */
__data_end__ = .;
} > RAM
/*
* Secondary data section, optional
*
* Remember to add each additional data section
* to the .copy.table above to asure proper
* initialization during startup.
*/
/*
__etext2 = ALIGN (4);
.data2 : AT (__etext2)
{
. = ALIGN(4);
__data2_start__ = .;
*(.data2)
*(.data2.*)
. = ALIGN(4);
__data2_end__ = .;
} > RAM2
*/
.bss :
{
. = ALIGN(4);
__bss_start__ = .;
*(.bss)
*(.bss.*)
*(COMMON)
. = ALIGN(4);
__bss_end__ = .;
} > RAM AT > RAM
/*
* Secondary bss section, optional
*
* Remember to add each additional bss section
* to the .zero.table above to asure proper
* initialization during startup.
*/
/*
.bss2 :
{
. = ALIGN(4);
__bss2_start__ = .;
*(.bss2)
*(.bss2.*)
. = ALIGN(4);
__bss2_end__ = .;
} > RAM2 AT > RAM2
*/
.heap (COPY) :
{
. = ALIGN(8);
__end__ = .;
PROVIDE(end = .);
. = . + __HEAP_SIZE;
. = ALIGN(8);
__HeapLimit = .;
} > RAM
.stack (ORIGIN(RAM) + LENGTH(RAM) - __STACK_SIZE) (COPY) :
{
. = ALIGN(8);
__StackLimit = .;
. = . + __STACK_SIZE;
. = ALIGN(8);
__StackTop = .;
} > RAM
PROVIDE(__stack = __StackTop);
/* Check if data + heap + stack exceeds RAM limit */
ASSERT(__StackLimit >= __HeapLimit, "region RAM overflowed with stack")
}
Building:
gcc-arm-none-eabi-9-2020-q2-update/bin/arm-none-eabi-gcc -O0 -ggdb -mthumb -mcpu=cortex-m33 -nostartfiles -ffreestanding --specs=rdimon.specs -DARMCM33_DSP_FP -I<PATH_TO_CMSIS_5>/CMSIS/Core/Include -I<PATH_TO_CMSIS_5>/Device/ARM/ARMCM33/Include -L. -Wl,-T,gcc_arm.ld -o cortex_m33.elf <PATH_TO_CMSIS_5>/Device/ARM/ARMCM33/Source/GCC/startup_ARMCM33.S <PATH_TO_CMSIS_5>/Device/ARM/ARMCM33/Source/system_ARMCM33.c microbit.s
where I had to add the --specs=rdimon.specs and -DARMCM33_DSP_FP flags to get CMSIS to compile.
Another note is that gcc gives errors if I use -mtune=cortex-m33:
conflicting CPU architectures 17/2
Start system installed qemu:
qemu-system-arm --semihosting-config enable=on,target=native -m 16M -nographic -cpu cortex-m33 -machine mps2-an505 -kernel cortex_m33.elf
Hello, World!
Local build:
Documents/qemu/build/qemu-system-arm --semihosting-config enable=on,target=native -m 16M -nographic -cpu cortex-m33 -machine mps2-an505 -kernel cortex_m33.elf
where there is no output.

This is difficult to say if you missed something because we don't have all the source code and the linker script you are using.
I therefore cannot answer your specific question, but here is a procedure that worked for me on debian bullseye, ubuntu focal and ubuntu jammy for building qemu-system-arm 7.2.0.
But the first step would be to make sure you are using version v7.2.0, that is revision b67b00e6b4 of the source code. You did not mention the extra git command required for getting this revision, it would be step 1.1):
git checkout v7.2.0
Note: switching to 'v7.2.0'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
HEAD is now at b67b00e6b4 Update VERSION for v7.2.0
Retrieving/building qemu (using wget):
cd /tmp
wget https://download.qemu.org/qemu-7.2.0.tar.xz
tar Jxf qemu-7.2.0.tar.xz
mkdir qemu
cd qemu
../qemu-7.2.0/configure --target-list=arm-softmmu,arm-linux-user --prefix=/tmp/qemu --extra-cflags=-I/tmp/qemu-7.2.0/packages/include --extra-ldflags=-L/tmp/qemu-7.2.0/packages/lib --enable-slirp
make install
microbit.s, a minimal semihosting program for the microbit machine:
.cpu cortex-m0
.code 16
.equ SYS_WRITE0 , 0x04
.equ angel_SWIreason_ReportException, 0x18
.global _start
_start: mov r0, #SYS_WRITE0
ldr r1,=hello
bkpt 0xab
mov r0, #angel_SWIreason_ReportException
ldr r1,=ADP_Stopped_ApplicationExit
bkpt 0xab
.balign 4
hello: .asciz "Hello, World!\n"
ADP_Stopped_ApplicationExit: .word 0x20026
.end
microbit.ld:
/*
*-------- <<< Use Configuration Wizard in Context Menu >>> -------------------
*/
/*---------------------- Flash Configuration ----------------------------------
<h> Flash Configuration
<o0> Flash Base Address <0x0-0xFFFFFFFF:8>
<o1> Flash Size (in Bytes) <0x0-0xFFFFFFFF:8>
</h>
-----------------------------------------------------------------------------*/
__ROM_BASE = 0x00000000;
__ROM_SIZE = 0x00020000;
/*--------------------- Embedded RAM Configuration ----------------------------
<h> RAM Configuration
<o0> RAM Base Address <0x0-0xFFFFFFFF:8>
<o1> RAM Size (in Bytes) <0x0-0xFFFFFFFF:8>
</h>
-----------------------------------------------------------------------------*/
__RAM_BASE = 0x20000000;
__RAM_SIZE = 0x00004000;
/*--------------------- Stack / Heap Configuration ----------------------------
<h> Stack / Heap Configuration
<o0> Stack Size (in Bytes) <0x0-0xFFFFFFFF:8>
<o1> Heap Size (in Bytes) <0x0-0xFFFFFFFF:8>
</h>
-----------------------------------------------------------------------------*/
__STACK_SIZE = 0x00000400;
__HEAP_SIZE = 0x00000C00;
/*
*-------------------- <<< end of configuration section >>> -------------------
*/
INCLUDE gcc_arm32.ld
gcc_arm32.ld:
/******************************************************************************
* #file gcc_arm32.ld
* #brief GNU Linker Script for Cortex-M based device
* #version V2.0.0
* #date 21. May 2019
******************************************************************************/
/*
* Copyright (c) 2009-2019 Arm Limited. All rights reserved.
*
* SPDX-License-Identifier: Apache-2.0
*
* Licensed under the Apache License, Version 2.0 (the License); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an AS IS BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
MEMORY
{
FLASH (rx) : ORIGIN = __ROM_BASE, LENGTH = __ROM_SIZE
RAM (rwx) : ORIGIN = __RAM_BASE, LENGTH = __RAM_SIZE
}
/* Linker script to place sections and symbol values. Should be used together
* with other linker script that defines memory regions FLASH and RAM.
* It references following symbols, which must be defined in code:
* Reset_Handler : Entry of reset handler
*
* It defines following symbols, which code can use without definition:
* __exidx_start
* __exidx_end
* __copy_table_start__
* __copy_table_end__
* __zero_table_start__
* __zero_table_end__
* __etext
* __data_start__
* __preinit_array_start
* __preinit_array_end
* __init_array_start
* __init_array_end
* __fini_array_start
* __fini_array_end
* __data_end__
* __bss_start__
* __bss_end__
* __end__
* end
* __HeapLimit
* __StackLimit
* __StackTop
* __stack
*/
ENTRY(Reset_Handler)
SECTIONS
{
.text :
{
KEEP(*(.vectors))
*(.text*)
KEEP(*(.init))
KEEP(*(.fini))
/* .ctors */
*crtbegin.o(.ctors)
*crtbegin?.o(.ctors)
*(EXCLUDE_FILE(*crtend?.o *crtend.o) .ctors)
*(SORT(.ctors.*))
*(.ctors)
/* .dtors */
*crtbegin.o(.dtors)
*crtbegin?.o(.dtors)
*(EXCLUDE_FILE(*crtend?.o *crtend.o) .dtors)
*(SORT(.dtors.*))
*(.dtors)
*(.rodata*)
KEEP(*(.eh_frame*))
} > FLASH
/*
* SG veneers:
* All SG veneers are placed in the special output section .gnu.sgstubs. Its start address
* must be set, either with the command line option ‘--section-start’ or in a linker script,
* to indicate where to place these veneers in memory.
*/
/*
.gnu.sgstubs :
{
. = ALIGN(32);
} > FLASH
*/
.ARM.extab :
{
*(.ARM.extab* .gnu.linkonce.armextab.*)
} > FLASH
__exidx_start = .;
.ARM.exidx :
{
*(.ARM.exidx* .gnu.linkonce.armexidx.*)
} > FLASH
__exidx_end = .;
.copy.table :
{
. = ALIGN(4);
__copy_table_start__ = .;
LONG (__etext)
LONG (__data_start__)
LONG (__data_end__ - __data_start__)
/* Add each additional data section here */
/*
LONG (__etext2)
LONG (__data2_start__)
LONG (__data2_end__ - __data2_start__)
*/
__copy_table_end__ = .;
} > FLASH
.zero.table :
{
. = ALIGN(4);
__zero_table_start__ = .;
/* Add each additional bss section here */
/*
LONG (__bss2_start__)
LONG (__bss2_end__ - __bss2_start__)
*/
__zero_table_end__ = .;
} > FLASH
/**
* Location counter can end up 2byte aligned with narrow Thumb code but
* __etext is assumed by startup code to be the LMA of a section in RAM
* which must be 4byte aligned
*/
__etext = ALIGN (4);
.data : AT (__etext)
{
__data_start__ = .;
*(vtable)
*(.data)
*(.data.*)
. = ALIGN(4);
/* preinit data */
PROVIDE_HIDDEN (__preinit_array_start = .);
KEEP(*(.preinit_array))
PROVIDE_HIDDEN (__preinit_array_end = .);
. = ALIGN(4);
/* init data */
PROVIDE_HIDDEN (__init_array_start = .);
KEEP(*(SORT(.init_array.*)))
KEEP(*(.init_array))
PROVIDE_HIDDEN (__init_array_end = .);
. = ALIGN(4);
/* finit data */
PROVIDE_HIDDEN (__fini_array_start = .);
KEEP(*(SORT(.fini_array.*)))
KEEP(*(.fini_array))
PROVIDE_HIDDEN (__fini_array_end = .);
KEEP(*(.jcr*))
. = ALIGN(4);
/* All data end */
__data_end__ = .;
} > RAM
/*
* Secondary data section, optional
*
* Remember to add each additional data section
* to the .copy.table above to asure proper
* initialization during startup.
*/
/*
__etext2 = ALIGN (4);
.data2 : AT (__etext2)
{
. = ALIGN(4);
__data2_start__ = .;
*(.data2)
*(.data2.*)
. = ALIGN(4);
__data2_end__ = .;
} > RAM2
*/
.bss :
{
. = ALIGN(4);
__bss_start__ = .;
*(.bss)
*(.bss.*)
*(COMMON)
. = ALIGN(4);
__bss_end__ = .;
} > RAM AT > RAM
/*
* Secondary bss section, optional
*
* Remember to add each additional bss section
* to the .zero.table above to asure proper
* initialization during startup.
*/
/*
.bss2 :
{
. = ALIGN(4);
__bss2_start__ = .;
*(.bss2)
*(.bss2.*)
. = ALIGN(4);
__bss2_end__ = .;
} > RAM2 AT > RAM2
*/
.heap (COPY) :
{
. = ALIGN(8);
__end__ = .;
PROVIDE(end = .);
. = . + __HEAP_SIZE;
. = ALIGN(8);
__HeapLimit = .;
} > RAM
.stack (ORIGIN(RAM) + LENGTH(RAM) - __STACK_SIZE) (COPY) :
{
. = ALIGN(8);
__StackLimit = .;
. = . + __STACK_SIZE;
. = ALIGN(8);
__StackTop = .;
} > RAM
PROVIDE(__stack = __StackTop);
/* Check if data + heap + stack exceeds RAM limit */
ASSERT(__StackLimit >= __HeapLimit, "region RAM overflowed with stack")
}
Building microbit.elf using CMSIS 5.6.0 - you can retrieve it here.
rm -f microbit.elf *.o microbit.lst
/opt/arm/9/gcc-arm-none-eabi-9-2020-q2-update/bin/arm-none-eabi-gcc -O0 -ggdb -mthumb -mtune=cortex-m0 -nostdlib -nostartfiles -ffreestanding -I/opt/arm/ARM.CMSIS.5.6.0//CMSIS/Include -I/opt/arm/ARM.CMSIS.5.6.0//Device/ARM/ARMCM0/Include -L. -Wl,-T,microbit.ld -o microbit.elf /opt/arm/ARM.CMSIS.5.6.0//Device/ARM/ARMCM0/Source/startup_ARMCM0.c /opt/arm/ARM.CMSIS.5.6.0//Device/ARM/ARMCM0/Source/system_ARMCM0.c microbit.s
/opt/arm/9/gcc-arm-none-eabi-9-2020-q2-update/bin/arm-none-eabi-objdump -d microbit.elf > microbit.lst
Executing microbit.elf:
/tmp/qemu/bin/qemu-system-arm --semihosting-config enable=on,target=native -m 16M -nographic -cpu cortex-m0 -machine microbit -kernel microbit.elf
Hello, World!
I tested with this project after having implemented some changes.
First, modify the Makefile according the following diff output:
diff --git a/Makefile b/Makefile
index 505b967..698bf71 100644
--- a/Makefile
+++ b/Makefile
## -5,12 +5,13 ## BINARY_ALL = image_s_ns.elf
MACHINE_NAME := mps2-an505
-CMSIS_PATH ?= ./CMSIS_5
-QEMU_PATH ?= ./qemu/build/arm-softmmu/qemu-system-arm
-TOOLCHAIN_PATH ?= ./gcc-arm-none-eabi-8-2019-q3-update/bin
+CMSIS_PATH ?= /opt/arm/ARM.CMSIS.5.9.0
+QEMU_PATH ?= /opt/qemu-7.2.0/bin/qemu-system-arm
+TOOLCHAIN_PATH ?= /opt/arm/11/gcc-arm-11.2-2022.02-x86_64-arm-none-eabi/bin
CROSS_COMPILE = $(TOOLCHAIN_PATH)/arm-none-eabi-
CC = $(CROSS_COMPILE)gcc
+AS = $(CROSS_COMPILE)as
LD = $(CROSS_COMPILE)ld
GDB = $(CROSS_COMPILE)gdb
OBJ = $(CROSS_COMPILE)objdump
## -116,6 +117,7 ## run: $(BINARY_S) $(BINARY_NS)
-m 16M \
-nographic \
-semihosting \
+ --semihosting-config enable=on,target=native \
-d int,cpu_reset \
-device loader,file=$(BINARY_NS) \
-device loader,file=$(BINARY_S)
## -127,6 +129,7 ## gdbserver: $(BINARY_S) $(BINARY_NS)
-m 16M \
-nographic \
-semihosting \
+ --semihosting-config enable=on,target=native \
-device loader,file=$(BINARY_NS) \
-device loader,file=$(BINARY_S) \
-d int,cpu_reset \
Second, delete non_secure/main_ns.c:
rm non_secure/main_ns.c
Third, create non_secure/main_ns.s with the following content:
.syntax unified
.cpu cortex-m33
.code 16
.equ SYS_WRITE0 , 0x04
.equ angel_SWIreason_ReportException, 0x18
.global main
main: mov r0, #SYS_WRITE0
ldr r1,=hello
bkpt 0xab
done: wfi
b done
.balign 4
hello: .asciz "Hello, World!\n"
.end
Then build and execute:
make run
You should see a lot of messages displayed by the various initialization code, then the message displayed by using the semihosting services:
Taking exception 16 [Semihosting call] on CPU 0
...handling as semihosting call 0x4
Hello, World!
I would say that this does demonstrate that semihosting works with a QEMU 7.2.0 compiled from its source code using the procedure above, and the problem may reside in your code as suggested by Peter Maydell:
it might also be a bug in your program which the older version of QEMU
just didn't happen to trigger.
It may work differently though that with an armv7-m core because of the specific armv8-m security features - I am not familiar with armv8-m - since it seems that the semihosting call made in non-secure mode is being intercepted, then honored in secure mode.
But this should probably be the topic for another question once you will have studied the example code, what I have to do myself.

Related

How is the address of the text section of a PIE executable determined in Linux?

First I tried to reverse engineer it a bit:
printf '
#include <stdio.h>
int main() {
puts("hello world");
}
' > main.c
gcc -std=c99 -pie -fpie -ggdb3 -o pie main.c
echo 2 | sudo tee /proc/sys/kernel/randomize_va_space
readelf -s ./pie | grep -E 'main$'
gdb -batch -nh \
-ex 'set disable-randomization off' \
-ex 'start' -ex 'info line' \
-ex 'start' -ex 'info line' \
-ex 'set disable-randomization on' \
-ex 'start' -ex 'info line' \
-ex 'start' -ex 'info line' \
./pie \
;
Output:
64: 000000000000063a 23 FUNC GLOBAL DEFAULT 14 main
Temporary breakpoint 1, main () at main.c:4
4 puts("hello world");
Line 4 of "main.c" starts at address 0x5575f5fd263e <main+4> and ends at 0x5575f5fd264f <main+21>.
Temporary breakpoint 2 at 0x5575f5fd263e: file main.c, line 4.
Temporary breakpoint 2, main () at main.c:4
4 puts("hello world");
Line 4 of "main.c" starts at address 0x55e3fbc9363e <main+4> and ends at 0x55e3fbc9364f <main+21>.
Temporary breakpoint 3 at 0x55e3fbc9363e: file main.c, line 4.
Temporary breakpoint 3, main () at main.c:4
4 puts("hello world");
Line 4 of "main.c" starts at address 0x55555555463e <main+4> and ends at 0x55555555464f <main+21>.
Temporary breakpoint 4 at 0x55555555463e: file main.c, line 4.
Temporary breakpoint 4, main () at main.c:4
4 puts("hello world");
Line 4 of "main.c" starts at address 0x55555555463e <main+4> and ends at 0x55555555464f <main+21>.
which indicates that it is 0x555555554000 + random offset + 63e.
But then I tried to grep the Linux kernel and glibc source code for 555555554 and there were no hits.
Which part of which code calculates that address?
I came across this while answering: What is the -fPIE option for position-independent executables in gcc and ld?
Some Internet search of 0x555555554000 gives hints: there were problems with ThreadSanitizer https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual
Q: When I run the program, it says: FATAL: ThreadSanitizer can not mmap the shadow memory (something is mapped at 0x555555554000 < 0x7cf000000000). What to do? You need to enable ASLR:
$ echo 2 >/proc/sys/kernel/randomize_va_space
This may be fixed in future kernels, see https://bugzilla.kernel.org/show_bug.cgi?id=66721
...
$ gdb -ex 'set disable-randomization off' --args ./a.out
and https://lwn.net/Articles/730120/ "Stable kernel updates." Posted Aug 7, 2017 20:40 UTC (Mon) by hmh (subscriber) https://marc.info/?t=150213704600001&r=1&w=2
(https://patchwork.kernel.org/patch/9886105/, commit c715b72c1ba4)
Moving the x86_64 and arm64 PIE base from 0x555555554000 to 0x000100000000
broke AddressSanitizer. This is a partial revert of:
commit eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE") (https://patchwork.kernel.org/patch/9807325/ https://lkml.org/lkml/2017/6/21/560)
commit 02445990a96e ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB") (https://patchwork.kernel.org/patch/9807319/)
Reverted code was:
b/arch/arm64/include/asm/elf.h
/*
* This is the base location for PIE (ET_DYN with INTERP) loads. On
- * 64-bit, this is raised to 4GB to leave the entire 32-bit address
+ * 64-bit, this is above 4GB to leave the entire 32-bit address * space open for things that want to use the area for 32-bit pointers. */
-#define ELF_ET_DYN_BASE 0x100000000UL
+#define ELF_ET_DYN_BASE (2 * TASK_SIZE_64 / 3)
+++ b/arch/x86/include/asm/elf.h
/*
* This is the base location for PIE (ET_DYN with INTERP) loads. On
- * 64-bit, this is raised to 4GB to leave the entire 32-bit address
+ * 64-bit, this is above 4GB to leave the entire 32-bit address
* space open for things that want to use the area for 32-bit pointers.
*/
#define ELF_ET_DYN_BASE (mmap_is_ia32() ? 0x000400000UL : \
- 0x100000000UL)
+ (TASK_SIZE / 3 * 2))
So, 0x555555554000 is related with ELF_ET_DYN_BASE macro (referenced in fs/binfmt_elf.c for ET_DYN as not randomized load_bias) and for x86_64 and arm64 it is like 2/3 of TASK_SIZE. When there is no CONFIG_X86_32, x86_64 has TASK_SIZE of 2^47 - one page in arch/x86/include/asm/processor.h
/*
* User space process size. 47bits minus one guard page. The guard
* page is necessary on Intel CPUs: if a SYSCALL instruction is at
* the highest possible canonical userspace address, then that
* syscall will enter the kernel with a non-canonical return
* address, and SYSRET will explode dangerously. We avoid this
* particular problem by preventing anything from being mapped
* at the maximum canonical address.
*/
#define TASK_SIZE_MAX ((1UL << 47) - PAGE_SIZE)
Older versions:
/*
* User space process size. 47bits minus one guard page.
*/
#define TASK_SIZE_MAX ((1UL << 47) - PAGE_SIZE)
Newer versions also have support of 5level with __VIRTUAL_MASK_SHIFT of 56 bit - v4.17/source/arch/x86/include/asm/processor.h (but don't want to use it before enabled by user + commit b569bab78d8d ".. Not all user space is ready to handle wide addresses")).
So, 0x555555554000 is rounded down (by load_bias = ELF_PAGESTART(load_bias - vaddr);, vaddr is zero) from the formula (2^47-1page)*(2/3) (or 2^56 for larger systems):
$ echo 'obase=16; (2^47-4096)/3*2'| bc -q
555555554AAA
$ echo 'obase=16; (2^56-4096)/3*2'| bc -q
AAAAAAAAAAA000
Some history of 2/3 * TASK_SIZE:
commit 9b1bbf6ea9b2 "use ELF_ET_DYN_BASE only for PIE" has usefull comments: "The ELF_ET_DYN_BASE position was originally intended to keep loaders away from ET_EXEC binaries ..."
Don't overflow 32bits with 2*TASK_SIZE "[uml-user] [PATCH] x86, UML: fix integer overflow in ELF_ET_DYN_BASE", 2015 and "ARM: 8320/1: fix integer overflow in ELF_ET_DYN_BASE", 2015:
Almost all arches define ELF_ET_DYN_BASE as 2/3 of TASK_SIZE. Though
it seems that some architectures do this in a wrong way. The problem
is that 2*TASK_SIZE may overflow 32-bits so the real ELF_ET_DYN_BASE
becomes wrong. Fix this overflow by dividing TASK_SIZE prior to
multiplying: (TASK_SIZE / 3 * 2)
Same in 4.x, 3.y, 2.6.z, (where is davej-history git repo? archive and at or.cz) 2.4.z, ... added in 2.1.54 of 06-Sep-1997
diff --git a/include/asm-i386/elf.h b/include/asm-i386/elf.h
+/* This is the location that an ET_DYN program is loaded if exec'ed. Typical
+ use of this is to invoke "./ld.so someprog" to test out a new version of
+ the loader. We need to make sure that it is out of the way of the program
+ that it will "exec", and that there is sufficient room for the brk. */
+
+#define ELF_ET_DYN_BASE (2 * TASK_SIZE / 3)

How do I check if a file has a certain mode in Rust?

I'd expect this to work:
use std::fs::OpenOptions;
use std::os::unix::fs::{OpenOptionsExt, PermissionsExt};
const MODE: u32 = 0o700;
fn main() {
let f = OpenOptions::new()
.write(true)
.create_new(true)
.mode(MODE)
.open("myfile")
.unwrap();
let f_mode = f.metadata().unwrap().permissions().mode();
assert_eq!(f_mode, MODE);
}
When run, I get:
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `33216`,
right: `448`', src/main.rs:14:5
If I check the output of ls:
$ ls -al myfile
-rwx------ 1 edd edd 0 Apr 26 14:50 myfile
Clearly there's some other information encoded in the mode field once it gets committed to the file-system.
Is there a good way to check if the file is -rwx------ besides using bitwise operators on underlying the octal representation (masking off the irrelevant parts)?
If you are going to use the low-level primitives of OS-specific permissions, you need to deal with those details:
#define S_IFMT 0170000 /* type of file */
#define S_IFIFO 0010000 /* named pipe (fifo) */
#define S_IFCHR 0020000 /* character special */
#define S_IFDIR 0040000 /* directory */
#define S_IFBLK 0060000 /* block special */
#define S_IFREG 0100000 /* regular */
#define S_IFLNK 0120000 /* symbolic link */
#define S_IFSOCK 0140000 /* socket */
#define S_IFWHT 0160000 /* whiteout */
#define S_ISUID 0004000 /* set user id on execution */
#define S_ISGID 0002000 /* set group id on execution */
#define S_ISVTX 0001000 /* save swapped text even after use */
#define S_IRUSR 0000400 /* read permission, owner */
#define S_IWUSR 0000200 /* write permission, owner */
#define S_IXUSR 0000100 /* execute/search permission, owner */
When you get the mode, you also get information on what kind of file it is. Here, you have S_IFREG | S_IRUSR | S_IWUSR | S_IXUSR.
Doing a bitwise AND is the simplest fix:
assert_eq!(f_mode & 0o777, MODE);
Of course, you can create your own accessor functions in an extension trait and implement them to have nice meaning, or there may be a crate which has already done so.

How to understand such sample in GNU ld manual about linker script?

I am learning the GNU linker ld script sample about memory region alias.
I see the following ld script snippet:
SECTIONS
{
.text :
{
*(.text)
} > REGION_TEXT
.rodata :
{
*(.rodata)
rodata_end = .;
} > REGION_RODATA <=========== PLACE 1
.data : AT (rodata_end) <=========== PLACE 2
{
data_start = .;
*(.data)
} > REGION_DATA <=========== PLACE 3
data_size = SIZEOF(.data);
data_load_start = LOADADDR(.data);
.bss :
{
*(.bss)
} > REGION_BSS
}
One possible system memory region layout given in the sample is like this (C in that sample):
MEMORY
{
ROM : ORIGIN = 0, LENGTH = 2M /*0M ~ 2M*/
ROM2 : ORIGIN = 0x10000000, LENGTH = 1M /*256M ~ 257M*/
RAM : ORIGIN = 0x20000000, LENGTH = 1M /*512M ~ 513M*/
}
REGION_ALIAS("REGION_TEXT", ROM); /*0M ~ 2M*/
REGION_ALIAS("REGION_RODATA", ROM2); /*256M ~ 257M*/
REGION_ALIAS("REGION_DATA", RAM); /*512M ~ 513M*/
REGION_ALIAS("REGION_BSS", RAM); /*512M ~ 513M*/
So,
PLACE 1 says .rodata MUST go into REGION_RODATA, that is 256M~257M
PLACE 2 says the .data section MUST be placed immediately after the .rodata section. So .data section MUST start from at most 257M.
But PLACE 3 says the .data section MUST goes into the REGION_DATA region. So .data section MUST start from at least 512M.
So how could it be possible?
The key concepts to understand this example are those of Virtual Memory Address (VMA) and Load Memory Address (LMA).
The GNU Linker official documentation defines those two terms as follows.
Every loadable or allocatable output section has two addresses. The
first is the VMA, or virtual memory address. This is the address the
section will have when the output file is run. The second is the LMA,
or load memory address. This is the address at which the section will
be loaded.
In the example, for all output sections but .data, the VMA and LMA addresses are the same. For section .data the LMA is specified by AT (rodata_end) while the VMA address is the first available address of the REGION_DATA memory region.
With this in mind, we can read again the example and see that it leads to the situation represented below.
ROM (alias REGION_TEXT)
+---------+------------------------------+
| .text | |
+---------+------------------------------+
ROM2 (alias REGION_RODATA)
+-----------+---------+--------+
| .rodata | .data | |
+-----------+---------+--------+
RAM (alias REGION_DATA)
+---------+--------+-----------+
| .data | .bss | |
+---------+--------+-----------+
The .data section appears twice: once in ROM2 and once in RAM. It is put at its load address (LMA) when loaded; subsequently it is moved to its virtual address before running the program.
By the way, this is why, a few line later in the documentation you mentioned, we can read that
It is possible to write a common system initialization routine to copy
the .data section from ROM or ROM2 into the RAM if necessary.

Update linker variables after --gc-sections

I wrote a small binary in cortex-a9 board, and defined a linker script like this:
SECTIONS
{
.text :
{
__text = . ;
*(.vector)
*(.text)
*(.text.*)
}
.rodata :
{
*(.rodata)
*(.rodata.*)
}
.data : {
__data_start = . ;
*(.data)
*(.data.*)
}
. = ALIGN(4);
__bss_start = . ;
.bss :
{
*(.bss)
*(.bss.*)
*(COMMON)
. = ALIGN(4);
}
__bss_end = .;
. = ALIGN(4);
__heap_start = .;
. = . + 0x1000;
. = ALIGN(4);
__heap_end = .;
_end = . ;
PROVIDE (end = .) ;
}
But it seems after --gc-sections worked and removed unused sections, the __heap_start still the value before --gc-sections get workked (I print it in code and check the ld flags):
arm-linux-gnueabihf-gcc -mcpu=cortex-a7 -msoft-float -nostdlib
-Wl,--gc-sections -Wl,--print-gc-sections -Wl,-Ttext,0x04000000 -T csrvisor.lds -Wl,-Map,binary.map
Anyone knows how to change the __heap_start to correct value after --gc-sections removed unused sections?
Check your compiler flags: Do they really contain -ffunction-sections -fdata-sections?
The heap normally (and in your case as well) starts right after the .bss section. So as for the start of the heap your linker script looks fine
Check if the linker really removes unused variables - if it only removes unused text sections, the value for __heap_start won't change.
Code, read-only data, initialized data et. al. normally go into the flash. If something is garbage-collected there, it won't affect your heap.
Data (initialized and uninitialized) will (eventually) turn up in the RAM. If something is garbage-collected there, it will affect your heap. So check if you really have variables which are removed by the garbage collection.
As for your linker script
There is no KEEP statement. Normally something like a reset handler, main et. al. must not be removed by the linker garbage collection
Your data section does not define the handling of initial values.
Your linker script does not contain region declarations (MEMORY). Check which defaults apply
Your sections do not have a target region: Again check which defaults apply in your case.
Examples with target regions:
.rodata :
{
*(.rodata)
*(.rodata.*)
} >rom
.data : {
__data_start = . ;
*(.data)
*(.data.*)
} >ram

How do I get an equivalent of /dev/one in Linux

You can use
dd if=/dev/zero of=file count=1024 bs=1024
to zero fill a file.
Instead of that I want to one fill a file. How do I do that?
There is no /dev/one file, so how can I simulate that effect via on bash shell?
tr '\0' '\377' < /dev/zero | dd bs=64K of=/dev/sdx
This should be much faster. Choose your blocksizes (or add counts) like you need at. Writing ones to a SSD-Disk till full with a blocksize of 99M gave me 350M/s write performance.
Try this:
dd if=<(yes $'\01' | tr -d "\n") of=file count=1024 bs=1024
Substitute $'\377' or $'\xFF' if you want all the bits to be ones.
MacOS tr may complain about "Illegal byte sequence". Setting LC_CTYPE=C will prevent that. This version can also be used in Linux:
dd if=<(yes $'\01' | LC_CTYPE=C tr -d "\n") of=file count=1024 bs=1024
Well, you could do this:
dd if=/dev/zero count=1024 bs=1024 |
tr '\000' '\001' > file
pv /dev/zero |tr \\000 \\377 >targetfile
...where \377 is the octal representation of 255 (a byte with all bits set to one). Why tr only works with octal numbers, I don't know -- but be careful not to subconsciously translate this to 3FF.
The syntax for using tr is error prone. I recommend verifying that it is making the desired translation...
cat /dev/zero |tr \\000 \\377 |hexdump -C
Note: pv is a nice utility that replaces cat and adds a progress/rate display.
I created a device driver in my github. Installing it creates a file /dev/one that is writing only bits set to 1.
The c file called one.c (the only interesting part is in device_file_read):
// File Driver to create a devince /dev/one like the /dev/zero
#include <linux/init.h>
#include <linux/module.h>
#include <linux/fs.h>
#include <linux/uaccess.h>
MODULE_LICENSE("GPL");
static int device_file_major_number = 0;
static const char device_name[] = "one";
static ssize_t device_file_read(
struct file *file_ptr,
char __user *user_buffer,
size_t count,
loff_t *position) {
printk( KERN_NOTICE "One: Device file is read at offset = %i, read bytes count = %u\n" , (int)*position , (unsigned int)count );
// Allocate Kernel buffer
char* ptr = (char*) vmalloc(count);
// Fill it with one, byte per byte
// -- Note that byte is the smallest accesible data unit
memset(ptr, 0xFF, count);
char res = copy_to_user(user_buffer, ptr, count);
if (res != 0){ return -EFAULT; }
// Return number of byte read
return count;
}
static struct file_operations simple_driver_fops = {
.owner = THIS_MODULE,
.read = device_file_read,
};
int register_device(void) {
int res = 0;
printk( KERN_NOTICE "One: register_device() is called.\n" );
res = register_chrdev( 0, device_name, &simple_driver_fops );
if( res < 0 ) {
printk( KERN_WARNING "One: can\'t register character device with error code = %i\n", res );
return res;
}
device_file_major_number = res;
printk( KERN_NOTICE "One: registered character device with major number = %i and minor numbers 0...255\n", device_file_major_number );
return 0;
}
void unregister_device(void) {
printk( KERN_NOTICE "One: unregister_device() is called\n" );
if(device_file_major_number != 0) {
unregister_chrdev(device_file_major_number, device_name);
}
}
static int my_init(void) {
register_device();
return 0;
}
static void my_exit(void) {
unregister_device();
return;
}
// Declare register and unregister command
module_init(my_init);
module_exit(my_exit);
The Makefile
TARGET_MODULE:=one
BUILDSYSTEM_DIR:=/lib/modules/$(shell uname -r)/build
PWD:=$(shell pwd)
obj-m := $(TARGET_MODULE).o
# See: https://stackoverflow.com/questions/15910064/how-to-compile-a-linux-kernel-module-using-std-gnu99
ccflags-y := -std=gnu99 -Wno-declaration-after-statement
build:
# run kernel build system to make module
$(MAKE) -C $(BUILDSYSTEM_DIR) M=$(PWD) modules
clean:
# run kernel build system to cleanup in current directory
$(MAKE) -C $(BUILDSYSTEM_DIR) M=$(PWD) clean
rm -f MOK.priv MOK*.der
key:
echo "Creating key"
openssl req -new -x509 -newkey rsa:2048 -days 36500 -keyout MOK.priv -outform DER -out MOK.der -nodes -subj "/CN=TinmarinoUnsafe/"
#
echo "\e[31;1mPlease enter a password you will be asked for on reboot:\e[0m"
mokutil --import MOK.der
echo "\e[31;1mNow you must: 1/ reboot, 2/ Select Unroll MOK, 3/ Enter password you previously gave\e[0m"
sign:
cp one.ko one.ko.bck
/usr/src/linux-headers-$(shell uname -r)/scripts/sign-file sha256 MOK.priv MOK.der one.ko
load:
insmod ./$(TARGET_MODULE).ko
unload:
rmmod ./$(TARGET_MODULE).ko
create:
mknod /dev/one c $(shell cat /proc/devices | grep one$ | cut -d ' ' -f1) 0
delete:
rm /dev/one
test:
[ "$(shell xxd -p -l 10 /dev/one)" = "ffffffffffffffffffff" ] \
&& echo "\e[32mSUCCESS\e[0m" \
|| echo "\e[31mFAILED\e[0m"
The instalation is long (3min) due to the driver signature enforcement. Froget this part if you disabled it in your UEFI.
git clone https://github.com/tinmarino/dev_one.git DevOne && cd DevOne # Download
make build # Compile
make key # Generate key for signing
sudo make sign # Sign driver module to permit MOK enforcement (security)
sudo reboot now # Reboot and enable Mok
A blue screen (MOK manager) will appear
Choose "Enroll MOK"
Choose "Continue"
Choose "Yes" (when asked "Enroll the key")
Enter the password you gave at make sign
Choose "Reboot" (again)
sudo make load # Load
sudo make device # Create /dev/one
make test # Test if all is ok
You can simulate a /dev/one without a special device, with a FIFO + yes:
mkfifo ddfifo
dd if=ddfifo of=<file> iflag=fullblock count=1024 bs=1024 status=progress & yes "" | tr '\n' '\1' > ddfifo
tee may be used to double the throughput:
mkfifo ddfifo
dd if=ddfifo of=<file> iflag=fullblock count=1024 bs=1024 status=progress & yes "" | tr '\n' '\1' | tee ddfifo > ddfifo
If you'd like bytes with all bits set to one, swap '\1' for '\377'.

Resources