RISC-V RV32I soft float lib calls MUL and MULHU instructions in __muldf3 - riscv

I'm using current riscv-tools to build a firmware image for the PicoRV32 core. The firmware requires floating point, so I'm using -msoft-float. This are the compiler/linker options I am using:
-Os -m32 -march=RV32I -msoft-float -ffreestanding -nostdlib -lgcc
in this configuration, __muldf3 is provided by (according to the linkers -Map output):
/opt/riscv/lib/gcc/riscv64-unknown-elf/4.9.2/soft-float/32/libgcc.a(dp-bit.o)
But this code is not compatible with the RV32I ISA: It uses the MUL and MULHU instructions!
How do I get soft-float for the plain RV32I ISA? Do I need to compile my own version of libgcc.a? Are there instructions somewhere on how to do this?

As you've noticed, the "-march=" flag only affects the current translation unit, not the libraries which get generated at toolchain build time.
Although there exist "disable-atomics"/"disable-float" configuration flags for building the toolchain, there is no multilib option for multiply/divide because they don't affect the ABI; the assumption is that the execution environment can emulate those instructions.
To the last point, the latest Privileged ISA v1.7 is designed such that you can run mul/div code and then trap into the machine-mode to emulate the mul/div instructions (you can even trap into M-mode while running in M-mode!). You'd have to provide your own mul trap handler in M-mode (probably located in your own crt0 file and linked in at compile time).
I instead recommend you try the "--with-arch" flag. A recent patch supports the --with-arch flag, so it's possible to build a gcc that by default won't generate multiply/divide. This will prevent libgcc from containing those instructions. You can try adding --with-arch=RV32I to the gcc configure line (to do so, you'll have to modify Makefile.in in the riscv-gnu-toolchain).

Related

Linux kernel on ARM Cortex-M: how to build proper executables

I need to build a complete linux development framework for a Cortex-M MCU, specifically a STM32F7 Cortex-M7. First I've to explain some background info so please bear with me.
I've downloaded and built a gcc toolchain with croostool-ng 1.24 specifying an armv7e-m architecture with thumb-only instructions and linux 4.20 as the OS and that I want the output to be FLAT executables (I assumed it will mean bFLT).
Then I proceeded to compile the linux kernel (version 4.20) using configs/stm32_defconf, and then a statically compiled busybox rootfs, all using my new toolchain.
Kernel booted just fine but throw me an error and kernel painc with the following message:
Starting init: /sbin/init exists but couldn't execute it (error -8)
and
request_module: modprobe binfmt-464c cannot be processed, kmod busy with 50 threads
The interesting part is the last message. My busybox excutable turned out to be an .ELF! Cortex-M has no MMU, so it's imposible to build a linux kernel on a MMU-less architecture with .ELF support, that's why an (464c)"LF" binary loader can't be found, there is none.
So at last, my question is:
how could I build bFLT executables to run on MMU-less Linux architectures? My toolchain has elf2flt, but in crosstool-ng I've already specified a MMU-less architecture and FLAT binary and I was expecting direct bFLT output, not a "useless" executable. Is that even possible?
Or better: is there anywhere a documented standard procedure to build a complete, working Linux system based on Cortex-M?
Follow-up:
I gave up on building FLAT binaries and tried FDPIC executables. Another dead end. AFAIK:
Linux has long been supporting ELF FDPIC, but the ABI for ARM is pretty new.
It seems that still at this day and age, GCC has not a standard way to enable FDPIC. On some architectures you can use -mfdpic. Not on arm, don't know why. I even don't know if ARM FDPIC is supported at all by mainline GCC. Info is extremely scarce if inexistent.
It seems crosstool-ng 1.24 is BROKEN at building ARM ELF FDPIC support. Resulting gcc has not -mfdpic, and -fPIC generates ARM executables, not ARM FDPIC.
Any insight will be very appreciated.
you can generate FDPIC ELF files just with a prebuilt arm-linux-gnueabi-gcc compiler.
Specifications of an FDPIC ELF file:
Position independent executable/code (i.e. -fPIE and fPIC)
Should be compiled as a shared executable (ET_DYN ELF) to be position independent
use these flags to compile your programs:
arm-linux-gnueabi-gcc -shared -fPIE -fPIC <YOUR PROGRAM.C> -o <OUTPUT FILE>
I've compiled busybox successfully for STM32H7 with this method.
As I know, unfortunately FDPIC ELFs should be compiled with - shared flag so, they use shared libraries and cannot be compiled as -static ELF.
For more information take a look at this file:
https://github.com/torvalds/linux/blob/master/fs/binfmt_elf_fdpic.c
Track the crosstool-ng BFLAT issue from here:
https://github.com/crosstool-ng/crosstool-ng/issues/1399

Can gcc produce binary for Arm without cross compilation

Can we configure gcc running on intel x64 architecture to produce binary for ARM chip by just passing some flags to gcc and not using a cross compiler.
Short: Nope
Compiler:
gcc is not a native crosscompiler, the target architecture has to be specified at the time you compile gcc. (Some exceptions apply, as for example x86 and x86_64 can be supported at the same time)
clang would be a native crosscompiler, and you can generate code for arm by passing -target=arm-linux-gnu, but you still cant produce binaries, as you need a linker and a C-library too. Means you can run clang -target=arm-linux-gnu -c <your file> and compile C/C++ Code (will likely need to point it to your C/C++ include paths) - but you cant build binaries.
Rest of the toolchain:
You need a fitting linker and toolchain too, both are specific to the architecture and OS you want to run at.
Possible solutions:
Get a fitting toolchain, or compile your own. For arm linux you have for ex. CrossToolchains if you are on debian, for barebones you can get a crosscompiler from codesourcery.
Since you were very vague, its not possible to give you a clearer answer

gcc 4.x not supporting x87 FPU math?

I've been trying to compile gcc 4.x from the sources using --with-fpmath=387 but I'm getting this error: "Invalid --with-fpmath=387". I looked in the configs and found that it doesn't support this option (even though docs still mention it as a possible option):
case ${with_fpmath} in
avx)
tm_file="${tm_file} i386/avxmath.h"
;;
sse)
tm_file="${tm_file} i386/ssemath.h"
;;
*)
echo "Invalid --with-fpmath=$with_fpmath" 1>&2
exit 1
Basically, I started this whole thing because I need to supply an executable for an old target platform (in fact, it's an old Celeron but without any SSE2 instructions that are apparently used by libstdc++ by DEFAULT). The executable crashes at the first instruction (movq XMM0,...) coming from copying routines in libstdc++ with an "Illegal instruction" message.
Is there any way to resolve this? I need to be on a fairly recent g++ to be able to port my existing code base.
I was wondering if it's possible to supply these headers/sources from an older build to enable support for regular x87 instructions, so that no SSE instructions are referenced?
UPDATE: Please note I'm talking about compiled libstdc++ having SSE2 instructions in the object code, so the question is not about gcc command line arguments. No matter what I'm supplying to gcc when compiling my code, it will link with libstdc++ that already has built-in SSE2 instructions.
The real answer is not to use ANY --with-fpmath switches when compiling GCC. I got confused by the configure script switch statement thinking that it only supports sse or avx, while, in fact, the default value (not mentioned in this switch is "387"). So make sure you don't use --with-fpmath when running configure. I recompiled GCC without it and it now works fine.
Thanks.
The argument to tell gcc to produce code for a specific target is -march=CPU where CPU is the particular cpu you want. For an old celeron, you probably want -march=pentium2 or -march=pentium3
To control the fp codegen separately, newer versions of gcc use -mfpmath= -- in your case, you want -mfpmath=387.
All of these and many others are covered in the gcc documentation
edit
In order to use those flags for building libraries (such as libstdc++) that you'll later link in to programs, you need to configure the build for the library to use the appropriate flags. libstdc++ gets built as part of the g++ build, so you'll need to do a custom build -- you can use configure CXXFLAGS=-mfpmath=387 to set extra flags to use while building things.
Please note the question was about compiled libstdc++ having SSE2 instructions in the object code, so the question was not about gcc command line arguments. No matter what I'm supplying to gcc when compiling my code, it will link with libstdc++ that already has built-in SSE2 instructions.
The real answer is not to use ANY --with-fpmath switches when compiling GCC. I got confused by the configure script switch statement thinking that it only supports sse or avx, while, in fact, the default value (not mentioned in this switch is "387"). So make sure you don't use --with-fpmath when running configure. I recompiled GCC without it and it now works fine.

How do I compile and link a 32-bit Windows executable using mingw-w64

I am using Ubuntu 13.04 and installed mingw-w64 using apt-get install mingw-w64. I can compile and link a working 64-bit version of my program with the following command:
x86_64-w64-mingw32-g++ code.cpp -o app.exe
Which generates a 64-bit app.exe file.
What binary or command line flags do I use to generate a 32-bit version of app.exe?
That depends on which variant of toolchain you're currently using. Both DWARF and SEH variants (which come starting from GCC 4.8.0) are only single-target. You can see it yourself by inspecting the directory structure of their distributions, i.e. they contain only the libraries with either 64- or 32-bit addressing, but not both. On the other hand, plain old SJLJ distributions are indeed dual-target, and in order to build 32-bit target, just supply -m32 flag. If that doesn't work, then just build with i686-w64-mingw32-g++.
BONUS
By the way, the three corresponding dynamic-link libraries (DLLs) implementing each GCC exception model are
libgcc_s_dw2-1.dll (DWARF);
libgcc_s_seh-1.dll (SEH);
libgcc_s_sjlj-1.dll (SJLJ).
Hence, to find out what exception model does your current MinGW-w64 distribution exactly provide, you can either
inspect directory and file structure of MinGW-w64 installation in hope to locate one of those DLLs (typically in bin); or
build some real or test C++ code involving exception handling to force linkage with one of those DLLs and then see on which one of those DLLs does the built target depend (for example, can be seen with Dependency Walker on Windows); or
take brute force approach and compile some test code to assembly (instead of machine code) and look for presence of references like ___gxx_personality_v* (DWARF), ___gxx_personality_seh* (SEH), ___gxx_personality_sj* (SJLJ); see Obtaining current GCC exception model.

Using software floating point on x86 linux

Is it (easily) possible to use software floating point on i386 linux without incurring the expense of trapping into the kernel on each call? I've tried -msoft-float, but it seems the normal (ubuntu) C libraries don't have a FP library included:
$ gcc -m32 -msoft-float -lm -o test test.c
/tmp/cc8RXn8F.o: In function `main':
test.c:(.text+0x39): undefined reference to `__muldf3'
collect2: ld returned 1 exit status
It is surprising that gcc doesn't support this natively as the code is clearly available in the source within a directory called soft-fp. It's possible to compile that library manually:
$ svn co svn://gcc.gnu.org/svn/gcc/trunk/libgcc/ libgcc
$ cd libgcc/soft-fp/
$ gcc -c -O2 -msoft-float -m32 -I../config/arm/ -I.. *.c
$ ar -crv libsoft-fp.a *.o
There are a few c files which don't compile due to errors but the majority does compile. After copying libsoft-fp.a into the directory with our source files they now compile fine with -msoft-float:
$ gcc -g -m32 -msoft-float test.c -lsoft-fp -L.
A quick inspection using
$ objdump -D --disassembler-options=intel a.out | less
shows that as expected no x87 floating point instructions are called and the code runs considerably slower as well, by a factor of 8 in my example which uses lots of division.
Note: I would've preferred to compile the soft-float library with
$ gcc -c -O2 -msoft-float -m32 -I../config/i386/ -I.. *.c
but that results in loads of error messages like
adddf3.c: In function '__adddf3':
adddf3.c:46: error: unknown register name 'st(1)' in 'asm'
Seems like the i386 version is not well maintained as st(1) points to one of the x87 registers which are obviously not available when using -msoft-float.
Strangely or luckily the arm version compiles fine on an i386 and seems to work just fine.
Unless you want to bootstrap your entire toolchain by hand, you could start with uclibc toolchain (the i386 version, I imagine) -- soft float is (AFAIK) not directly supported for "native" compilation on debian and derivatives, but it can be used via the "embedded" approach of the uclibc toolchain.
GCC does not support this without some extra libraries. From the 386 documentation:
-msoft-float Generate output containing library calls for floating
point. Warning: the requisite
libraries are not part of GCC.
Normally the facilities of the
machine's usual C compiler are used,
but this can't be done directly in
cross-compilation. You must make your
own arrangements to provide suitable
library functions for
cross-compilation.
On machines where a function returns
floating point results in the 80387
register stack, some floating point
opcodes may be emitted even if
-msoft-float is used
Also, you cannot set -mfpmath=unit to "none", it has to be sse, 387 or both.
However, according to this gnu wiki page, there is fp-soft and ieee. There is also SoftFloat.
(For ARM there is -mfloat-abi=softfp, but it does not seem like something similar is available for 386 SX).
It does not seem like tcc supports software floating point numbers either.
Good luck finding a library that works for you.
G'day,
Unless you're targetting a platform that doesn't have inbuilt FP support, I can't think of a reason why you'd want to emulate FP support.
Doesn't your x386 platform have external FPU support? Pity it's not a x486 with the FPU built in!
In my experience, any soft emulation is bound to be much slower than its hardware equivalent.
That's why I finished up writing a package in Ada to taget the onboard 68k FPU instead of using the soft emulation provided by the compiler manufacturer at the time. They finished up bundling it in their compiler as a matter of fact.
Edit: Just seen your comment below. Hmmm, if you don't need a full suite of FP support is it possible to roll your own for the few math functions you do need? That how the Ada package I mentioned got started.
HTH
cheers,

Resources