Dectecting CPU feature support (Eg sse2, fma4 etc) - linux

I have some code that depends on CPU and OS support for various CPU features.
In particular I need to check for various SIMD instruction set support.
Namely sse2, avx, avx2, fma4, and neon.
(neon being the ARM SIMD feature. I'm less interested in that; given less ARM end-users.)
What I am doing right now is:
function cpu_flags()
if is_linux()
cpuinfo = readstring(`cat /proc/cpuinfo`);
cpu_flag_string = match(r"flags\t\t: (.*)", cpuinfo).captures[1]
elseif is_apple()
sysinfo = readstring(`sysctl -a`);
cpu_flag_string = match(r"machdep.cpu.features: (.*)", cpuinfo).captures[1]
else
#assert is_windows()
warn("CPU Feature detection does not work on windows.")
cpu_flag_string = ""
end
split(lowercase(cpu_flag_string))
end
This has two downsides:
It doesn't work on windows
I'm just not sure it is correct; it it? Or does it screw up, if for example the OS has a feature disabled, but physically the CPU supports it?
So my questions is:
How can I make this work on windows.
Is this correct, or even a OK way to go about getting this information?
This is part of a build script (with BinDeps.jl); so I need a solution that doesn't involve opening a GUI.
And ideally one that doesn't add a 3rd party dependency.
Extracting the information from GCC somehow would work, since I already require GCC to compile some shared libraries. (choosing which libraries, is what this code to detect the instruction set is for)

I'm just not sure it is correct; it it? Or does it screw up, if for example the OS has a feature disabled, but physically the CPU supports it?
I don't think that the OS has any say in disabling vector instructions; I've seen the BIOS being able to disable stuff (in particular, the virtualization extensions), but in that case you won't find them even in /proc/cpuinfo - that's kind of its point :-) .
Extracting the information from GCC somehow would work, since I already require GCC to compile some shared libraries
If you always have gcc (MinGW on Windows) you can use __builtin_cpu_supports:
#include <stdio.h>
int main()
{
if (__builtin_cpu_supports("mmx")) {
printf("\nI got MMX !\n");
} else
printf("\nWhat ? MMX ? What is that ?\n");
return (0);
}
and apparently this built-in functions work under mingw-w64 too.
AFAIK it uses the CPUID instruction to extract the relevant information (so it should reflect quite well the environment your code will run in).
(from https://stackoverflow.com/a/17759098/214671)

Related

Can we convert elf from a cpu architecture to another, in linux? [duplicate]

How I can run x86 binaries (for example .exe file) on arm?As I see on Wikipedia,I need to convert binary data for the emulated platform into binary data suitable for execution on the targeted platform.but question is:How I can do it?I need to open file in hex editor and change?Or something else?
To successfully do this, you'd have to do two things.. one relatively easy, one very hard. Neither of which you want to do by hand in a hex editor.
Convert the machine code from x86 to ARM. This is the easy one, because you should be able to map each x86 opcode to one or more ARM opcodes. There are different ways to do this, some more efficient than others, but it can be done with a pretty straightforward mapping.
Remap function calls (and other jumps). This one is hard, because monkeying with the opcodes is going to change all the offsets for the jump and return points. If you have dynamically linked libraries (.so), and we assume that all the libraries are available at exactly the same version in both places (a sketchy assumption at best), you'd have to remap the loads.
It's essentially a machine->machine compiler and linker.
So, can you do it? Sure.
Is it easy? No.
There may be a commercial tool out there, but I'm not aware of it.
You can not do this with a binary;note1 here binary means an object with no symbol information like an elf file. Even with an elf file, this is difficult to impossible. The issue is determining code from data. If you resolve this issue, then you can make de-compilers and other tools.
Even if you haven an elf file, a compiler will insert constants used in the code in the text segment. You have to look at many op-codes and do a reverse basic block to figure out where a function starts and ends.
A better mechanism is to emulate the x86 on the ARM. Here, you can use JIT technology to do the translation as encountered, but you approximately double code space. Also, the code will execute horribly. The ARM has 16 registers and the x86 is register starved (usually it has hidden registers). A compilers big job is to allocate these registers. QEMU is one technology that does this. I am unsure if it goes in the x86 to ARM direction; and it will have a tough job as noted.
Note1: The x86 has an asymmetric op-code sizing. In order to recognize a function prologue and epilogue, you would have to scan an image multiple times. To do this, I think the problem would be something like O(n!) where n is the bytes of the image, and then you might have trouble with in-line assembler and library routines coded in assembler. It maybe possible, but it is extremely hard.
To run an ARM executable on an X86 machine all you need is qemu-user.
Example:
you have busybox compiled for AARCH64 architecture (ARM64) and you want to run it on an X86_64 linux system:
Assuming a static compile, this runs arm64 code on x86 system:
$ qemu-aarch64-static ./busybox
And this runs X86 code on ARM system:
$ qemu-x86_64-static ./busybox
What I am curioous is if there is a way to embed both in a single program.
read x86 binary file as utf-8,then copy from ELF to last character�.Then go to arm binary and delete as you copy with x86.Then copy x86 in clip-board to the head.i tried and it's working.

"Illegal instruction" when running ARM code targeting my CPU

I'm compiling a rather large project for ARM. I'm using an AT91SAM9G25-EK as a devboard running a Debian ARM image. All libraries and executables in the image seem to be compiled for the armv4t instruction set.
My CPU is an ARM926EJ-S, which should run armv5tej code.
I'm using GCC to cross compile for my board. My CXX flags look like the following:
set(CMAKE_CXX_FLAGS "--signed-char --sysroot=${SYSROOT} -mcpu=arm926ej-s -mtune=arm926ej-s -mfloat-abi=softfp" CACHE STRING "" FORCE)
If I try to run this on my board, I get an Illegal Instruction signal (SIGILL) during initialization of one of my dependencies (using armv4t).
If I enable thumb mode (-mthumb -mthumb-interwork) it works, but uses Thumb for all the code, which in my case runs slower (I'm doing some serious number crunching).
In this case, if I specify one function to be compiled for ARM mode (using __attribute__((target("arm")))) it will run fine until that function is called, then exit with SIGILL.
I'm lost. Is it that bad I'm linking against libraries using armv4t? Am I misunderstanding the way ARM modes work? Is it something in the linux kernel?
What softfp means is to use using the soft-float calling convention between functions, but still use the hardware FPU within them. Assuming your cross-compiler is configured with a default -mfpu option other than "none" (run arm-whatever-gcc -v and look for --with-fpu= to check), then you've got a problem, because as far as I can see from the Atmel datasheet, SAM9G25 doesn't have an FPU.
My first instinct would be to slap GDB on there, catch the signal and disassemble the offending instruction to make sure, but the fact that Thumb code works OK is already a giveaway (Thumb before ARMv6T2 doesn't include any coprocessor instructions, and thus can't make use of an FPU).
In short, use -mfloat-abi=soft to ensure the ARM code actually uses software floating-point and avoids poking a non-existent FPU. And if the "serious number crunching" involves a lot of floating-point, perhaps consider getting a different MCU...

Ada + Fixed Stringss + Stm32 Arm

How to create a simple function returns a string on a arm platform?
procedure Main is
function tst_func return String is
begin
return "string";
end tst_func;
str : String := tst_func; -- <-- Doesnt work, runtime error.
-- Adacore gpl compiller, crossdev, arm elf hosted of win os.
-- Hardware is smt32f407 discovery board.
begin
...
The problem is a bug in the runtime system: if your program doesn’t involve any tasking, the environment task’s secondary stack isn’t set up properly, so when your function tries to return a string it thinks the secondary stack has been exhausted and raises Storage_Error.
I have reported this to AdaCore: their recommendation was to include
delay until Ada.Real_Time.Clock;
in your main program.
The bug will likely be resolved in the next GNAT GPL release.
The issue here seems to be that using Ada on small embedded CPUs like the STm32 (ARM Cortex) or the Actel AVR or TI MSP430 often involves compromises, because the platform may not be capable of running a full Ada RTS (Runtime System) including things like tasking.
Instead, a minimal RTS may be supplied, with restrictions specified by pragmas, that doesn't support tasking, or in this case, features requiring the secondary stack. Funnily enough, the RTS for the AVR does include the files s-secsta.ads,.adb which implement package System.Secondary_Stack so the much more powerful STm32 ought to be capable of it. You could look at the RTS sources supplied with the Adacore GPL package to see if these files are present or not.
So - options.
1) Work around, either using fixed length strings, or a table of string constants, or returning an access String (i.e. pointer) to a string allocated on the heap (don't forget to free it!) though heap use is not normally recommended for embedded programming.
2) Find a better RTS. You can compile and link against a different RTS by supplying -RTS=... arguments to the compiler. Here is a thread discussing alternative RTS strategies for this CPU.

how come an x64 OS can run a code compiled for x86 machine

Basically, what I wonder is how come an x86-64 OS can run a code compiled for x86 machine. I know when first x64 Systems has been introduced, this wasn't a feature of any of them. After that, they somehow managed to do this.
Note that I know that x86 assembly language is a subset of x86-64 assembly language and ISA's is designed in such a way that they can support backward compatibility. But what confuses me here is stack calling conventions. These conventions differ a lot depending on the architecture. For example, in x86, in order to backup frame pointer, proceses pushes where it points to stack(RAM) and pops after it is done. On the other hand, in x86-64, processes doesn't need to update frame pointer at all since all the references is given via stack pointer. And secondly, While in x86 architecture arguments to functions is passed by stack in x86-64, registers are used for that purpose.
Maybe this differences between stack calling conventions of x86-64 and x64 architecture may not affect the way program stack grows as long as different conventions are not used at the same time and this is mostly the case because x32 functions are called by other x32's and same for x64. But, at one point, a function (probably a system function) will call a function whose code is compiled for a x86-64 machine with some arguments, at this point, I am curious about how OS(or some other control unit) handle to get this function work.
Thanks in advance.
Part of the way that the i386/x86-64 architecture is designed is that the CS and other segment registers refer to entries in the GDT. The GDT entries have a few special bits besides the base and limit that describe the operating mode and privilege level of the current running task.
If the CS register refers to a 32-bit code segment, the processor will run in what is essentially i386 compatibility mode. Likewise 64-bit code requires a 64-bit code segment.
So, putting this all together.
When the OS wants to run a 32-bit task, during the task switch into it, it loads a value into CS which refers to a 32-bit code segment. Interrupt handlers also have segment registers associated with them, so when a system call occurs or an interrupt occurs, the handler will switch back to the OS's 64-bit code segment, (allowing the 64-bit OS code to run correctly) and the OS then can do its work and continue scheduling new tasks.
As a follow up with regards to calling convention. Neither i386 or x86-64 require the use of frame pointers. The code is free to do as it pleases. In fact, many compilers (gcc, clang, VS) offer the ability to compile 32-bit code without frame pointers. What is important is that the calling convention is implemented consistently. If all the code expects arguments to be passed on the stack, that's fine, but the called code better agree with that. Likewise, passing via registers is fine too, just everyone has to agree (at least at the library interface level, internal functions can generally do as they please).
Beyond that, just keep in mind that the difference between the two isn't really an issue because every process gets its own private view of memory. A side consequence though is that 32-bit apps can't load 64-bit dlls, and 64-bit apps can't load 32-bit dlls, because a process either has a 32-bit code segment or a 64-bit code segment. It can't be both.
The processor in put into legacy mode, but that requires everything executing at that time to be 32bit code. This switching is handled by the OS.
Windows : It uses WoW64. WoW64 is responsible for changing the processor mode, it also provides the compatible dll and registry functions.
Linux : Until recently Linux used to (like windows) shift to running the processor in legacy mode when ever it started executing 32bit code, you needed all the 32bit glibc libraries installed, and it would break if it tried to work together with 64bit code. Now there are implementing the X32 ABI which should make everything run like smoother and allow 32bit applications to access x64 feature like increased no. of registers. See this article on the x32 abi
PS : I am not very certain on the details of things, but it should give you a start.
Also, this answer combined with Evan Teran's answer probably give a rough picture of everything that is happening.

compiling linux kernel with non-gcc

Linux kernel is written for compiling with gcc and uses a lot of small and ugly gcc-hacks.
Which compilers can compile linux kernel except gcc?
The one, which can, is the Intel Compiler. What minimal version of it is needed for kernel compiling?
There also was a Tiny C compiler, but it was able to compile only reduced and specially edited version of the kernel.
Is there other compilers capable of building kernel?
An outdatet information: you need to patch the kernel in order to compile using the Intel CC
Download Linux kernel patch for Intel® Compiler
See also Is it possible to compile Linux kernel with something other than gcc for further links and information
On of the most recent sources :http://forums.fedoraforum.org/showthread.php?p=1328718
There is ongoing process of committing LLVMLinux patches into vanilla kernel (2013-2014).
The LLVMLinux is project by The Linux Foundation: http://llvm.linuxfoundation.org/ to enable vanilla kernel to be built with LLVM. Lot of patches are prepared by Behan Webster, who is LLVMLinux project lead.
There is LWN article about the project from May 2013
https://lwn.net/Articles/549203/ "LFCS: The LLVMLinux project"
Current status of LLVMLinux project is tracked at page http://llvm.linuxfoundation.org/index.php/Bugs#Linux_Kernel_Issues
Things (basically gcc-isms) already eliminated from kernel:
* Expicit Registers Variables (non-C99)
* VLAIS (non C99-compliant undocumented GCC feature "Variable length arrays in structs") like struct S { int array[N];} or even struct S { int array[N]; int array_usb_gadget[M]; } where N and M are non-constant function argument
* Nested Functions (Ada feature ported into C by GCC/Gnat developers; not allowed in C99)
* Some gcc/gas magic like special segments, or macro
Things to be done:
* Usage of __builtin_constant_p builtin to implement scary magic like BUILD_BUG_ON(!__builtin_constant_p(offset));
The good news about LLVMLinux are that after its patches kernel not only becomes buildable with LLVM+clang, but also easier to build by other non-GCC compilers, because the project kills much not C99 code like VLAIS, created by usb gadget author, by netfilter hackers, and by crypto subsystem hackers; also nested functions are killed.
In short, you cannot, because the kernel code was written to take advantage of the gcc's compiler semantics...and between the kernel and the compiled code, the relationship is a very strong one, i.e. must be compiled with gcc...Since gcc uses 'ELF' (Embedded Linking Format) object files, the kernel must be built using the object code format. Unless you can hack it up to work with another compiler - it may well compile but may not work, as the compilers under Windows produces PE code, there could be unexpected results, meaning the kernel may not boot at all!

Resources