ARMv8 - Running legacy 32 bit Applications on 64 bit OS - linux

Going thru the ARMv8 manual, I have the following questions to help understand the big picture.
Can legacy 32 bit app. (ARMv7 or earlier) run as is on the ARMv8 OS?
If the legacy applications need to be rebuilt for ARMv8 and assuming that I rebuild the application as 32 bit (Aarch32), does this need 32 bit OS underlying support? (It is interesting to know how the addressing mechanism works here.)
Please provide references wherever possible.
PS: I am targeting Linux OS with Aarch64 support (3.7 and later)

Aarch64 platform may run 32bit ARM but this compatibility is optional.
To run AArch32 binaries you need all libraries application would use in 32bit versions. Same as with i686 binaries on x86-64 systems.

There is also a Linux arm64 CONFIG_COMPAT at: https://github.com/torvalds/linux/blob/v4.17/arch/arm64/Kconfig#L1274 which says:
This option enables support for a 32-bit EL0 running under a 64-bit
kernel at EL1. AArch32-specific components such as system calls,
the user helper functions, VFP support and the ptrace interface are
handled appropriately by the kernel.
which will likely be required, and an ARM employee mentioned on this thread: https://community.arm.com/processors/f/discussions/5535/running-armv7-binaries-on-armv8 that userland instructions are basically the same with some exceptions:
For something like a Linux application, then yes. ARMv8-A includes AArch32, which provides backwards compatibility with ARMv7-A. There are some limitations, such as the SWP instruction no longer being supported. But these are types of things that applications are unlikely to be using (and were deprecated in ARMv7).
For baremetal, you have all the usual problems of using a binary from one platform on another. So you are going to need to do some degree of porting in most cases.
I then tried it for myself with this QEMU full system setup but my attempt failed: I compiled a C hello world with the armv7 compiler as:
arm-linux-gcc -static hello_world.c
and put the built file into the aarch64 target, but when I tried to run it it failed with:
a.out: line 1: syntax error: unexpected word (expecting ")")
even though /proc/config.gz says that CONFIG_COMPAT is set.
It seems that the Linux kernel is not identifying it as an ELF file but rather falling back to /bin/sh, I get the same error if I do:
sh /mnt/9p/a.out
is trying to use the shell binfmt instead of ELF.
In particular, I know that the Linux kernel can choose between archs from the binfmt signature because qemu-user does so: https://unix.stackexchange.com/questions/41889/how-can-i-chroot-into-a-filesystem-with-a-different-architechture

Related

Why does the executable built in 64 bit mode on linux show machine type as AMD x86 64?

I encountered this while trying to understand ELF (Executable and Linking Format).
Steps I followed
Wrote a simple application.
main.c containing
int main(int argc, char **argv){ return 0;}
Compiled in linux environment using gcc. (Done on intel laptop)
Simplest command possible
gcc main.c
Now when I run a.out, it runs without any issue. So build is fine.
I used readelf tool to retrieve the ELF information, where in machine field is put as Advanced Micro Devices X86-64.
This part puzzled me.
So I checked the file header of a.out, it was as per ELF-64 specification (Value 64 - EM_X86_64).
Would anyone care to explain, why does the executable, built in 64 bit mode on linux, show machine type as AMD x86 64?
The x86_64 platform was called the AMD64 platform back when AMD introduced it. Initially, it was far from clear that Intel would ever support it.
You notice how long after i386's ceases to exist, a lot of software had the architecture tag i386? It was because i386 CPUs introduced the instruction set that software uses. Similarly, AMD introduced the instruction set your program uses, so it has an architecture tag that reflects the first CPUs that supported its instruction set. (Modern 32-bit code is still often tagged i686 which refers to the Pentium Pro, circa 1995.)
For a while, the IA-64 (Intel Architecture 64-bit) or Itanium chips were Intel's 64-bit offering, and the Pentium-class chips were the IA-32 chips. The IA-64 chip instruction set was sufficiently different from the Pentium code set that people did not pick it up in large numbers. Meanwhile, AMD came out with a 64-bit extension to the Pentium code set - and that got a lot of support. After a while, Intel bowed to the inevitable and made its own chips that were compatible with the AMD x86/64 chips. But it was AMD that specified the architecture, so it gets the credit in the name.
why does the executable ... show machine type as AMD x86 64?
Because the ELF machine code, used by file, was registered by AMD. There is the official list of registered codes: http://www.sco.com/developers/gabi/latest/ch4.eheader.html (the table at second page):
e_machine
This member's value specifies the required architecture for an individual file.
Name Value Meaning
EM_NONE 0 No machine
...
EM_X86_64 62 AMD x86-64 architecture

glibc: elf file OS ABI invalid

downloaded and compiled glibc-2.13. when i try to run a sample C program which does a malloc(). I get following error
elf file OS ABI invalid
Can anybody please pass my any pointer helpful in resolving this issue.Please note that my kernel version is linux-2.6.35.9
It's not your kernel version that's the problem.
The loader on your system does not support the new Linux ABI. Until relatively recently, Linux ELF binaries used the System V ABI. Recently, in support of STT_GNU_IFUNC, the Linux ABI was added. You would have to update your system C library to have a loader that support STT_GNU_IFUNC, and then it will also recognize ELF objects with the Linux ABI type.
See Dave Miller's blog entry on STT_GNU_IFUNC for Sparc (archived) to gain an understanding of what STT_GNU_IFUNC does, if you care.
If you get your hands in the loader from a newer system, you might be able to make it work using that. But you'll have to carry the loader wherever your program go. You can either compile your program to use that loader as explained here, or compile your program and patch it later using patchelf, in a way similar to what I mention here. I was able to run a program that was giving me the OS ABI invalid error on a linux 2.6.18 (older than yours) that had ld-2.5.so, by copying a ld-2.15.so from somewhere else.
NOTE: do NOT overwrite your system ld*.so or ld-linux. ;-/
It is possible your glibc was built with the --enable-multiarch flag that forced using ifunc and new LINUX ABI
From what I can tell is that --enable-multiarch is the default setting and you should disable it by setting --enable-multiarch=no.

compiling linux kernel with non-gcc

Linux kernel is written for compiling with gcc and uses a lot of small and ugly gcc-hacks.
Which compilers can compile linux kernel except gcc?
The one, which can, is the Intel Compiler. What minimal version of it is needed for kernel compiling?
There also was a Tiny C compiler, but it was able to compile only reduced and specially edited version of the kernel.
Is there other compilers capable of building kernel?
An outdatet information: you need to patch the kernel in order to compile using the Intel CC
Download Linux kernel patch for Intel® Compiler
See also Is it possible to compile Linux kernel with something other than gcc for further links and information
On of the most recent sources :http://forums.fedoraforum.org/showthread.php?p=1328718
There is ongoing process of committing LLVMLinux patches into vanilla kernel (2013-2014).
The LLVMLinux is project by The Linux Foundation: http://llvm.linuxfoundation.org/ to enable vanilla kernel to be built with LLVM. Lot of patches are prepared by Behan Webster, who is LLVMLinux project lead.
There is LWN article about the project from May 2013
https://lwn.net/Articles/549203/ "LFCS: The LLVMLinux project"
Current status of LLVMLinux project is tracked at page http://llvm.linuxfoundation.org/index.php/Bugs#Linux_Kernel_Issues
Things (basically gcc-isms) already eliminated from kernel:
* Expicit Registers Variables (non-C99)
* VLAIS (non C99-compliant undocumented GCC feature "Variable length arrays in structs") like struct S { int array[N];} or even struct S { int array[N]; int array_usb_gadget[M]; } where N and M are non-constant function argument
* Nested Functions (Ada feature ported into C by GCC/Gnat developers; not allowed in C99)
* Some gcc/gas magic like special segments, or macro
Things to be done:
* Usage of __builtin_constant_p builtin to implement scary magic like BUILD_BUG_ON(!__builtin_constant_p(offset));
The good news about LLVMLinux are that after its patches kernel not only becomes buildable with LLVM+clang, but also easier to build by other non-GCC compilers, because the project kills much not C99 code like VLAIS, created by usb gadget author, by netfilter hackers, and by crypto subsystem hackers; also nested functions are killed.
In short, you cannot, because the kernel code was written to take advantage of the gcc's compiler semantics...and between the kernel and the compiled code, the relationship is a very strong one, i.e. must be compiled with gcc...Since gcc uses 'ELF' (Embedded Linking Format) object files, the kernel must be built using the object code format. Unless you can hack it up to work with another compiler - it may well compile but may not work, as the compilers under Windows produces PE code, there could be unexpected results, meaning the kernel may not boot at all!

Is there any difference between executable binary files between distributions?

As all Linux distributions use the same kernel, is there any difference between their executable binary files?
If yes, what are the main differences? Or does that mean we can build a universal linux executable file?
All Linux distributions use the same binary format ELF, but there is still some differences:
different cpu arch use different instruction set.
the same cpu arch may use different ABI, ABI defines how to use the register file, how to call/return a routine. Different ABI can not work together.
Even on same arch, same ABI, this still does not mean we can copy one binary file in a distribution to another. Since most binary files are not statically linked, so they depends on the libraries under the distribution, which means different distribution may use different versions or different compilation configuration of libraries.
So if you want your program to run on all distribution, you may have to statically link a version that depends on the kernel's syscall only, even this you can only run a specified arch.
If you really want to run a program on any arch, then you have to compile binaries for all arches, and use a shell script to start up the right one.
All Linux ports (that is, the Linux kernel on different processors) use ELF as the file format for executables and libraries. A specific ELF binary is labeled with a single architecture/OS on which it can run (although some OSes have compatibility to run ELF binaries from other OSes).
Most ports have support for the older a.out format. (Some processors are new enough that there have never existed any a.out executables for them.)
Some ports support other executable file formats as well; for example, the PA-RISC port has support for HP-UX's old SOM executables, and the μcLinux (nonmmu) ports support their own FLAT format.
Linux also has binfmt_misc, which allows userspace to register handlers for arbitrary binary formats. Some distributions take advantage of this to be able to execute Windows, .NET, or Java applications -- it's really still launching an interpreter, but it's completely transparent to the user.
Linux on Alpha has support for loading Intel binaries, which are run via the em86 emulator.
It's possible to register binfmt_misc for executables of other architectures, to be run with qemu-user.
In theory, one could create a new format -- perhaps register a new "architecture" in ELF -- for fat binaries. Then the kernel binfmt loader would have to be taught about this new format, and you wouldn't want to miss the ld-linux.so dynamic linker and the whole build toolchain. There's been little interest in such a feature, and as far as I know, nobody is working on anything like it.
Almost all linux program files use the ELF standard.
Old Unixes also used COFF format. You may still find executables from times of yore in this format. Linux still has support for it (I don't know if it's compiled in current distros, though).
If you want to create a program that runs an all Linux distributions, you can consider using scripting languages (like Python and Perl) or a platform independent programming language like Java.
Programs written in scripting languages are complied at execution time, which means they are always compiled to match the platform they are executed on, and, hence, should always work (given that the libraries are set up properly).
Programs written in Java, on the other hand, are compiled before distributing them, but can be executed on any Linux distribution as long as it has a Java VM installed.
Furthermore, programs written in Java can be run on other operating systems like MS Windows and Mac OS.
The same is true for many programs written in Python and Perl; however, whether a Python or Perl program will work on another operating system depends on what libraries are used by that program and whether these libraries are available on the other operating systems.

64-bit linux, Assembly Language, Issues?

I'm currently in the process of learning assembly language.
I'm using Gas on Linux Mint (32-bit). Using this book:
Programming from the Ground Up.
The machine I'm using has an AMD Turion 64 bit processor, but I'm limited to 2 GB of RAM.
I'm thinking of upgrading my Linux installation to the 64-bit version of Linux Mint, but I'm worried that because the book is targeted at 32-bit x86 architecture that the code examples won't work.
So two questions:
Is there likely to be any problems with the code samples?
Has anyone here noticed any benefits in general in using 64-bit Linux over 32-bit (I've seen some threads on Stack Overflow about this but they are mostly related to Windows Vista vs. Windows XP.)
Your code examples should all still work. 64-bit processors and operating systems can still run 32-bit code in a sort of "compatability mode". Your assembly examples are no different. You may have to provide an extra line of assembly or two (such as .BITS 32) but that's all.
In general, using a 64-bit OS will be faster than using a 32-bit OS. x86_64 has more registers than i386. Since you're working on assembly, you already know what registers are used for... Having more of them means less stuff has to be moved on and off the stack (and other temporary memory) thus your program spends less time managing data and more time working on that data.
Edit: To compile 32-bit code on 64-bit linux using gas, you just use the commandline argument "--32", as noted in the GAS manual
Even if you run Linux 64bit, it is possible to compile and run 32bit binaries on it. I don't know how good Mint's support for that is, I assume you should check.
64bit assembler however is not fully compatible to 32bit, for example you have different (more) registers. There are some specific instructions not available on either platform.
I would say the move to 64bit is not a big deal. You can still write 32bit assembly and then perhaps try to get it also running as 64bit (shouldn't be too hard), as a source of even more programming/learning fun.
Usually 32-bits is plenty so only use 64-bits or more if you really NEED IT.
Best to decide prior to programming if you want to do it as a 32-bit app or
a 64-bit app and then stick to it as mixed mode debugging ca get tricky fast.

Resources