Run Linux binary under OS X

Run Linux binary under OS X - linux

Is it possible to create a Linux compatibility layer inside OS X?
Someone have created xbinary, which is essentially an OS X port of binfmt_misc as a kext, here: http://www.osxbook.com/software/xbinary/. Extending from the idea here, I am guessing if it is possible to create a Linux compatibility layer inside OS X:
xbinary is obviously what makes the kernel accepts ELF. When an ELF is encountered ...
A port of ld-linux.so which itself is a Mach-O binary is started, loading ELF libraries (Mach-O libraries can also be used, to some extent).
Another kext that catches Linux syscalls (int 80h on x86 and amd64 systems) and translates them into corresponding OS X syscalls in kernel, or let the ld-linux.so replace all syscalls with a small function call in another library that translates them in userland.

I think you would also need to make OS X support Linux personality so that the syscalls behave identically to those of Linux (if present)
Other than that a binary might depend on a VDSO shim present in higher randges of address space, which is usually taken care of by libc implementation on Linux
Many other subtleties are possible...

I think it's not posssible because mac has a different structure than linux.

Related

Why does toolchain name have separate OS and EABI fields.?

For eg. arm-unknown-linux-gnueabi
Now, once the OS i.e Linux is fixed, the C Library will be fixed (GLibc) and hence the calling convention and ABI being followed will be fixed. What is the requirement of 4th field i.e. ABI separately? Can a toolchain use a different ABI from the one used by underlying OS and LIBC. In that case how will libraries compiled by the said toolchain run on the OS?

It's more or less a matter of historical reasons, a.k.a the holy wars about the sacred operating system's name. What you call the "toolchain name" is actually called the Target Triplet, and as it name imply, it has three fields, not either more or less. In your example case, the fields would be:
Machine/CPU: arm
Vendor: unknown
Operating System: linux-gnueabi
Take another reference example I've already faced: i686-elf-gcc, which is used for hobbyist operating system development:
Machine/CPU: i686-elf
Vendor: unknown (implicit)
Operating System: none (implicit; the compiler is actually a freestanding cross compiler, used for the development of operating system kernels, thus the code it outputs expects no underlying OS, as the output code is the OS itself!).
This is just a matter of confusion originating from the fact that the fields may (and do) use the - character, which is used to separate the fields, too. In your case, the OS is considered to be linux-gnueabi, otherwise known as the GNU operating system with the Linux kernel using the Embedded ARM ABI. The Linux Kernel has historically been one of the most portable pieces of software in the world, so it's expected to be portable to other ARM ABIs, although I'm only aware of the EABI...

ARMv8 - Running legacy 32 bit Applications on 64 bit OS

Going thru the ARMv8 manual, I have the following questions to help understand the big picture.
Can legacy 32 bit app. (ARMv7 or earlier) run as is on the ARMv8 OS?
If the legacy applications need to be rebuilt for ARMv8 and assuming that I rebuild the application as 32 bit (Aarch32), does this need 32 bit OS underlying support? (It is interesting to know how the addressing mechanism works here.)
Please provide references wherever possible.
PS: I am targeting Linux OS with Aarch64 support (3.7 and later)

Aarch64 platform may run 32bit ARM but this compatibility is optional.
To run AArch32 binaries you need all libraries application would use in 32bit versions. Same as with i686 binaries on x86-64 systems.

There is also a Linux arm64 CONFIG_COMPAT at: https://github.com/torvalds/linux/blob/v4.17/arch/arm64/Kconfig#L1274 which says:
This option enables support for a 32-bit EL0 running under a 64-bit
kernel at EL1. AArch32-specific components such as system calls,
the user helper functions, VFP support and the ptrace interface are
handled appropriately by the kernel.
which will likely be required, and an ARM employee mentioned on this thread: https://community.arm.com/processors/f/discussions/5535/running-armv7-binaries-on-armv8 that userland instructions are basically the same with some exceptions:
For something like a Linux application, then yes. ARMv8-A includes AArch32, which provides backwards compatibility with ARMv7-A. There are some limitations, such as the SWP instruction no longer being supported. But these are types of things that applications are unlikely to be using (and were deprecated in ARMv7).
For baremetal, you have all the usual problems of using a binary from one platform on another. So you are going to need to do some degree of porting in most cases.
I then tried it for myself with this QEMU full system setup but my attempt failed: I compiled a C hello world with the armv7 compiler as:
arm-linux-gcc -static hello_world.c
and put the built file into the aarch64 target, but when I tried to run it it failed with:
a.out: line 1: syntax error: unexpected word (expecting ")")
even though /proc/config.gz says that CONFIG_COMPAT is set.
It seems that the Linux kernel is not identifying it as an ELF file but rather falling back to /bin/sh, I get the same error if I do:
sh /mnt/9p/a.out
is trying to use the shell binfmt instead of ELF.
In particular, I know that the Linux kernel can choose between archs from the binfmt signature because qemu-user does so: https://unix.stackexchange.com/questions/41889/how-can-i-chroot-into-a-filesystem-with-a-different-architechture

Linux kernel assembly and logic

My question is somewhat weird but I will do my best to explain.
Looking at the languages the linux kernel has, I got C and assembly even though I read a text that said [quote] Second iteration of Unix is written completely in C [/quote]
I thought that was misleading but when I said that kernel has assembly code I got 2 questions of the start
What assembly files are in the kernel and what's their use?
Assembly is architecture dependant so how can linux be installed on more than one CPU architecture
And if linux kernel is truly written completely in C than how can it get GCC needed for compiling?
I did a complete find / -name *.s
and just got one assembly file (asm-offset.s) somewhere in the /usr/src/linux-headers-`uname -r/
Somehow I don't think that is helping with the GCC working, so how can linux work without assembly or if it uses assembly where is it and how can it be stable when it depends on the arch.
Thanks in advance

1. Why assembly is used?
Because there are certain things then can be done only in assembly and because assembly results in a faster code. For eg, "you can get access to unusual programming modes of your processor (e.g. 16 bit mode to interface startup, firmware, or legacy code on Intel PCs)".
Read here for more reasons.
2. What assembly file are used?
From: https://www.kernel.org/doc/Documentation/arm/README
"The initial entry into the kernel is via head.S, which uses machine
independent code. The machine is selected by the value of 'r1' on
entry, which must be kept unique."
From https://www.ibm.com/developerworks/library/l-linuxboot/
"When the bzImage (for an i386 image) is invoked, you begin at ./arch/i386/boot/head.S in the start assembly routine (see Figure 3 for the major flow). This routine does some basic hardware setup and invokes the startup_32 routine in ./arch/i386/boot/compressed/head.S. This routine sets up a basic environment (stack, etc.) and clears the Block Started by Symbol (BSS). The kernel is then decompressed through a call to a C function called decompress_kernel (located in ./arch/i386/boot/compressed/misc.c). When the kernel is decompressed into memory, it is called. This is yet another startup_32 function, but this function is in ./arch/i386/kernel/head.S."
Apart from these assembly files, lot of linux kernel code has usage of inline assembly.
3. Architecture dependence?
And you are right about it being architecture dependent, that's why the linux kernel code is ported to different architecture.
Linux porting guide
List of supported arch

Things written mainly in assembly in Linux:
Boot code: boots up the machine and sets it up in a state in which it can start executing C code (e.g: on some processors you may need to manually initialize caches and TLBs, on x86 you have to switch to protected mode, ...)
Interrupts/Exceptions/Traps entry points/returns: there you need to do very processor-specific things, e.g: saving registers and reenabling interrupts, and eventually restoring registers and properly returning to user mode. Some exceptions may be handled entirely in assembly.
Instruction emulation: some CPU models may not support certain instructions, may not support unaligned data access, or may not have an FPU. An option is using emulation when getting the corresponding exception.
VDSO: the VDSO is a virtual library that the kernel maps into userspace. It allows e.g: selecting the optimal syscall sequence for the current CPU (on x86 use sysenter/syscall instead of int 0x80 if available), and implementing certain system calls without requiring a context switch (e.g: gettimeofday()).
Atomic operations and locks: Maybe in a future some of these could be written using C11 support for atomic operations.
Copying memory from/to user mode: Besides using an optimized copy, these check for out-of-bounds access.
Optimized routines: the kernel has optimized version of some routines, e.g: crypto routines, memset, clear_page, csum_copy (checksum and copy to another place IP data in one pass), ...
Support for suspend/resume and other ACPI/EFI/firmware thingies
BPF JIT: newer kernels include a JIT compiler for BPF expressions (used for example by tcpdump, secmode mode 2, ...)
...
To support different architectures, Linux has assembly code (re-)written for each architecture it supports (and sometimes, there are several implementations of some code for different platforms using the same CPU architecture). Just look at all the subdirectories under arch/

Assembly is needed for a couple of reasons.
There are many instructions that are needed for the operation of an operating system that have no C equivalent, at least on most processors. A good example on Intel x86/64 processors is the iret instruciton, which returns from hardware/software interrupts. These interrupts are key to handling hardware events (like a keyboard press) and system calls from programs on older processors.
A computer does not start up in a state that is immediately ready for execution of C code. For an Intel example, when execution gets to the startup routine the processor may not be in 32-bit mode (or 64-bit mode), and the stack required by C also may not be ready. There are some other features present in some processors (like paging) which need to be turned on from assembly as well.
However, most of the Linux kernel is written in C, which interfaces with some platform specific C/assembly code through standardized interfaces. By separating the parts in this way, most of the logic of the Linux kernel can be shared between platforms. The build system simply compiles the platform independent and dependent parts together for specific platforms, which results in different executable kernel files for different platforms (and kernel configurations for that matter).

Assembly code in the kernel is generally used for low-level hardware interaction that can't be done directly from C. They're like a platform- specific foundation that's used by higher-level parts of the kernel that are written in C.
The kernel source tree contains assembly code for a variety of systems. When you compile a kernel for a particular type of system (such as an x86 PC), only the appropriate assembly code for that platform is included in the build process.

Linux is not the second version of Unix (or Unix in general). It is Unix compatible, but Unix and Linux have separate histories and, in terms of code base (of their kernels), are completely separate. Linus Torvald's idea was to write an open source Unix.
Some of the lower level things like some of the architecture dependent parts of memory management are done in assembly. The old (but still available) Linux kernel API for x86, int 0x80, is implemented in assembly. There are probably other places in the kernel that are implemented in assembly, but I don't know any others.
When you compile the kernel, you select an architecture to target. Depending on the target, the right assembly files for that architecture are included in the build.
The reason you don't find anything is because you're searching the headers, not the sources. Download a tar ball from kernel.org and search that.

Executing Binary Files

I downloaded a binary file that was compiled (a C program) using GCC 4.4.4 for x86-64 Red Hat Linux.
Is it normal that when I try to run it on a Mac OS X (running Lion so also x86-64) running GCC 4.2.1 that it would say: cannot execute binary file? It can't detect it as a binary file.
Why would it do that? I believe the gcc version has nothing to do with that since the file has already been compiled. It has been compiled for x86-64 of which both machines run. Can someone please explain?

There are different binary formats. Linux systems use ELF for executables and libraries, but Mac OS X uses the Mach-O format. Windows uses another still: PE format.

It's highly unlikely that a binary compiled for a particular OS will run on another. Either get a binary for Mac, or get the source and compile it yourself.
There are many things that will cause issues - version of libc and libstdc++, there can be difference in versions of .so libraries - different API interface to the OS itself. Or even a different binary format (ie VMS binaries do not run on AIX).

Although Rad Hat Linux and Mac OS X are both 'Unix based', they cannot run each other's binaries. Just like you can't run Windows binaries on Macs and vice versa.

binaries in this sense are compiled to make operating system calls, when your program has a printf() that boils down to operating system calls. If the operating system it was compiled for is say a 64 bit redhat linux then that likely means the binary is going to look for redhat linux names for shared libraries in redhat linux paths. Which has absolutely nothing in any way shape or form to do with a completely different operating system, Mac OS X, and its system calls and shared or static libraries, etc. Its like taking a wheel off of a mini cooper and trying to bolt it onto a bicycle. Yes at one point in time it was raw metal and rubber, and could have been formed into a bike tire. But once you make that binary, the car tire or the bike tire, that is what it is. sometimes you find an emulator like wine that emulates windows on top of a posix system. or a virtual machine like vmware that lets you run a whole different operating system on top of another by virtualizing a whole computer.
it is also true that you cannot generally expect to take any old C program and have it compile and run on any operating system that say has a gcc compiler. yes you can learn to write c programs that are portable, but you have to carefully stick to libraries that are supported on all the target platforms. so even taking the source code for your program to the mac and compiling it is not necessarily going to just work.

Is there any difference between executable binary files between distributions?

As all Linux distributions use the same kernel, is there any difference between their executable binary files?
If yes, what are the main differences? Or does that mean we can build a universal linux executable file?

All Linux distributions use the same binary format ELF, but there is still some differences:
different cpu arch use different instruction set.
the same cpu arch may use different ABI, ABI defines how to use the register file, how to call/return a routine. Different ABI can not work together.
Even on same arch, same ABI, this still does not mean we can copy one binary file in a distribution to another. Since most binary files are not statically linked, so they depends on the libraries under the distribution, which means different distribution may use different versions or different compilation configuration of libraries.
So if you want your program to run on all distribution, you may have to statically link a version that depends on the kernel's syscall only, even this you can only run a specified arch.
If you really want to run a program on any arch, then you have to compile binaries for all arches, and use a shell script to start up the right one.

All Linux ports (that is, the Linux kernel on different processors) use ELF as the file format for executables and libraries. A specific ELF binary is labeled with a single architecture/OS on which it can run (although some OSes have compatibility to run ELF binaries from other OSes).
Most ports have support for the older a.out format. (Some processors are new enough that there have never existed any a.out executables for them.)
Some ports support other executable file formats as well; for example, the PA-RISC port has support for HP-UX's old SOM executables, and the μcLinux (nonmmu) ports support their own FLAT format.
Linux also has binfmt_misc, which allows userspace to register handlers for arbitrary binary formats. Some distributions take advantage of this to be able to execute Windows, .NET, or Java applications -- it's really still launching an interpreter, but it's completely transparent to the user.
Linux on Alpha has support for loading Intel binaries, which are run via the em86 emulator.
It's possible to register binfmt_misc for executables of other architectures, to be run with qemu-user.
In theory, one could create a new format -- perhaps register a new "architecture" in ELF -- for fat binaries. Then the kernel binfmt loader would have to be taught about this new format, and you wouldn't want to miss the ld-linux.so dynamic linker and the whole build toolchain. There's been little interest in such a feature, and as far as I know, nobody is working on anything like it.

Almost all linux program files use the ELF standard.

Old Unixes also used COFF format. You may still find executables from times of yore in this format. Linux still has support for it (I don't know if it's compiled in current distros, though).

If you want to create a program that runs an all Linux distributions, you can consider using scripting languages (like Python and Perl) or a platform independent programming language like Java.
Programs written in scripting languages are complied at execution time, which means they are always compiled to match the platform they are executed on, and, hence, should always work (given that the libraries are set up properly).
Programs written in Java, on the other hand, are compiled before distributing them, but can be executed on any Linux distribution as long as it has a Java VM installed.
Furthermore, programs written in Java can be run on other operating systems like MS Windows and Mac OS.
The same is true for many programs written in Python and Perl; however, whether a Python or Perl program will work on another operating system depends on what libraries are used by that program and whether these libraries are available on the other operating systems.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string