Is assembler portable between Linux distros? - linux

Is a program shipped in assembler format portable between Linux distributions (modulo CPU architecture differences)?
Here's the background to my question: I'm working on a new programming language (named Aklo), whose modus operandi will be the classic compiling to .s and feeding the result to the GNU assembler.
Obviously it would be nice ultimately to have the implementation written in itself, but I had resigned myself to maintaining it in C++ to solve the chicken and egg problem: suppose you download the compiler for the first time and it is itself written in Aklo, how do you compile it? As I understand it, different Linux distributions and other UNIX like systems have different conventions for binary formats.
But it's just occurred to me, a solution might be to ship the .s file (well, one per CPU architecture): it's fair to assume you have or can install the GNU assembler. Of course I'd still need a bootstrap compiler, but that doesn't need to be fast; I can write it in Python.
Is assembler portable in the way that binaries are not? Are there any other stumbling blocks I haven't thought of?
Added in response to one answer:
I had looked wistfully at LLVM, there is certainly a lot of good stuff there and it would make my life easier -- except that it would incur a dependency on the correct version of LLVM being installed. It wouldn't be so bad having that dependency on development machines, but in a world where it's common to ship programs as source, the same dependency would be incurred for every user of every program ever written in Aklo, and I decided that was too high a price to pay.
But if the solution of shipping compiled programs as assembler works... then that solves that problem, and I can use LLVM after all, which would be a big win.
So the question about portability of assembler is even considerably more important than I had first realized.
Conclusion: from answers here and on the LLVM mailing list http://lists.cs.uiuc.edu/pipermail/llvmdev/2010-January/028991.html it seems the bad news is the problem is unsolvable, but the good news is that means using LLVM makes it no worse, so I'm free to do so and obtain all the advantages thereof.

You might want to check out LLVM before going down this particular path. It might make your life a lot easier, as it provides a low level virtual machine that makes a lot of hard stuff just work and has been very popular.

At a very high level, the ABI consists of { instruction set, system calls, binary format, libraries }.
Distribution as .s may free you from the binary format. This is still rather pointless, because you are fixed to a particular ISA and still need to use libraries and/or make system calls. Libraries vary from distribution to distribution (although this isn't really that bad, especially if you just use libc) and syscalls vary from OS to OS.

It's basically 20 years since I last bootstrapped a C compiler. At the level of compilers, the differences between Linux distributions are minimal.
The much more important reason for going LLVM is cross-platform; if you're not writing some intermediate language, your compiler will be extremely difficult to retarget for different processors. And seeing as, on my laptop, I have compilers for x86, x86_64, two kinds of MIPS, PowerPC, ARM and AVR... you see where I'm going? I can compile multiple languages for most of those targets too (only C for AVR).

Related

Difference between arm-none-eabi and arm-linux-gnueabi?

What is the difference between arm-none-eabi and arm-linux-gnueabi? I know the difference in how to use them (one for bare metal software, the other one for software meant to be run on linux). But what is the technical background?
I see there is a difference in the ABI which is, as far as I understood, something like an API but on binary level. It ensures interoperability of different applications.
But I don't really understand in which way having or not having an operating system affects my toolchain. The only thing that came to my mind is, that libraries maybe have to be statically linked (do they?) while compiling bare metal software, because there is no os dynamically providing them.
The most pages I found related to this toppic just answered how to use the toolchains but not the technical background. I'm a student of mechatronics and new to embedded systems, so my experience in this field is somewhat limited.
Maybe this link to a description will help.
Probably the biggest difference:
"The bare-metal ABI will assume a different C library (newlib for example, or even no C library) to the Linux ABI (which assumes glibc). Therefore, the compiler may make different function calls depending on what it believes is available above and beyond the Standard C library."

How to inspect Haskell bytecode

I am trying to figure out a bug (a serious performance downgrade). Unfortunately, I wasn't able to figure out why by going back many different versions of my code.
I am suspecting it could be some modifications to libraries that I've updated, not to mention in the meanwhile I've updated to GHC 7.6 from 7.4 (and if anybody knows if some laziness behavior has changed I would greatly appreciate it!).
I have an older executable of this code that does not have this bug and thus I wonder if there are any tools to tell me the library versions I was linking to from before? Like if it can figure out the symbols, etc.
GHC creates executables, which are notoriously hard to understand... On my Linux box I can view the assembly code by typing in
objdump -d <executable filename>
but I get back over 100K lines of code from just a simple "Hello, World!" program written in Haskell.
If you happen to have the GHC .hi files, you can get some information about the executable by typing in
ghc --show-iface <hi filename>
This won't give you the assembly code, but you can get some extra information that may prove useful.
As I mentioned in the comment above, on Linux you can use "ldd" to see what C-system libraries you used in the compile, but that is also probably less than useful.
You can try to use a disassembler, but those are generally written to disassemble to C, not anything higher level and certainly not Haskell. That being said, GHC compiles to C as an intermediary (at least it used to; has that changed?), so you might be able to learn something.
Personally I often find view system calls in action much more interesting than viewing pure assembly. On my Linux box, I can view all system calls by running using strace (use Wireshark for the network traffic equivalent):
strace <program executable>
This also will generate a lot of data, so it might only be useful if you know of some specific place where direct real world communication (i.e., changes to a file on the hard disk drive) goes wrong.
In all honesty, you are probably better off just debugging the problem from source, although, depending on the actual problem, some of these techniques may help you pinpoint something.
Most of these tools have Mac and Windows equivalents.
Since much has changed in the last 9 years, and apparently this is still the first result a search engine gives on this question (like for me, again), an updated answer is in order:
First of all, yes, while Haskell does not specify a bytecode format, bytecode is also just a kind of machine code, for a virtual machine. So for the rest of the answer I will treat them as the same thing. The “Core“ as well as the LLVM intermediate language, or even WASM could be considered equivalent too.
Secondly, if your old binary is statically linked, then of course, no matter the format your program is in, no symbols will be available to check out. Because that is what linking does. Even with bytecode, and even with just classic static #include in simple languages. So your old binary will be no good, no matter what. And given the optimisations compilers do, a classic decompiler will very likely never be able to figure out what optimised bits used to be partially what libraries. Especially with stream fusion and such “magic”.
Third, you can do the things you asked with a modern Haskell program. But you need to have your binaries compiled with -dynamic and -rdynamic, So not only the C-calling-convention libraries (e.g. .so), and the Haskell libraries, but also the runtime itself is dynamically loaded. That way you end up with a very small binary, consisting of only your actual code, dynamic linking instructions, and the exact data about what libraries and runtime were used to build it. And since the runtime is compiler-dependent, you will know the compiler too. So it would give you everything you need, but only if you compiled it right. (I recommend using such dynamic linking by default in any case as it saves memory.)
The last factor that one might forget, is that even the exact same compiler version might behave vastly differently, depending on what IT was compiled with. (E.g. if somebody put a backdoor in the very first version of GHC, and all GHCs after that were compiled with that first GHC, and nobody ever checked, then that backdoor could still be in the code today, with no traces in any source or libraries whatsoever. … Or for a less extreme case, that version of GHC your old binary was built with might have been compiled with different architecture options, leading to it putting more optimised instructions into the binaries it compiles for unless told to cross-compile.)
Finally, of course, you can profile even compiled binaries, by profiling their system calls. This will give you clues about which part of the code acted differently and how. (E.g. if you notice that your new binary floods the system with some slow system calls where the old one just used a single fast one. A classic OpenGL example would be using fast display lists versus slow direct calls to draw triangles. Or using a different sorting algorithm, or having switched to a different kind of data structure that fits your work load badly and thrashes a lot of memory.)

The compilation of the compiler could affect the compiled programs?

Probably my question sounds weird, but my point is: i have to compile a program using GCC, if i compile GCC from the source i will get a slight edge in terms of performances from a software compiled with the fresh new GCC? What I should expect?
You won't get any faster programs out of a compiler built with optimizing flags. Since a program is the compilers' output, and optimizations don't change the output of a correct program, the programs stay the same.
You might, however, profit from new available options if your distributor ships an incomplete compiler. Look through the GCC manual for any options you want to enable (like certain target architecture variants), and if you can't enable them in your current compiler build, there might be potential in a custom-built compiler. However, it is unlikely that it's worth it.
Not unless you're building a newer version of gcc, or enabling cloog, graphite, etc.
the performance difference usually is nothing or is negligible.
in a very rare, really very rare cases you can see noticeable difference, but not always performance improvement. degradation is possible too.

Bare metal cross compilers input

What are the input limitations of a bare metal cross compiler...as in does it not compile programs with pointers or mallocs......or anything that would require more than the underlying hardware....also how can 1 find these limitations..
I also wanted to ask...I built a cross compiler for target mips..i need to create a mips executable using this cross compiler...but i am not able to find where the executable is...as in there is 1 executable which i found mipsel-linux-cpp which is supposed to compile,assemble and link and then produce a.out but it is not doing so...
However the ./cc1 gives a mips assembly.......
There is an install folder which has a gcc executable which uses i386 assembly and then gives an exe...i dont understand how can the gcc exe give i386 and not mips assembly when i have specified target as mips....
please help im really not able to understand what is happ...
I followed the foll steps..
1. Installed binutils 2.19
2. configured gcc for mips..(g++,core)
I would suggest that you should have started two separate questions.
The GNU toolchain does not have any OS dependencies, but the GNU library does. Most bare-metal cross builds of GCC use the Newlib C library which provides a set of syscall stubs that you must map to your target yourself. These stubs include low-level calls necessary to implement stream I/O and heap management. They can be very simple or very complex depending on your needs. If the only I/O support is to a UART to stdin/stdout/stderr, then it is simple. You don't have to implement everything, but if you do not implement teh I/O stubs, you won't be able to use printf() for example. You must implement the sbrk()/sbrk_r() syscall is you want malloc() to work.
The GNU C++ library will work correctly with Newlib as its underlying library. If you use C++, the C runtime start-up (usually crt0.s) must include the static initialiser loop to invoke the constructors of any static objects that your code may include. The run-time start-up must also of course initialise the processor, clocks, SDRAM controller, timers, MMU etc; that is your responsibility, not the compiler's.
I have no experience of MIPS targets, but the principles are the same for all processors, there is a very useful article called "Building Bare Metal ARM with GNU" which you may find helpful, much of it will be relevant - especially porting the parts regarding implementing Newlib stubs.
Regarding your other question, if your compiler is called mipsel-linux-cpp, then it is not a 'bare-metal' build but rather a Linux build. Also this executable does not really "compile, assemble and link", it is rather a driver that separately calls the pre-processor, compiler, assembler and linker. It has to be configured correctly to invoke the cross-tools rather than the host tools. I generally invoke the linker separately in order to enforce decisions about which standard library to link (-nostdlib), and also because it makes more sense when a application is comprised of multiple execution units. I cannot offer much help other than that here since I have always used GNU-ARM tools built by people with obviously more patience than me, and moreover hosted on Windows, where there is less possibility of the host tool-chain being invoked instead (one reason why I have also avoided those tool-chains that rely on Cygwin)
EDIT
With more time available, I have rewritten my original answer in an attempt to provide something more useful.
I cannot provide a specific answer for your question. I have never tried to get code running on a MIPS machine. What I do have is plenty of experience getting a variety of "bare metal" boards up and running. All kinds of CPUs and all kinds of compilers and cross compilers. So I have an understanding of the principles that apply in all such situations. I will point out the kind of knowledge you will need to absorb before you can hope to succeed with a job like this, and hopefully I can list some links to resources to get you started on learning that knowledge.
I am worried you don't know that pointers are exactly the kind of thing a bare metal compiler can handle, they are a basic machine primitive. This tells me you are probably not an expert embedded developer who is just stuck in this particular scenario. Never mind. There isn't anything magic about programming an embedded system, and you can learn what you need to know.
The first step is getting to understand the relationship between C and the machine you wish to run code on. Basically C is a portable assembly language. This means that C is good for manipulating the basic operations of the machine. In this sense the basic operations of the machine are reading and writing memory locations, performing arithmetic and boolean operations on the data read from memory, and making branching and looping decisions based on that data. In particular the C concept of pointers allows you to manipulate data at locations in memory that you specify.
So far so good, but just doing raw computations in memory is not usually enough - you need a way to input and output data from memory. To do that you need to manipulate the hardware peripherals on your board. If the hardware peripherals are memory mapped then the machine registers used to control the peripherals look exactly like memory locations and C can manipulate them directly. Even in that case though, it is much more likely that doing useful I/O is best handled by extending the C core language with a library of routines provided just for that purpose. These library routines handle all the nasty details (timers, interrupts, non-memory mapped I/O) involved in manipulating the peripheral hardware on the board, and wrap them up with a convenient C function call interface. The idea is that you can go simply printf("hello world"); and the library call take care of the details of displaying the string.
An appropriately skilled developer knows how to adapt an existing I/O library to a new board, or how to develop new library routines to provide access to non-standard custom hardware. The classic way to develop these skills is to start with something simple, usually a LED for an output device, and a switch for an input device. Write a program that pulses a LED in a predictable way, or reads a switch and reflects in on a LED. The first time you get this working will be hugely satisfying.
Okay I have rambled enough. It is time to provide some more resources for you to study. The good news is that there's never been a better time to learn how things work at the interface between hardware and software. There is a wealth of freely available code and docs. Stackoverflow is a great resource as you know. Good luck! Links follow;
Embedded systems overview
Knowing the C language well is fundamental
Why not get your code working on a simulator before you try real hardware
Another emulated environment
Linux device drivers - an overlapping subject
Another book about bare metal programming

Is it possible to compile Linux kernel with something other than gcc

I wonder if someone managed to compile the Linux kernel with some other compiler than gcc. Or if someone have ever tried? Question may seem to be silly or academic, but it arose when I thought about answers to: Are C++ int operations atomic on the mips architecture
It seems that the atomicity of some operations depends not only on the cpu architecture, but also on used compiler. So, I wonder if in Linux world some compiler other than gcc even exists.
Linux explicitly depends on some gcc extensions, so any other compiler must be compatible with the needed extensions, in that case.
This is not a "no", since it's of course not impossible for a separate compiler vendor/developer to track gcc's extensions, just a data point that might help you search.
At some point tcc would process and run the linux kernel source. SO that would be a yes, I guess.
::Hat tip to ephemient in the comments.::
The LLVM developers are trying to compile it with clang. The meta-bug on compiling the Linux kernel with clang has more details (the dependency tree for that meta-bug shows how little seems to be left).
There have been some efforts (and patches) to compile an early version of the 2.6 kernel with icc.
Yup. I've done this. See [cfe-dev] Clang builds a working Linux Kernel (Boots to RL5 with SMP, networking and X, self hosts).
IBM's compiler was able to do it some Linux versions ago, but I'm not sure about now, nor am I sure of how well IBM optimized the kernel as instructed. All I know is, they got it to build.
As Linux is self hosting (with its own libc) and has been developed from the start with gcc (and gcc cross compilers), its sort of silly to use anything else.
I think mainly, playing nice with preprocessor macros and instructed optimizations is the biggest obstacle (not even getting into a departure from gas), as GNU has basically written the book on the above, and extended it. Beyond that, Linux tunes its optimizations to work with gcc, for instance, don't get caught using 'volatile' in the kernel without a damn good reason. Using inline and actually having the compiler agree is another challenge.
Linus is the first one to call GCC an &*#$ hole, which makes for a better compiler.
This is why we have the great GNU/Linux debate.
Many, many, many years ago, it was actually possible to compile the kernel with g++, and as far as I remember part of the motivation was because C++ had stronger type check, not necessarily to have g++ to produce object files. But as Neil Butterworth have pointed out, Linus is not particular fond of C++, and there is zero chance that this ever will be possible again.
EKOPath 4 Compiler, not now. but probably with some minor patches
https://github.com/path64/repositories
http://www.pathscale.com/ekopath-compiler-suite
I am just now working on compile Linux kernel using Open64 for MIPS archtecture, and some other guys are now just working for make Open64 can build for X86 arch. Now the kernel can partly run, and still have Run fail errors.
However for the atomic problem, at least i have not come up with it. And I do not think it is really a problem.The reasons are:
The Linux kernel have already been a collection of source code, which can successfully build with GCC, so it is only the compiler's problem if it can not build it, or the built kernel runs fail.
If a compiler want to successfully build Linux kernel, it should obide the GNU C Extension, and this extension will give a clear discription of what a atomic operation is, so such a compile only need to generate code according to this extension.
My non-technical guess: The Linux Kernel can't currently (2009) be compiled with any compiler other than the GNU compiler, gcc.
I say this on the basis that I've heard Richard Stallman, with some conviction, say Linux should be called GNU/Linux because the kernel is "only 1 part of the operating system" and I'm guessing he would not be able to say this if the kernel was non-dependant on GNU (e.g. a tonne of embedded devices run a Linux OS without any GNU software).
As I said, just a guess, let me know if I'm wrong...

Resources