How is Rust compiled to machine code? - rust

I've recently been looking at the Rust programming language. How does it work? Rust code seems to be compiled into ELF or PE (etc) binaries, but I've not been able to find any information on how that's done? Is it compiled to an intermediate format then compiled the rest of the way with gxx for example? Any help (or links) would be really appreciated.

The code-generation phase of the Rust compiler is mainly done by LLVM. LLVM is a set of tools for building a compiler, most notably used by the C[++] Compiler clang[++].
First, the Rust compiler (just like clang, for example) does all the Rust specific stuff like type and borrow checking; in the end, it generates LLVM-IR. IR stands for intermediate representation and it's... comparable to assembly, but a tiny bit more high level and most importantly: platform independent. Then the Rust compiler just calls ☏ LLVM and says:
Hey buddy, could you please take this IR and generate machine code for the current platform? That would be fantastic β—• β—‘ β—•
To which LLVM responds:
🌈 Sure, no problem, new friend. Here is your highly optimized machine code for [e.g.] x86_64! β™«β™ͺβ™«
Afterwards they invite a few more friends to wrap it all up in a nice little [e.g.] ELF package and beautifully place it in the users file system. (and the user is like...)
Information like this can be found in the official FAQ which contains a lot of interesting information anyway. For more in-depth details on the Rust compiler, you can read the "Rustc Guide". For this question, the chapter "High Level Overview" is pretty interesting.

Related

Progamming inside a Attiny85 with rust

Introduction
I created a small project that uploads C++ code to an Attiny85, for this I used arduino.
Question
But I would have liked to know if it was possible to download and run rust code in Attiny85 or other Attiny.
If we can, how do we do it?
Details
I found this GitHub repo to do this, but it is not explicit on how can export the rust code to Attiny.
The GitHub repo in question: https://github.com/q231950/avr-attiny85-rust?ref=https://githubhelp.com
C++ is cross-compiled to AVR machine code on your development host. What you are loading is not C++ code; that is the source code used to generate the machine executable binary code, which is what you load..
You can develop for AVR using any language for which a cross compiler exists. Rust is certainly such a language. This article discusses using Rust on Arduino Uno hardware.
Whether ATTiny85 with only 8Kb of Flash and 512 bytes of SRAM will support a Rust runtime environment and any useful code I cannot tell; I am not familiar with Rust's runtime requirements, but it does not seem like an efficient use of limited resources to me, and I would treat it as an academic challenge rather than a practical development approach. I would expect Rust to have a considerably larger run-time footprint than C or even C++.

Confusions arising from a programming language whose compiler is written in itself

By accident I knew that the compiler of Haskell is written in Haskell. It sounds strange to me. How is this possible, I mean, to compile itself? Who is to compile the compiler then? What is the ultimate code accepted by machine?
Consider the programming language who is the first to have a compiler. What is the language of its compiler? Going back even farther in time, how did people program before the era of compiler?
Broadly speaking, I am often confused about the border between software (e.g. programming written by people) and hardware (e.g. something executable on a physical machine).
P.S.: I have basic knowledge about compiler such as lexical analysis, parsing, and code optimization. However, I know little about hardware (the machine).
It seems that the answer to a related post Implementing a compiler in β€œitself” does not go deeply into the border between software and hardware.
And I would like to see some concrete examples.
EDIT: Some comments mentioned the term "bootstrapping". It seems that there is some minimum core part of a language (like axioms/basic theorems in mathematics) which must be compiled in a lower-level way (instead of by itself). What are they? Are they basically the same in different languages? Again, I would like to see some concrete examples.
As you can read in A history of Haskell page 28, the first haskell compiler was written in Lazy ML in June 1989. It implemented essentially all of Haskell 1.0.
Now that this compiler existed, it can then be used to compile the Haskell version of GHC. The first beta of GHC written in Haskell was release on 1 April 1991. The full release came in December 1992.
Because the Lazy ML-based compiler wasn't developed further, today you use a previous version of GHC to compile GHC. So if you want to build GHC 7.8, you use GHC 7.6 to build it (in practice, it's a bit more complicated, because there are multiple stages and only the first stage, which doesn't support GHCi or TemplateHaskell is built with GHC 7.6)
That means that if you don't have a working haskell compiler today, you have two options:
Try to install a LML compiler and compile the first version of GHC written in Lazy ML. Then use this compiler to compile the next version which is written in Haskell. Then again use that compiler to build the next version, and repeat until you have a reasonably recent compiler. It may be possible to skip a few versions, but I don't know how many. As you can imagine, this could take a lot of time.
(Much easier) Download pre-built GHC binaries.
Um... I have not tried this, but another route would be simply compiling to c and using a c compiler to compile latest ghc...ghc is itself built in stages, so you don't really even need to convert the whole code base to c, just the first stage, which then can compile the rest. Certainly no need to dig up Lazy ML.
Edit: Note the resulting compiler will not build binaries targeting the new platform, it would simply run on that platform and be a cross-platform compiler for targets that ghc already has backends for. Another note is that i actually intended this in response to bennofs answer, not as stand alone answer to the OP.

How do functional language compilers work? [duplicate]

I've heard of the idea of bootstrapping a language, that is, writing a compiler/interpreter for the language in itself. I was wondering how this could be accomplished and looked around a bit, and saw someone say that it could only be done by either
writing an initial compiler in a different language.
hand-coding an initial compiler in Assembly, which seems like a special case of the first
To me, neither of these seem to actually be bootstrapping a language in the sense that they both require outside support. Is there a way to actually write a compiler in its own language?
Is there a way to actually write a compiler in its own language?
You have to have some existing language to write your new compiler in. If you were writing a new, say, C++ compiler, you would just write it in C++ and compile it with an existing compiler first. On the other hand, if you were creating a compiler for a new language, let's call it Yazzleof, you would need to write the new compiler in another language first. Generally, this would be another programming language, but it doesn't have to be. It can be assembly, or if necessary, machine code.
If you were going to bootstrap a compiler for Yazzleof, you generally wouldn't write a compiler for the full language initially. Instead you would write a compiler for Yazzle-lite, the smallest possible subset of the Yazzleof (well, a pretty small subset at least). Then in Yazzle-lite, you would write a compiler for the full language. (Obviously this can occur iteratively instead of in one jump.) Because Yazzle-lite is a proper subset of Yazzleof, you now have a compiler which can compile itself.
There is a really good writeup about bootstrapping a compiler from the lowest possible level (which on a modern machine is basically a hex editor), titled Bootstrapping a simple compiler from nothing. It can be found at https://web.archive.org/web/20061108010907/http://www.rano.org/bcompiler.html.
The explanation you've read is correct. There's a discussion of this in Compilers: Principles, Techniques, and Tools (the Dragon Book):
Write a compiler C1 for language X in language Y
Use the compiler C1 to write compiler C2 for language X in language X
Now C2 is a fully self hosting environment.
The way I've heard of is to write an extremely limited compiler in another language, then use that to compile a more complicated version, written in the new language. This second version can then be used to compile itself, and the next version. Each time it is compiled the last version is used.
This is the definition of bootstrapping:
the process of a simple system activating a more complicated system that serves the same purpose.
EDIT: The Wikipedia article on compiler bootstrapping covers the concept better than me.
A super interesting discussion of this is in Unix co-creator Ken Thompson's Turing Award lecture.
He starts off with:
What I am about to describe is one of many "chicken and egg" problems that arise when compilers are written in their own language. In this ease, I will use a specific example from the C compiler.
and proceeds to show how he wrote a version of the Unix C compiler that would always allow him to log in without a password, because the C compiler would recognize the login program and add in special code.
The second pattern is aimed at the C compiler. The replacement code is a Stage I self-reproducing program that inserts both Trojan horses into the compiler. This requires a learning phase as in the Stage II example. First we compile the modified source with the normal C compiler to produce a bugged binary. We install this binary as the official C. We can now remove the bugs from the source of the compiler and the new binary will reinsert the bugs whenever it is compiled. Of course, the login command will remain bugged with no trace in source anywhere.
Check out podcast Software Engineering Radio episode 61 (2007-07-06) which discusses GCC compiler internals, as well as the GCC bootstrapping process.
Donald E. Knuth actually built WEB by writing the compiler in it, and then hand-compiled it to assembly or machine code.
As I understand it, the first Lisp interpreter was bootstrapped by hand-compiling the constructor functions and the token reader. The rest of the interpreter was then read in from source.
You can check for yourself by reading the original McCarthy paper, Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I.
Every example of bootstrapping a language I can think of (C, PyPy) was done after there was a working compiler. You have to start somewhere, and reimplementing a language in itself requires writing a compiler in another language first.
How else would it work? I don't think it's even conceptually possible to do otherwise.
Another alternative is to create a bytecode machine for your language (or use an existing one if it's features aren't very unusual) and write a compiler to bytecode, either in the bytecode, or in your desired language using another intermediate - such as a parser toolkit which outputs the AST as XML, then compile the XML to bytecode using XSLT (or another pattern matching language and tree-based representation). It doesn't remove the dependency on another language, but could mean that more of the bootstrapping work ends up in the final system.
It's the computer science version of the chicken-and-egg paradox. I can't think of a way not to write the initial compiler in assembler or some other language. If it could have been done, I should Lisp could have done it.
Actually, I think Lisp almost qualifies. Check out its Wikipedia entry. According to the article, the Lisp eval function could be implemented on an IBM 704 in machine code, with a complete compiler (written in Lisp itself) coming into being in 1962 at MIT.
Some bootstrapped compilers or systems keep both the source form and the object form in their repository:
ocaml is a language which has both a bytecode interpreter (i.e. a compiler to Ocaml bytecode) and a native compiler (to x86-64 or ARM, etc... assembler). Its svn repository contains both the source code (files */*.{ml,mli}) and the bytecode (file boot/ocamlc) form of the compiler. So when you build it is first using its bytecode (of a previous version of the compiler) to compile itself. Later the freshly compiled bytecode is able to compile the native compiler. So Ocaml svn repository contains both *.ml[i] source files and the boot/ocamlc bytecode file.
The rust compiler downloads (using wget, so you need a working Internet connection) a previous version of its binary to compile itself.
MELT is a Lisp-like language to customize and extend GCC. It is translated to C++ code by a bootstrapped translator. The generated C++ code of the translator is distributed, so the svn repository contains both *.melt source files and melt/generated/*.cc "object" files of the translator.
J.Pitrat's CAIA artificial intelligence system is entirely self-generating. It is available as a collection of thousands of [A-Z]*.c generated files (also with a generated dx.h header file) with a collection of thousands of _[0-9]* data files.
Several Scheme compilers are also bootstrapped. Scheme48, Chicken Scheme, ...

Producing executables within Linux (in relation to implementing a compiler)

For my university, final-year dissertation, I am going to implement a compiler for a skeletal form of the C programming language, then go about extending it until it resembles something a little more like Java with array bounds checking, type-checking and so forth.
I am relatively competent at much of the theory that relates to compiler construction, and have experience programming in MIPS assembly language, so I do understand a little of what it is to write extremely low-level code.
My main concern is that I am likely to be able to get all the way to the point where I need to produce the actual machine-code output, but then not understand enough about how machine code is executed from the perspective of the operating system running it.
So, my actual question is basically, "does anyone know the best place to read up about writing assembly to run on an intel x86-64 processor under linux?"
The main gap in my knowledge is how the machine code is actually run in practise. Is it run directly on the processor, making "syscall"s (or the x86 equivalent) when it needs services provided by the kernel, or is the assembly language somehow an encapsulated description that tells the kernel how to execute the instructions (in a manner similar to an interpreted language such as Java)?
Any help you can provide would be greatly appreciated.
This document explains how you can implement a foreign function interface to interact with other code: http://www.x86-64.org/documentation/abi.pdf
Firstly, for the machine code start here: http://www.intel.com/products/processor/manuals/
Next, I assume your question about how the machine code is run is really about how the OS loads the exe into memory and calls main()? These links may help
Linkers and loaders:
http://www.linuxjournal.com/article/6463
ELF file format:
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format and
http://www.linuxjournal.com/article/1060
Your machine code will go into the .text section of the executable
Finally, best of luck. Your project is similar to my final year project, except I targeted the JVM and compiled a subset of Visual Basic!

To which programming language should I switch my project?

I have a large program written with my own patched version of the GNU Eiffel (SmallEiffel) compiler. While I love the language I'm running into the problem that the compiler is O(n^2) or worse on the compiled system size. So I have to move soon.
ISE Eiffel the only alive Eiffel compiler is not an option for various reasons. Mostly because the compiled code runs way to slow.
I'm looking for a language which is:
imperative and OO
has generics/templates
compiles to native code and does not
require .NET/Java
statically typed (which means fast)
garbage collected
cross platform
not as ugly and braindead as C++
I couldn't come up with anything else then D but this looks a little bit to low level and non stable. Is there really none which satisfies this seven points?
OCaml, perhaps?
You could write in Java and compile to native-ish code with GCJ (it will be native code, but you'll need to link against a fair portion of code that makes up all the things Java needs at run-time. Your users will not need to install a JRE.)
Googling 'object oriented native code compiler' brings up Objective Caml before Eiffel.
If you're willing to take your chances on a research compiler, check out the Diesel language and the native-code Vortex compiler (written for Diesel in Diesel). It is a research project, but it is stable, and Craig Chambers is one of the best people in the business.
What about Python?
It is OO, scripted language, runs fast, has generic templates.

Resources