How exactly is Rust programming language implemented?

How exactly is Rust programming language implemented? - rust

If you check languages percentage in github rust lang compiler repository it says that 97.6% of the rust lang compiler is written in rust. So how does this exactly works?. How you can create a programming language (I think this is related to a compiler, since it's whom read the code, doesn't it?) written in itself.

This is called self-hosting or bootstrapping. The basic idea goes like this:
Write an initial compiler for a small subset of Rust using your Other Programming Language of Choice. You now have compiler C0.
Using the subset of Rust you have a compiler for, rewrite the source for C0 purely in Rust. Compile that program using compiler C0 to form compiler C1.
Add features to Rust by adding code to the compiler you just wrote to properly parse and implement those features. Compile that Rust program with C1 to form compiler C2.
By repeating step (3) as many times as you’d like, you can add progressively more and more features to the Rust language, with the Rust compiler always being written in Rust itself.
There’s a famous talk called Reflections on Trusting Trust that talks about how this process works, as well as how you can use this process to do Nefarious Things.

Related

Can Rust code compile without the standard library?

I'm currently in the progress of learning Rust. I'm mainly using The Rust Programming Language book and this nice reference which relates Rust features/syntax to C++ equivalents.
I'm having a hard time understanding where the core language stops and the standard library starts. I've encountered a lot of operators and/or traits which seems to have a special relationship with the compiler. For example, Rust has a trait (which from what I understand is like an interface) called Deref which let's a type implementing it be de-referenced using the * operator:
fn main() {
let x = 5;
let y = Box::new(x);
assert_eq!(5, x);
assert_eq!(5, *y);
}
Another example is the ? operator, which seems to depend on the Result and Option types.
Can code that uses those operators can be compiled without the standard library? And if not, what parts of the Rust language are depending on the standard library? Is it even possible to compile any Rust code without it?

The Rust standard library is in fact separated into three distinct crates:
core, which is the glue between the language and the standard library. All types, traits and functions required by the language are found in this crate. This includes operator traits (found in core::ops), the Future trait (used by async fn), and compiler intrinsics. The core crate does not have any dependencies, so you can always use it.
alloc, which contains types and traits related to or requiring dynamic memory allocation. This includes dynamically allocated types such as Box<T>, Vec<T> and String.
std, which contains the whole standard library, including things from core and alloc but also things with further requirements, such as file system access, networking, etc.
If your environment does not provide the functionality required by the std crate, you can choose to compile without it. If your environment also does not provide dynamic memory allocation, you can choose to compile without the alloc crate as well. This option is useful for targets such as embedded systems or writing operating systems, where you usually won't have all of the things that the standard library usually requires.
You can use the #![no_std] attribute in the root of your crate to tell the compiler to compile without the standard library (only core). Many libraries also usually support "no-std" compilation (e.g. base64 and futures), where functionality may be restricted but it will work when compiling without the std crate.

DISCLAIMER: This is likely not the answer you're looking for. Consider reading the other answers about no_std, if you're trying to solve a problem. I suggest you only read on, if you're interested in trivia about the inner workings of Rust.
If you really want full control over the environment you use, it is possible to use Rust without the core library using the no_core attribute.
If you decide to do so, you will run into some problems, because the compiler is integrated with some items defined in core.
This integration works by applying the #[lang = "..."] attribute to those items, making them so called "lang items".
If you use no_core, you'll have to define your own lang items for the parts of the language you'll actually use.
For more information I suggest the following blog posts, which go into more detail on the topic of lang items and no_core:
Rust Tidbits: What Is a Lang Item?
Oxidizing the technical interview
So yes, in theory it is possible to run Rust code without any sort of standard library and supplied types, but only if you then supply the required types yourself.
Also this is not stable and will likely never be stabilized and it is generally not a recommended way of using Rust.

When you're not using std, you rely on core, which is a subset of the std library which is always (?) available. This is what's called a no_std environment, which is commonly used for some types of "embedded" programming. You can find more about no_std in the Rust Embedded book, including some guidance on how to get started with no_std programming.

How do I interpret or otherwise evaluate Rust at runtime?

I've been searching, and while this seems to be a much-wanted feature, all search results seem to be at least one year old.
What is the current state of this? Is there a good solution to evaluating arbitrary Rust code at runtime (like Haskell's hint)?
Maybe something can be done with Miri?

Miri (short for MIR Interpreter) is the de-facto interpreter of Rust code. It is what powers the compile-time function evaluation inside of rustc, the Rust compiler, but Miri is more featureful than what is currently used by compiler.
For experimentation purposes, Miri is also available in the Rust playground. It can be used to evaluate a particular run of a program, detecting if certain types of undefined behavior exist.
Miri does not provide a Rust REPL, but it may be part of creating such a tool.

Generic programming vs. Metaprogramming

What exactly is the difference? It seems like the terms can be used somewhat interchangeably, but reading the wikipedia entry for Objective-c, I came across:
In addition to C’s style of procedural
programming, C++ directly supports
certain forms of object-oriented
programming, generic programming, and
metaprogramming.
in reference to C++. So apparently they're different?

Programming: Writing a program that creates, transforms, filters, aggregates and otherwise manipulates data.
Metaprogramming: Writing a program that creates, transforms, filters, aggregates and otherwise manipulates programs.
Generic Programming: Writing a program that creates, transforms, filters, aggregates and otherwise manipulates data, but makes only the minimum assumptions about the structure of the data, thus maximizing reuse across a wide range of datatypes.
As was already mentioned in several other answers, the distinction can be confusing in C++, since both Generic Programming and (static/compile time) Metaprogramming are done with Templates. To confuse you even further, Generic Programming in C++ actually uses Metaprogramming to be efficient, i.e. Template Specialization generates specialized (fast) programs from generic ones.
Also note that, as every Lisp programmer knows, code and data are the same thing, so there really is no such thing as "metaprogramming", it's all just programming. Again, this is a bit hard to see in C++, since you actually use two completely different programming languages for programming (C++, an imperative, procedural, object-oriented language in the C family) and metaprogramming (Templates, a purely functional "accidental" language somewhere in between pure lambda calculus and Haskell, with butt-ugly syntax, since it was never actually intended to be a programming language.)
Many other languages use the same language for both programming and metaprogramming (e.g. Lisp, Template Haskell, Converge, Smalltalk, Newspeak, Ruby, Ioke, Seph).

Metaprogramming, in a broad sense, means writing programs that yield other programs. E.g. like templates in C++ produce actual code only when instantiated. One can interpret a template as a program that takes a type as an input and produces an actual function/class as an output. Preprocessor is another kind of metaprogramming. Another made-up example of metaprogramming:a program that reads an XML and produces some SQL scripts according to the XML. Again, in general, a metaprogram is a program that yields another program, whereas generic programming is about parametrized(usually with other types) types(including functions) .
EDITED after considering the comments to this answer

I would roughly define metaprogramming as "writing programs to write programs" and generic programming as "using language features to write functions, classes, etc. parameterized on the data types of arguments or members".
By this standard, C++ templates are useful for both generic programming (think vector, list, sort...) and metaprogramming (think Boost and e.g. Spirit). Furthermore, I would argue that generic programming in C++ (i.e. compile-time polymorphism) is accomplished by metaprogramming (i.e. code generation from templated code).

Generic programming usually refers to functions that can work with many types. E.g. a sort function, which can sort a collection of comparables instead of one sort function to sort an array of ints and another one to sort a vector of strings.
Metaprogramming refers to inspecting, modifying or creating classes, modules or functions programmatically.

Its best to look at other languages, because in C++, a single feature supports both Generic Programming and Metaprogramming. (Templates are very powerful).
In Scheme / Lisp, you can change the grammar of your code. People probably know Scheme as "that prefix language with lots of parenthesis", but it also has very powerful metaprogramming techniques (Hygenic Macros). In particular, try / catch can be created, and even the grammar can be manipulated to a point (For example, here is a prefix to infix converter if you don't want to write prefix code anymore: http://github.com/marcomaggi/nausicaa). This is accomplished through metaprogramming, code that writes code that writes code. This is useful for experimenting with new paradigms of programming (the AMB operator plays an important role in non-deterministic programming. I hope AMB will become mainstream in the next 5 years or so...)
In Java / C#, you can have generic programming through generics. You can write one generic class that supports the types of many other classes. For instance, in Java, you can use Vector to create a Vector of Integers. Or Vector if you want it specific to your own class.
Where things get strange, is that C++ templates are designed for generic programming. However, because of a few tricks, C++ templates themselves are turing-complete. Using these tricks, it is possible to add new features to the C++ language through meta-programming. Its convoluted, but it works. Here's an example which adds multiple dispatch to C++ through templates. http://www.eptacom.net/pubblicazioni/pub_eng/mdisp.html . The more typical example is Fibonacci at compile time: http://blog.emptycrate.com/node/271

Generic programming is a very simple form of metacoding albeit not usually runtime. It's more like the preprocessor in C and relates more to template programming in most use cases and basic implementations.
You'll find often in typed languages that you'll create a few implementations of something where only the type if different. In languages such as Java this can be especially painful since every class and interface is defining a new type.
You can generate those classes by converting them to a string literal then replacing the class name with a variable to interpolate.
Where generics are used in runtime it's a bit different, in that case it's simply variable programming, programming using variables.
The way to envisage that is simple, take to files, compare them and turn anything different into a variable. Now you have only one file that is reusable. You only have to specify what's different, hence the name variable.
How generics came about it that not everything can be made variable like the variable type you expect or a cast type. Often there would by a lot of file duplication where the only thing that was variable was the variable types. This was a very common source of duplication. Although there are ways around it or to mitigate it they aren't particularly convenient. Generics have come along as a kind of variable variable to allow making the variable type variable. Because the variable type is something normally expressing in the programming language that can now be specified in runtime it is also considered metacoding, albeit a very simple case.
The effect of not having variability where you need it is to unroll your variables, that is you are forced instead of having a variable to make an implementation for every possible would be variable value.
As you can imagine that is quite expensive. This would be very common when using any kind of reusage object storage library. These will accept any object but in most cases people only want to sore one type of objdct. If you put in a Shop object which extends Object then want to retrieve it, the method signature on the storage object will return simply Object but your code will expect a Shop object. This will break compilation with the downgrade of the object unless you cast it back up to shop. This raises another conundrum as without generics there is no way to check for compatibility and ensure the object you are storing is a Shop class.
Java avoids metaprogramming and tries to keep the language simple using OOP principles of polymorphism instead to make flexible code. However there are some pressing and reoccurring problems that through experience have presented themselves and are addressed with the addition of minimal metaprogramming facilities. Java does not want to be a metaprogramming language but sparingly imports concepts from there to solve the most nagging problems.
Programming languages that offer lavage metacoding facilities can be significantly more productive than languages than avoid it barring special cases, reflection, OOP polymorphism, etc. However it often also takes a lot more skill and expertise to generate un=nderstandable, maintaiable and bug free code. There is also often a performance penalty for such languages with C++ being a bit of an oddball because it is compiled to native.

Does "The whole language always available" hold in case of Clojure?

Ninth bullet point in Paul Graham's What Made Lisp Different says,
9. The whole language always available.
There is no real distinction between read-time, compile-time, and runtime. You can compile or run code while reading, read or run code while compiling, and read or compile code at runtime.
Running code at read-time lets users reprogram Lisp's syntax; running code at compile-time is the basis of macros; compiling at runtime is the basis of Lisp's use as an extension language in programs like Emacs; and reading at runtime enables programs to communicate using s-expressions, an idea recently reinvented as XML.
Does this last bullet point hold for Clojure?

You can mix runtime and compile-time freely in Clojure, although Common Lisp is still somewhat more flexible here (due to the presence of compiler macros and symbol macros and a fully supported macrolet; Clojure has an advantage in its cool approach to macro hygiene through automagic symbol resolution in syntax-quote). The reader is currently closed, so the free mixing of runtime, compile-time and read-time is not possible1.
1 Except through unsupported clever hacks.

It does hold,
(eval (read-string "(println \"Hello World!!\")"))
Hello World!!
nil
Just like emacs you can have your program configuration in Clojure, one project that I know Clojure for is static which allows you to have your template as a Clojure vector along with arbitrary code which will be executed at read time.

Non-C++ languages for generative programming?

C++ is probably the most popular language for static metaprogramming and Java doesn't support it.
Are there any other languages besides C++ that support generative programming (programs that create programs)?

The alternative to template style meta-programming is Macro-style that you see in various Lisp implementations. I would suggest downloading Paul Graham's On Lisp and also taking a look at Clojure if you're interested in a Lisp with macros that runs on the JVM.
Macros in Lisp are much more powerful than C/C++ style and constitute a language in their own right -- they are meant for meta-programming.

let me list a few important details about how metaprogramming works in lisp (or scheme, or slate, or pick your favorite "dynamic" language):
when doing metaprogramming in lisp you don't have to deal with two languages. the meta level code is written in the same language as the object level code it generates. metaprogramming is not limited to two levels, and it's easier on the brain, too.
in lisp you have the compiler available at runtime. in fact the compile-time/run-time distinction feels very artificial there and is very much subject to where you place your point of view. in lisp with a mere function call you can compile functions to machine instructions that you can use from then on as first class objects; i.e. they can be unnamed functions that you can keep in a local variable, or a global hashtable, etc...
macros in lisp are very simple: a bunch of functions stuffed in a hashtable and given to the compiler. for each form the compiler is about to compile, it consults that hashtable. if it finds a function then calls it at compile-time with the original form, and in place of the original form it compiles the form this function returns. (modulo some non-important details) so lisp macros are basically plugins for the compiler.
writing a lisp function in lisp that evaluates lisp code is about two pages of code (this is usually called eval). in such a function you have all the power to introduce whatever new rules you want on the meta level. (making it run fast is going to take some effort though... about the same as bootstrapping a new language... :)
random examples of what you can implement as a user library using lisp metaprogramming (these are actual examples of common lisp libraries):
extend the language with delimited continuations (hu.dwim.delico)
implement a js-to-lisp-rpc macro that you can use in javascript (which is generated from lisp). it expands into a mixture of js/lisp code that automatically posts (in the http request) all the referenced local variables, decodes them on the server side, runs the lisp code body on the server, and returns back the return value to the javascript side.
add prolog like backtracking to the language that very seamlessly integrates with "normal" lisp code (see screamer)
an XML templating extension to common lisp (includes an example of reader macros that are plugins for the lisp parser)
a ton of small DSL's, like loop or iterate for easy looping

Template metaprogramming is essentially abuse of the template mechanism. What I mean is that you get basically what you'd expect from a feature that was an unplanned side-effect --- it's a mess, and (although tools are getting better) a real pain in the ass because the language doesn't support you in doing it (I should note that my experience with state-of-the-art on this is out of date, since I essentially gave up on the approach. I've not heard of any great strides made, though)
Messing around with this in about '98 was what drove me to look for better solutions. I could write useful systems that relied on it, but they were hellish. Poking around eventually led me to Common Lisp. Sure, the template mechanism is Turing complete, but then again so is intercal.
Common Lisp does metaprogramming `right'. You have the full power of the language available while you do it, no special syntax, and because the language is very dynamic you can do more with it.
There are other options of course. No other language I've used does metaprogramming better than Lisp does, which is why I use it for research code. There are lots of reasons you might want to try something else though, but it's all going to be tradeoffs. You can look at Haskell/ML/OCaml etc. Lots of functional languages have something approaching the power of Lisp macros. You can find some .NET targeted stuff, but they're all pretty marginal (in terms of user base etc.). None of the big players in industrially used languages have anything like this, really.

Nemerle and Boo are my personal favorites for such things. Nemerle has a very elegant macro syntax, despite its poor documentation. Boo's documentation is excellent but its macros are a little less elegant. Both work incredibly well, however.
Both target .NET, so they can easily interoperate with C# and other .NET languages -- even Java binaries, if you use IKVM.
Edit: To clarify, I mean macros in the Lisp sense of the word, not C's preprocessor macros. These allow definition of new syntax and heavy metaprogramming at compiletime. For instance, Nemerle ships with macros that will validate your SQL queries against your SQL server at compiletime.

Nim is a relatively new programming language that has extensive support for static meta-programming and produces efficient (C++ like) compiled code.
http://nim-lang.org/
It supports compile-time function evaluation, lisp-like AST code transformations through macros, compile-time reflection, generic types that can be parametrized with arbitrary values, and term rewriting that can be used to create user-defined high-level type-aware peephole optimizations. It's even possible to execute external programs during the compilation process that can influence the code generation. As an example, consider talking to a locally running database server in order to verify that the ORM definition in your code (supplied through some DSL) matches the schema of the database.

The "D" programming language is C++-like but has much better metaprogramming support. Here's an example of a ray-tracer written using only compile-time metaprogramming:
Ctrace
Additionally, there is a gcc branch called "Concept GCC" that supports metaprogramming contructs that C++ doesn't (at least not yet).
Concept GCC

Common Lisp supports programs that write programs in several different ways.
1) Program data and program "abstract syntax tree" are uniform (S-expressions!)
2) defmacro
3) Reader macros.
4) MOP
Of these, the real mind-blower is MOP. Read "The Art of the Metaobject Protocol." It will change things for you, I promise!

I recommend Haskell. Here is a paper describing its compile-time metaprogramming capabilities.

Lots of work in Haskell: Domain Specific Languages (DSL's), Executable Specifications, Program Transformation, Partial Application, Staged Computation. Few links to get you started:
http://haskell.readscheme.org/appl.html
http://www.cse.unsw.edu.au/~dons/papers/SCKCB07.html
http://www.haskell.org/haskellwiki/Research_papers/Domain_specific_languages

The ML family of languages were designed specifically for this purpose. One of OCaml's most famous success stories is the FFTW library for high-performance FFTs that is C code generated almost entirely by an OCaml program.
Cheers,
Jon Harrop.

Most people try to find a language that has "ultimate reflection"
for self-inspection and something like "eval" for reifying new code.
Such languages are hard to find (LISP being a prime counterexample)
and they certainly aren't mainstream.
But another approach is to use a set of tools that can inspect,
generate, and manipulate program code. Jackpot is such a tool
focused on Java. http://jackpot.netbeans.org/
Our DMS software reengineering toolkit is
such a tool, that works on C, C++, C#, Java, COBOL, PHP,
Javascript, Ada, Verilog, VHDL and variety of other languages.
(It uses production quality front ends to enable it to read
all these langauges).
Better, it can do this with multiple languages at the same instant.
See http://www.semdesigns.com/Products/DMS/DMSToolkit.html
DMS succeeds because it provides a regular method and support infrastructure for complete access to the program structure as ASTs, and in most cases additional data such a symbol tables, type information, control and data flow analysis, all necessary to do sophisticated program manipulation.

'metaprogramming' is really a bad name for this specific feature, at least when you're discussing more than one language, since this feature is only needed for a narrow slice of languages that are:
static
compiled to machine language
heavily optimised for performance at compile time
extensible with user-defined data types (OOP in C++'s case)
hugely popular
take out any one of these, and 'static metaprogramming', just doesn't make sense. therefore, i would be surprised if any remotely mainstream language had something like that, as understood on C++.
of course, dynamic languages, and several functional languages support totally different concepts that could also be called metaprogramming.

Lisp supports a form of "metaprogramming", although not in the same sense as C++ template metaprogramming. Also, your term "static" could mean different things in this context, but Lisp also supports static typing, if that's what you mean.

The Meta-Language (ML), of course: http://cs.anu.edu.au/student/comp8033/ml.html

It does not matter what language you are using -- any of them is able to do Heterogeneous Generative Metaprogramming. Take any dynamic language such as Python or Clojure, or Haskell if you are a type-fan, and write models in this host language that are able to compile themself into some mainstream language you want or forced to use by your team/employer.
I found object graphs a good model for internal model representation. This graph can mix attributes and ordered subgraphs in a single node, which is native to attribute grammar and AST. So, object graph interpretation can be an effective layer between your host and target languages and can act itself as some sort of no-syntax language defined over data structures.
The closest model is an AST: describe AST trees in Python (host language) targets to C++ syntax (target language):
# from metaL import *
class Object:
def __init__(self, V):
self.val = V
self.slot = {}
self.nest = []
class Module(Object):
def cc(self):
c = '// \ %s\n' % self.head(test=True)
for i in self.nest:
c += i.cc()
c += '// / %s\n' % self.head(test=True)
return c
hello = Module('hello')
# <module:hello> #a04475a2
class Include(Object):
def cc(self):
return '#include <%s.h>\n' % self.val
stdlib = Include('stdlib')
hello // stdlib
# <module:hello> #b6efb657
# 0: <include:stdlib> #f1af3e21
class Fn(Object):
def cc(self):
return '\nvoid %s() {\n}\n\n' % self.val
main = Fn('main')
hello // main
print(hello.cc())
// \ <module:hello>
#include <stdlib.h>
void main() {
}
// / <module:hello>
But you are not limited with the level of abstraction of constructed object graph: you not only can freely add your own types but object graph can interpret itself, can thus can modify itself the same way as lists in a Lisp can do.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string