What is the difference between Syntactic macros and Procedural macros? - rust

What is the difference between procedural macros and syntactic macros? Rust refers to its macro system as procedural, but I've seen language articles refer to a system like the rust macro system as syntactic macros. Syntactic macros would appear to have access to all or part of the AST when parsing them. Which appears to be what Rust has.

Rust macros are syntactic; they work on the AST level.
What might be tripping you up in terminology is Rust has two flavors of macros that differ in how they are written and how they can be used. There are declarative macros (also called "macros by example") that are created by invocations of macro_rules!. And there are procedural macros, which are written as functions that handle TokenStreams as input and output (can be used as attributes, in derives, or like functions).
See also:
Macros in the Rust Book
Procedural Macros and Macros by Example in the Rust Reference

Related

which source file in rust language source refers_to/parse language keywords

I want to change keyword spellings in Rust language.
For example, for -> phur, so where should I look in rustc source code?
Thanks in advance.
There isn't just the one place you'd need to change.
The keywords themselves are defined in src/librustc_span/symbol.rs but many things in the compiler internals rely on those keyword not to change, including:
macro expansion
Any kind of syntactic sugar relying on the quote! family of macros.
pretty printing
the test suite

Is it possible to write something as complex as `print!` in a pure Rust macro?

I am starting out learning Rust macros, but the documentation is somewhat limited. Which is fine — they're an expert feature, I guess. While I can do basic code generation, implementation of traits, and so on, some of the built-in macros seem well beyond that, such as the various print macros, which examine a string literal and use that for code expansion.
I looked at the source for print! and it calls another macro called format_args. Unfortunately this doesn't seem to be built in "pure Rust" the comment just says "compiler built-in."
Is it possible to write something as complex as print! in a pure Rust macro? If so, how would it be done?
I'm actually interested in building a "compile time trie" -- basically recognizing certain fixed strings as "keywords" fixed at compile time. This would be performant (probably) but mostly I'm just interested in code generation.
format_args is implemented in the compiler itself, in the libsyntax_ext crate. The name is registered in the register_builtins function, and the code to process it has its entry point in the expand_format_args function.
Macros that do such detailed syntax processing cannot be defined using the macro_rules! construct. They can be defined with a procedural macro; however, this feature is currently unstable (can only be used with the nightly compiler and is subject to sudden and unannounced changes) and rather sparsely documented.
Rust macros cannot parse string literals, so it's not possible to create a direct Rust equivalent of format_args!.
What you could do is to use a macro to transform the function-call-like syntax into something that represents the variadic argument list in the Rust type system in some way (say, as a heterogeneous single-linked list, or a builder type). This can then be passed to a regular Rust function, along with the format string. But you will not be able to implement compile-time type checking of the format string this way.

What is the difference between macros and functions in Rust?

Quoted from the Rust blog:
One last thing to mention: Rust’s macros are significantly different from C macros, if you’ve used those
What is the difference between macros and function in Rust? How is it different from C?
Keep on reading the documentation, specifically the chapter on macros!
Rust functions vs Rust macros
Macros are executed at compile time. They generally expand into new pieces of code that the compiler will then need to further process.
Rust macros vs C macros
The biggest difference to me is that Rust macros are hygenic. The book has an example that explains what hygiene prevents, and also says:
Each macro expansion happens in a distinct ‘syntax context’, and each variable is tagged with the syntax context where it was introduced.
It uses this example:
For example, this C program prints 13 instead of the expected 25.
#define FIVE_TIMES(x) 5 * x
int main() {
printf("%d\n", FIVE_TIMES(2 + 3));
return 0;
}
Beyond that, Rust macros
Can be distributed with the compiled code
Can be overloaded in argument counts
Can match on syntax patterns like braces or parenthesis or commas
Can require a repeated input pattern
Can be recursive
Operate at the syntax level, not the text level
Quoting from the Rust documentation:
The Difference Between Macros and Functions
Fundamentally, macros are a way of writing code that writes other code, which
is known as metaprogramming. In Appendix C, we discuss the derive
attribute, which generates an implementation of various traits for you. We’ve
also used the println! and vec! macros throughout the book. All of these
macros expand to produce more code than the code you’ve written manually.
Metaprogramming is useful for reducing the amount of code you have to write and
maintain, which is also one of the roles of functions. However, macros have
some additional powers that functions don’t.
A function signature must declare the number and type of parameters the
function has. Macros, on the other hand, can take a variable number of
parameters: we can call println!("hello") with one argument or
println!("hello {}", name) with two arguments. Also, macros are expanded
before the compiler interprets the meaning of the code, so a macro can, for
example, implement a trait on a given type. A function can’t, because it gets
called at runtime and a trait needs to be implemented at compile time.
The downside to implementing a macro instead of a function is that macro
definitions are more complex than function definitions because you’re writing
Rust code that writes Rust code. Due to this indirection, macro definitions are
generally more difficult to read, understand, and maintain than function
definitions.
Another important difference between macros and functions is that you must
define macros or bring them into scope before you call them in a file, as
opposed to functions you can define anywhere and call anywhere.
In macro, you can take variable number of parameters.
In function you have to define number and type of parameters.

Generic programming vs. Metaprogramming

What exactly is the difference? It seems like the terms can be used somewhat interchangeably, but reading the wikipedia entry for Objective-c, I came across:
In addition to C’s style of procedural
programming, C++ directly supports
certain forms of object-oriented
programming, generic programming, and
metaprogramming.
in reference to C++. So apparently they're different?
Programming: Writing a program that creates, transforms, filters, aggregates and otherwise manipulates data.
Metaprogramming: Writing a program that creates, transforms, filters, aggregates and otherwise manipulates programs.
Generic Programming: Writing a program that creates, transforms, filters, aggregates and otherwise manipulates data, but makes only the minimum assumptions about the structure of the data, thus maximizing reuse across a wide range of datatypes.
As was already mentioned in several other answers, the distinction can be confusing in C++, since both Generic Programming and (static/compile time) Metaprogramming are done with Templates. To confuse you even further, Generic Programming in C++ actually uses Metaprogramming to be efficient, i.e. Template Specialization generates specialized (fast) programs from generic ones.
Also note that, as every Lisp programmer knows, code and data are the same thing, so there really is no such thing as "metaprogramming", it's all just programming. Again, this is a bit hard to see in C++, since you actually use two completely different programming languages for programming (C++, an imperative, procedural, object-oriented language in the C family) and metaprogramming (Templates, a purely functional "accidental" language somewhere in between pure lambda calculus and Haskell, with butt-ugly syntax, since it was never actually intended to be a programming language.)
Many other languages use the same language for both programming and metaprogramming (e.g. Lisp, Template Haskell, Converge, Smalltalk, Newspeak, Ruby, Ioke, Seph).
Metaprogramming, in a broad sense, means writing programs that yield other programs. E.g. like templates in C++ produce actual code only when instantiated. One can interpret a template as a program that takes a type as an input and produces an actual function/class as an output. Preprocessor is another kind of metaprogramming. Another made-up example of metaprogramming:a program that reads an XML and produces some SQL scripts according to the XML. Again, in general, a metaprogram is a program that yields another program, whereas generic programming is about parametrized(usually with other types) types(including functions) .
EDITED after considering the comments to this answer
I would roughly define metaprogramming as "writing programs to write programs" and generic programming as "using language features to write functions, classes, etc. parameterized on the data types of arguments or members".
By this standard, C++ templates are useful for both generic programming (think vector, list, sort...) and metaprogramming (think Boost and e.g. Spirit). Furthermore, I would argue that generic programming in C++ (i.e. compile-time polymorphism) is accomplished by metaprogramming (i.e. code generation from templated code).
Generic programming usually refers to functions that can work with many types. E.g. a sort function, which can sort a collection of comparables instead of one sort function to sort an array of ints and another one to sort a vector of strings.
Metaprogramming refers to inspecting, modifying or creating classes, modules or functions programmatically.
Its best to look at other languages, because in C++, a single feature supports both Generic Programming and Metaprogramming. (Templates are very powerful).
In Scheme / Lisp, you can change the grammar of your code. People probably know Scheme as "that prefix language with lots of parenthesis", but it also has very powerful metaprogramming techniques (Hygenic Macros). In particular, try / catch can be created, and even the grammar can be manipulated to a point (For example, here is a prefix to infix converter if you don't want to write prefix code anymore: http://github.com/marcomaggi/nausicaa). This is accomplished through metaprogramming, code that writes code that writes code. This is useful for experimenting with new paradigms of programming (the AMB operator plays an important role in non-deterministic programming. I hope AMB will become mainstream in the next 5 years or so...)
In Java / C#, you can have generic programming through generics. You can write one generic class that supports the types of many other classes. For instance, in Java, you can use Vector to create a Vector of Integers. Or Vector if you want it specific to your own class.
Where things get strange, is that C++ templates are designed for generic programming. However, because of a few tricks, C++ templates themselves are turing-complete. Using these tricks, it is possible to add new features to the C++ language through meta-programming. Its convoluted, but it works. Here's an example which adds multiple dispatch to C++ through templates. http://www.eptacom.net/pubblicazioni/pub_eng/mdisp.html . The more typical example is Fibonacci at compile time: http://blog.emptycrate.com/node/271
Generic programming is a very simple form of metacoding albeit not usually runtime. It's more like the preprocessor in C and relates more to template programming in most use cases and basic implementations.
You'll find often in typed languages that you'll create a few implementations of something where only the type if different. In languages such as Java this can be especially painful since every class and interface is defining a new type.
You can generate those classes by converting them to a string literal then replacing the class name with a variable to interpolate.
Where generics are used in runtime it's a bit different, in that case it's simply variable programming, programming using variables.
The way to envisage that is simple, take to files, compare them and turn anything different into a variable. Now you have only one file that is reusable. You only have to specify what's different, hence the name variable.
How generics came about it that not everything can be made variable like the variable type you expect or a cast type. Often there would by a lot of file duplication where the only thing that was variable was the variable types. This was a very common source of duplication. Although there are ways around it or to mitigate it they aren't particularly convenient. Generics have come along as a kind of variable variable to allow making the variable type variable. Because the variable type is something normally expressing in the programming language that can now be specified in runtime it is also considered metacoding, albeit a very simple case.
The effect of not having variability where you need it is to unroll your variables, that is you are forced instead of having a variable to make an implementation for every possible would be variable value.
As you can imagine that is quite expensive. This would be very common when using any kind of reusage object storage library. These will accept any object but in most cases people only want to sore one type of objdct. If you put in a Shop object which extends Object then want to retrieve it, the method signature on the storage object will return simply Object but your code will expect a Shop object. This will break compilation with the downgrade of the object unless you cast it back up to shop. This raises another conundrum as without generics there is no way to check for compatibility and ensure the object you are storing is a Shop class.
Java avoids metaprogramming and tries to keep the language simple using OOP principles of polymorphism instead to make flexible code. However there are some pressing and reoccurring problems that through experience have presented themselves and are addressed with the addition of minimal metaprogramming facilities. Java does not want to be a metaprogramming language but sparingly imports concepts from there to solve the most nagging problems.
Programming languages that offer lavage metacoding facilities can be significantly more productive than languages than avoid it barring special cases, reflection, OOP polymorphism, etc. However it often also takes a lot more skill and expertise to generate un=nderstandable, maintaiable and bug free code. There is also often a performance penalty for such languages with C++ being a bit of an oddball because it is compiled to native.

why do some languages require function to be declared in code before calling?

Suppose you have this pseudo-code
do_something();
function do_something(){
print "I am saying hello.";
}
Why do some programming languages require the call to do_something() to appear below the function declaration in order for the code to run?
Programming languages use a symbol table to hold the various classes, functions, etc. that are used in the source code. Some languages compile in a single pass, whereby the symbols are pulled out of the symbol table as soon as they are used. Others use two passes, where the first pass is used to populate the table, and then the second is used to find the entries.
Most languages with a static type system are designed to require definition before use, which means there must be some sort of declaration of a function before the call so that the call can be checked (e.g., is the function getting the right number and types of arguments). This sort of design helps both a person and a compiler reading the program: everything you see has already been defined. The ease of reading and the popularity of one-pass compilers may explain the popularity of this design rule.
Unfortunately definition before use does not play well with mutual recursion, and so language designers resorted to an ugly hack whereby you have
Declaration (sometimes called a "forward declaration" from the keyword in Pascal)
Use
Definition
You see the same phenomenon at the type level in C in the form of the "incomplete struct declaration."
Around 1990 some language designers figured out that the one-pass compiler with no abstract-syntax tree should be a thing of the past, and two very nice designs from that era—Modula-3 and Haskell got rid of definition before use: in those languages, any defined function or variable is visible throughout its scope, including parts of the program textually before the definition. In other words, mutual recursion is the default for both types and functions. Good on them, I say—these languages have no ugly and unnecessary forward declarations.
Why [have definition before use]?
Easy to write a one-pass compiler in 1975.
without definition before use, you have to think harder about mutual recursion, especially mutually recursive type definitions.
Some people think it makes it easier for a person to read the code.

Resources