What is the difference between macros and functions in Rust? - rust

Quoted from the Rust blog:
One last thing to mention: Rust’s macros are significantly different from C macros, if you’ve used those
What is the difference between macros and function in Rust? How is it different from C?

Keep on reading the documentation, specifically the chapter on macros!
Rust functions vs Rust macros
Macros are executed at compile time. They generally expand into new pieces of code that the compiler will then need to further process.
Rust macros vs C macros
The biggest difference to me is that Rust macros are hygenic. The book has an example that explains what hygiene prevents, and also says:
Each macro expansion happens in a distinct ‘syntax context’, and each variable is tagged with the syntax context where it was introduced.
It uses this example:
For example, this C program prints 13 instead of the expected 25.
#define FIVE_TIMES(x) 5 * x
int main() {
printf("%d\n", FIVE_TIMES(2 + 3));
return 0;
}
Beyond that, Rust macros
Can be distributed with the compiled code
Can be overloaded in argument counts
Can match on syntax patterns like braces or parenthesis or commas
Can require a repeated input pattern
Can be recursive
Operate at the syntax level, not the text level

Quoting from the Rust documentation:
The Difference Between Macros and Functions
Fundamentally, macros are a way of writing code that writes other code, which
is known as metaprogramming. In Appendix C, we discuss the derive
attribute, which generates an implementation of various traits for you. We’ve
also used the println! and vec! macros throughout the book. All of these
macros expand to produce more code than the code you’ve written manually.
Metaprogramming is useful for reducing the amount of code you have to write and
maintain, which is also one of the roles of functions. However, macros have
some additional powers that functions don’t.
A function signature must declare the number and type of parameters the
function has. Macros, on the other hand, can take a variable number of
parameters: we can call println!("hello") with one argument or
println!("hello {}", name) with two arguments. Also, macros are expanded
before the compiler interprets the meaning of the code, so a macro can, for
example, implement a trait on a given type. A function can’t, because it gets
called at runtime and a trait needs to be implemented at compile time.
The downside to implementing a macro instead of a function is that macro
definitions are more complex than function definitions because you’re writing
Rust code that writes Rust code. Due to this indirection, macro definitions are
generally more difficult to read, understand, and maintain than function
definitions.
Another important difference between macros and functions is that you must
define macros or bring them into scope before you call them in a file, as
opposed to functions you can define anywhere and call anywhere.

In macro, you can take variable number of parameters.
In function you have to define number and type of parameters.

Related

What is the difference between Syntactic macros and Procedural macros?

What is the difference between procedural macros and syntactic macros? Rust refers to its macro system as procedural, but I've seen language articles refer to a system like the rust macro system as syntactic macros. Syntactic macros would appear to have access to all or part of the AST when parsing them. Which appears to be what Rust has.
Rust macros are syntactic; they work on the AST level.
What might be tripping you up in terminology is Rust has two flavors of macros that differ in how they are written and how they can be used. There are declarative macros (also called "macros by example") that are created by invocations of macro_rules!. And there are procedural macros, which are written as functions that handle TokenStreams as input and output (can be used as attributes, in derives, or like functions).
See also:
Macros in the Rust Book
Procedural Macros and Macros by Example in the Rust Reference

Is it possible to write something as complex as `print!` in a pure Rust macro?

I am starting out learning Rust macros, but the documentation is somewhat limited. Which is fine — they're an expert feature, I guess. While I can do basic code generation, implementation of traits, and so on, some of the built-in macros seem well beyond that, such as the various print macros, which examine a string literal and use that for code expansion.
I looked at the source for print! and it calls another macro called format_args. Unfortunately this doesn't seem to be built in "pure Rust" the comment just says "compiler built-in."
Is it possible to write something as complex as print! in a pure Rust macro? If so, how would it be done?
I'm actually interested in building a "compile time trie" -- basically recognizing certain fixed strings as "keywords" fixed at compile time. This would be performant (probably) but mostly I'm just interested in code generation.
format_args is implemented in the compiler itself, in the libsyntax_ext crate. The name is registered in the register_builtins function, and the code to process it has its entry point in the expand_format_args function.
Macros that do such detailed syntax processing cannot be defined using the macro_rules! construct. They can be defined with a procedural macro; however, this feature is currently unstable (can only be used with the nightly compiler and is subject to sudden and unannounced changes) and rather sparsely documented.
Rust macros cannot parse string literals, so it's not possible to create a direct Rust equivalent of format_args!.
What you could do is to use a macro to transform the function-call-like syntax into something that represents the variadic argument list in the Rust type system in some way (say, as a heterogeneous single-linked list, or a builder type). This can then be passed to a regular Rust function, along with the format string. But you will not be able to implement compile-time type checking of the format string this way.

Multiple specialization, iterator patterns in Rust

Learning Rust (yay!) and I'm trying to understand the intended idiomatic programming required for certain iterator patterns, while scoring top performance. Note: not Rust's Iterator trait, just a method I've written accepting a closure and applying it to some data I'm pulling off of disk / out of memory.
I was delighted to see that Rust (+LLVM?) took an iterator I had written for sparse matrix entries, and a closure for doing sparse matrix vector multiplication, written as
iterator.map_edges({ |x, y| dst[y] += src[x] });
and inlined the closure's body in the generated code. It went quite fast. :D
If I create two of these iterators, or use the first a second time (not a correctness issue) each instance slows down quite a lot (about 2x in this case), presumably because the optimizer no longer chooses to do specialization because of the multiple call sites, and you end up doing a function call for each element.
I'm trying to understand if there are idiomatic patterns that keep the pleasant experience above (I like it, at least) without sacrificing the performance. My options seem to be (none satisfying this constraint):
Accept dodgy performance (2x slower is not fatal, but no prizes either).
Ask the user to supply a batch-oriented closure, so acting on an iterator over a small batch of data. This exposes a bit much of the internals of the iterator (the data are compressed nicely, and the user needs to know how to unwrap them, or the iterator needs to stage an unwrapped batch in memory).
Make map_edges generic in a type implementing a hypothetical EdgeMapClosure trait, and ask the user to implement such a type for each closure they want to inline. Not tested, but I would guess this exposes distinct methods to LLVM, each of which get nicely inlined. Downside is that the user has to write their own closure (packing relevant state up, etc).
Horrible hacks, like make distinct methods map_edges0, map_edges1, ... . Or add a generic parameter the programmer can use to make the methods distinct, but which is otherwise ignored.
Non-solutions include "just use for pair in iterator.iter() { /* */ }"; this is prep work for a data/task-parallel platform, and I would like to be able to capture/move these closures to work threads rather than capturing the main thread's execution. Maybe the pattern I should be using is to write the above, put it in a lambda/closure, and ship it around instead?
In a perfect world, it would be great to have a pattern which causes each occurrence of map_edges in the source file to result in different specialized methods in the binary, without forcing the entire project to be optimized at some scary level. I'm coming out of an unpleasant relationship with managed languages and JITs where generics would be the only way (I know of) to get this to happen, but Rust and LLVM seem magical enough that I thought there might be a good way. How do Rust's iterators handle this to inline their closure bodies? Or don't they (they should!)?
It seems that the problem is resolved by Rust's new approach to closures outlined at
http://smallcultfollowing.com/babysteps/blog/2014/11/26/purging-proc/
In short, Option 3 above (make functions generic with respect to a new closure type) is now transparently implemented when you make an implementation generic using the new closure traits. Rust produces the type behind the scenes for you.

Haskell FFI - C struct array data fields

I'm in the process of working on haskell bindings for a native library with a pretty complex interface. It has a lot of structs as part of its interface, and I've been working on building interfaces to them with hsc2hs and the bindings-DSL package for helping automate struct bindings.
One problem I've run into, though, is with structs that contain multidimensional arrays. The bindings-DSL documentation describes macros for binding to a structure like
struct with_array {
char v[5];
struct test *array_pointer;
struct test proper_array[10];
};
with macros like
#starttype struct with_array
#array_field v , CChar
#field array_pointer , Ptr <test>
#array_field proper_array , <test>
#stoptype
But this library has many structs with multidimensional arrays as fields, more like
struct with_multidimensional_array {
int whatever;
struct something big_array[10][25][500];
};
The #array_field macro seems to only handle the first dimension of the array. Is it the case that bindings-DSL just doesn't have a macro for handling multidimensional arrays?
I'd really like a macro for binding a (possibly-multidimensional) array to a StorableArray of arbitrary indexes. Seems like the necessary information is possible in the macros bindings-DSL provides - there's just no macro for this.
Has anyone added macros to bindings-DSL? Has anyone added a macro for this to bindings-DSL? Am I way past what I should be doing with hsc2hs, and there's some other tool that would help me do what I want in a more succinct way?
Well, no one's come up with anything else, so I'll go with the idea in my comment. I'll use the #field macro instead of the #array_field macro, and specify a type that wraps StorableArray to work correctly.
Since I was thinking about this quite a bit, I realized that it was possible to abstract out the wrapper entirely, using the new type-level numbers that GHC 7.6+ support. I put together a package called storable-static-array that takes dimensions on the type level and provides a proper Storable instance for working with native arrays, even multidimensional ones.
One thing that's still missing, that I would like greatly, is to find a way to write a bindings-DSL compatible macro that automatically extracts dimensions and takes care of generating them properly. A short glance at the macros in bindings-DSL, though, convinced me that I don't know nearly enough to manage it myself.
The #array_field macro handles arrays with any dimension. Documentation has been updated to show that explicitly.
The Haskell equivalent record will be a list. When peeking and poking, the length and order of the elements of that list will correspond to the array as it were considered as a one-dimensional array in C. So, a field int example[2][3] would correspond to a list with 6 elements ordered as example[0][0], example[0][1], example[0][2], example[1][0], example[1][1], example[1][2]. When poking, if the list has more than 6 elements, only the first 6 would be used.
This design was choosen for consistency with peekArray and pokeArray from FFI standard library. Before version 1.0.17 of bindings-DSL there was a bug that caused the size of that list to be underestimated when array fields had dimension bigger than 1.

why do some languages require function to be declared in code before calling?

Suppose you have this pseudo-code
do_something();
function do_something(){
print "I am saying hello.";
}
Why do some programming languages require the call to do_something() to appear below the function declaration in order for the code to run?
Programming languages use a symbol table to hold the various classes, functions, etc. that are used in the source code. Some languages compile in a single pass, whereby the symbols are pulled out of the symbol table as soon as they are used. Others use two passes, where the first pass is used to populate the table, and then the second is used to find the entries.
Most languages with a static type system are designed to require definition before use, which means there must be some sort of declaration of a function before the call so that the call can be checked (e.g., is the function getting the right number and types of arguments). This sort of design helps both a person and a compiler reading the program: everything you see has already been defined. The ease of reading and the popularity of one-pass compilers may explain the popularity of this design rule.
Unfortunately definition before use does not play well with mutual recursion, and so language designers resorted to an ugly hack whereby you have
Declaration (sometimes called a "forward declaration" from the keyword in Pascal)
Use
Definition
You see the same phenomenon at the type level in C in the form of the "incomplete struct declaration."
Around 1990 some language designers figured out that the one-pass compiler with no abstract-syntax tree should be a thing of the past, and two very nice designs from that era—Modula-3 and Haskell got rid of definition before use: in those languages, any defined function or variable is visible throughout its scope, including parts of the program textually before the definition. In other words, mutual recursion is the default for both types and functions. Good on them, I say—these languages have no ugly and unnecessary forward declarations.
Why [have definition before use]?
Easy to write a one-pass compiler in 1975.
without definition before use, you have to think harder about mutual recursion, especially mutually recursive type definitions.
Some people think it makes it easier for a person to read the code.

Resources