Why does Rust have both call by value and call by reference?

Why does Rust have both call by value and call by reference? - rust

Some languages, like Haskell, make no distinction between pass-by-value and pass-by-reference. The compiler can then approximately choose the most efficient calling convention with a heuristic. One example heuristic would be for the Linux x64 ABI: if the size of parameter is greater than 16 bytes, pass a pointer to the stack otherwise pass the value in registers.
What is the advantage of keeping both notions of pass-by-value and pass-by-reference (non-mutable of course) in Rust and forcing the user to choose?
Could it be the case that pass-by-value is syntactic sugar for pass-by-reference + copy if the value is seen to be modified?

Two things:
Rust will transform certain pass-by-value calls into pass-by-reference, based on a similar heuristic.
Pass-by-value indicates ownership transfer, while pass-by-reference indicates borrowing. These are very different, and totally orthogonal from the asm-level concern you're asking about.
In other words, in Rust, these two forms have different semantics. That doesn't preclude also doing the optimization, though.

[Edited: changed exampled to work in release mode]
It isn't syntactic sugar, as one can see by looking at the generated code.
Given these functions:
fn by_value(v: (u64, u64)) -> u64 {
v.0 + v.1
}
fn by_ref(v: &(u64, u64)) -> u64 {
v.0 + v.1
}
then if one was syntactic sugar for another, we'd expect them to generate identical assembly code, or at least identical calling conventions. But actually, we find that by_ref passes v in the rdi and rsi registers, whereas by_value passes a pointer to v in the rdi register and has to follow that pointer to get the value: (see details, use release mode):
by_value:
movq 8(%rdi), %rax
addq (%rdi), %rax
retq
by_ref:
leaq (%rdi,%rsi), %rax
retq

Related

How to break on Err creation

When debugging a Rust program is it possible to break execution when an Err() value is created?
This would serve the same purpose as breaking on exceptions in other languages (C++, Javascript, Java, etc.) in that it shows you the actual source of the error, rather than just the place where you unwrapped it, which is not usually very useful.
I'm using LLDB but interested in answers for any debugger. The Err I am interested in is generated deep in Serde so I cannot really modify any of the code.

I'll try give this one a shot.
I believe you want to accomplish is incompatible with how the (current) "one true Rust implementation" is currently constructed and its take on "enum constructors" without some serious hacks -- and I'll give my best inference about why (as of the time of writing - Thu Sep 22 00:58:49 UTC 2022), and give you some ideas and options.
Breaking it down: finding definitions
"What happens when you "construct" an enum, anyways...?"
As Rust does not have a formal language standard or specification document, its "semantics" are not particularly precisely defined, so there is no "legal" text to really provide the "Word of God" or final authority on this topic.
So instead, let's refer to community materials and some code:
Constructors - The Rustonomicon
There is exactly one way to create an instance of a user-defined type: name it, and initialize all its fields at once:
...
That's it. Every other way you make an instance of a type is just calling a totally vanilla function that does some stuff and eventually bottoms out to The One True Constructor.
Unlike C++, Rust does not come with a slew of built-in kinds of constructor. There are no Copy, Default, Assignment, Move, or whatever constructors. The reasons for this are varied, but it largely boils down to Rust's philosophy of being explicit.
Move constructors are meaningless in Rust because we don't enable types to "care" about their location in memory. Every type must be ready for it to be blindly memcopied to somewhere else in memory. This means pure on-the-stack-but-still-movable intrusive linked lists are simply not happening in Rust (safely).
In comparison to C++'s better-specified semantics for both enum class constructors and std::Variant<T...> (its closest analogue to Rust enum), Rust does not really say anything about "enum constructors" in-specific except that it's just part of "The One True Constructor."
The One True Constructor is not really a well-specified Rust concept. It's not really commonly used in any of its references or books, and it's not a general programming language theory concept (at least, by that exact name -- it's most-likely referring to type constructors, which we'll get to) -- but you can eke out its meaning by reading more and comparison to the programming languages that Rust takes direct inspiration from.
In fact, where C++ might have move, copy, placement new and other types of constructors, Rust simply has a sort of universal "dumb value constructor" for all values (like struct and enum) that does not have special operational semantics besides something like "create the value, wherever it might be stored in memory".
But that's not very precise at all. What if we try to look at the definition of an enum?
Defining an Enum - The Rust Programming Language
...
We attach data to each variant of the enum directly, so there is no need for an extra struct. Here it’s also easier to see another detail of how enums work: the name of each enum variant that we define also becomes a function that constructs an instance of the enum. That is, IpAddr::V4() is a function call that takes a String argument and returns an instance of the IpAddr type. We automatically get this constructor function defined as a result of defining the enum.
Aha! They dropped the words "constructor function" -- so it's pretty much something like a fn(T, ...) -> U or something? So is it some sort of function? Well, as a generally introductory text to Rust, The Rust Programming Language book can be thought as less "technical" and "precise" than The Rust Reference:
Enumerated types - The Rust Reference
An enumerated type is a nominal, heterogeneous disjoint union type, denoted by the name of an enum item. ^1 ...
...
Enum types cannot be denoted structurally as types, but must be denoted by named reference to an enum item.
...
Most of this is pretty standard -- most modern programming languages have "nomimal types" (the type identifier matters for type comparison) -- but the footnote here is the interesting part:
The enum type is analogous to a data constructor declaration in ML, or a pick ADT in Limbo.
This is a good lead! Rust is known for taking a large amount of inspiration from functional programming languages, which are much closer to the mathematical foundations of programming languages.
ML is a whole family of functional programming languages (e.g. OCaml, Standard ML, F#, and sometimes Haskell) and is considered one of the important defining language-families within the functional programming language space.
Limbo is an older concurrent programming language with support for abstract data types, of which enum is one of.
Both are strongly-rooted in the functional programming language space.
Summary: Rust enum in Functional Programming / Programming Language Theory
For brevity, I'll omit quotes and give a summary of the formal programming language theory behind Rust enum's.
Rust enum's are theoretically known as "tagged unions" or "sum types" or "variants".
Functional programming and mathematical type theory place a strong emphasis on modeling computation as basically "changes in typed-value structure" versus "changes in data state".
So, in object-oriented programming where "everything is an [interactable] object" that then send messages or interact with each other...
-- in functional programming, "everything is a pure [non-mutative] value" that is then "transformed" without side effects by "mathematically-pure functions" .
In fact, type theory goes as far as to say "everything is a type" -- they'll do stuff like mock-up the natural numbers by constructing some sort of mathematical recursive type that has properties like the natural numbers.
To construct "[typed] values" as "structures," mathematical type theory defines a fundamental concept called a "type constructor" -- and you can think of type constructors as being just a Rust () and compositions of such.
So functional/mathematical type constructors are not intended to "execute" or have any other behavior. They are simply there to "purely construct the structure of pure data."
Conclusion: "Rust doesn't want you to inject a breakpoint into data"
Per Rust's theoretical roots and inspiring influences, Rust enum type constructors are meant to be functional and only to wrap and create type-tagged data.
In other words, Rust doesn't really want to allow you to "inject" arbitrary logic into type constructors (unlike C++, which has a whole slew of semantics regarding side effects in constructors, such as throwing exceptions, etc.).
They want to make injecting a breakpoint into Err(T) sort of like injecting a breakpoint into 1 or as i32. Err(T) is more of a "data primitive" rather than a "transforming function/computation" like if you were to call foo(123).
In Code: why it's probably hard to inject a breakpoint in Err().
Let's start by looking at the definition of Err(T) itself.
The Definition of std::result::Result::Err()
Here's is where you can find the definition of Err() directly from rust-lang/rust/library/core/src/result.rs # v1.63.0 on GitHub:
// `Result` is a type that represents either success ([`Ok`]) or failure ([`Err`]).
///
/// See the [module documentation](self) for details.
#[derive(Copy, PartialEq, PartialOrd, Eq, Ord, Debug, Hash)]
#[must_use = "this `Result` may be an `Err` variant, which should be handled"]
#[rustc_diagnostic_item = "Result"]
#[stable(feature = "rust1", since = "1.0.0")]
pub enum Result<T, E> {
/// Contains the success value
#[lang = "Ok"]
#[stable(feature = "rust1", since = "1.0.0")]
Ok(#[stable(feature = "rust1", since = "1.0.0")] T),
/// Contains the error value
#[lang = "Err"]
#[stable(feature = "rust1", since = "1.0.0")]
Err(#[stable(feature = "rust1", since = "1.0.0")] E),
}
Err() is just a sub-case of the greater enum std::result::Result<T, E> -- and this means that Err() is not a function, but more of like a "data tagging constructor".
Err(T) in assembly is meant to be optimized out completely
Let's use Godbolt to breakdown usage of std::result::Result::<T, E>::Err(E): https://rust.godbolt.org/z/oocqGj5cd
// Type your code here, or load an example.
pub fn swap_err_ok(r: Result<i32, i32>) -> Result<i32, i32> {
let swapped = match r {
Ok(i) => Err(i),
Err(e) => Ok(e),
};
return swapped;
}
example::swap_err_ok:
sub rsp, 16
mov dword ptr [rsp], edi
mov dword ptr [rsp + 4], esi
mov eax, dword ptr [rsp]
test rax, rax
je .LBB0_2
jmp .LBB0_5
.LBB0_5:
jmp .LBB0_3
ud2
.LBB0_2:
mov eax, dword ptr [rsp + 4]
mov dword ptr [rsp + 12], eax
mov dword ptr [rsp + 8], 1
jmp .LBB0_4
.LBB0_3:
mov eax, dword ptr [rsp + 4]
mov dword ptr [rsp + 12], eax
mov dword ptr [rsp + 8], 0
.LBB0_4:
mov eax, dword ptr [rsp + 8]
mov edx, dword ptr [rsp + 12]
add rsp, 16
ret
Here is the (unoptimized) assembly code that corresponds to the line Ok(i) => Err(i), that constructs the Err:
mov dword ptr [rsp + 12], eax
mov dword ptr [rsp + 8], 1
and Err(e) is basically optimized out if you optimized with -C optlevel=3:
example::swap_err_ok:
mov edx, esi
xor eax, eax
test edi, edi
sete al
ret
Unlike in C++, where C++ leaves room to allow for injection of arbitrary logic in constructors and to even to represent actions like locking a mutex, Rust discourages this in the name of optimization.
Rust is designed to discourage inserting computation in type constructor calls -- and, in fact, if there is no computation associated with a constructor, it should have no operational value or action at the machine-instruction level.
Is there any way this is possible?
If you're still here, you really want a way to do this even though it goes against Rust's philosophy.
"...And besides, how hard can it be? If gcc and MSVC can instrument ALL functions with tracing at the compiler-level, can't rustc do the same?..."
I answered a related StackOverflow question like this in the past: How to build a graph of specific function calls?
In general, you have 2 strategies:
Instrument your application with some sort of logging/tracing framework, and then try to replicate some sort of tracing mixin-like functionality to apply global/local tracing depending on which parts of code you apply the mixins.
Recompile your code with some sort of tracing instrumentation feature enabled for your compiler or runtime, and then use the associated tracing compiler/runtime-specific tools/frameworks to transform/sift through the data.
For 1, this will require you to manually insert more code or something like _penter/_pexit for MSVC manually or create some sort of ScopedLogger that would (hopefully!) log async to some external file/stream/process. This is not necessarily a bad thing, as having a separate process control the trace tracking would probably be better in the case where the traced process crashes. Regardless, you'd probably have to refactor your code since C++ does not have great first-class support for metaprogramming to refactor/instrument code at a module/global level. However, this is not an uncommon pattern anyways for larger applications; for example, AWS X-Ray is an example of a commercial tracing service (though, typically, I believe it fits the use case of tracing network calls and RPC calls rather than in-process function calls).
For 2, you can try something like utrace or something compiler-specific: MSVC has various tools like Performance Explorer, LLVM has XRay, GCC has gprof. You essentially compile in a sort of "debug++" mode or there is some special OS/hardware/compiler magic to automatically insert tracing instructions or markers that help the runtime trace your desired code. These tracing-enabled programs/runtimes typically emit to some sort of unique tracing format that must then be read by a unique tracing format reader.
However, because Err(T) is a [data]type constructor and not really a first-class fn, this means that Err(T) will most likely NOT be instrumented like a usual fn call. Usually compilers with some sort of "instrumentation mode" will only inject "instrumentation code" at function-call boundaries, but not at data-creation points generically.
What about replacing std:: with an instrumented version such that I can instrument std::result::Result<T, E> itself? Can't I just link-in something?
Well, Err(T) simply does not represent any logical computation except the creation of a value, and so, there is no fn or function pointer to really replace or switch-out by replacing the standard library. It's not really part of the surface language-level interface of Rust to do something like this.
So now what?
If you really specifically need this, you would want a custom compiler flag or mode to inject custom instrumentation code every-time you construct an Err(T) data type -- and you would have to rebuild every piece of Rust code where you want it.
Possible Options
Do a text-replace or macro-replacement to turn every usage of /Err(.*)/ in your application code that you want to instrument into your own macro or fn call (to represent computation in the way Rust wants), and inject your own type of instrumentation (probably using either log or tracing crates).
Find or ask for a custom instrumentation flag on rustc that can generate specific assembly/machine-code to instrument per every usage of Err(T).

Yes, it is possible to break execution when an Err() value is created. This can be done by using the debugger to break on the Err() function, and then inspecting the stack trace to find the point at which the Err() value was created.

Rust seems to allocate the same space in memory for an array of booleans as an array of 8 bit integers

Running this code in rust:
fn main() {
println!("{:?}", std::mem::size_of::<[u8; 1024]>());
println!("{:?}", std::mem::size_of::<[bool; 1024]>());
}
1024
1024
This is not what I expected. So I compiled and ran in release mode. But I got the same answer.
Why does the rust compiler seemingly allocate a whole byte for each single boolean? To me it seems to be a simple optimization to only allocate 128 bytes instead. This project implies I'm not the first to think this.
Is this a case of compilers being way harder than the seem? Or is this not optimized because it isn't a realistic scenario? Or am I not understanding something here?

Pointers and references.
There is an assumption that you can always take a reference to an item of a slice, a field of a struct, etc...
There is an assumption in the language that any reference to an instance of a statically sized type can transmuted to a type-erased pointer *mut ().
Those two assumptions together mean that:
due to (2), it is not possible to create a "bit-reference" which would allow sub-byte addressing,
due to (1), it is not possible not to have references.
This essentially means that any type must have a minimum alignment of one byte.
Note that this is not necessarily an issue. Opting in to a 128 bytes representation should be done cautiously, as it implies trading off speed (and convenience) for memory. It's not a pure win.
Prior art (in the name of std::vector<bool> in C++) is widely considered a mistake in hindsight.

Can Rust optimise away the bit-wise copy during move of an object someday?

Consider the snippet
struct Foo {
dummy: [u8; 65536],
}
fn bar(foo: Foo) {
println!("{:p}", &foo)
}
fn main() {
let o = Foo { dummy: [42u8; 65536] };
println!("{:p}", &o);
bar(o);
}
A typical result of the program is
0x7fffc1239890
0x7fffc1229890
where the addresses are different.
Apparently, the large array dummy has been copied, as expected in the compiler's move implementation. Unfortunately, this can have non-trivial performance impact, as dummy is a very large array. This impact can force people to choose passing argument by reference instead, even when the function actually "consumes" the argument conceptually.
Since Foo does not derive Copy, object o is moved. Since Rust forbids the access of moved object, what is preventing bar to "reuse" the original object o, forcing the compiler to generate a potentially expensive bit-wise copy? Is there a fundamental difficulty, or will we see the compiler someday optimise away this bit-wise copy?

Given that in Rust (unlike C or C++) the address of a value is not considered to matter, there is nothing in terms of language that prevents the elision of the copy.
However, today rustc does not optimize anything: all optimizations are delegated to LLVM, and it seems you have hit a limitation of the LLVM optimizer here (it's unclear whether this limitation is due to LLVM being close to C's semantics or is just an omission).
So, there are two avenues of improving code generation for this:
teaching LLVM to perform this optimization (if possible)
teaching rustc to perform this optimization (optimization passes are coming to rustc now that it has MIR)
but for now you might simply want to avoid such large objects from being allocated on the stack, you can Box it for example.

How to correctly Initialize the value of an array to zero?

I had the following code in VC++ 2010:
PWCHAR pszErrorMessage = new WCHAR[100];
This code initializes a char pointer to an array. But the values in the array are garbage. I wanted a way to set the values to zero.
Changing the above to the one below sets all the values in the array to zero. This works for arrays of custom structures too.
PWCHAR pszErrorMessage = new WCHAR[100]();
Is this correct?
Does this have any performance implications?
Has this type of array initialization been part of VC++ 2005?
Which method is been internally called to set the values of the struct in the array to zero?

As noted elsewhere, yes, the parentheses force value initialization, which means arithmetic types will be initialized to zero (and pointers to null pointers, etc.) For types that explicitly define default constructors that initialize the members, this won't make any difference--for them, the default constructor will be invoked whether the parentheses are included or not.
Yes, this can have some (minor) performance implication: initializing the memory can take some time, especially if you're allocating a large amount. It doesn't always though: if you were allocating an object type with a default ctor that initialized its members, then that ctor would be used either way.
This feature was added in the C++03 standard. Offhand, I don't recall whether it was implemented in VC++ 2005 or not. I tried to do a quick scan through the VC++ developers blog, but that post-dates the release of VC++ 2005. It does include some information about VC++ 2005 SP1, which doesn't seem to mention it.
At least when I've looked at the code produced, the code to zero the allocated buffer seemed to be allocated in-line, at least for simple types like char and such. For example:
xor eax, eax
mov rcx, QWORD PTR $T86268[rsp]
rep stosb

Reduce assembly number of instructions

I want to reduce (manually) the number of instructions from a Linux assembly file. This will be basically done by searching predefined reductions in an abstract syntax tree.
For example:
pushl <reg1>
popl <reg1>
Will be deleted because it makes no sense.
Or:
pushl <something1>
popl <something2>
Will become:
movl <something1>, <something2>
I'm looking for other optimizations that involve a fixed number of instructions. I don't want to search dynamic ranges of instructions.
Could you suggest other similar patterns that can be replaced with less instructions?
Later Edit: Found out, thanks to Richard Pennington, that what I want is peephole optimization.
So I rephrase the question as: suggestions for peephole optimization on Linux assembly code.

Compilers already do such optimizations. Besides, it's not that straightforward decision to make such optimizations, because:
push reg1
pop reg1
Still leaves value of reg1 at memory location [sp-nn] (where nn = size of reg1 in bytes). So although sp is past it, the code after can assume [sp-nn] contains the value of reg1.
The same applies to other optimization as well:
push some1
pop some2
And that is usually emitted only when there is no equivalent movl some1, some2 instruction.
If you're trying to optimize a high-level compiler generated code, compilers usually take most of those cases into account. If you're trying to optimize natively written assembly code, then an assembly programmer should write even better code.
I would suggest you to optimize the compiler, rather than optimizing the assembly code, it would provide you a better framework for dealing with intent of the code and register usage etc.

To get more information about what you are trying to do, you might want to look for "peephole optimization".

pushl <something1>
popl <something2>
replaced with
mov <something1>, <something2>
actually increased the size of my program. Weird!
Could you provide some other possible peephole optimizations?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Why does Rust have both call by value and call by reference? - rust

Related

How to break on Err creation

Rust seems to allocate the same space in memory for an array of booleans as an array of 8 bit integers

Can Rust optimise away the bit-wise copy during move of an object someday?

How to correctly Initialize the value of an array to zero?

Reduce assembly number of instructions

Categories

Resources