When to use as_* vs to_* vs into_* in Rust?

When to use as_* vs to_* vs into_* in Rust? - rust

My understanding, based on the standard library examples, is that:
into_ convention is used when the function completely absorbs the ownership and spits out another type, as in into_iter() . Is the understanding correct?
The real confusion is between as_ and to_ .
It seems to_ as in to_owned() takes the reference of a type and spits a new related type (like a type coercion), where as to_string() takes the reference of type and spits a new type (as in type conversion).
But as_ as in as_ptr also seems like type coercion. I couldn't find any examples for this beyond as_ptr or as_mut.
Can someone explain exactly the cases where we need to use the specific naming convention and with a real life example that is beyond what's used in standard library?

There is a Naming section of the Rust API Guidelines that includes recommendations for "conversion" methods and shows a handy table:
Prefix
Cost
Ownership
as_
Free
borrowed -> borrowed
to_
Expensive
borrowed -> borrowedborrowed -> owned (non-Copy types)owned -> owned (Copy types)
into_
Variable
owned -> owned (non-Copy types)
The guidelines continue with examples like str::as_bytes(), str::to_lowercase(), String::into_bytes(), etc. and some other considerations for abstraction and mutability.
A quicker way to think about it:
if it consumes the data, use into_*
if it returns another "view" of the data, use as_*
otherwise use to_*
These rules are pretty much followed by the standard library and most ecosystem crates. As always though, these are more guidelines than actual rules. Conventions are helpful but don't need to have strict adherence.

Related

Rust multiple levels of indirections - Performance impact?

I've been experimenting with everyone's favorite new thing, rust. I'm coming from a C background, so I think this may be what is causing me issue here. I've tried to find a clear definitive answer online but could not. Here's my issue, simplified:
fn main() {
let v : Vec<&str> = vec!["abc", "def"];
let z : Vec<&&str> = v.iter().filter(|e| {e.starts_with("a")}).collect(); // type of `e` is &&&str!
// do something with z
}
My issue occurs with the type of z and e (within the closure). I understand the need for the additional layer of reference for the ownership system. I also understand I can go around this issue using things liked copied() or into_iter() (for z's type) or use |&e|.
However, I am wondering if there's a performance impact here. Even if I used copied, the type of e in |e| would still be a double reference. If pointers and references map 1:1, this seems like a waste. But references and pointers do not need (in this case at least) to map 1:1. The compiler could create a reference for the ownership rules, without deref a reference.
So here's my question: does multiple references translates 1:1 to pointers? Are there optimizations? In this case (e for example), am I going through two levels of indirection?

Does multiple references translates 1:1 to pointers?
Yes.
But the compiler can optimize them out, as usual. In the case of iterators, it usually does. When all stuff will be inlined, the compiler will be able to convert the iterator to a simple loop, so this is rarely a problem. If it is, you can always use copied(), or even switch to simple loops. This is part of Rust's zero-cost abstraction story.

Rust Box vs non-box

Given a rust object, is it possible to wrap it so that multiple references and a mutable reference are allowed but do not cause problems?
For example, a Vec that has multiple references and a single mutable reference.

Yes, but...
The type you're looking for is RefCell, but read on before jumping the gun!
Rust is a single-ownership language. It always will be. It's exactly that feature that makes Rust as thread-safe and memory-safe as it is. You cannot fully circumvent this, short of wrapping your entire program in unsafe and using raw pointers exclusively, and if you're going to do that, just write C since you're no longer getting any benefits out of using Rust.
So, at any given moment in your program, there must either be one thing writing to this memory or several things reading. That's the fundamental law of single-ownership. Keep that in mind; you cannot get around that. What I'm about to say still follows that rule.
Usually, we enforce this with our type signatures. If I take a &T, then I'm just an alias and won't write to it. If I take a &mut T, then nobody else can see what I'm doing till I forfeit that reference. That's usually good enough, and if we can, we want to do it that way, since we get guarantees at compile-time.
But it doesn't always work that way. Sometimes we can't prove that what we're doing is okay. Sometimes I've got two functions holding an, ostensibly, mutable reference, but I know, due to some other guarantees Rust doesn't know about, that only one will be writing to it at a time. Enter RefCell. RefCell<T> contains a single T and pretends to be immutable but lets you borrow the thing inside either mutably or immutably with try_borrow_mut and try_borrow. When we call one of these functions, we get a reference-like value that can read (and write, in the mutable case) to the original data, even though we started with a &RefCell<T> that doesn't look mutable.
But the fundamental law still holds. Note that those try_* functions return a Result, i.e. they might fail. If two functions simultaneously try to get try_borrow_mut references, the second one will fail, and it's your job to deal with that eventuality (even if "deal with that" means panic! in your particular use case). All we've done is move the single-ownership rules from compile-time to runtime. We haven't gotten rid of them; we've just changed who's responsible for enforcing them.

When should I use a reference instead of transferring ownership?

From the Rust book's chapter on ownership, non-copyable values can be passed to functions by either transferring ownership or by using a mutable or immutable reference. When you transfer ownership of a value, it can't be used in the original function anymore: you must return it back if you want to. When you pass a reference, you borrow the value and can still use it.
I come from languages where values are immutable by default (Haskell, Idris and the like). As such, I'd probably never think about using references at all. Having the same value in two places looks dangerous (or, at least, awkward) to me. Since references are a feature, there must be a reason to use them.
Are there situations I should force myself to use references? What are those situations and why are they beneficial? Or are they just for convenience and defaulting to passing ownership is fine?

Mutable references in particular look very dangerous.
They are not dangerous, because the Rust compiler will not let you do anything dangerous. If you have a &mut reference to a value then you cannot simultaneously have any other references to it.
In general you should pass references around. This saves copying memory and should be the default thing you do, unless you have a good reason to do otherwise.
Some good reasons to transfer ownership instead:
When the value's type is small in size, such as bool, u32, etc. It's often better performance to move/copy these values to avoid a level of indirection. Usually these values implement Copy, and actually the compiler may make this optimisation for you automatically. Something it's free to do because of a strong type system and immutability by default!
When the value's current owner is going to go out of scope, you may want to move the value somewhere else to keep it alive.

Are Rust references (usually) Voldemort types?

Voldemort – he who must not be named – types are types whose names are impossible to write down in the source code. In Rust, closures have such types, because the compiler generates a new internal type for each closure. The only way to accept a closure as function argument is to accept a generic type (usually called F) which is bounded to be an Fn() (or similar) trait.
References in Rust always contain a lifetime parameter, even if this lifetime can usually be omitted. Lifetimes can't be named explicitly, because they represent some complex compiler-internal scope of some kind. The only way to interact with lifetimes is to use a generic parameter (usually called 'a) which stands for any lifetime (maybe bounded by another lifetime). Of course, there is 'static which can be named, but this is a special case and doesn't conflict with my arguing.
So: are Rust references Voldemort types? Or do I misunderstand the term “Voldemort type” or Rust references?

As someone without any particularly strong knowledge in the area:
I think the answer is probably: technically yes, but it's overly reductive. A bit like saying "all types are arrays of integers"; I mean, yes, but you're losing some useful semantic discrimination by doing that.
Voldemort types are usually to hide the implementation type from the user, either because it's only supposed to be a temporary, or you're not supposed to use anything but the interface described by the function. References are technically unnameable in their entirety, but it's not like it ever actually restricts you. I mean, even if you could name the specific lifetime, I don't think you could do anything meaningful with it (except possibly for slightly stricter lifetime checking within a function).

Arguably no. Are the types of references and pointers in all languages considered Voldemort types? They hide something, but no.
We envision lifetimes as being regions of code outside the called function. Also, they're created roughly like that in rustc. Yet, I'd argue function signatures are the type definition of the lifetimes we actually see. And rustc is merely satisfying them. There is nothing more to the named lifetimes than what you see in the function definition.

What does the GHC source mean by "zonk"?

I'm working on a plugin for GHC, so I'm reading the documentation for some of its implementation.
The verb "to zonk" is all over the place, but I can't track down an explanation of what it means to zonk something or (in broad terms) when one might want to. I can find plenty of notes about complicated circumstances under which it is necessary to zonk or not to zonk something, but without a clue as to what the big picture is I am having a lot of trouble following.

An un-zonked type can have type variables which are mutable references filled in during unification (and this mutability is heavily used by the type checker to increase performance). Zonking traverses a type and replaces all mutable references with the type that they dereference to; thus, the resulting structure is immutable and requires no dereferencing to interpret.
Note that these type variables are meta-variables, i.e. they don't correspond to the type variables introduced by polymorphism; rather, they are unification variables to be replaced by real types. The choice of replacement is decided by the type checking/type inference process, and then the actual replacement is done during zonking.
This notion of zonking extends naturally to other intermediate representations of the typechecker that contain types.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string