When to use zero-variant enum over unit-like struct - rust

If I understand correctly rust unit-like structs can be used like say atoms in Erlang.
But I don't understand what zero-variant enums provide. Could someone explain what the main purpose of zero-variant enums is and in which cases they might be favored over unit-like structs?

One usage for zero-variants enum is to express unreachable code. For example, an infallible TryFrom or FromStr. This commonly occurs when using generics (here's an example: OnceCell has both get_or_init() and get_or_try_init() methods. To save code duplication, the get_or_init() method calls get_or_try_init(). However, without using empty enums, this would incur cost at runtime because of the panic for the impossible Err case if the get_or_try_init() call isn't inlined). This is intended to be replaced by the never type once stabilized. Using empty enums has two advantages over unit structs:
They cannot be constructed by mistake.
They can hint the optimizer that this code is unreachable and allow it to remove it. They can also help the developers avoid panics in the code, because an infallible enum may be converted into the never type by match value {}, and the never type may be coerced into any other type. An example is in the once_cell code above.
Another usage is in generics, when you need only a type and not value, for example in the Strategy pattern at compile time, some people prefer zero variants enums to express that this types are not meant to be instantiated.

Related

Why do I need "use rand::Rng" to call gen() on rand::thread_rng()?

When I'm using Rust's rand crate, if I want to produce a rand number, I would write:
use rand::{self, Rng};
let rand = rand::thread_rng().gen::<usize>();
If I don't use rand::Rng, an error occurs:
no method named gen found for struct rand::prelude::ThreadRng in the current scope
That's quite different from what I'm used to. Usually I treat mods like:
import rand from "path";
rand.generate();
Once I import the mod I don't need to import something else, and I can use every method it exports.
Why must I use rand::Rng to enable the gen method on rand::thread_rng()?
That's quite different from what I used to know.
It feels different because it is indeed different. You are probably used to dynamic dispatch via some kind of virtual method table (as in e.g. C++), or, in case of JS, to dynamic dispatch by looking up either the own properties of the receiver object, or its ancestors via the __proto__-chain. In any case, the object on which you are invoking a method carries around some data that tells it how to get the method that you're invoking. Given the signature of the invoked method, the receiver object itself knows how to get the method with that signature.
That's not the only way, though. For example,
modules / functors in OCaml or SML
Typeclasses in Haskell
implicits / givens in Scala
traits in Rust
work on a rather different principle: the methods are not tied to the receiver, but to the module / typeclass / given / trait instances. In each case, those are entities that are separate from the receiver of the method call. It opens some new possibilities, e.g. it allows you to do some ad-hoc polymorphism (i.e. to define instances of traits after the fact, for types that are not necessarily under your control). At the same time, the compiler typically requires a bit more information from you in order to be able to select the correct instances: it behaves somewhat like a little type-directed search engine, or even a little "theorem prover", and for this to work, you have to tell the compiler where to look for the suitable building blocks for those synthetically generated instances.
If you've never worked before with any language that has a compiler with a subsystem that is "searching for instances" based on type information, this should indeed feel quite foreign. The error messages and the solution approaches do indeed feel rather different, because instead of comparing your implementation against an interface and looking for conflicts, you have to guide this instance-searching mechanism by providing more hints (e.g. by importing more traits etc.).
In your particular case, rand::thread_rng returns a struct ThreadRng. On its own, the struct knows nothing about the gen method, because this method is not tied directly to the struct. Instead, it's defined in the Rng trait. But at the same time, it could be defined in some entirely unrelated trait, and have some completely different meaning. In order to disambiguate the intended meaning, you therefore have to explicitly specify that you want to work with the Rng trait. This is why you have to mention it in the use-clause.
I don't know the specific library you're using, but I can guess at the problem. I would guess that Rng is a trait which defines gen. Traits can be thought of as somewhat like Java's interfaces: they enable ad-hoc polymorphism by allowing you to define different behaviors for the same function on different datatypes.
However, Rust's traits fix one major problem (well, they fix several major problems, but one that's relevant here) with Java's interfaces. In Java, if you define an interface, then anyone writing a class can implement the interface, but you can't implement it for other people. In particular, the built-in types String and int and the like can never implement any new interfaces downstream. In Rust, either the trait writer or the struct/enum writer can implement the trait.
But this poses another issue. Now, if I have a value foo of type Foo and I write foo.bar(), then bar might not be a method defined on Foo; it might be something some trait writer implemented in some other file. We can't go search every Rust file on your computer for possible matching traits, so Rust makes the logical decision to restrict this search to traits that are in scope. If you want to call foo.bar() and bar is a method on trait Bar, then trait Bar has to be in scope when you call it. Otherwise, Rust won't see it.
So, in your case, thread_rng() returns a rand::prelude::ThreadRng. The method gen is not defined on rand::prelude::ThreadRng. Instead, it's defined on a trait called rand::Rng which is *implemented by ThreadRng. That trait has to be in-scope to use the method.

Is it idiomatic to panic in From implementations?

The documentation at https://doc.rust-lang.org/std/convert/trait.From.html states
Note: This trait must not fail. If the conversion can fail, use TryFrom.
Suppose I have a From implementation thus:
impl From<SomeStruct> for http::Uri {
fn from(item: SomeStruct) -> http::Uri {
item.uri.parse::<http::Uri>() // can fail
}
}
Further suppose I am completely certain that item.uri.parse will succeed. Is it idiomatic to panic in this scenario? Say, with:
item.uri.parse::<http::Uri>().unwrap()
In this particular case, it appears there's no way to construct an HTTP URI at compile time: https://docs.rs/http/0.2.5/src/http/uri/mod.rs.html#117. In the real scenario .uri is an associated const, so I can test all used values parse. But it seems to me there could be other scenarios when the author is confident in the infallibility of a piece of code, particularly when that confidence can be encoded in tests, and would therefore prefer the ergonomics of From over TryFrom. The Rust compiler, typically quite strict, doesn't prevent this behaviour, though it seems it perhaps could. This makes me think this is a decision the author has been deliberately allowed to make. So the question is asking: what do people tend to do in this situation?
So in general, traits only enforce that the implementors adhere to the signatures and types as laid out in the trait. At least that's what the compiler enforces.
On top of that, there are certain contracts that traits are expected to adhere to just so that there's no weird surprises by those who work with these traits. These contracts aren't checked by the compiler; that would be quite difficult.
Nothing prevents you from implementing all a trait's methods but in way that's totally unrelated to what the trait is all about, like implementing the Display trait but then in the fmt method not actually bothering to use write! and instead, I don't know, delete the user's home directory.
Now back to your specific case. If your from method will not fail, provably so, then of course you can use .unwrap. The point of the cannot fail contract for the From trait is that those who rely on the From trait want to be able to assume that the conversion will go through every time. If you actually panic in your own implementation of from, it means the conversion sometimes doesn't go through, counter to the ideas and contracts in the From trait.

Can #[inline] be used in both trait method declarations and implementations?

I have a trait with some small methods, which are generally implemented as one-line wrappers around other methods that the implementing structs have. If I want to make sure that the trait method is inlined, should I place #[inline(always)] inside the trait definition, or inside the impl for each struct? I'd prefer to simply put it in the trait definition, but as far as I can tell that doesn't work.
What does inline mean?
When a compiler inlines a call, it copies the body of the function at the call site. Essentially, it's as if the code had been copy/pasted at each call site where it's inlined.
What does #[inline(always)] mean?
This instructs the compiler to perform inlining, always.
Normally, the compiler performs inlining when:
the body of the function is known
the set of heuristics estimate that this is a good trade-off (it might not be, though) which notably depends on the size of the function body
Why can I not specify #[inline(always)] on a trait method?
Because there is no body.
This may sounds trite, I know, however this is nonetheless true.
In Rust, traits may be used in two ways:
as bounds, for generic parameters
as runtime interfaces, aka trait objects
When used as a trait object, there is literally no body: the function to be called is determined at runtime!
Now, there are specific optimizations (devirtualizations) where the compiler attempts to divine or track the actual dynamic type of variables to be able to avoid the dynamic dispatch. I've even seen partial devirtualization in GCC where the compiler computes a likeliness of each type and creates an if ladder for the sufficiently likely one (if A { A::call(x); } else if B { B::call(x); } else { x.call(); }). However those are not guaranteed to succeed, of course.
So, what would be the semantics of #[inline(always)] on a virtual call? Should the compiler just ignore the attribute silently (uh!)?
It seems to me that what you are looking for is a new attribute (require(inline(always))?) to enforce specific constraints on the implementations of trait methods.
As far as I know, this does not exist yet.

Are Rust references (usually) Voldemort types?

Voldemort – he who must not be named – types are types whose names are impossible to write down in the source code. In Rust, closures have such types, because the compiler generates a new internal type for each closure. The only way to accept a closure as function argument is to accept a generic type (usually called F) which is bounded to be an Fn() (or similar) trait.
References in Rust always contain a lifetime parameter, even if this lifetime can usually be omitted. Lifetimes can't be named explicitly, because they represent some complex compiler-internal scope of some kind. The only way to interact with lifetimes is to use a generic parameter (usually called 'a) which stands for any lifetime (maybe bounded by another lifetime). Of course, there is 'static which can be named, but this is a special case and doesn't conflict with my arguing.
So: are Rust references Voldemort types? Or do I misunderstand the term “Voldemort type” or Rust references?
As someone without any particularly strong knowledge in the area:
I think the answer is probably: technically yes, but it's overly reductive. A bit like saying "all types are arrays of integers"; I mean, yes, but you're losing some useful semantic discrimination by doing that.
Voldemort types are usually to hide the implementation type from the user, either because it's only supposed to be a temporary, or you're not supposed to use anything but the interface described by the function. References are technically unnameable in their entirety, but it's not like it ever actually restricts you. I mean, even if you could name the specific lifetime, I don't think you could do anything meaningful with it (except possibly for slightly stricter lifetime checking within a function).
Arguably no. Are the types of references and pointers in all languages considered Voldemort types? They hide something, but no.
We envision lifetimes as being regions of code outside the called function. Also, they're created roughly like that in rustc. Yet, I'd argue function signatures are the type definition of the lifetimes we actually see. And rustc is merely satisfying them. There is nothing more to the named lifetimes than what you see in the function definition.

Multiple specialization, iterator patterns in Rust

Learning Rust (yay!) and I'm trying to understand the intended idiomatic programming required for certain iterator patterns, while scoring top performance. Note: not Rust's Iterator trait, just a method I've written accepting a closure and applying it to some data I'm pulling off of disk / out of memory.
I was delighted to see that Rust (+LLVM?) took an iterator I had written for sparse matrix entries, and a closure for doing sparse matrix vector multiplication, written as
iterator.map_edges({ |x, y| dst[y] += src[x] });
and inlined the closure's body in the generated code. It went quite fast. :D
If I create two of these iterators, or use the first a second time (not a correctness issue) each instance slows down quite a lot (about 2x in this case), presumably because the optimizer no longer chooses to do specialization because of the multiple call sites, and you end up doing a function call for each element.
I'm trying to understand if there are idiomatic patterns that keep the pleasant experience above (I like it, at least) without sacrificing the performance. My options seem to be (none satisfying this constraint):
Accept dodgy performance (2x slower is not fatal, but no prizes either).
Ask the user to supply a batch-oriented closure, so acting on an iterator over a small batch of data. This exposes a bit much of the internals of the iterator (the data are compressed nicely, and the user needs to know how to unwrap them, or the iterator needs to stage an unwrapped batch in memory).
Make map_edges generic in a type implementing a hypothetical EdgeMapClosure trait, and ask the user to implement such a type for each closure they want to inline. Not tested, but I would guess this exposes distinct methods to LLVM, each of which get nicely inlined. Downside is that the user has to write their own closure (packing relevant state up, etc).
Horrible hacks, like make distinct methods map_edges0, map_edges1, ... . Or add a generic parameter the programmer can use to make the methods distinct, but which is otherwise ignored.
Non-solutions include "just use for pair in iterator.iter() { /* */ }"; this is prep work for a data/task-parallel platform, and I would like to be able to capture/move these closures to work threads rather than capturing the main thread's execution. Maybe the pattern I should be using is to write the above, put it in a lambda/closure, and ship it around instead?
In a perfect world, it would be great to have a pattern which causes each occurrence of map_edges in the source file to result in different specialized methods in the binary, without forcing the entire project to be optimized at some scary level. I'm coming out of an unpleasant relationship with managed languages and JITs where generics would be the only way (I know of) to get this to happen, but Rust and LLVM seem magical enough that I thought there might be a good way. How do Rust's iterators handle this to inline their closure bodies? Or don't they (they should!)?
It seems that the problem is resolved by Rust's new approach to closures outlined at
http://smallcultfollowing.com/babysteps/blog/2014/11/26/purging-proc/
In short, Option 3 above (make functions generic with respect to a new closure type) is now transparently implemented when you make an implementation generic using the new closure traits. Rust produces the type behind the scenes for you.

Resources