Is it idiomatic to panic in From implementations? - rust

The documentation at https://doc.rust-lang.org/std/convert/trait.From.html states
Note: This trait must not fail. If the conversion can fail, use TryFrom.
Suppose I have a From implementation thus:
impl From<SomeStruct> for http::Uri {
fn from(item: SomeStruct) -> http::Uri {
item.uri.parse::<http::Uri>() // can fail
}
}
Further suppose I am completely certain that item.uri.parse will succeed. Is it idiomatic to panic in this scenario? Say, with:
item.uri.parse::<http::Uri>().unwrap()
In this particular case, it appears there's no way to construct an HTTP URI at compile time: https://docs.rs/http/0.2.5/src/http/uri/mod.rs.html#117. In the real scenario .uri is an associated const, so I can test all used values parse. But it seems to me there could be other scenarios when the author is confident in the infallibility of a piece of code, particularly when that confidence can be encoded in tests, and would therefore prefer the ergonomics of From over TryFrom. The Rust compiler, typically quite strict, doesn't prevent this behaviour, though it seems it perhaps could. This makes me think this is a decision the author has been deliberately allowed to make. So the question is asking: what do people tend to do in this situation?

So in general, traits only enforce that the implementors adhere to the signatures and types as laid out in the trait. At least that's what the compiler enforces.
On top of that, there are certain contracts that traits are expected to adhere to just so that there's no weird surprises by those who work with these traits. These contracts aren't checked by the compiler; that would be quite difficult.
Nothing prevents you from implementing all a trait's methods but in way that's totally unrelated to what the trait is all about, like implementing the Display trait but then in the fmt method not actually bothering to use write! and instead, I don't know, delete the user's home directory.
Now back to your specific case. If your from method will not fail, provably so, then of course you can use .unwrap. The point of the cannot fail contract for the From trait is that those who rely on the From trait want to be able to assume that the conversion will go through every time. If you actually panic in your own implementation of from, it means the conversion sometimes doesn't go through, counter to the ideas and contracts in the From trait.

Related

When to use zero-variant enum over unit-like struct

If I understand correctly rust unit-like structs can be used like say atoms in Erlang.
But I don't understand what zero-variant enums provide. Could someone explain what the main purpose of zero-variant enums is and in which cases they might be favored over unit-like structs?
One usage for zero-variants enum is to express unreachable code. For example, an infallible TryFrom or FromStr. This commonly occurs when using generics (here's an example: OnceCell has both get_or_init() and get_or_try_init() methods. To save code duplication, the get_or_init() method calls get_or_try_init(). However, without using empty enums, this would incur cost at runtime because of the panic for the impossible Err case if the get_or_try_init() call isn't inlined). This is intended to be replaced by the never type once stabilized. Using empty enums has two advantages over unit structs:
They cannot be constructed by mistake.
They can hint the optimizer that this code is unreachable and allow it to remove it. They can also help the developers avoid panics in the code, because an infallible enum may be converted into the never type by match value {}, and the never type may be coerced into any other type. An example is in the once_cell code above.
Another usage is in generics, when you need only a type and not value, for example in the Strategy pattern at compile time, some people prefer zero variants enums to express that this types are not meant to be instantiated.

Is it safe and defined behavior to transmute between a T and an UnsafeCell<T>?

A recent question was looking for the ability to construct self-referential structures. In discussing possible answers for the question, one potential answer involved using an UnsafeCell for interior mutability and then "discarding" the mutability through a transmute.
Here's a small example of such an idea in action. I'm not deeply interested in the example itself, but it's just enough complication to require a bigger hammer like transmute as opposed to just using UnsafeCell::new and/or UnsafeCell::into_inner:
use std::{
cell::UnsafeCell, mem, rc::{Rc, Weak},
};
// This is our real type.
struct ReallyImmutable {
value: i32,
myself: Weak<ReallyImmutable>,
}
fn initialize() -> Rc<ReallyImmutable> {
// This mirrors ReallyImmutable but we use `UnsafeCell`
// to perform some initial interior mutation.
struct NotReallyImmutable {
value: i32,
myself: Weak<UnsafeCell<NotReallyImmutable>>,
}
let initial = NotReallyImmutable {
value: 42,
myself: Weak::new(),
};
// Without interior mutability, we couldn't update the `myself` field
// after we've created the `Rc`.
let second = Rc::new(UnsafeCell::new(initial));
// Tie the recursive knot
let new_myself = Rc::downgrade(&second);
unsafe {
// Should be safe as there can be no other accesses to this field
(&mut *second.get()).myself = new_myself;
// No one outside of this function needs the interior mutability
// TODO: Is this call safe?
mem::transmute(second)
}
}
fn main() {
let v = initialize();
println!("{} -> {:?}", v.value, v.myself.upgrade().map(|v| v.value))
}
This code appears to print out what I'd expect, but that doesn't mean that it's safe or using defined semantics.
Is transmuting from a UnsafeCell<T> to a T memory safe? Does it invoke undefined behavior? What about transmuting in the opposite direction, from a T to an UnsafeCell<T>?
(I am still new to SO and not sure if "well, maybe" qualifies as an answer, but here you go. ;)
Disclaimer: The rules for these kinds of things are not (yet) set in stone. So, there is no definitive answer yet. I'm going to make some guesses based on (a) what kinds of compiler transformations LLVM does/we will eventually want to do, and (b) what kind of models I have in my head that would define the answer to this.
Also, I see two parts to this: The data layout perspective, and the aliasing perspective. The layout issue is that NotReallyImmutable could, in principle, have a totally different layout than ReallyImmutable. I don't know much about data layout, but with UnsafeCell becoming repr(transparent) and that being the only difference between the two types, I think the intent is for this to work. You are, however, relying on repr(transparent) being "structural" in the sense that it should allow you to replace things in larger types, which I am not sure has been written down explicitly anywhere. Sounds like a proposal for a follow-up RFC that extends the repr(transparent) guarantees appropriately?
As far as aliasing is concerned, the issue is breaking the rules around &T. I'd say that, as long as you never have a live &T around anywhere when writing through the &UnsafeCell<T>, you are good -- but I don't think we can guarantee that quite yet. Let's look in more detail.
Compiler perspective
The relevant optimizations here are the ones that exploit &T being read-only. So if you reordered the last two lines (transmute and the assignment), that code would likely be UB as we may want the compiler to be able to "pre-fetch" the value behind the shared reference and re-use that value later (i.e. after inlining this).
But in your code, we would only emit "read-only" annotations (noalias in LLVM) after the transmute comes back, and the data is indeed read-only starting there. So, this should be good.
Memory models
The "most aggressive" of my memory models essentially asserts that all values are always valid, and I think even that model should be fine with your code. &UnsafeCell is a special case in that model where validity just stops, and nothing is said about what lives behind this reference. The moment the transmute returns, we grab the memory it points to and make it all read-only, and even if we did that "recursively" through the Rc (which my model doesn't, but only because I couldn't figure out a good way to make it do so) you'd be fine as you don't mutate any more after the transmute. (As you may have noticed, this is the same restriction as in the compiler perspective. The point of these models is to allow compiler optimizations, after all. ;)
(As a side-note, I really wish miri was in better shape right now. Seems I have to try and get validation to work again in there, because then I could tell you to just run your code in miri and it'd tell you if that version of my model is okay with what you are doing :D )
I am thinking about other models currently that only check things "on access", but haven't worked out the UnsafeCell story for that model yet. What this example shows is that the model may have to contain ways for a "phase transition" of memory first being UnsafeCell, but later having normal sharing with read-only guarantees. Thanks for bringing this up, that will make for some nice examples to think about!
So, I think I can say that (at least from my side) there is the intent to allow this kind of code, and doing so does not seem to prevent any optimizations. Whether we'll actually manage to find a model that everybody can agree with and that still allows this, I cannot predict.
The opposite direction: T -> UnsafeCell<T>
Now, this is more interesting. The problem is that, as I said above, you must not have a &T live when writing through an UnsafeCell<T>. But what does "live" mean here? That's a hard question! In some of my models, this could be as weak as "a reference of that type exists somewhere and the lifetime is still active", i.e., it could have nothing to do with whether the reference is actually used. (That's useful because it lets us do more optimizations, like moving a load out of a loop even if we cannot prove that the loop ever runs -- which would introduce a use of an otherwise unused reference.) And since &T is Copy, you cannot even really get rid of such a reference either. So, if you have x: &T, then after let y: &UnsafeCell<T> = transmute(x), the old x is still around and its lifetime still active, so writing through y could well be UB.
I think you'd have to somehow restrict the aliasing that &T allows, very carefully making sure that nobody still holds such a reference. I'm not going to say "this is impossible" because people keep surprising me (especially in this community ;) but TBH I cannot think of a way to make this work. I'd be curious if you have an example though where you think this is reasonable.

Can #[inline] be used in both trait method declarations and implementations?

I have a trait with some small methods, which are generally implemented as one-line wrappers around other methods that the implementing structs have. If I want to make sure that the trait method is inlined, should I place #[inline(always)] inside the trait definition, or inside the impl for each struct? I'd prefer to simply put it in the trait definition, but as far as I can tell that doesn't work.
What does inline mean?
When a compiler inlines a call, it copies the body of the function at the call site. Essentially, it's as if the code had been copy/pasted at each call site where it's inlined.
What does #[inline(always)] mean?
This instructs the compiler to perform inlining, always.
Normally, the compiler performs inlining when:
the body of the function is known
the set of heuristics estimate that this is a good trade-off (it might not be, though) which notably depends on the size of the function body
Why can I not specify #[inline(always)] on a trait method?
Because there is no body.
This may sounds trite, I know, however this is nonetheless true.
In Rust, traits may be used in two ways:
as bounds, for generic parameters
as runtime interfaces, aka trait objects
When used as a trait object, there is literally no body: the function to be called is determined at runtime!
Now, there are specific optimizations (devirtualizations) where the compiler attempts to divine or track the actual dynamic type of variables to be able to avoid the dynamic dispatch. I've even seen partial devirtualization in GCC where the compiler computes a likeliness of each type and creates an if ladder for the sufficiently likely one (if A { A::call(x); } else if B { B::call(x); } else { x.call(); }). However those are not guaranteed to succeed, of course.
So, what would be the semantics of #[inline(always)] on a virtual call? Should the compiler just ignore the attribute silently (uh!)?
It seems to me that what you are looking for is a new attribute (require(inline(always))?) to enforce specific constraints on the implementations of trait methods.
As far as I know, this does not exist yet.

Why is Default not implemented for Mutex, RWLock, CondVar, Duration?

The Default trait can be #[derive(..)]d only if the contents of the deriving type also implement Default. This means the trait gets easier to use the more it is implemented. However, I notice some types from std are missing implementations, although they have perfectly valid defaults (sometimes depending on generic params).
Mutex<T> and RWLock<T> could implement by new(_) (where T: Default)
CondVar could simply implement by CondVar::new()
Duration could derive (to get a zero duration, which is a sensible default)
Is there a technical reason for those omissions?
Some people have asked a similar question with Debug implementations, see “Missing Debug Implementations — #31869” which can also only be deriving under similar conditions as Default.
Unfortunately the corresponding PR “libcore: add Debug implementations to most missing types #32054” seems to suggest that some types were not Debug simply because no-one had written a Debug implementation for them. Some other types are kind of controversial about what the implementation should do and there is some fear about adding them in the standard library.
It reasonable to assume that at least some types are not Default for the same non-technical reasons.

Do Rust lifetimes influence the semantics of the compiled program?

I'm trying to grok lifetimes in Rust and asked myself whether they are "just" a safety measure (and a way to communicate how safety is ensured, or not, in the case of errors) or if there are cases where different choices of lifetimes actually change how the program runs, i.e. whether lifetimes make a semantic difference to the compiled program.
And with "lifetimes" I refer to all the pesky little 'a, 'b, 'static markers we include to make the borrow checker happy. Of course, writing
{
let foo = File::open("foo.txt")?;
}
foo.write_all(b"bar");
instead of
let foo = File::open("foo.txt")?;
foo.write_all(b"bar");
will close the file descriptor before the write occurs, even if we could access foo afterwards, but that kind of scoping and destructor-calling also happens in C++.
No, lifetimes do not affect the generated machine code in any way. At the end of the day, it's all "just pointers" to the compiled code.
Because we are humans speaking a human language, we tend to lump two different but related concepts together: concrete lifetimes and generic lifetime parameters.
All programming languages have concrete lifetimes. That just corresponds to when a resource will be released. That's what your example shows and indeed, C++ works the same as Rust does there. This is often known as Resource Acquisition Is Initialization (RAII). Garbage-collected languages have lifetimes too, but they can be harder to nail down exactly when they end.
What makes Rust neat in this area are the generic lifetime parameters, the things we know as 'a or 'static. These allow the compiler to track the underlying pointers so that the programmer doesn't need to worry if the pointer will remain valid long enough. This works for storing references in structs and passing them to and from functions.

Resources