Is this mem::transmute::<&str, &'static str>(k) safe? - rust

I'm looking at this code which is a very simple library with just one file and mostly tests, so it's short. There's a struct that I'm trying to understand:
pub struct ChallengeFields(HashMap<UniCase<CowStr>, (String, Quote)>);
struct CowStr(Cow<'static, str>);
There's a line where it does
pub fn get(&self, k: &str) -> Option<&String> {
self.0
.get(&UniCase(CowStr(Cow::Borrowed(unsafe {
mem::transmute::<&str, &'static str>(k)
}))))
.map(|&(ref s, _)| s)
}
I'm annoyed by this unsafe operation. I think CowStr is a Cow with 'static lifetime otherwise it'd be hard or impossible to store strs inside the map. Because of that, when I try to get something inside this map, the str in question has to have 'static lifetime. This is the reason for the transmute, right? If so, why simply not use String, so we can get rid of lifetimes and thus transmute? I don't like unsafe, and reading about transmute it looks very unsafe.
Also, I don't see why Cow is needed at all.

I think CowStr is a Cow with 'static lifetime otherwise it'd be hard or impossible to store strs inside the map.
Well yes and no, you can store &'static str inside a hashmap with no issue, the problem is that you can't store both &'static str and String.
Am I rigth? If so, why simply not use String, so we can get rid of lifetimes and thus transmute?
I assume that is an optimisation: with String you'd have to create an allocation every time you want to insert a challenge in the map, but if the overwhelming majority of challenge names would be Digest and Basic then that's a waste of time (and memory but mostly time), but at the same time you'd have to support String for custom auth schemes.
Now maybe in the grand scheme of things this is not an optimisation which actually matter and it'd be better off not doing that, I couldn't tell you.
I don't like unsafe, and reading about transmute it looks very unsafe.
It's a debatable thing to do, but in this case it's "safe", in the sense that the reference is valid for the entirety of the HashMap::get call and we know that that call doesn't keep the reference alive (it's reliance on an implementation detail which is a bit risky, but the odds that that would change are basically nil as it wouldn't make much sense).
Extending lifetimes is not in-and-of-itself UB (the mem::transmute documentation literally provides an example doing that), but requires care as you must avoid it causing UBs (the most likely being a dangling reference).

Related

Is it safe to transmute::<&'a Arc<T>, &'a Weak<T>>(…)?

Is it safe to transmute a shared reference & to a strong Arc<T> into a shared reference & to a Weak<T>?
To ask another way: is the following safe function sound, or is it a vulnerability waiting to happen?
pub fn as_weak<'a, T>(strong: &'a Arc<T>) -> &'a Weak<T> {
unsafe { transmute::<&'a Arc<T>, &'a Weak<T>>(strong) }
}
Why I want to do this
We have an existing function that returns a &Weak<T>. The internal data structure has changed a bit, and I now have an Arc<T> where I previously had a Weak<T>, but I need to maintain semver compatibility with this function's interface. I'd rather avoid needing to stash an actual Weak<T> copy just for the sake of this function if I don't need to.
Why I hope this is safe
The underlying memory representations of Arc<T> and Weak<T> are the same: a not-null pointer (or pointer-like value for Weak::new()) to an internal ArcInner struct, which contains the strong and weak reference counts and the inner T value.
Arc<T> also contains a PhantomData<T>, but my understanding is that if that changes anything, it would only apply on drop, which isn't relevant for the case here as we're only transmuting a shared reference, not an owned value.
The operations that an Arc<T> will perform on its inner pointer are presumably a superset of those that may be performed by a Weak<T>, since they have the same representation but Arc carries a guarantee that the inner T value is still alive, while Weak does not.
Given these facts, it seems to me like nothing could go wrong. However, I haven't written much unsafe code before, and never for a production case like this. I'm not confident that I fully understand the possible issues. Is this transmutation safe and sound, or are are there other factors that need to be considered?
No, this is not sound.
Neither Arc nor Weak has a #[repr] forcing a particular layout, therefore they are both #[repr(Rust)] by default. According to the Rustonomicon section about repr(Rust):
struct A {
a: i32,
b: u64,
}
struct B {
a: i32,
b: u64,
}
Rust does guarantee that two instances of A have their data laid out in exactly the same way. However Rust does not currently guarantee that an instance of A has the same field ordering or padding as an instance of B.
You cannot therefore assume that Arc<T> and Weak<T> have the same layout.

Is allowing library users to embed arbitrary data in your structures a correct usage of std::mem::transmute?

A library I'm working on stores various data structures in a graph-like manner.
I'd like to let users store metadata ("annotations") in nodes, so they can retrieve them later. Currently, they have to create their own data structure which mirrors the library's, which is very inconvenient.
I'm placing very little constraints on what an annotation can be, because I do not know what the users will want to store in the future.
The rest of this question is about my current attempt at solving this use case, but I'm open to completely different implementations as well.
User annotations are represented with a trait:
pub trait Annotation {
fn some_important_method(&self)
}
This trait contains a few methods (all on &self) which are important for the domain, but these are always trivial to implement for users. The real data of an annotation implementation cannot be retrieved this way.
I can store a list of annotations this way:
pub struct Node {
// ...
annotations: Vec<Box<dyn Annotation>>,
}
I'd like to let the user retrieve whatever implementation they previously added to a list, something like this:
impl Node {
fn annotations_with_type<T>(&self) -> Vec<&T>
where
T: Annotation,
{
// ??
}
}
I originally aimed to convert dyn Annotation to dyn Any, then use downcast_ref, however trait upcasting coercion is unsable.
Another solution would be to require each Annotation implementation to store its TypeId, compare it with annotations_with_type's type parameter's TypeId, and std::mem::transmute the resulting &dyn Annotation to &T… but the documentation of transmute is quite scary and I honestly don't know whether that's one of the allowed cases in which it is safe. I definitely would have done some kind of void * in C.
Of course it's also possible that there's a third (safe) way to go through this. I'm open to suggestions.
What you are describing is commonly solved by TypeMaps, allowing a type to be associated with some data.
If you are open to using a library, you might consider looking into using an existing implementation, such as https://crates.io/crates/typemap_rev, to store data. For example:
struct MyAnnotation;
impl TypeMapKey for MyAnnotation {
type Value = String;
}
let mut map = TypeMap::new();
map.insert::<MyAnnotation>("Some Annotation");
If you are curious. It underlying uses a HashMap<TypeId, Box<(dyn Any + Send + Sync)>> to store the data. To retrieve data, it uses a downcast_ref on the Any type which is stable. This could also be a pattern to implement it yourself if needed.
You don't have to worry whether this is valid - because it doesn't compile (playground):
error[E0512]: cannot transmute between types of different sizes, or dependently-sized types
--> src/main.rs:7:18
|
7 | _ = unsafe { std::mem::transmute::<&dyn Annotation, &i32>(&*v) };
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: source type: `&dyn Annotation` (128 bits)
= note: target type: `&i32` (64 bits)
The error message should be clear, I hope: &dyn Trait is a fat pointer, and has size 2*size_of::<usize>(). &T, on the other hand, is a thin pointer (as long as T: Sized), of size of only one usize, and you cannot transmute between types of different sizes.
You can work around that with transmute_copy(), but it will just make things worse: it will work, but it is unsound and is not guaranteed to work in any way. It may become UB in future Rust versions. This is because the only guaranteed thing (as of now) for &dyn Trait references is:
Pointers to unsized types are sized. The size and alignment is guaranteed to be at least equal to the size and alignment of a pointer.
Nothing guarantees the order of the fields. It can be (data_ptr, vtable_ptr) (as it is now, and thus transmute_copy() works) or (vtable_ptr, data_ptr). Nothing is even guaranteed about the contents. It can not contain a data pointer at all (though I doubt somebody will ever do something like that). transmute_copy() copies the data from the beginning, meaning that for the code to work the data pointer should be there and should be first (which it is). For the code to be sound this needs to be guaranteed (which is not).
So what can we do? Let's check how Any does its magic:
// SAFETY: caller guarantees that T is the correct type
unsafe { &*(self as *const dyn Any as *const T) }
So it uses as for the conversion. Does it work? Certainly. And that means std can do that, because std can do things that are not guaranteed and relying on how things work in practice. But we shouldn't. So, is it guaranteed?
I don't have a firm answer, but I'm pretty sure the answer is no. I have found no authoritative source that guarantees the behavior of casts from unsized to sized pointers.
Edit: #CAD97 pointed on Zulip that the reference promises that *[const|mut] T as *[const|mut V] where V: Sized will be a pointer-to-pointer case, and that can be read as a guarantee this will work.
But I still feel fine with relying on that. Because, unlike the transmute_copy(), people are doing it. In production. And there is no better way in stable. So the chance it will become undefined behavior is very low. It is much more likely to be defined.
Does a guaranteed way even exist? Well, yes and no. Yes, but only using the unstable pointer metadata API:
#![feature(ptr_metadata)]
let v: &dyn Annotation;
let v = v as *const dyn Annotation;
let v: *const T = v.to_raw_parts().0.cast::<T>();
let v: &T = unsafe { &*v };
In conclusion, if you can use nightly features, I would prefer the pointer metadata API just to be extra safe. But in case you can't, I think the cast approach is fine.
Last point, there may be a crate that already does that. Prefer that, if it exists.

Is it safe to temporarily give away ownership of the contents of a mutable borrow in Rust? [duplicate]

This question already has an answer here:
replace a value behind a mutable reference by moving and mapping the original
(1 answer)
Closed 1 year ago.
Is a function that modifies a &mut T in place by a function FnOnce(T) -> T safe to have in rust, or can it lead to undefined behavior? Is it included in the standard library somewhere, or a well-known crate?
If you additionally assume T: Default, that looks like
fn modify<T, F: FnOnce(T) -> T>(x: &mut T, f: F) -> ()
where
T: Default
{
let val = std::mem::take(x);
let val = f(val);
*x = val;
}
(See also
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=f015812bac6f527fe663fe4e0b7a3188)
My question is about doing the same but dropping the where T: Default clause (and no T: Clone either). This requires a different implementation, since you can't use std::mem::take.
I'm not sure how to implement the unconstrained version, but it should be possible using unsafe Rust.
I'm learning Rust from a background of linear types and sub-structural logic. Rust's mutable borrow seems very similar to moving a resource in and then back out of a function, but I don't know if it is actually safe to take temporary ownership of the contents of a mutable borrow like this.
It is safe, and there are even crates for that (can't find them now).
HOWEVER.
When writing unsafe code, you have to be very careful. If you don't know exactly what you're doing, it can easily lead to UB.
Here, for example, there is something you maybe haven't thought of: panic safety.
Suppose we implement that trivially:
pub fn modify<T, F: FnOnce(T) -> T>(v: &mut T, f: F) {
let prev = unsafe { std::ptr::read(v) };
let new = f(prev);
unsafe { std::ptr::write(v, new) };
}
Trivially right.
Or is it?
fn main() {
struct MyStruct(pub i32);
impl Drop for MyStruct {
fn drop(&mut self) {
println!("MyStruct({}) dropped", self.0);
}
}
let mut v = MyStruct(123);
std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
modify(&mut v, |_prev| {
// `prev` is dropped here.
panic!("Haha, evil panic!");
})
}))
.unwrap_err();
v.0 = 456; // Writing to an uninitialized memory!
// `v` is dropped here, double drop!
}
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=6f7312a8be70cd43cf5cf7a9816be56a
I used a custom type that its destructor does nothing but to print, but imagine what could happen if this was a Vec that freed the memory and we were writing into freed memory (then, as a bonus, get a double-free).
It is correct, like #Kendas said, that when there are no interruption point it is valid to leave memory in an uninitialized state in Rust. The problem is, that much more places than you wish are actually interruption points. In fact, when writing unsafe code, you have to consider any call to external code (i.e. not yours code neither code that you trust to not do bad things, for example std) to be an interruption point.
Unsafe code is hard. Better stay in the safe land.
Edit: You may wonder what the AssertUnwindSafe is. Maybe you even tried to remove it and noticed it doesn't compiler. Well, UnwindSafe is a protection against this, and AssertUnwindSafe is a way to bypass the protection.
You may ask, what's the point? The point is, this protection is really not accurate. So much not accurate, that bypassing it does not even require unsafe. But it still exists, so we have a lower chance of accidental UB.
It doesn't matter to you as the writer of the API - you should act like this protection doesn't exist, because it is safe to bypass it and easy to do so by mistake. The Rust standard library itself had bugs like that in the past (#86443, #81740, ... - It is not an accident that they're both in the same code - those issues tend to appear in chunks. But there're more).
Well, you are replacing the contents of the borrowed memory location with the default value. This would mean that the memory is indeed correct at every point. So there should not be any undefined behavior.
Basically from the perspective of the mutable reference x, you are mutating it to the default value, then mutating it again to a different new value.
In general, if there is a chance of undefined behavior, you will need to use the unsafe keyword. Or somebody has made a mistake while using the unsafe keyword further down the stack. It is relatively rare for these things to happen in the standard library.
Go ahead and look at the safety remarks in the code if you must: https://doc.rust-lang.org/src/core/mem/mod.rs.html#756

Do I have to create distinct structs for both owned (easy-to-use) and borrowed (more efficient) data structures?

I have a Message<'a> which has &'a str references on a mostly short-lived buffer.
Those references mandate a specific program flow as they are guaranteed to never outlive the lifetime 'a of the buffer.
Now I also want to have an owned version of Message, such that it can be moved around, sent via threads, etc.
Is there an idiomatic way to achieve this? I thought that Cow<'a, str> might help, but unfortunately, Cow does not magically allocate in case &'a str would outlive the buffer's lifetime.
AFAIK, Cow is not special in the sense that no matter if Cow holds an Owned variant, it must still pass the borrow checker on 'a.
Definition of std::borrow::Cow.
pub enum Cow<'a, B> {
Borrowed(&'a B),
Owned(<B as ToOwned>::Owned),
}
Is there an idiomatic way to have an owned variant of Message? For some reason we have &str and String, &[u8] and Vec<u8>, ... does that mean people generally would go for &msg and Message?
I suppose I still have to think about if an owned variant is really, really needed, but my experience shows that having an escape hatch for owned variants generally improves prototyping speed.
Yes, you need to have multiple types, one representing the owned concept and one representing the borrowed concept.
You'll see the same technique throughout the standard library and third-party crates.
See also:
How to abstract over a reference to a value or a value itself?
How to avoid writing duplicate accessor functions for mutable and immutable references in Rust?

Can I push a reference to a vector with a longer lifetime if I pop() it back just after?

Here, I'm not able to add thing to vector because its lifetime is different than 'a:
pub fn foo<'a>(vec: &'a mut Vec<&'a Thing>) {
let thing: Thing = new_thing();
vec.push(&thing);
// do stuff with vec
vec.pop();
}
Notice that I always remove it from the vector, and the vector isn't reordered further, so this operation should be safe. I think it would be hard to convince that to the compiler, but is there any trick to achieve the same?
Definitely not in safe Rust. The compiler has no idea what Vec::push and Vec::pop do. All it knows is what it can tell from the function signature — that you have to push the same type that the Vec is parameterized with.
Doing this in unsafe Rust is probably possible, but unsafe code is tricky to get right. As loganfsmyth mentions, if you somehow push an "invalid" value into the Vec and then a panic happens, that value is still in the vector after the function has exited. Now the destructor of the Vec can access invalid memory, subverting Rust's guarantees. This is A Bad Thing.
There's probably a better solution to your real problem. Possible avenues:
Use Iterator::chain and iter::once to combine the values into one iterator.
Create a wrapper type around a slice and a single value that exposes the operation(s) you need.

Resources