Is it possible to create an Arc<[T]> from a Vec<T>? - rust

To be more specific, why doesn't Arc<T> implement from_raw with a dynamically sized T while Box<T> does?
use std::sync::Arc;
fn main() {
let x = vec![1, 2, 3].into_boxed_slice();
let y = Box::into_raw(x);
let z = unsafe { Arc::from_raw(y) }; // ERROR
}
(play)
As pointed out in the comments, Arc::from_raw must be used with a pointer from Arc::into_raw, so the above example doesn't make sense. My original question (Is it possible to create an Arc<[T]> from a Vec<T>) remains: is this possible, and if not, why?

As of Rust 1.21.0, you can do this:
let thing: Arc<[i32]> = vec![1, 2, 3].into();
This was enabled by RFC 1845:
In addition: From<Vec<T>> for Rc<[T]> and From<Box<T: ?Sized>> for Rc<T> will be added.
Identical APIs will also be added for Arc.
Internally, this uses a method called copy_from_slice, so the allocation of the Vec is not reused. For the details why, check out DK.'s answer.

No.
First of all, as already noted in comments, you can't toss raw pointers around willy-nilly like that. To quote the documentation of Arc::from_raw:
The raw pointer must have been previously returned by a call to a Arc::into_raw.
You absolutely must read the documentation any time you're using an unsafe method.
Secondly, the conversion you want is impossible. Vec<T> → Box<[T]> works because, internally, Vec<T> is effectively a (Box<[T]>, usize) pair. So, all the method does is give you access to that internal Box<[T]> pointer [1]. Arc<[T]>, however, is not physically compatible with a Box<[T]>, because it has to contain the reference counts. The thing being pointed to by Arc<T> has a different size and layout to the thing being pointed to by Box<T>.
The only way you could get from Vec<T> to Arc<[T]> would be to reallocate the contents of the vector in a reference-counted allocation... which I'm not aware of any way to do. I don't believe there's any particular reason it couldn't be implemented, it just hasn't [2].
All that said, I believe not being able to use dynamically sized types with Arc::into_raw/Arc::from_raw is a bug. It's certainly possible to get Arcs with dynamically sized types... though only by casting from pointers to fixed-sized types.
[1]: Not quite. Vec<T> doesn't actually have a Box<[T]> inside it, but it has something compatible. It also has to shrink the slice to not contain uninitialised elements.
[2]: Rust does not, on the whole, have good support for allocating dynamically sized things in general. It's possible that part of the reason for this hole in particular is that Box<T> also can't allocate arrays directly, which is possibly because Vec<T> exists, because Vec<T> used to be part of the language itself, and why would you add array allocation to Box when Vec already exists? "Why not have ArcVec<T>, then?" Because you'd never be able to construct one due to shared ownership.

Arc<[T]> is an Arc containing a pointer to a slice of T's. But [T] is not actually Sized at compile time since the compiler does not know how long it will be (versus &[T] which is just a reference and thus has a known size).
use std::sync::Arc;
fn main() {
let v: Vec<u32> = vec![1, 2, 3];
let b: Box<[u32]> = v.into_boxed_slice();
let y: Arc<[u32]> = Arc::new(*b);
print!("{:?}", y)
}
Play Link
However, you can make an Arc<&[T]> without making a boxed slice:
use std::sync::Arc;
fn main() {
let v = vec![1, 2, 3];
let y: Arc<&[u32]> = Arc::new(&v[..]);
print!("{:?}", y)
}
Shared Ref Play Link
However, this seems like a study in the type system with little practical value. If what you really want is a view of the Vec that you can pass around between threads, an Arc<&[T]> will give you what you want. And if you need it to be on the heap, Arc<Box<&[T]>> works fine too.

Related

Is allowing library users to embed arbitrary data in your structures a correct usage of std::mem::transmute?

A library I'm working on stores various data structures in a graph-like manner.
I'd like to let users store metadata ("annotations") in nodes, so they can retrieve them later. Currently, they have to create their own data structure which mirrors the library's, which is very inconvenient.
I'm placing very little constraints on what an annotation can be, because I do not know what the users will want to store in the future.
The rest of this question is about my current attempt at solving this use case, but I'm open to completely different implementations as well.
User annotations are represented with a trait:
pub trait Annotation {
fn some_important_method(&self)
}
This trait contains a few methods (all on &self) which are important for the domain, but these are always trivial to implement for users. The real data of an annotation implementation cannot be retrieved this way.
I can store a list of annotations this way:
pub struct Node {
// ...
annotations: Vec<Box<dyn Annotation>>,
}
I'd like to let the user retrieve whatever implementation they previously added to a list, something like this:
impl Node {
fn annotations_with_type<T>(&self) -> Vec<&T>
where
T: Annotation,
{
// ??
}
}
I originally aimed to convert dyn Annotation to dyn Any, then use downcast_ref, however trait upcasting coercion is unsable.
Another solution would be to require each Annotation implementation to store its TypeId, compare it with annotations_with_type's type parameter's TypeId, and std::mem::transmute the resulting &dyn Annotation to &T… but the documentation of transmute is quite scary and I honestly don't know whether that's one of the allowed cases in which it is safe. I definitely would have done some kind of void * in C.
Of course it's also possible that there's a third (safe) way to go through this. I'm open to suggestions.
What you are describing is commonly solved by TypeMaps, allowing a type to be associated with some data.
If you are open to using a library, you might consider looking into using an existing implementation, such as https://crates.io/crates/typemap_rev, to store data. For example:
struct MyAnnotation;
impl TypeMapKey for MyAnnotation {
type Value = String;
}
let mut map = TypeMap::new();
map.insert::<MyAnnotation>("Some Annotation");
If you are curious. It underlying uses a HashMap<TypeId, Box<(dyn Any + Send + Sync)>> to store the data. To retrieve data, it uses a downcast_ref on the Any type which is stable. This could also be a pattern to implement it yourself if needed.
You don't have to worry whether this is valid - because it doesn't compile (playground):
error[E0512]: cannot transmute between types of different sizes, or dependently-sized types
--> src/main.rs:7:18
|
7 | _ = unsafe { std::mem::transmute::<&dyn Annotation, &i32>(&*v) };
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: source type: `&dyn Annotation` (128 bits)
= note: target type: `&i32` (64 bits)
The error message should be clear, I hope: &dyn Trait is a fat pointer, and has size 2*size_of::<usize>(). &T, on the other hand, is a thin pointer (as long as T: Sized), of size of only one usize, and you cannot transmute between types of different sizes.
You can work around that with transmute_copy(), but it will just make things worse: it will work, but it is unsound and is not guaranteed to work in any way. It may become UB in future Rust versions. This is because the only guaranteed thing (as of now) for &dyn Trait references is:
Pointers to unsized types are sized. The size and alignment is guaranteed to be at least equal to the size and alignment of a pointer.
Nothing guarantees the order of the fields. It can be (data_ptr, vtable_ptr) (as it is now, and thus transmute_copy() works) or (vtable_ptr, data_ptr). Nothing is even guaranteed about the contents. It can not contain a data pointer at all (though I doubt somebody will ever do something like that). transmute_copy() copies the data from the beginning, meaning that for the code to work the data pointer should be there and should be first (which it is). For the code to be sound this needs to be guaranteed (which is not).
So what can we do? Let's check how Any does its magic:
// SAFETY: caller guarantees that T is the correct type
unsafe { &*(self as *const dyn Any as *const T) }
So it uses as for the conversion. Does it work? Certainly. And that means std can do that, because std can do things that are not guaranteed and relying on how things work in practice. But we shouldn't. So, is it guaranteed?
I don't have a firm answer, but I'm pretty sure the answer is no. I have found no authoritative source that guarantees the behavior of casts from unsized to sized pointers.
Edit: #CAD97 pointed on Zulip that the reference promises that *[const|mut] T as *[const|mut V] where V: Sized will be a pointer-to-pointer case, and that can be read as a guarantee this will work.
But I still feel fine with relying on that. Because, unlike the transmute_copy(), people are doing it. In production. And there is no better way in stable. So the chance it will become undefined behavior is very low. It is much more likely to be defined.
Does a guaranteed way even exist? Well, yes and no. Yes, but only using the unstable pointer metadata API:
#![feature(ptr_metadata)]
let v: &dyn Annotation;
let v = v as *const dyn Annotation;
let v: *const T = v.to_raw_parts().0.cast::<T>();
let v: &T = unsafe { &*v };
In conclusion, if you can use nightly features, I would prefer the pointer metadata API just to be extra safe. But in case you can't, I think the cast approach is fine.
Last point, there may be a crate that already does that. Prefer that, if it exists.

Is it safe to temporarily give away ownership of the contents of a mutable borrow in Rust? [duplicate]

This question already has an answer here:
replace a value behind a mutable reference by moving and mapping the original
(1 answer)
Closed 1 year ago.
Is a function that modifies a &mut T in place by a function FnOnce(T) -> T safe to have in rust, or can it lead to undefined behavior? Is it included in the standard library somewhere, or a well-known crate?
If you additionally assume T: Default, that looks like
fn modify<T, F: FnOnce(T) -> T>(x: &mut T, f: F) -> ()
where
T: Default
{
let val = std::mem::take(x);
let val = f(val);
*x = val;
}
(See also
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=f015812bac6f527fe663fe4e0b7a3188)
My question is about doing the same but dropping the where T: Default clause (and no T: Clone either). This requires a different implementation, since you can't use std::mem::take.
I'm not sure how to implement the unconstrained version, but it should be possible using unsafe Rust.
I'm learning Rust from a background of linear types and sub-structural logic. Rust's mutable borrow seems very similar to moving a resource in and then back out of a function, but I don't know if it is actually safe to take temporary ownership of the contents of a mutable borrow like this.
It is safe, and there are even crates for that (can't find them now).
HOWEVER.
When writing unsafe code, you have to be very careful. If you don't know exactly what you're doing, it can easily lead to UB.
Here, for example, there is something you maybe haven't thought of: panic safety.
Suppose we implement that trivially:
pub fn modify<T, F: FnOnce(T) -> T>(v: &mut T, f: F) {
let prev = unsafe { std::ptr::read(v) };
let new = f(prev);
unsafe { std::ptr::write(v, new) };
}
Trivially right.
Or is it?
fn main() {
struct MyStruct(pub i32);
impl Drop for MyStruct {
fn drop(&mut self) {
println!("MyStruct({}) dropped", self.0);
}
}
let mut v = MyStruct(123);
std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
modify(&mut v, |_prev| {
// `prev` is dropped here.
panic!("Haha, evil panic!");
})
}))
.unwrap_err();
v.0 = 456; // Writing to an uninitialized memory!
// `v` is dropped here, double drop!
}
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=6f7312a8be70cd43cf5cf7a9816be56a
I used a custom type that its destructor does nothing but to print, but imagine what could happen if this was a Vec that freed the memory and we were writing into freed memory (then, as a bonus, get a double-free).
It is correct, like #Kendas said, that when there are no interruption point it is valid to leave memory in an uninitialized state in Rust. The problem is, that much more places than you wish are actually interruption points. In fact, when writing unsafe code, you have to consider any call to external code (i.e. not yours code neither code that you trust to not do bad things, for example std) to be an interruption point.
Unsafe code is hard. Better stay in the safe land.
Edit: You may wonder what the AssertUnwindSafe is. Maybe you even tried to remove it and noticed it doesn't compiler. Well, UnwindSafe is a protection against this, and AssertUnwindSafe is a way to bypass the protection.
You may ask, what's the point? The point is, this protection is really not accurate. So much not accurate, that bypassing it does not even require unsafe. But it still exists, so we have a lower chance of accidental UB.
It doesn't matter to you as the writer of the API - you should act like this protection doesn't exist, because it is safe to bypass it and easy to do so by mistake. The Rust standard library itself had bugs like that in the past (#86443, #81740, ... - It is not an accident that they're both in the same code - those issues tend to appear in chunks. But there're more).
Well, you are replacing the contents of the borrowed memory location with the default value. This would mean that the memory is indeed correct at every point. So there should not be any undefined behavior.
Basically from the perspective of the mutable reference x, you are mutating it to the default value, then mutating it again to a different new value.
In general, if there is a chance of undefined behavior, you will need to use the unsafe keyword. Or somebody has made a mistake while using the unsafe keyword further down the stack. It is relatively rare for these things to happen in the standard library.
Go ahead and look at the safety remarks in the code if you must: https://doc.rust-lang.org/src/core/mem/mod.rs.html#756

Rust: can I have a fixed size slice by borrowing the whole fixed size array in a smaller scope in a simple way

I saw the workarounds and they where kinda long. Am I missing a feature of Rust or a simple solution (Important: not workaround). I feel like I should be able to do this with maybe a simple macro but arrayref crate implementations aren't what I am looking for. Is this a feature that needs to be added to Rust or creating fixed size slicing from fixed sized array in a smaller scope is something bad.
Basically what I want to do is this;
fn f(arr:[u8;4]){
arr[0];
}
fn basic(){
let mut arr:[u8;12] = [0;12];
// can't I borrow the whole array but but a fixed slice to it?
f(&mut arr[8..12]); // But this is know on compile time?
f(&mut arr[8..12] as &[u8;4]); // Why can't I do these things?
}
What I want can be achieved by below code(from other so threads)
use array_ref;
fn foo(){
let buf:[u8;12] = [0;12];
let (_, fixed_slice) = mut_array_refs![
&mut buf,
8,
4
];
write_u32_into(fixed_slice,0);
}
fn write_u32_into(fixed_slice:&mut [u8;12],num:u32){
// won't have to check if fixed_slice.len() == 12 and won't panic
}
But I looked into the crate and even though this never panics there are many unsafe blocks and many lines of code. It is a workaround for the Rust itself. In the first place I wanted something like this to get rid of the overhead of checking the size and the possible runtime panic.
Also this is a little overhead it doesn't matter isn't a valid answer because technically I should be able to guarantee this in compile time even if the overhead is small this doesn't mean rust doesn't need to have this type of feature or I should not be looking for an ideal way.
Note: Can this be solved with lifetimes?
Edit: If we where able to have a different syntax for fixed slices such as arr[12;;16] and when I borrowed them this way it would borrow it would borrow the whole arr. I think this way many functions for example (write_u32) would be implemented in a more "rusty" way.
Use let binding with slice_patterns feature. It was stabilized in Rust 1.42.
let v = [1, 2, 3]; // inferred [i32; 3]
let [_, ref subarray # ..] = v; // subarray is &[i32; 2]
let a = v[0]; // 1
let b = subarray[1]; // 3
Here is a section from the Rust reference about slice patterns.
Why it doesn't work
What you want is not available as a feature in rust stable or nightly because multiple things related to const are not stabilized yet, namely const generics and const traits. The reason traits are involved is because the arr[8..12] is a call to the core::ops::Index::<Range<usize>> trait that returns a reference to a slice, in your case [u8]. This type is unsized and not equal to [u8; 4] even if the compiler could figure out that it is, rust is inherently safe and can be overprotective sometimes to ensure safety.
What can you do then?
You have a few routes you can take to solve this issue, I'll stay in a no_std environment for all this as that seems to be where you're working and will avoid extra crates.
Change the function signature
The current function signature you have takes the four u8s as an owned value. If you only are asking for 4 values you can instead take those values as parameters to the function. This option breaks down when you need larger arrays but at that point, it would be better to take the array as a reference or using the method below.
The most common way, and the best way in my opinion, is to take the array in as a reference to a slice (&[u8] or &mut [u8]). This is not the same as taking a pointer to the value in C, slices in rust also carry the length of themselves so you can safely iterate through them without worrying about buffer overruns or if you read all the data. This does require changing the algorithms below to account for variable-sized input but most of the time there is a just as good option to use.
The safe way
Slice can be converted to arrays using TryInto, but this comes at the cost of runtime size checking which you seem to want to avoid. This is an option though and may result in a minimal performance impact.
Example:
fn f(arr: [u8;4]){
arr[0];
}
fn basic(){
let mut arr:[u8;12] = [0;12];
f(arr[8..12].try_into().unwrap());
}
The unsafe way
If you're willing to leave the land of safety there are quite a few things you can do to force the compiler to recognize the data as you want it to, but they can be abused. It's usually better to use rust idioms rather than force other methods in but this is a valid option.
fn basic(){
let mut arr:[u8;12] = [0;12];
f(unsafe {*(arr[8..12].as_ptr() as *const [u8; 4])});
}
TL;DR
I recommend changing your types to utilize slices rather than arrays but if that's not feasible I'd suggest avoiding unsafety, the performance won't be as bad as you think.

Can I push a reference to a vector with a longer lifetime if I pop() it back just after?

Here, I'm not able to add thing to vector because its lifetime is different than 'a:
pub fn foo<'a>(vec: &'a mut Vec<&'a Thing>) {
let thing: Thing = new_thing();
vec.push(&thing);
// do stuff with vec
vec.pop();
}
Notice that I always remove it from the vector, and the vector isn't reordered further, so this operation should be safe. I think it would be hard to convince that to the compiler, but is there any trick to achieve the same?
Definitely not in safe Rust. The compiler has no idea what Vec::push and Vec::pop do. All it knows is what it can tell from the function signature — that you have to push the same type that the Vec is parameterized with.
Doing this in unsafe Rust is probably possible, but unsafe code is tricky to get right. As loganfsmyth mentions, if you somehow push an "invalid" value into the Vec and then a panic happens, that value is still in the vector after the function has exited. Now the destructor of the Vec can access invalid memory, subverting Rust's guarantees. This is A Bad Thing.
There's probably a better solution to your real problem. Possible avenues:
Use Iterator::chain and iter::once to combine the values into one iterator.
Create a wrapper type around a slice and a single value that exposes the operation(s) you need.

Does a move involve a copy? [duplicate]

Editor's note: this question was asked before Rust 1.0 and some of the assertions in the question are not necessarily true in Rust 1.0. Some answers have been updated to address both versions.
I have this struct
struct Triplet {
one: i32,
two: i32,
three: i32,
}
If I pass this to a function, it is implicitly copied. Now, sometimes I read that some values are not copyable and therefore have to moved.
Would it be possible to make this struct Triplet non-copyable? For example, would it be possible to implement a trait which would make Triplet non-copyable and therefore "movable"?
I read somewhere that one has to implement the Clone trait to copy things that are not implicitly copyable, but I never read about the other way around, that is having something that is implicitly copyable and making it non-copyable so that it moves instead.
Does that even make any sense?
Preface: This answer was written before opt-in built-in traits—specifically the Copy aspects—were implemented. I've used block quotes to indicate the sections that only applied to the old scheme (the one that applied when the question was asked).
Old: To answer the basic question, you can add a marker field storing a NoCopy value. E.g.
struct Triplet {
one: int,
two: int,
three: int,
_marker: NoCopy
}
You can also do it by having a destructor (via implementing the Drop trait), but using the marker types is preferred if the destructor is doing nothing.
Types now move by default, that is, when you define a new type it doesn't implement Copy unless you explicitly implement it for your type:
struct Triplet {
one: i32,
two: i32,
three: i32
}
impl Copy for Triplet {} // add this for copy, leave it out for move
The implementation can only exist if every type contained in the new struct or enum is itself Copy. If not, the compiler will print an error message. It can also only exist if the type doesn't have a Drop implementation.
To answer the question you didn't ask... "what's up with moves and copy?":
Firstly I'll define two different "copies":
a byte copy, which is just shallowly copying an object byte-by-byte, not following pointers, e.g. if you have (&usize, u64), it is 16 bytes on a 64-bit computer, and a shallow copy would be taking those 16 bytes and replicating their value in some other 16-byte chunk of memory, without touching the usize at the other end of the &. That is, it's equivalent to calling memcpy.
a semantic copy, duplicating a value to create a new (somewhat) independent instance that can be safely used separately to the old one. E.g. a semantic copy of an Rc<T> involves just increasing the reference count, and a semantic copy of a Vec<T> involves creating a new allocation, and then semantically copying each stored element from the old to the new. These can be deep copies (e.g. Vec<T>) or shallow (e.g. Rc<T> doesn't touch the stored T), Clone is loosely defined as the smallest amount of work required to semantically copy a value of type T from inside a &T to T.
Rust is like C, every by-value use of a value is a byte copy:
let x: T = ...;
let y: T = x; // byte copy
fn foo(z: T) -> T {
return z // byte copy
}
foo(y) // byte copy
They are byte copies whether or not T moves or is "implicitly copyable". (To be clear, they aren't necessarily literally byte-by-byte copies at run-time: the compiler is free to optimise the copies out if code's behaviour is preserved.)
However, there's a fundamental problem with byte copies: you end up with duplicated values in memory, which can be very bad if they have destructors, e.g.
{
let v: Vec<u8> = vec![1, 2, 3];
let w: Vec<u8> = v;
} // destructors run here
If w was just a plain byte copy of v then there would be two vectors pointing at the same allocation, both with destructors that free it... causing a double free, which is a problem. NB. This would be perfectly fine, if we did a semantic copy of v into w, since then w would be its own independent Vec<u8> and destructors wouldn't be trampling on each other.
There's a few possible fixes here:
Let the programmer handle it, like C. (there's no destructors in C, so it's not as bad... you just get left with memory leaks instead. :P )
Perform a semantic copy implicitly, so that w has its own allocation, like C++ with its copy constructors.
Regard by-value uses as a transfer of ownership, so that v can no longer be used and doesn't have its destructor run.
The last is what Rust does: a move is just a by-value use where the source is statically invalidated, so the compiler prevents further use of the now-invalid memory.
let v: Vec<u8> = vec![1, 2, 3];
let w: Vec<u8> = v;
println!("{}", v); // error: use of moved value
Types that have destructors must move when used by-value (aka when byte copied), since they have management/ownership of some resource (e.g. a memory allocation, or a file handle) and its very unlikely that a byte copy will correctly duplicate this ownership.
"Well... what's an implicit copy?"
Think about a primitive type like u8: a byte copy is simple, just copy the single byte, and a semantic copy is just as simple, copy the single byte. In particular, a byte copy is a semantic copy... Rust even has a built-in trait Copy that captures which types have identical semantic and byte copies.
Hence, for these Copy types by-value uses are automatically semantic copies too, and so it's perfectly safe to continue using the source.
let v: u8 = 1;
let w: u8 = v;
println!("{}", v); // perfectly fine
Old: The NoCopy marker overrides the compiler's automatic behaviour of assuming that types which can be Copy (i.e. only containing aggregates of primitives and &) are Copy. However this will be changing when opt-in built-in traits is implemented.
As mentioned above, opt-in built-in traits are implemented, so the compiler no longer has automatic behaviour. However, the rule used for the automatic behaviour in the past are the same rules for checking whether it is legal to implement Copy.
The easiest way is to embed something in your type that is not copyable.
The standard library provides a "marker type" for exactly this use case: NoCopy. For example:
struct Triplet {
one: i32,
two: i32,
three: i32,
nocopy: NoCopy,
}

Resources