I'm trying to reproduce the code suggested in the MaybeUninit docs. Specifically, it seems to work with specific datatypes, but produces a compiler error on generic types.
Working example (with u32)
use std::mem::{self, MaybeUninit};
fn init_array(t: u32) -> [u32; 1000] {
// Create an uninitialized array of `MaybeUninit`. The `assume_init` is
// safe because the type we are claiming to have initialized here is a
// bunch of `MaybeUninit`s, which do not require initialization.
let mut data: [MaybeUninit<u32>; 1000] = unsafe { MaybeUninit::uninit().assume_init() };
// Dropping a `MaybeUninit` does nothing. Thus using raw pointer
// assignment instead of `ptr::write` does not cause the old
// uninitialized value to be dropped. Also if there is a panic during
// this loop, we have a memory leak, but there is no memory safety
// issue.
for elem in &mut data[..] {
elem.write(t);
}
// Everything is initialized. Transmute the array to the
// initialized type.
unsafe { mem::transmute::<_, [u32; 1000]>(data) }
}
fn main() {
let data = init_array(42);
assert_eq!(&data[0], &42);
}
Failing example (with generic T)
use std::mem::{self, MaybeUninit};
fn init_array<T: Copy>(t: T) -> [T; 1000] {
// Create an uninitialized array of `MaybeUninit`. The `assume_init` is
// safe because the type we are claiming to have initialized here is a
// bunch of `MaybeUninit`s, which do not require initialization.
let mut data: [MaybeUninit<T>; 1000] = unsafe { MaybeUninit::uninit().assume_init() };
// Dropping a `MaybeUninit` does nothing. Thus using raw pointer
// assignment instead of `ptr::write` does not cause the old
// uninitialized value to be dropped. Also if there is a panic during
// this loop, we have a memory leak, but there is no memory safety
// issue.
for elem in &mut data[..] {
elem.write(t);
}
// Everything is initialized. Transmute the array to the
// initialized type.
unsafe { mem::transmute::<_, [T; 1000]>(data) }
}
fn main() {
let data = init_array(42);
assert_eq!(&data[0], &42);
}
error:
Compiling playground v0.0.1 (/playground)
error[E0512]: cannot transmute between types of different sizes, or dependently-sized types
--> src/main.rs:20:14
|
20 | unsafe { mem::transmute::<_, [T; 1000]>(data) }
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: source type: `[MaybeUninit<T>; 1000]` (size can vary because of T)
= note: target type: `[T; 1000]` (size can vary because of T)
For more information about this error, try `rustc --explain E0512`.
error: could not compile `playground` due to previous error
Playground link here
Questions
why is the second example failing? (I thought MaybeUninit<T> could always be transmuted into a T because they'd be guaranteed to have the same memory layout.)
can the example be rewritten to work with generic types?
This is a known issue (related), you can fix the code using the tips of HadrienG2 by doing a more unsafe unsafe thing:
// Everything is initialized. Transmute the array to the
// initialized type.
let ptr = &mut data as *mut _ as *mut [T; 1000];
let res = unsafe { ptr.read() };
core::mem::forget(data);
res
In future we expect to be able to use array_assume_init().
Related
In C, a pointer to a struct can be cast to a pointer to its first member, and vice-versa. That is, the address of a struct is defined to be the address of its first member.
struct Base { int x; };
struct Derived { struct Base base; int y; };
int main() {
struct Derived d = { {5}, 10 };
struct Base *base = &d.base; // OK
printf("%d\n", base->x);
struct Derived *derived = (struct Derived *)base; // OK
printf("%d\n", derived->y);
}
This is commonly used to implement C++-style inheritance.
Is the same thing allowed in Rust if the structs are repr(C) (so that their fields aren't reorganized)?
#[derive(Debug)]
#[repr(C)]
struct Base {
x: usize,
}
#[derive(Debug)]
#[repr(C)]
struct Derived {
base: Base,
y: usize,
}
// safety: `base` should be a reference to `Derived::base`, otherwise this is UB
unsafe fn get_derived_from_base(base: &Base) -> &Derived {
let ptr = base as *const Base as *const Derived;
&*ptr
}
fn main() {
let d = Derived {
base: Base {
x: 5
},
y: 10,
};
let base = &d.base;
println!("{:?}", base);
let derived = unsafe { get_derived_from_base(base) }; // defined behaviour?
println!("{:?}", derived);
}
The code works, but will it always work, and is it defined behaviour?
The way you wrote it, currently not; but it is possible to make it work.
Reference to T is only allowed to access T - no more (it has provenance for T). The expression &d.base gives you a reference that is only valid for Base. Using it to access Derived's fields is undefined behavior. It is not clear this is what we want, and there is active discussion about that (also this), but that is the current behavior. There is a good tool named Miri that allows you to check your Rust code for the presence of some (not all!) undefined behavior (you can run it in the playground; Tools->Miri), and indeed it flags your code:
error: Undefined Behavior: trying to reborrow <untagged> for SharedReadOnly permission at alloc1707[0x8], but that tag does not exist in the borrow stack for this location
--> src/main.rs:17:5
|
17 | &*ptr
| ^^^^^
| |
| trying to reborrow <untagged> for SharedReadOnly permission at alloc1707[0x8], but that tag does not exist in the borrow stack for this location
| this error occurs as part of a reborrow at alloc1707[0x0..0x10]
|
= help: this indicates a potential bug in the program: it performed an invalid operation, but the rules it violated are still experimental
= help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
= note: inside `get_derived_from_base` at src/main.rs:17:5
note: inside `main` at src/main.rs:31:28
--> src/main.rs:31:28
|
31 | let derived = unsafe { get_derived_from_base(base) }; // defined behaviour?
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can make it work by creating a reference to the whole Derived and casting it to a raw pointer to Base. The raw pointer will keep the provenance of the original reference, and thus that will work:
// safety: `base` should be a reference to `Derived`, otherwise this is UB
unsafe fn get_derived_from_base<'a>(base: *const Base) -> &'a Derived {
let ptr = base as *const Derived;
&*ptr
}
fn main() {
let d = Derived {
base: Base {
x: 5
},
y: 10,
};
let base = &d as *const Derived as *const Base;
println!("{:?}", unsafe { &*base });
let derived = unsafe { get_derived_from_base(base) };
println!("{:?}", derived);
}
Note: References should not be involved at all in the process. If you'll reborrow base as a reference of type Base, you will lose the provenance again. This will pass Miri on the playground, but is still undefined behavior per the current rules and will fail Miri with stricter flags (set the environment variable MIRIFLAGS to -Zmiri-tag-raw-pointers before running Miri locally).
Running into an ownership issue when attempting to reference multiple values from a HashMap in a struct as parameters in a function call. Here is a PoC of the issue.
use std::collections::HashMap;
struct Resource {
map: HashMap<String, String>,
}
impl Resource {
pub fn new() -> Self {
Resource {
map: HashMap::new(),
}
}
pub fn load(&mut self, key: String) -> &mut String {
self.map.get_mut(&key).unwrap()
}
}
fn main() {
// Initialize struct containing a HashMap.
let mut res = Resource {
map: HashMap::new(),
};
res.map.insert("Item1".to_string(), "Value1".to_string());
res.map.insert("Item2".to_string(), "Value2".to_string());
// This compiles and runs.
let mut value1 = res.load("Item1".to_string());
single_parameter(value1);
let mut value2 = res.load("Item2".to_string());
single_parameter(value2);
// This has ownership issues.
// multi_parameter(value1, value2);
}
fn single_parameter(value: &String) {
println!("{}", *value);
}
fn multi_parameter(value1: &mut String, value2: &mut String) {
println!("{}", *value1);
println!("{}", *value2);
}
Uncommenting multi_parameter results in the following error:
28 | let mut value1 = res.load("Item1".to_string());
| --- first mutable borrow occurs here
29 | single_parameter(value1);
30 | let mut value2 = res.load("Item2".to_string());
| ^^^ second mutable borrow occurs here
...
34 | multi_parameter(value1, value2);
| ------ first borrow later used here
It would technically be possible for me to break up the function calls (using the single_parameter function approach), but it would be more convenient to pass the
variables to a single function call.
For additional context, the actual program where I'm encountering this issue is an SDL2 game where I'm attempting to pass multiple textures into a single function call to be drawn, where the texture data may be modified within the function.
This is currently not possible, without resorting to unsafe code or interior mutability at least. There is no way for the compiler to know if two calls to load will yield mutable references to different data as it cannot always infer the value of the key. In theory, mutably borrowing both res.map["Item1"] and res.map["Item2"] would be fine as they would refer to different values in the map, but there is no way for the compiler to know this at compile time.
The easiest way to do this, as already mentioned, is to use a structure that allows interior mutability, like RefCell, which typically enforces the memory safety rules at run-time before returning a borrow of the wrapped value. You can also work around the borrow checker in this case by dealing with mut pointers in unsafe code:
pub fn load_many<'a, const N: usize>(&'a mut self, keys: [&str; N]) -> [&'a mut String; N] {
// TODO: Assert that keys are distinct, so that we don't return
// multiple references to the same value
keys.map(|key| self.load(key) as *mut _)
.map(|ptr| unsafe { &mut *ptr })
}
Rust Playground
The TODO is important, as this assertion is the only way to ensure that the safety invariant of only having one mutable reference to any value at any time is upheld.
It is, however, almost always better (and easier) to use a known safe interior mutation abstraction like RefCell rather than writing your own unsafe code.
I am trying to write a packet parser, where basically one builds up a packet by parsing each Layer in the packet. The packet then holds those 'layers' in a vector.
The ~pseudo code~ code with compilation errors is something like the following -
Also added comments below - for each step. I have experimented with RefCell , but could not get that working. Essentially the challenges are enumerated at the end of the code.
The basic pattern is as follows - Get the object of a Layer type (Every Layer type will return a default next object based upon some field in the current layer as a 'boxed trait object'.)
Edit: I am adding a code that's more than a pseudo code - Also added following compilation errors. May be a way to figure out how to fix these errors could solve the problems.!
#[derive(Debug, Default)]
pub struct Packet<'a> {
data: Option<&'a [u8]>,
meta: PacketMetadata,
layers: Vec<Box<dyn Layer<'a>>>,
}
pub trait Layer<'a>: Debug {
fn from_u8<'b>(&mut self, bytes: &'b [u8]) -> Result<(Option<Box<dyn Layer>>, usize), Error>;
}
#[derive(Debug, Default)]
pub struct PacketMetadata {
timestamp: Timestamp,
inface: i8,
len: u16,
caplen: u16,
}
impl<'a> Packet<'a> {
fn from_u8(bytes: &'a [u8], _encap: EncapType) -> Result<Self, Error> {
let mut p = Packet::default();
let eth = ethernet::Ethernet::default();
let mut layer: RefCell<Box<dyn Layer>> = RefCell::new(Box::new(eth));
let mut res: (Option<Box<dyn Layer>>, usize);
let mut start = 0;
loop {
let mut decode_layer = layer.borrow_mut();
// process it
res = decode_layer.from_u8(&bytes[start..])?;
if res.0.is_none() {
break;
}
// if the layer exists, get it in a layer.
let boxed = layer.replace(res.0.unwrap());
start = res.1;
// append the layer to layers.
p.layers.push(boxed);
}
Ok(p)
}
}
Compilation Errors
error[E0515]: cannot return value referencing local variable `decode_layer`
--> src/lib.rs:81:9
|
68 | res = decode_layer.from_u8(&bytes[start..])?;
| ------------ `decode_layer` is borrowed here
...
81 | Ok(p)
| ^^^^^ returns a value referencing data owned by the current function
error[E0515]: cannot return value referencing local variable `layer`
--> src/lib.rs:81:9
|
65 | let mut decode_layer = layer.borrow_mut();
| ----- `layer` is borrowed here
...
81 | Ok(p)
| ^^^^^ returns a value referencing data owned by the current function
error: aborting due to 2 previous errors; 3 warnings emitted
It's not clear why the above errors come. I am using the values returned by the calls. (The 3: warnings shown above can be ignored, they are unused warnings.)
The challenges -
p.layers.last_mut and p.layers.push are simultaneous mutable borrows - not allowed. I could somehow put it behind a RefCell, but how that's not clear.
This code is similar in pattern to syn::token::Tokens, however one basic difference being, there an Enum is used(TokenTree). In the above example I cannot use Enum because the list of protocols to be supported is potentially unbounded.
I cannot use Layer trait without Trait Objects due to the loop construct.
The pattern can be thought of as - mutably iterating over a container of Trait objects while updating the container itself.
Perhaps I am missing something very basic.
The problem with the above code is due to lifetime annotation on the Layer trait. If that lifetime annotation is removed, the above code indeed compiles with a few modifications as posted below -
// Layer Trait definition
pub trait Layer: Debug {
fn from_u8(&mut self, bytes: &[u8]) -> Result<(Option<Box<dyn Layer>>, usize), Error>;
}
impl<'a> Packet<'a> {
fn from_u8(bytes: &'a [u8], _encap: EncapType) -> Result<Self, Error> {
let mut p = Packet::default();
let eth = ethernet::Ethernet::default();
let layer: RefCell<Box<dyn Layer>> = RefCell::new(Box::new(eth));
let mut res: (Option<Box<dyn Layer>>, usize);
let mut start = 0;
loop {
{
// Do a `borrow_mut` in it's own scope, that gets dropped at the end.
let mut decode_layer = layer.borrow_mut();
res = decode_layer.from_u8(&bytes[start..])?;
}
if res.0.is_none() {
// This is just required to push something to the RefCell, that will get dropped anyways.
let fake_boxed = Box::new(FakeLayer {});
let boxed = layer.replace(fake_boxed);
p.layers.push(boxed);
break;
}
// if the layer exists, get it in a layer.
let boxed = layer.replace(res.0.unwrap());
start = res.1;
// append the layer to layers.
p.layers.push(boxed);
}
Ok(p)
}
}
I am writing code to initialize an array of MaybeUninits and drop all initialized elements in case of a panic. Miri complains about undefined behaviour, which I've reduced to the example below.
use std::mem::{transmute, MaybeUninit};
fn main() {
unsafe {
let mut item: MaybeUninit<String> = MaybeUninit::uninit();
let ptr = item.as_mut_ptr();
item = MaybeUninit::new(String::from("Hello"));
println!("{}", transmute::<_, &String>(&item));
ptr.drop_in_place();
}
}
Error message produced by cargo miri run:
error: Undefined Behavior: trying to reborrow for SharedReadWrite at alloc1336, but parent tag <untagged> does not have an appropriate item in the borrow stack
--> /home/antek/.local/opt/rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:179:1
|
179 | pub unsafe fn drop_in_place<T: ?Sized>(to_drop: *mut T) {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ trying to reborrow for SharedReadWrite at alloc1336, but parent tag <untagged> does not have an appropriate item in the borrow stack
|
As far as I can tell, this is exactly how MaybeUninit is supposed to be used.
Am I using the interface incorrectly and invoking undefined behaviour, or is Miri being overly conservative?
Your issue is actually completely unrelated to MaybeUninit and can be boiled down to:
fn main() {
let mut item = 0;
let ptr = &item as *const _;
item = 1;
// or do anything with the pointer
unsafe { &*ptr; }
}
Playground link
Which throws the same "undefined behavior" error when run with Miri. From what I could gather by reading their reference, it seems like Miri has some metadata that keeps track of which pointers are allowed to read an item, and when you reassign that item, the metadata is wiped. However, reassigning values shouldn't change their memory address, so I would say that this is a bug with Miri and not UB.
In fact, changing ptr to a *mut i32 then using ptr.write instead of the assignment actually gets rid of the UB warning, which means it's probably a false positive.
Given the following code:
trait Function {
fn filter (&self);
}
#[derive(Debug, Copy, Clone)]
struct Kidney {}
impl Function for Kidney {
fn filter (&self) {
println!("filtered");
}
}
fn main() {
let k = Kidney {};
let f: &Function = &k;
//let k1 = (*f); //--> This gives a "size not satisfied" error
(*f).filter(); //--> Works; what exactly happens here?
}
I am not sure why it compiles. I was expecting the last statement to fail. I guess I have overlooked some fundamentals while learning Rust, as I am failing to understand why dereferencing a trait (that lives behind a pointer) should compile.
Is this issue similar to the following case?
let v = vec![1, 2, 3, 4];
//let s: &[i32] = *v;
println!("{}", (*v)[0]);
*v gives a slice, but a slice is unsized, so again it is not clear to me how this compiles. If I uncomment the second statement I get
| let s:&[i32]= *v;
| ^^
| |
| expected &[i32], found slice
| help: consider borrowing here: `&*v`
|
= note: expected type `&[i32]`
found type `[{integer}]`
Does expected type &[i32] mean "expected a reference of slice"?
Dereferencing a trait object is no problem. In fact, it must be dereferenced at some point, otherwise it would be quite useless.
let k1 = (*f); fails not because of dereferencing but because you try to put the raw trait object on the stack (this is where local variables live). Values on the stack must have a size known at compile time, which is not the case for trait objects because any type could implement the trait.
Here is an example where a structs with different sizes implement the trait:
trait Function {
fn filter (&self);
}
#[derive(Debug, Copy, Clone)]
struct Kidney {}
impl Function for Kidney {
fn filter (&self) {
println!("filtered");
}
}
#[derive(Debug, Copy, Clone)]
struct Liver {
size: f32
}
impl Function for Liver {
fn filter (&self) {
println!("filtered too!");
}
}
fn main() {
let k = Kidney {};
let l = Liver {size: 1.0};
let f: &Function;
if true {
f = &k;
} else {
f = &l;
}
// Now what is the size of *f - Kidney (0 bytes) or Liver (4 bytes)?
}
(*f).filter(); works because the temporarily dereferenced object is not put on the stack. In fact, this is the same as f.filter(). Rust automatically applies as many dereferences as required to get to an actual object. This is documented in the book.
What happens in the second case is that Vec implements Deref to slices, so it gets all methods implemented for slices for free. *v gives you a dereferenced slice, which you assign to a slice. This is an obvious type error.
Judging by the MIR produced by the first piece of code, (*f).filter() is equivalent to f.filter(); it appears that the compiler is aware that since filter is a method on &self, dereferencing it doesn't serve any purpose and is omitted altogether.
The second case, however, is different, because dereferencing the slice introduces bounds-checking code. In my opinion the compiler should also be able to tell that this operation (dereferencing) doesn't introduce any meaningful changes (and/or that there won't be an out-of-bounds error) and treat it as regular slice indexing, but there might be some reason behind this.