How can you easily borrow a vector of vectors as a slice of slices?
fn use_slice_of_slices<T>(slice_of_slices: &[&[T]]) {
// Do something...
}
fn main() {
let vec_of_vec = vec![vec![0]; 10];
use_slice_of_slices(&vec_of_vec);
}
I will get the following error:
error[E0308]: mismatched types
--> src/main.rs:7:25
|
7 | use_slice_of_slices(&vec_of_vec);
| ^^^^^^^^^^^ expected slice, found struct `std::vec::Vec`
|
= note: expected type `&[&[_]]`
found type `&std::vec::Vec<std::vec::Vec<{integer}>>`
I could just as easily define use_slice_of_slices as
fn use_slice_of_slices<T>(slice_of_slices: &[Vec<T>]) {
// Do something
}
and the outer vector would be borrowed as a slice and all would work. But what if, just for the sake of argument, I want to borrow it as a slice of slices?
Assuming automatic coercing from &Vec<Vec<T>> to &[&[T]] is not possible, then how can I define a function borrow_vec_of_vec as below?
fn borrow_vec_of_vec<'a, T: 'a>(vec_of_vec: Vec<Vec<T>>) -> &'a [&'a [T]] {
// Borrow vec_of_vec...
}
To put it in another way, how could I implement Borrow<[&[T]]> for Vec<Vec<T>>?
You cannot.
By definition, a slice is a view on an existing collection of element. It cannot conjure up new elements, or new views of existing elements, out of thin air.
This stems from the fact that Rust generic parameters are generally invariants. That is, while a &Vec<T> can be converted as a &[T] after a fashion, the T in those two expressions MUST match.
A possible work-around is to go generic yourself.
use std::fmt::Debug;
fn use_slice_of_slices<U, T>(slice_of_slices: &[U])
where
U: AsRef<[T]>,
T: Debug,
{
for slice in slice_of_slices {
println!("{:?}", slice.as_ref());
}
}
fn main() {
let vec_of_vec = vec![vec![0]; 10];
use_slice_of_slices(&vec_of_vec);
}
Instead of imposing what the type of the element should be, you instead accept any type... but place a bound that it must be coercible to [T].
This has nearly the same effect, as then the generic function can only manipulate [T] as a slice. As a bonus, it works with multiple types (any which can be coerced into a [T]).
A deref coercion from Vec<T> to &[T] is cheap. A Vec<T> is represented by a struct essentially containing a pointer to the heap-allocated data, the capacity of the heap allocation and the current length of the vector. A slice &[T] is a fat pointer consisting of a pointer to the data and the length of the slice. The conversion from Vec<T> to &[T] essentially requires to copy the pointer and the length from the Vec<T> struct to a new fat pointer.
If we want to convert from Vec<Vec<T>> to &[&[T]], we need to perform the above conversion for each of the inner vectors. This means we need to store an unknown number of fat pointers somewhere. This requires to allocate space for these fat pointers somewhere. When converting a single vector, the compiler will reserve space for the single resulting fat pointer on the stack. For an unknown, potentially large, number of fat pointers this is not possible, and the conversion also isn't cheap anymore. This is the reason this conversion isn't easily possible, and you need to write explicit code for it.
So whenever you can, you should instead change your function signature as suggested in Matthieu's answer. If you don't control the function signature, your only choice is to write the explicit conversion code, allocating a new vector:
fn vecs_to_slices<T>(vecs: &[Vec<T>]) -> Vec<&[T]> {
vecs.iter().map(Vec::as_slice).collect()
}
Applied to the functions in the original post, this can be used like this:
use_slice_of_slices(&vecs_to_slice(&vec_of_vec));
I've read the term "fat pointer" in several contexts already, but I'm not sure what exactly it means and when it is used in Rust. The pointer seems to be twice as large as a normal pointer, but I don't understand why. It also seems to have something to do with trait objects.
The term "fat pointer" is used to refer to references and raw pointers to dynamically sized types (DSTs) – slices or trait objects. A fat pointer contains a pointer plus some information that makes the DST "complete" (e.g. the length).
Most commonly used types in Rust are not DSTs but have a fixed size known at compile time. These types implement the Sized trait. Even types that manage a heap buffer of dynamic size (like Vec<T>) are Sized, as the compiler knows the exact number of bytes a Vec<T> instance will take up on the stack. There are currently four different kinds of DSTs in Rust.
Slices ([T] and str)
The type [T] (for any T) is dynamically sized (so is the special "string slice" type str). That's why you usually only see it as &[T] or &mut [T], i.e. behind a reference. This reference is a so-called "fat pointer". Let's check:
dbg!(size_of::<&u32>());
dbg!(size_of::<&[u32; 2]>());
dbg!(size_of::<&[u32]>());
This prints (with some cleanup):
size_of::<&u32>() = 8
size_of::<&[u32; 2]>() = 8
size_of::<&[u32]>() = 16
So we see that a reference to a normal type like u32 is 8 bytes large, as is a reference to an array [u32; 2]. Those two types are not DSTs. But as [u32] is a DST, the reference to it is twice as large. In the case of slices, the additional data that "completes" the DST is simply the length. So one could say the representation of &[u32] is something like this:
struct SliceRef {
ptr: *const u32,
len: usize,
}
Trait objects (dyn Trait)
When using traits as trait objects (i.e. type erased, dynamically dispatched), these trait objects are DSTs. Example:
trait Animal {
fn speak(&self);
}
struct Cat;
impl Animal for Cat {
fn speak(&self) {
println!("meow");
}
}
dbg!(size_of::<&Cat>());
dbg!(size_of::<&dyn Animal>());
This prints (with some cleanup):
size_of::<&Cat>() = 8
size_of::<&dyn Animal>() = 16
Again, &Cat is only 8 bytes large because Cat is a normal type. But dyn Animal is a trait object and therefore dynamically sized. As such, &dyn Animal is 16 bytes large.
In the case of trait objects, the additional data that completes the DST is a pointer to the vtable (the vptr). I cannot fully explain the concept of vtables and vptrs here, but they are used to call the correct method implementation in this virtual dispatch context. The vtable is a static piece of data that basically only contains a function pointer for each method. With that, a reference to a trait object is basically represented as:
struct TraitObjectRef {
data_ptr: *const (),
vptr: *const (),
}
(This is different from C++, where the vptr for abstract classes is stored within the object. Both approaches have advantages and disadvantages.)
Custom DSTs
It's actually possible to create your own DSTs by having a struct where the last field is a DST. This is rather rare, though. One prominent example is std::path::Path.
A reference or pointer to the custom DST is also a fat pointer. The additional data depends on the kind of DST inside the struct.
Exception: Extern types
In RFC 1861, the extern type feature was introduced. Extern types are also DSTs, but pointers to them are not fat pointers. Or more exactly, as the RFC puts it:
In Rust, pointers to DSTs carry metadata about the object being pointed to. For strings and slices this is the length of the buffer, for trait objects this is the object's vtable. For extern types the metadata is simply (). This means that a pointer to an extern type has the same size as a usize (ie. it is not a "fat pointer").
But if you are not interacting with a C interface, you probably won't ever have to deal with these extern types.
Above, we've seen the sizes for immutable references. Fat pointers work the same for mutable references, immutable raw pointers and mutable raw pointers:
size_of::<&[u32]>() = 16
size_of::<&mut [u32]>() = 16
size_of::<*const [u32]>() = 16
size_of::<*mut [u32]>() = 16
I'm a bit confused about how pointers work in Rust. There's ref, Box, &, *, and I'm not sure how they work together.
Here's how I understand it currently:
Box isn't really a pointer - it's a way to allocate data on the heap, and pass around unsized types (traits especially) in function arguments.
ref is used in pattern matching to borrow something that you match on, instead of taking it. For example,
let thing: Option<i32> = Some(4);
match thing {
None => println!("none!"),
Some(ref x) => println!("{}", x), // x is a borrowed thing
}
println!("{}", x + 1); // wouldn't work without the ref since the block would have taken ownership of the data
& is used to make a borrow (borrowed pointer). If I have a function fn foo(&self) then I'm taking a reference to myself that will expire after the function terminates, leaving the caller's data alone. I can also pass data that I want to retain ownership of by doing bar(&mydata).
* is used to make a raw pointer: for example, let y: i32 = 4; let x = &y as *const i32. I understand pointers in C/C++ but I'm not sure how this works with Rust's type system, and how they can be safely used. I'm also not sure what the use cases are for this type of pointer. Additionally, the * symbol can be used to dereference things (what things, and why?).
Could someone explain the 4th type of pointer to me, and verify that my understanding of the other types is correct? I'd also appreciate anyone pointing out any common use cases that I haven't mentioned.
First of all, all of the items you listed are really different things, even if they are related to pointers. Box is a library-defined smart pointer type; ref is a syntax for pattern matching; & is a reference operator, doubling as a sigil in reference types; * is a dereference operator, doubling as a sigil in raw pointer types. See below for more explanation.
There are four basic pointer types in Rust which can be divided in two groups - references and raw pointers:
&T - immutable (shared) reference
&mut T - mutable (exclusive) reference
*const T - immutable raw pointer
*mut T - mutable raw pointer
The difference between the last two is very thin, because either can be cast to another without any restrictions, so const/mut distinction there serves mostly as a lint. Raw pointers can be created freely to anything, and they also can be created out of thin air from integers, for example.
Naturally, this is not so for references - reference types and their interaction define one of the key feature of Rust: borrowing. References have a lot of restrictions on how and when they could be created, how they could be used and how they interact with each other. In return, they can be used without unsafe blocks. What borrowing is exactly and how it works is out of scope of this answer, though.
Both references and raw pointers can be created using & operator:
let x: u32 = 12;
let ref1: &u32 = &x;
let raw1: *const u32 = &x;
let ref2: &mut u32 = &mut x;
let raw2: *mut u32 = &mut x;
Both references and raw pointers can be dereferenced using * operator, though for raw pointers it requires an unsafe block:
*ref1; *ref2;
unsafe { *raw1; *raw2; }
The dereference operator is often omitted, because another operator, the "dot" operator (i.e., .), automatically references or dereferences its left argument. So, for example, if we have these definitions:
struct X { n: u32 };
impl X {
fn method(&self) -> u32 { self.n }
}
then, despite that method() takes self by reference, self.n automatically dereferences it, so you won't have to type (*self).n. Similar thing happens when method() is called:
let x = X { n: 12 };
let n = x.method();
Here, the compiler automatically references x in x.method(), so you won't have to write (&x).method().
The next to last piece of code also demonstrated the special &self syntax. It means just self: &Self, or, more specifically, self: &X in this example. &mut self, *const self, *mut self also work.
So, references are the main pointer kind in Rust and should be used almost always. Raw pointers, which don't have restrictions of references, should be used in low-level code implementing high-level abstractions (collections, smart pointers, etc.) and in FFI (interacting with C libraries).
Rust also has dynamically-sized (or unsized) types. These types do not have a definite statically-known size and therefore can only be used through a pointer/reference. However, only a pointer is not enough - additional information is needed, for example, length for slices or a pointer to a virtual methods table for trait objects. This information is "embedded" in pointers to unsized types, making these pointers "fat".
A fat pointer is basically a structure which contains the actual pointer to the piece of data and some additional information (length for slices, pointer to vtable for trait objects). What's important here is that Rust handles these details about pointer contents absolutely transparently for the user - if you pass &[u32] or *mut SomeTrait values around, corresponding internal information will be automatically passed along.
Box<T> is one of the smart pointers in the Rust standard library. It provides a way to allocate enough memory on the heap to store a value of the corresponding type, and then it serves as a handle, a pointer to that memory. Box<T> owns the data it points to; when it is dropped, the corresponding piece of memory on the heap is deallocated.
A very useful way to think of boxes is to consider them as regular values, but with a fixed size. That is, Box<T> is equivalent to just T, except it always takes a number of bytes which correspond to the pointer size of your machine. We say that (owned) boxes provide value semantics. Internally, they are implemented using raw pointers, like almost any other high-level abstraction.
Boxes (in fact, this is true for almost all of the other smart pointers, like Rc) can also be borrowed: you can get a &T out of Box<T>. This can happen automatically with the . operator or you can do it explicitly by dereferencing and referencing it again:
let x: Box<u32> = Box::new(12);
let y: &u32 = &*x;
In this regard, Boxes are similar to built-in pointers - you can use dereference operator to reach their contents. This is possible because the dereference operator in Rust is overloadable, and it is overloaded for most (if not all) of the smart pointer types. This allows easy borrowing of these pointers contents.
And, finally, ref is just a syntax in patterns to obtain a variable of the reference type instead of a value. For example:
let x: u32 = 12;
let y = x; // y: u32, a copy of x
let ref z = x; // z: &u32, points to x
let ref mut zz = x; // zz: &mut u32, points to x
While the above example can be rewritten with reference operators:
let z = &x;
let zz = &mut x;
(which would also make it more idiomatic), there are cases when refs are indispensable, for example, when taking references into enum variants:
let x: Option<Vec<u32>> = ...;
match x {
Some(ref v) => ...
None => ...
}
In the above example, x is only borrowed inside the whole match statement, which allows using x after this match. If we write it as such:
match x {
Some(v) => ...
None => ...
}
then x will be consumed by this match and will become unusable after it.
Box is logically a newtype around a raw pointer (*const T). However, it allocates and deallocates its data during construction and destruction, so does not have to borrow data from some other source.
The same thing is true of other pointer types, like Rc - a reference counted pointer. These are structs containing private raw pointers which they allocate into and deallocate from.
A raw pointer has exactly the same layout as a a normal pointer, so are not compatible with C pointers in several cases. Importantly, *const str and *const [T] are fat pointers, which means they contain extra information about the value's length.
However, raw pointers makes absolutely no guarantees as to their validity. For example, I can safely do
123 as *const String
This pointer is invalid, since the memory location 123 does not point to a valid String. Thus, when dereferencing one, an unsafe block is required.
Further, whereas borrows are required to respect certain laws - namely that you cannot have multiple borrows if one is mutable - raw pointers do not have to respect this. There are other, weaker, laws that must be obeyed, but you're less likely to run afoul of these.
There is no logical difference between *mut and *const, although they may need to be casted to the other to do certain operations - the difference is documentative.
References and raw pointers are the same thing at the implementation level. The difference from the programmer perspective is that references are safe (in Rust terms), but raw pointers are not.
The borrow checker guarantees that references are always valid (lifetime management), that you can have only one mutable reference at time, etc.
These type of constraint can be too strict for many use cases, so raw pointers (which do not have any constraints, like in C/C++) are useful to implement low-level data structures, and in general low-level stuff. However, you can only dereference raw pointers or do operations on them inside an unsafe block.
The containers in the standard library are implemented using raw pointers, Box and Rc too.
Box and Rc are what smart pointers are in C++, that is wrappers around raw pointers.
I would like to add my two cents.
A. Table
Reference/Pointer
DataLocation
Mutable
SharedOwnership
Safe
implCopy
&T
stack
❌
✔️️
✔️
✔️
&mut T
stack
✔️
❌
✔️
❌
*const T
stack
❌
✔️
❌
✔️
*mut T
stack
✔️
✔️
❌
✔️
Box<T>
heap
✔️
❌
✔️
❌
Rc<T>
heap
❌
✔️
✔️
❌
B. Comments on table
&T
Mutable (❌): Error: cannot assign to *some_ref, which is behind a & reference some_ref is a & reference, so the data it refers to cannot be written rustc (E0594).
Shared (✔️)
Safe (✔️)
impl Copy (✔️)
&mut T
Mutable (✔️)
Shared (❌): Has only one owner. Error: cannot borrow x as mutable more than once at a time second mutable borrow occurs here rustc (E0499).
Safe (✔️)
impl Copy (❌): Error: move occurs because some_ref has type &mut u32, which does not implement the Copy trait.
*const T
Mutable: (❌): Error: cannot assign to *some_raw_pointer, which is behind a *const pointer raw1 is a *const pointer, so the data it refers to cannot be written rustc (E0594).
Shared (✔️)
Safe: (❌): Error: dereference of raw pointer is unsafe and requires unsafe function or block raw pointers may be null, dangling or unaligned; they can violate aliasing rules and cause data races: all of these are undefined behavior rustc (E0133).
impl Copy (✔️): Please check the official documentation.
*mut T
Mutable (✔️)
Shared (✔️)
Safe (❌): Error: dereference of raw pointer is unsafe and requires unsafe function or block
raw pointers may be null, dangling or unaligned; they can violate aliasing rules and cause data races: all of these are undefined behavior rustc (E0133).
impl Copy (✔️): Please check the Official Documentation.
Box<T>
Mutable (✔️)
Shared (❌): In order to prove it, use a reference to a box in some scope, the reference will drop right after that scope ends because it has only one owner. Please refer to this SO answer for more details. Error: some_box does not live long enough borrowed value does not live long enough rustc (E0597).
Safe (✔️)
impl Copy (❌): Please check the Official Documentation. Actually there is a reason:
You can't implement Copy for Box, that would allow creation of multiple boxes referencing the same thing.
Rc<T>
Mutable (❌): Well, only one copy is mutable, and it's a bit more complicated. Error: cannot assign to data in an Rc trait DerefMut is required to modify through a dereference, but it is not implemented for Rc<u32> rustc (E0594).
Shared (✔️): Actually it's multiple ownership.
Safe (✔️)
impl Copy (❌): Please check the Official Documentation.
C. Related Notes
1. Copy trait vs move:
According to the official documentation:
It’s important to note that in these two examples, the only difference is whether you are allowed to access x after the assignment. Under the hood, both a copy and a move can result in bits being copied in memory, although this is sometimes optimized away.
So, be aware that move transfers ownership, while Copy has nothing to do with it.
2. Mutable References do not implement Copy
Some types can’t be copied safely. For example, copying &mut T would create an aliased mutable reference. Copying String would duplicate responsibility for managing the String’s buffer, leading to a double free.
It's good anyway to read the full Copy documentation page.
3. Dereferencing Pointers and Unsafe
The term unsafe here means that you won't be able to dereference the pointer unless with an unsafe function or block. Otherwise, you'll get the following error:
dereference of raw pointer is unsafe and requires unsafe function or block raw pointers may be null, dangling or unaligned; they can violate aliasing rules and cause data races: all of these are undefined behavior rustc (E0133).
4. ref is the same as &
Box is a smart pointer which is a data type. it is not just a simple pointer to the address in the memory. Box pointer is the owner of the value.
fn main(){
// this will point to a value 0.1 which will be stored on the HEAP
// the var heap_value is just the address and it will be stored in the stack
// Box pointer is the owner of the value
let heap_value=Box::new(0.1);
// "x" is a primitive type, it will have a fixed size and therefore will be stored on the stack.
let x=0.1;
// * dereference which means just get the stored value
println!("they are equal or not {}",x==*heap_value); // true
}
Dereference a tuple:
fn main(){
let coord=Box::new((25,50));
// x is a pointer
let x=coord;
// to extract all the tuple data structure
// if you are behind a reference and you need to use the value
let extracted_tuple=*x;
}
type of "x" pointer is: Box<(i32, i32)>
type of "extracted_tuple" is (i32, i32)
Keep in mind that references are always stack allocated, because they are fixed size
fn main(){
let stack_var=10;
// this is the reference of stack_var. they both are on the stack.
// this will point to the above +
let stack_ref=&stack_var;
// this will create a box pointer. heap memory will be allocated
// copy of stack_var will be stored on the heap, heap_var points to that memory
let heap_var=Box::new(stack_var);
println!("heap var is {}",heap_var);
}
this image explains above function
As you said ref is used in pattern matching to borrow something that you match on. Instead of using ref keyword,
&thing is used
let thing: Option<i32> = Some(4);
match &thing {
None => println!("none!"),
Some(x) => println!("{}", x), // x is a borrowed thing
}
println!("{}", x + 1);
How can you easily borrow a vector of vectors as a slice of slices?
fn use_slice_of_slices<T>(slice_of_slices: &[&[T]]) {
// Do something...
}
fn main() {
let vec_of_vec = vec![vec![0]; 10];
use_slice_of_slices(&vec_of_vec);
}
I will get the following error:
error[E0308]: mismatched types
--> src/main.rs:7:25
|
7 | use_slice_of_slices(&vec_of_vec);
| ^^^^^^^^^^^ expected slice, found struct `std::vec::Vec`
|
= note: expected type `&[&[_]]`
found type `&std::vec::Vec<std::vec::Vec<{integer}>>`
I could just as easily define use_slice_of_slices as
fn use_slice_of_slices<T>(slice_of_slices: &[Vec<T>]) {
// Do something
}
and the outer vector would be borrowed as a slice and all would work. But what if, just for the sake of argument, I want to borrow it as a slice of slices?
Assuming automatic coercing from &Vec<Vec<T>> to &[&[T]] is not possible, then how can I define a function borrow_vec_of_vec as below?
fn borrow_vec_of_vec<'a, T: 'a>(vec_of_vec: Vec<Vec<T>>) -> &'a [&'a [T]] {
// Borrow vec_of_vec...
}
To put it in another way, how could I implement Borrow<[&[T]]> for Vec<Vec<T>>?
You cannot.
By definition, a slice is a view on an existing collection of element. It cannot conjure up new elements, or new views of existing elements, out of thin air.
This stems from the fact that Rust generic parameters are generally invariants. That is, while a &Vec<T> can be converted as a &[T] after a fashion, the T in those two expressions MUST match.
A possible work-around is to go generic yourself.
use std::fmt::Debug;
fn use_slice_of_slices<U, T>(slice_of_slices: &[U])
where
U: AsRef<[T]>,
T: Debug,
{
for slice in slice_of_slices {
println!("{:?}", slice.as_ref());
}
}
fn main() {
let vec_of_vec = vec![vec![0]; 10];
use_slice_of_slices(&vec_of_vec);
}
Instead of imposing what the type of the element should be, you instead accept any type... but place a bound that it must be coercible to [T].
This has nearly the same effect, as then the generic function can only manipulate [T] as a slice. As a bonus, it works with multiple types (any which can be coerced into a [T]).
A deref coercion from Vec<T> to &[T] is cheap. A Vec<T> is represented by a struct essentially containing a pointer to the heap-allocated data, the capacity of the heap allocation and the current length of the vector. A slice &[T] is a fat pointer consisting of a pointer to the data and the length of the slice. The conversion from Vec<T> to &[T] essentially requires to copy the pointer and the length from the Vec<T> struct to a new fat pointer.
If we want to convert from Vec<Vec<T>> to &[&[T]], we need to perform the above conversion for each of the inner vectors. This means we need to store an unknown number of fat pointers somewhere. This requires to allocate space for these fat pointers somewhere. When converting a single vector, the compiler will reserve space for the single resulting fat pointer on the stack. For an unknown, potentially large, number of fat pointers this is not possible, and the conversion also isn't cheap anymore. This is the reason this conversion isn't easily possible, and you need to write explicit code for it.
So whenever you can, you should instead change your function signature as suggested in Matthieu's answer. If you don't control the function signature, your only choice is to write the explicit conversion code, allocating a new vector:
fn vecs_to_slices<T>(vecs: &[Vec<T>]) -> Vec<&[T]> {
vecs.iter().map(Vec::as_slice).collect()
}
Applied to the functions in the original post, this can be used like this:
use_slice_of_slices(&vecs_to_slice(&vec_of_vec));
I am looking at the code of from_raw_parts_mut:
pub unsafe fn from_raw_parts_mut<'a, T>(p: *mut T, len: usize) -> &'a mut [T] {
mem::transmute(Repr { data: p, len: len })
}
It uses transmute to reinterpret a Repr to a &mut [T]. As far as I understand, Repr is a 128 bit struct. How does this transmute of differently sized types work?
mem::transmute() does only work when transmuting to a type of the same size - so that means an &mut[T] slice is also the same size.
Looking at Repr:
#[repr(C)]
struct Repr<T> {
pub data: *const T,
pub len: usize,
}
It has a pointer to some data and a length. This is exactly what a slice is - a pointer to an array of items (which might be an actual array, or owned by a Vec<T>, etc.) with a length to say how many items are valid.
The object which is passed around as a slice is (under the covers) exactly what the Repr looks like, even though the data it refers to can be anything from 0 to as many T as will fit into memory.
In Rust, some references are not just implemented as a pointer as in some other languages. Some types are "fat pointers". This might not be obvious at first since, especially if you are familiar with references/pointers in some other languages! Some examples are:
Slices &[T] and &mut [T], which as described above, are actually a pointer and length. The length is needed for bounds checks. For example, you can pass a slice corresponding to part of an array or Vec to a function.
Trait objects like &Trait or Box<Trait>, where Trait is a trait rather than a concrete type, are actually a pointer to the concrete type and a pointer to a vtable — the information needed to call trait methods on the object, given that its concrete type is not known.