What is Option<Box<T>> in rust - rust

I know Box is a smart pointer that is allocated on the heap. so I can transform a primitive stack-allocated array into a 'dynamic' heap-allocated array using the Box.
But in terms of creating a data structure, let's say a tree, why would I need to Box the pointers?

Unlike pointers in C, Box<T> can't be null. It always holds a non-null pointer to T. Option<Box<T>> represents the concept of a nullable pointer to a heap allocated object.
If you were to store a plain Box<T> inside a T then each object would contain a non-null pointer to another distinct object. Without nullability, and since Box<T> is necessarily acyclic, you'd end up with an infinitely large data structure.

If a struct, enum or union were to directly or indirectly contain itself, without using some kind of pointer, the size would have to be infinite, which isn't possible. By using Option<Box<T>>, you only allocate more space when you actually need it.

Box pointer is used to determine the size of recursive types. For example
enum List{
// Cons refers to List itself
// since this is a recursive type its size cannot be determined at compile time
Cons(i32,List),
Nil,
}
Rust cannot figure out how much space it has to store because of recursion. Instead we write this:
#[derive(Debug)]
enum List{
// Cons is similar to a linked list
Cons(i32,Box<List>),
Nil,
}
now compiler knows that List type has only i32 and Box pointer which is fixed size. when we define a list variable using List enum
let list_var=Cons(1,Box::new(Cons(10,Box::new(Cons(100,Box::new(Nil))))));
the problem here is everytime we create Cons variant we wrap it with Box even when we have to write Nil. 'Nilis something that has no associated value but with wrapping it withBox` pointer we are still allocating heap space and this is unnecessary. To overcome this we write this
enum List{
// we avoid allocating heap space for Nil
// Option's None will replace Nil
Cons(i32, Option<Box<List>>),
// Nil, we remove this
}
After that, this is how I will initialize list_var
// instead of Box::new(Nil) I am using None
let list_var=List::Cons(1,Some(Box::new(Cons(10,Some(Box::new(Cons(100,None)))))));

Related

An Rc equivalent that supports pointing to a component of the owned data

This answer asserts that an equivalent of C++ shared_ptr in rust is std::rc::Rc. This is true on the surface but there is an important operation missing: Specifically, shared_ptr can be used to point to an unrelated, unmanaged pointer, via its aliasing constructor (8), which is mostly used to point to a subpart of the original allocation. The shared_ptr still keeps the object alive, but the original type is completely erased. This can be very useful to forget generic parameters (and other tricks).
Obviously, in Rust, managing a completely unrelated pointer would be incredibly unsafe. Yet there is precedence of such an operation: Ref::map allows one to obtain a Ref to a component of the original Ref, while "forgetting" the type of the former.
Implementation-wise, it is clear that the current Rc<T> can not possibly implement this behaviour. When the refcount hits 0, it has to deallocate after all, and the layout must be exactly the same as when it allocated, thus it must use the same T. But that's just because it doesn't store the Layout that was originally used for the allocation.
So my question: Is there a library or other overlooked type that supports the allocation management of Rc, while also allowing the equivalent of Ref::map? I should mention that the type should also support unsize coercion.
struct Foo<T> {
zet: T,
bar: u16,
}
let foo: MagicRc<Foo<usize>> = MagicRc::new(Foo { zet: 6969, bar: 42 });
// Now I want to erase the generic parameter. Imagine a queue of
// these MagicRc<u16> that point to the `bar` field of different
// Foo<T> for arbitrary T. It is very important that Foo<..> does
// not appear in this type.
let foobar: MagicRc<u16> = MagicRc::map(&foo, |f| &f.bar);
// I can drop foo, but foobar should still be kept alive
drop(foo);
assert_eq!(*foobar, 42);
// unsizing to a trait object should work
let foodbg: MagicRc<dyn Debug> = foobar;
Addressing comments:
OwningRef and the playground link (DerefFn) do not erase the owner type.
Addressing Cerberus concern, such a type would store the Layout (a perfectly normal type) of the original allocation somewhere as part of the managed object and use it to free the value without having to have access to the original Type of the allocation.

Why does size_of::<&T>() == size_of::<usize>() depend on whether T is Sized?

The docs for size_of say the following:
If T is Sized, all of those types [pointer types such as &T] have the same size as usize.
Why that qualifier? Would a pointer not have a definitive size regardless of any property of the type being pointed to?
Unsized object references have extra data in addition to the pointer to the object. In the case of slice references (&[T]), they contain a size in order to indicate how long this slice is. And in the case of references to traits (trait objects), they contain a pointer to a vtable in order to enable dynamic dispatch.

Array as a struct field

I would like to create a non binary tree structure in Rust. Here is a try
struct TreeNode<T> {
tag : T,
father : Weak<TreeNode<T>>,
childrenlists : [Rc<TreeNode<T>>]
}
Unfortunately, this does not compile.
main.rs:4:1: 8:2 error: the trait `core::marker::Sized` is not implemented for the type `[alloc::rc::Rc<TreeNode<T>>]` [E0277]
main.rs:4 struct TreeNode<T> {
main.rs:5 tag : T,
main.rs:6 father : Weak<TreeNode<T>>,
main.rs:7 childrenlist : [Rc<TreeNode<T>>]
main.rs:8 }
main.rs:4:1: 8:2 note: `[alloc::rc::Rc<TreeNode<T>>]` does not have a constant size known at compile-time
main.rs:4 struct TreeNode<T> {
main.rs:5 tag : T,
main.rs:6 father : Weak<TreeNode<T>>,
main.rs:7 childrenlist : [Rc<TreeNode<T>>]
main.rs:8 }
error: aborting due to previous error
The code compiles if we replace an array with a Vec. However, the structure is immutable and I do not need an overallocated Vec.
I heard it could be possible to have a struct field with size unknown at compile time, provided it is unique. How can we do it?
Rust doesn't have the concept of a variable-length (stack) array, which you seem to be trying to use here.
Rust has a couple different array-ish types.
Vec<T> ("vector"): Dynamically sized; dynamically allocated on the heap. This is probably what you want to use. Initialize it with Vec::with_capacity(foo) to avoid overallocation (this creates an empty vector with the given capacity).
[T; n] ("array"): Statically sized; lives on the stack. You need to know the size at compile time, so this won't work for you (unless I've misanalysed your situation).
[T] ("slice"): Unsized; usually used from &[T]. This is a view into a contiguous set of Ts in memory somewhere. You can get it by taking a reference to an array, or a vector (called "taking a slice of an array/vector"), or even taking a view into a subset of the array/vector. Being unsized, [T] can't be used directly as a variable (it can be used as a member of an unsized struct), but you can view it from behind a pointer. Pointers referring to [T] are fat ; i.e. they have an extra field for the length. &[T] would be useful if you want to store a reference to an existing array; but I don't think that's what you want to do here.
If you don't know the size of the list in advance, you have two choices:
&[T] which is just a reference to some piece of memory that you don't own
Vec<T> which is your own storage.
The correct thing here is to use a Vec. Why? Because you want the children list (array of Rc) to be actually owned by the TreeNode. If you used a &[T], it means that someone else would be keeping the list, not the TreeNode. With some lifetime trickery, you could write some valid code but you would have to go very far to please the compiler because the borrowed reference would have to be valid at least as long as the TreeNode.
Finally, a sentence in your question shows a misunderstanding:
However, the structure is immutable and I do not need an overallocated Vec.
You confuse mutability and ownership. Sure you can have an immutable Vec. It seems like you want to avoid allocating memory from the heap, but that's not possible, precisely because you don't know the size of the children list. Now if you're concerned with overallocating, you can fine-tune the vector storage with methods like with_capacity() and shrink_to_fit().
A final note: if you actually know the size of the list because it is fixed at compile time, you just need to use a [T; n] where n is compile-time known. But that's not the same as [T].

Is &[T] literally an alias of Slice in rust?

&[T] is confusing me.
I naively assumed that like &T, &[T] was a pointer, which is to say, a numeric pointer address.
However, I've seen some code like this, that I was rather surprised to see work fine (simplified for demonstration purposes; but you see code like this in many 'as_slice()' implementations):
extern crate core;
extern crate collections;
use self::collections::str::raw::from_utf8;
use self::core::raw::Slice;
use std::mem::transmute;
fn main() {
let val = "Hello World";
{
let value:&str;
{
let bytes = val.as_bytes();
let mut slice = Slice { data: &bytes[0] as *const u8, len: bytes.len() };
unsafe {
let array:&[u8] = transmute(slice);
value = from_utf8(array);
}
// slice.len = 0;
}
println!("{}", value);
}
}
So.
I initially thought that this was invalid code.
That is, the instance of Slice created inside the block scope is returned to outside the block scope (by transmute), and although the code runs, the println! is actually accessing data that is no longer valid through unsafe pointers. Bad!
...but that doesn't seem to be the case.
Consider commenting the line // slice.len = 0;
This code still runs fine (prints 'Hello World') when this happens.
So the line...
value = from_utf8(array);
If it was an invalid pointer to the 'slice' variable, the len at the println() statement would be 0, but it is not. So effectively a copy not just of a pointer value, but a full copy of the Slice structure.
Is that right?
Does that mean that in general its valid to return a &[T] as long as the actual inner data pointer is valid, regardless of the scope of the original &[T] that is being returned, because a &[T] assignment is a copy operation?
(This seems, to me, to be extremely counter intuitive... so perhaps I am misunderstanding; if I'm right, having two &[T] that point to the same data cannot be valid, because they won't sync lengths if you modify one...)
A slice &[T], as you have noticed, is "equivalent" to a structure std::raw::Slice. In fact, Slice is an internal representation of &[T] value, and yes, it is a pointer and a length of data behind that pointer. Sometimes such structure is called "fat pointer", that is, a pointer and an additional piece of information.
When you pass &[T] value around, you indeed are just copying its contents - the pointer and the length.
If it was an invalid pointer to the 'slice' variable, the len at the println() statement would be 0, but it is not. So effectively a copy not just of a pointer value, but a full copy of the Slice structure.
Is that right?
So, yes, exactly.
Does that mean that in general its valid to return a &[T] as long as the actual inner data pointer is valid, regardless of the scope of the original &[T] that is being returned, because a &[T] assignment is a copy operation?
And this is also true. That's the whole idea of borrowed references, including slices - borrowed references are statically checked to be used as long as their referent is alive. When DST finally lands, slices and regular references will be even more unified.
(This seems, to me, to be extremely counter intuitive... so perhaps I am misunderstanding; if I'm right, having two &[T] that point to the same data cannot be valid, because they won't sync lengths if you modify one...)
And this is actually an absolutely valid concern; it is one of the problems with aliasing. However, Rust is designed exactly to prevent such bugs. There are two things which render aliasing of slices valid.
First, slices can't change length; there are no methods defined on &[T] which would allow you changing its length in place. You can create a derived slice from a slice, but it will be a new object whatsoever.
But even if slices can't change length, if the data could be mutated through them, they still could bring disaster if aliased. For example, if values in slices are enum instances, mutating a value in such an aliased slice could make a pointer to internals of enum value contained in this slice invalid. So, second, Rust aliasable slices (&[T]) are immutable. You can't change values contained in them and you can't take mutable references into them.
These two features (and compiler checks for lifetimes) make aliasing of slices absolutely safe. However, sometimes you do need to modify the data in a slice. And then you need mutable slice, called &mut [T]. You can change your data through such slice; but these slices are not aliasable. You can't create two mutable slices into the same structure (an array, for example), so you can't do anything dangerous.
Note, however, that using transmute() to transform a slice into a Slice or vice versa is an unsafe operation. &[T] is guaranteed statically to be correct if you create it using right methods, like calling as_slice() on a Vec. However, creating it manually using Slice struct and then transmuting it into &[T] is error-prone and can easily segfault your program, for example, when you assign it more length than is actually allocated.

Are There Any Hidden Costs to Passing Around a Struct With a Single Reference?

I was recently reading this article on structs and classes in D, and at one point the author comments that
...this is a perfect candidate for a struct. The reason is that it contains only one member, a pointer to an ALLEGRO_CONFIG. This means I can pass it around by value without care, as it's only the size of a pointer.
This got me thinking; is that really the case? I can think of a few situations in which believing you're passing a struct around "for free" could have some hidden gotchas.
Consider the following code:
struct S
{
int* pointer;
}
void doStuff(S ptrStruct)
{
// Some code here
}
int n = 123;
auto s = S(&n);
doStuff(s);
When s is passed to doStuff(), is a single pointer (wrapped in a struct) really all that's being passed to the function? Off the top of my head, it seems that any pointers to member functions would also be passed, as well as the struct's type information.
This wouldn't be an issue with classes, of course, since they're always reference types, but a struct's pass by value semantics suggests to me that any extra "hidden" data such as described above would be passed to the function along with the struct's pointer to int. This could lead to a programmer thinking that they're passing around an (assuming a 64-bit machine) 8-byte pointer, when they're actually passing around an 8-byte pointer, plus several other 8-byte pointers to functions, plus however many bytes an object's typeinfo is. The unwary programmer is then allocating far more data on the stack than was intended.
Am I chasing shadows here, or is this a valid concern when passing a struct with a single reference, and thinking that you're getting a struct that is a pseudo reference type? Is there some mechanism in D that prevents this from being the case?
I think this question can be generalized to wrapping native types. E.g. you could make a SafeInt type which wraps and acts like an int, but throws on any integer overflow conditions.
There are two issues here:
Compilers may not optimize your code as well as with a native type.
For example, if you're wrapping an int, you'll likely implement overloaded arithmetic operators. A sufficiently-smart compiler will inline those methods, and the resulting code will be no different than that as with an int. In your example, a dumb compiler might be compiling a dereference in some clumsy way (e.g. get the address of the struct's start, add the offset of the pointer field (which is 0), then dereference that).
Additionally, when calling a function, the compiler may decide to pass the struct in some other way (due to e.g. poor optimization, or an ABI restriction). This could happen e.g. if the compiler doesn't pay attention to the struct's size, and treats all structs in the same way.
struct types in D may indeed have a hidden member, if you declare it in a function.
For example, the following code works:
import std.stdio;
void main()
{
string str = "I am on the stack of main()";
struct S
{
string toString() const { return str; }
}
S s;
writeln(s);
}
It works because S saves a hidden pointer to main()'s stack frame. You can force a struct to not have any hidden pointers by prefixing static to the declaration (e.g. static struct S).
There is no hidden data being passed. A struct consists exactly of what's declared in it (and any padding bytes if necessary), nothing else. There is no need to pass type information and member function information along because it's all static. Since a struct cannot inherit from another struct, there is no polymorphism.

Resources