Why do fat pointers sometimes percolate outwards?

Why do fat pointers sometimes percolate outwards? - rust

I thought that I had understood fat pointers in Rust, but I have a case where I can't understand why they seem to percolate outwards from an inner type. Presumably my mental model is off, but I'm struggling to come up with a satisfactory explanation for this code:
use std::cell::RefCell;
use std::fmt::Debug;
use std::mem::size_of;
use std::rc::Rc;
fn main() {
println!("{}", size_of::<Rc<RefCell<Vec<u8>>>>());
println!("{}", size_of::<Rc<RefCell<Debug>>>());
println!("{}", size_of::<Box<Rc<RefCell<Debug>>>>());
}
which, on a 64-bit machine, prints 8, 16, 8. Playground link.
Since Rc makes a Box internally (using into_raw_non_null), I expected this to print 8, 8, 8. Is there a reason why, at least from size_of's perspective, the fat pointer seems to percolate outwards from Debug, even past Rc's Box? Is it because it's stored as a raw pointer perhaps?

Ultimately, Rc<RefCell<Debug>> is the trait object and trait objects are fat pointers. The types inside and outside of it are not fat pointers.
There are no fat pointers in the Vec<u8> set, whatsoever. A Vec<T> is a (*mut T, usize, usize), a RefCell<T> is a (T, usize), and an Rc<T> is a (*mut T).
size_of | is
---------------------+---
Vec<u8> | 24
RefCell<Vec<u8>> | 32
Rc<RefCell<Vec<u8>>> | 8
Your second and third cases do involve a fat pointer for a trait object: Rc<RefCell<dyn Debug>>. Putting a trait object behind another pointer (the Rc) creates a thin pointer to the concrete type: *mut RefCell<dyn Debug>.
size_of | is
----------------------------+---
Rc<RefCell<dyn Debug>> | 16
Box<Rc<RefCell<dyn Debug>>> | 8
Notably, it's impossible to create a RefCell<dyn Debug>:
error[E0277]: the size for values of type `dyn std::fmt::Debug` cannot be known at compilation time
--> src/main.rs:4:20
|
4 | println!("{}", mem::size_of::<RefCell<dyn Debug>>());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time
|
= help: within `std::cell::RefCell<dyn std::fmt::Debug>`, the trait `std::marker::Sized` is not implemented for `dyn std::fmt::Debug`
= note: to learn more, visit <https://doc.rust-lang.org/book/second-edition/ch19-04-advanced-types.html#dynamically-sized-types-and-the-sized-trait>
= note: required because it appears within the type `std::cell::RefCell<dyn std::fmt::Debug>`
= note: required by `std::mem::size_of`
The trait object requires some indirection; when you add some, you've finally constructed some type of fat pointer.
You can use the unstable option -Z print-type-sizes to explore the layout of structs:
type: `std::rc::RcBox<std::cell::RefCell<dyn std::fmt::Debug>>`: 24 bytes, alignment: 8 bytes
field `.strong`: 8 bytes
field `.weak`: 8 bytes
field `.value`: 8 bytes
type: `core::nonzero::NonZero<*const std::rc::RcBox<std::cell::RefCell<dyn std::fmt::Debug>>>`: 16 bytes, alignment: 8 bytes
field `.0`: 16 bytes
type: `std::ptr::NonNull<std::rc::RcBox<std::cell::RefCell<dyn std::fmt::Debug>>>`: 16 bytes, alignment: 8 bytes
field `.pointer`: 16 bytes
type: `std::rc::Rc<std::cell::RefCell<dyn std::fmt::Debug>>`: 16 bytes, alignment: 8 bytes
field `.ptr`: 16 bytes
field `.phantom`: 0 bytes, offset: 0 bytes, alignment: 1 bytes
type: `std::cell::RefCell<dyn std::fmt::Debug>`: 8 bytes, alignment: 8 bytes
field `.borrow`: 8 bytes
field `.value`: 0 bytes
I'm not 100% about parsing this output, as I expect RefCell<dyn Debug> to be an unsized type (as shown by the error above). I assume the meaning of "0 bytes" is overloaded.

Related

Convert Vec<&T> to &[T] without allocation

Problem Statement
I am trying to call fn some_api_function() which takes &[T] as parameter. To generate that parameter for the function, I tried to call flat_map on a Vec of Vecs (which itself buried inside RefCell). But I have trouble convert Vec<&T> to &[T]. I'd preferably avoid Copy or Clone the entire dataset for performance reason, as some_api just need to read-only borrow.
Code to illustrate:
use std::cell::RefCell;
pub struct EnvVar {}
pub struct Arena {
services: RefCell<Vec<Service>>,
}
pub struct Service {
env_vars: Vec<EnvVar>,
}
pub fn some_api(env_vars: &[EnvVar]) {}
fn main() {
let arena = Arena {
services: RefCell::new(vec![Service {
env_vars: vec![EnvVar {}],
}]),
};
let env_vars: Vec<&EnvVar> = arena
.services
.borrow()
.iter()
.flat_map(|compose_service| compose_service.env_vars.as_ref())
.collect();
some_api(&env_vars);
}
(Playground)
Errors:
Compiling playground v0.0.1 (/playground)
error[E0308]: mismatched types
--> src/main.rs:27:14
|
27 | some_api(&env_vars);
| ^^^^^^^^^ expected slice, found struct `Vec`
|
= note: expected reference `&[EnvVar]`
found reference `&Vec<&EnvVar>`
For more information about this error, try `rustc --explain E0308`.
error: could not compile `playground` due to previous error

It's not possible, and here's why:
Imagine that each T takes up 500 bytes. Meanwhile each &T, being a 64-bit address, takes only 8 bytes. What you get is:
A &[T] is a reference to a slice of contiguous Ts, each 500 bytes. If the slice had 20 Ts that contiguous block of memory would be 20 × 500 bytes = 10 KB large.
A Vec<&T> containing 20 elements would have a bunch of addresses side-by-side and would only need a block of 20 × 8 bytes = 160 bytes.
There is no cheap way to turn a 160-byte block of &Ts into a 10 KB block of Ts. Their memory layouts are not compatible. Your only options are:
Clone the objects.
Change the caller to build a Vec<T> instead of Vec<&T>.
Change the function to accept &[&T].

How does double ampersands passed to size_of_val work?

I read a book published by Apress named Beginning Rust - Get Started with Rust 2021 Edition
In one of the code examples, the author does not explain it in detail or clearly how the code works. Here is the code snippet
/* In a 64-bit system, it prints:
16 16 16; 8 8 8
In a 32-bit system, it prints:
8 8 8; 4 4 4
*/
fn main() {
use std::mem::*;
let a: &str = "";
let b: &str = "0123456789";
let c: &str = "abcdè";
print!("{} {} {}; ",
size_of_val(&a),
size_of_val(&b),
size_of_val(&c));
print!("{} {} {}",
size_of_val(&&a),
size_of_val(&&b),
size_of_val(&&c));
}
My question is how it work since the size_of_val takes a reference and this was done in the declaration of the &str. But how come in the print! statement, the author put another ampersand before the variable? In addition to that when we just pass the variable without an additional ampersand such as size_of_val(a or b or c), the size we get is for a 0, for b 10 and for c 6, but when we pass the variable with the ampersand such as size_of_val(&a or &b or &c), then like the comments above the main function described by the author, the sizes are 16 16 16 or 8 8 8. Last for the second print! statement (macro), the author put double ampersands to get the size of reference? How does it work. Just don't get it cuz I thought that would generate the error since size_of_val only accept one reference but then in the print! macro there is another ampersand and the second macro there are double ampersands...

The size_of_val() function is declared as follows:
pub fn size_of_val<T>(val: &T) -> usize
where
T: ?Sized,
That means: given any type T (the ?Sized constraint means "really any type, even unsized ones"), we take a reference for T and give back a usize.
Let's take a as an example (b and c are the same).
When we evaluate size_of_val(a), the compiler knows that a has type &str, and thus it infers the generic parameter to be str (without a reference), so the full call is size_of_val::<str>(a /* &str */), which match the signature: we give &str for T == str.
What is the size of a str? str is actually a continuous sequence of bytes, encoding the string as UTF-8. a contains "", the empty string, which is of course zero bytes long. So size_of_val() returns 0. For b, there are 10 ASCII characters, each is one byte long UTF8-encoded, so together they're 10 bytes long. C contains 4 ASCII chars (abcd), so four bytes, and one Unicode character (è) that is two bytes wide, encoded as \xC3\xA8 (195 and 168 in decimal). So a total length of six bytes.
What does happen when we calculate size_of_val(&a)? &a is &&str because a is &str, so the compiler infers T to be &str. The size of &str is constant and always double the size of a pointer: this is because &str, i.e. a pointer to str, should include the data address and the length. On 64 bit platforms this is 16 (8 * 2); on 32 bit ones it is 8 (4 * 2). This is called a fat pointer, that is, a pointer that carries additional metadata besides just the address (note that it is not guaranteed to be double times the length, so don't rely on it, but practically it is).
When we evaluate size_of_val(&&a), the type of &&a is &&&str, so T is inferred to be &&str. While &str (a pointer to str) is a fat pointer, meaning it is doubled in size, a pointer to a fat pointer is a normal thin pointer (the opposite of a fat pointer: a pointer that only carries the address, without any additional metadata), meaning it is one machine word size. So 8 bytes for 64 bit or 4 bytes for 32 bit platforms.

Struct padding rules in Rust

Recently when I was learning Type Layout in Rust, I saw that struct in Rust supports the #[repr(C)] directive, so I want to to see the difference between the default (Rust) representation and C-like representation. Here comes the code:
use type_layout::TypeLayout;
#[derive(TypeLayout)]
struct ACG1 {
time1: u16, // 2
time2: u16, // 2
upper: u32, // 4
lower: u16, // 2
}
#[derive(TypeLayout)]
#[repr(C)]
struct ACG2 {
time1: u16, // 2
time2: u16, // 2
upper: u32, // 4
lower: u16, // 2
}
fn main() {
println!("ACG1: {}", ACG1::type_layout());
println!("ACG2: {}", ACG2::type_layout());
}
And I get the following output:
I understand the rules for padding the #[repr(C)] structure and the size of the structure as a whole, but what confused me is the Rust representation struct ACG1. I can't find any clear documentation on Rust padding rules, and I think the padding size should also be included in the overall size of the structure, but why is the size of ACG1 only 12 bytes?
BTW, this is the crate I used to assist in printing the layout of the structure: type-layout
0.2.0

This crate does not seem to account for field reordering. It appears the compiler reordered the struct to have upper first:
struct ACG1 {
upper: u32,
time1: u16,
time2: u16,
lower: u16,
}
It’s somewhat hard to see, but the derive macro implementation checks the difference between fields in declared order. So in this sense, there are four bytes of "padding" between the beginning of the struct and the first field (time1) and four bytes of "padding" between the third field (upper) and fourth field (lower).
There is an issue filed that it doesn't work for non-#[repr(C)] structs, so I would not recommend using this crate for this purpose.
As far as Rust's rules go, the reference says "There are no guarantees of data layout made by [the default] representation." So in theory, the compiler can do whatever it wants and reorder fields based on access patterns. But in practice, I don't think it’s that elaborate and organizing by field size is a simple way to minimize padding.

As others have said, it seem to be an issue in the crate.
Better ask the compiler:
cargo clean
cargo rustc -- -Zprint-type-sizes
This will give you:
...
print-type-size type: `ACG1`: 12 bytes, alignment: 4 bytes
print-type-size field `.upper`: 4 bytes
print-type-size field `.time1`: 2 bytes
print-type-size field `.time2`: 2 bytes
print-type-size field `.lower`: 2 bytes
print-type-size end padding: 2 bytes
print-type-size type: `ACG2`: 12 bytes, alignment: 4 bytes
print-type-size field `.time1`: 2 bytes
print-type-size field `.time2`: 2 bytes
print-type-size field `.upper`: 4 bytes
print-type-size field `.lower`: 2 bytes
print-type-size end padding: 2 bytes

Why does Box<[T]> need 16 bytes in memory, but a referenced slice needs only 8? (on x64 machine)

Consider:
fn main() {
// Prints 8, 8, 16
println!(
"{}, {}, {}",
std::mem::size_of::<Box<i8>>(),
std::mem::size_of::<Box<&[i8]>>(),
std::mem::size_of::<Box<[i8]>>(),
);
}
Why do owned slices take 16 bytes, but referenced slices take only 8?

Box<T> is basically *const T (Actually it's a newtype around Unique<T>, which itself is a NonNull<T> with PhantomData<T> (for dropck), but let's stick to *const T for simplicity).
A pointer in Rust normally has the same size as size_of::<usize>() except when T is a dynamically sized type (DST). Currently, a Box<DST> is 2 * size_of::<usize>() in size (the exact representation is not stable at the time of writing). A pointer to a DST is called FatPtr.
Currently, there are two kinds of DSTs: Slices and traits. A FatPtr to a slice is defined like this:
#[repr(C)]
struct FatPtr<T> {
data: *const T,
len: usize,
}
Note: For a trait pointer, len is replaced by a pointer to the vtable.
With those information, your question can be answered:
Box<i8>: i8 is a sized type => basically the same as *const i8 => 8 bytes in size (with 64 bit pointer width)
Box<[i8]>: [i8] is a DST => basically the same as FatPtr<i8> => 16 bytes in size (with 64 bit pointer width)
Box<&[i8]>: &[i8] is not a DST. It's basically the same as *const FatPtr<i8> => 8 bytes in size (with 64 bit pointer width)

The size of a reference depends on the "sizedness" of the referenced type:
A reference to a sized type is a single pointer to the memory address.
A reference to an unsized type is a pointer to the memory and the size of the pointed datum. That's what is called a fat pointer:
#[repr(C)]
struct FatPtr<T> {
data: *const T,
len: usize,
}
A Box is a special kind of pointer that points to the heap, but it is still a pointer.
Knowing that, you understand that:
Box<i8> is 8 bytes because i8 is sized,
Box<&[i8]> is 8 bytes because a reference is sized,
Box<[i8]> is 16 bytes because a slice is unsized.

Why is type conversion from u64 to usize allowed using `as` but not `From`?

The first conversion using 'as' compiles, but the second one using the 'From' trait does not:
fn main() {
let a: u64 = 5;
let b = a as usize;
let b = usize::from(a);
}
Using Rust 1.34.0, I get the following error:
error[E0277]: the trait bound `usize: std::convert::From<u64>` is not satisfied
--> src/main.rs:4:13
|
4 | let b = usize::from(a);
| ^^^^^^^^^^^ the trait `std::convert::From<u64>` is not implemented for `usize`
|
= help: the following implementations were found:
<usize as std::convert::From<bool>>
<usize as std::convert::From<std::num::NonZeroUsize>>
<usize as std::convert::From<u16>>
<usize as std::convert::From<u8>>
= note: required by `std::convert::From::from`
When I replace u64 with u8, there is no more error. From the error message, I understand that the From trait is implemented only for u8, but not for the other integer types.
If there is a good reason for that, then why shouldn't the conversion using 'as' should also fail to compile?

as casts are fundamentally different from From conversions. From conversions are "simple and safe" whereas as casts are purely "safe". When considering numeric types, From conversions exist only when the output is guaranteed to be the same, i.e. there is no loss of information (no truncation or flooring or loss of precision). as casts, however, do not have this limitation.
Quoting the docs,
The size of [usize] is "how many bytes it takes to reference any location in memory. For example, on a 32 bit target, this is 4 bytes and on a 64 bit target, this is 8 bytes."
Since the size depends on the target architecture and cannot be determined before compilation, there is no guarantee that a From conversion between a numeric type and usize is possible. An as cast, however, will always operate by the rules listed here.
For instance, on a 32-bit system, usize is equivalent to u32. Since a usize is smaller than a u64, there can be loss of information (truncation) when converting a u64 into a usize and hence a From conversion cannot exist. However, the size of a usize is always guaranteed to be 8 bits or greater and a u8 to usize From conversion will always exist.

As already mentioned, converting from a 64-bit value to a usize might cause truncation; you might lose data when a usize is 16 or 32 bits.
Fallable conversions are covered by the TryFrom trait, available in Rust 1.34:
use std::convert::TryFrom;
fn main() {
let a: u64 = 5;
let b = a as usize;
let b = usize::try_from(a);
}
See also:
How do I convert a usize to a u32 using TryFrom?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Why do fat pointers sometimes percolate outwards? - rust

Related

Convert Vec<&T> to &[T] without allocation

How does double ampersands passed to size_of_val work?

Struct padding rules in Rust

Why does Box<[T]> need 16 bytes in memory, but a referenced slice needs only 8? (on x64 machine)

Why is type conversion from u64 to usize allowed using `as` but not `From`?

Categories

Resources