Convert Vec<&T> to &[T] without allocation

Convert Vec<&T> to &[T] without allocation - rust

Problem Statement
I am trying to call fn some_api_function() which takes &[T] as parameter. To generate that parameter for the function, I tried to call flat_map on a Vec of Vecs (which itself buried inside RefCell). But I have trouble convert Vec<&T> to &[T]. I'd preferably avoid Copy or Clone the entire dataset for performance reason, as some_api just need to read-only borrow.
Code to illustrate:
use std::cell::RefCell;
pub struct EnvVar {}
pub struct Arena {
services: RefCell<Vec<Service>>,
}
pub struct Service {
env_vars: Vec<EnvVar>,
}
pub fn some_api(env_vars: &[EnvVar]) {}
fn main() {
let arena = Arena {
services: RefCell::new(vec![Service {
env_vars: vec![EnvVar {}],
}]),
};
let env_vars: Vec<&EnvVar> = arena
.services
.borrow()
.iter()
.flat_map(|compose_service| compose_service.env_vars.as_ref())
.collect();
some_api(&env_vars);
}
(Playground)
Errors:
Compiling playground v0.0.1 (/playground)
error[E0308]: mismatched types
--> src/main.rs:27:14
|
27 | some_api(&env_vars);
| ^^^^^^^^^ expected slice, found struct `Vec`
|
= note: expected reference `&[EnvVar]`
found reference `&Vec<&EnvVar>`
For more information about this error, try `rustc --explain E0308`.
error: could not compile `playground` due to previous error

It's not possible, and here's why:
Imagine that each T takes up 500 bytes. Meanwhile each &T, being a 64-bit address, takes only 8 bytes. What you get is:
A &[T] is a reference to a slice of contiguous Ts, each 500 bytes. If the slice had 20 Ts that contiguous block of memory would be 20 × 500 bytes = 10 KB large.
A Vec<&T> containing 20 elements would have a bunch of addresses side-by-side and would only need a block of 20 × 8 bytes = 160 bytes.
There is no cheap way to turn a 160-byte block of &Ts into a 10 KB block of Ts. Their memory layouts are not compatible. Your only options are:
Clone the objects.
Change the caller to build a Vec<T> instead of Vec<&T>.
Change the function to accept &[&T].

Related

How do I cast the elements referred by a slice without heap allocation?

Let's suppose there's an array of parameters that need to be used in SQL query. Each parameter must be a &dyn ToSql,which is implemented already for &str.
The need arises to use the object both as &dyn ToSql and as &str, like in the example down below, where it needs to implement Display in order to be printed out.
let params = ["a", "b"];
// this works but allocates
// let tx_params = &params
// .iter()
// .map(|p| p as &(dyn ToSql + Sync))
// .collect::<Vec<_>>();
// this is ideal, doesn't allocate on the heap, but doesn't work
params.map(|p| p as &(dyn ToSql + Sync));
// this has to compile, so can't just crate `params` as [&(dyn ToSql + Sync)] initially
println!("Could not insert {}", params);
Error:
Compiling playground v0.0.1 (/playground)
error[E0277]: the trait bound `str: ToSql` is not satisfied
--> src/main.rs:14:20
|
14 | params.map(|p| p as &(dyn ToSql + Sync));
| ^ the trait `ToSql` is not implemented for `str`
|
= help: the following implementations were found:
<&'a str as ToSql>
= note: required for the cast to the object type `dyn ToSql + Sync`
error[E0277]: the size for values of type `str` cannot be known at compilation time
--> src/main.rs:14:20
|
14 | params.map(|p| p as &(dyn ToSql + Sync));
| ^ doesn't have a size known at compile-time
|
= help: the trait `Sized` is not implemented for `str`
= note: required for the cast to the object type `dyn ToSql + Sync`
For more information about this error, try `rustc --explain E0277`.
error: could not compile `playground` due to 2 previous errors
The trait ToSql isn't implemented for str, but it is for &str, however we borrow checked won't let us borrow p here, even though we're not doing anything with the data, except cast it as a new type.
Playground

I agree with #Caesar's take on this, however you actually can do that without heap allocations.
You can use <[T; N]>::each_ref() for that (this method converts &[T; N] to [&T; N]):
params.each_ref().map(|p| p as &(dyn ToSql + Sync));
Playground.
Unfortunately each_ref() is unstable, but you can write it using stable Rust with unsafe code:
use std::iter;
use std::mem::{self, MaybeUninit};
fn array_each_ref<T, const N: usize>(arr: &[T; N]) -> [&T; N] {
let mut result = [MaybeUninit::uninit(); N];
for (result_item, arr_item) in iter::zip(&mut result, arr) {
*result_item = MaybeUninit::new(arr_item);
}
// SAFETY: `MaybeUninit<T>` is guaranteed to have the same layout as `T`; we
// initialized all items above (can be replaced with `MaybeUninit::array_assume_init()`
// once stabilized).
unsafe { mem::transmute_copy(&result) }
}
Playground.

I fought this a month ago and my general recommendation is: Don't bother. The actual query is so much heavier than an allocation.
The situation is a bit confusing, because you need an &ToSql, but ToSql is implemented for &str, so you need two arrays: One [&str], and one [&ToSql], whose elements reference &strs - so the contenst of [&ToSql] are double references. I don't see an easy way of achieving that without allocating. (let params: [&&str; 2] = params.iter().collect::<Vec<_>>().try_into().unwrap(); works and the allocation will likely be optimized out. Nighly or unsafe ways exist, see #ChayimFriedman's answer.)
In this case, you can work around either by initially declaring:
let params = [&"a", &"b"];
by using an iterator, not an array:
let iter = params.iter().map(|p| p as &(dyn ToSql + Sync));
client.query_raw("select * from foo where id in ?", iter);
In my case, I wasn't able to do anything like this because I was using execute, not query, and execute_raw exists only on tokio-postgres, but not on postgres. So beware of these kinds of pitfalls.

In Rust, what's the correct way to keep data read from a file in scope?

I'm new to Rust.
I'm reading SHA1-as-hex-strings from a file - a lot of them, approx. 30 million.
In the text file, they are sorted ascending numerically.
I want to be able to search the list, as fast as possible.
I (think I) want to read them into a (sorted) Vec<primitive_type::U256> for fast searching.
So, I've tried:
log("Loading haystack.");
// total_lines read earlier
let mut the_stack = Vec::<primitive_types::U256>::with_capacity(total_lines);
if let Ok(hay) = read_lines(haystack) { // Get BufRead
for line in hay { // Iterate over lines
if let Ok(hash) = line {
the_stack.push(U256::from(hash));
}
}
}
log(format!("Read {} hashes.", the_stack.len()));
The error is:
$ cargo build
Compiling nsrl v0.1.0 (/my_app)
error[E0277]: the trait bound `primitive_types::U256: std::convert::From<std::string::String>` is not satisfied
--> src/main.rs:55:24
|
55 | the_stack.push(U256::from(hash));
| ^^^^^^^^^^ the trait `std::convert::From<std::string::String>` is not implemented for `primitive_types::U256`
|
= help: the following implementations were found:
<primitive_types::U256 as std::convert::From<&'a [u8; 32]>>
<primitive_types::U256 as std::convert::From<&'a [u8]>>
<primitive_types::U256 as std::convert::From<&'a primitive_types::U256>>
<primitive_types::U256 as std::convert::From<&'static str>>
and 14 others
= note: required by `std::convert::From::from`
This code works if instead of the variable hash I have a string literal, e.g. "123abc".
I think I should be able to use the implementation std::convert::From<&'static str>, but I don't understand how I'm meant to keep hash in scope?
I feel like what I'm trying to achieve is a pretty normal use case:
Iterate over the lines in a file.
Add the line to a vector.
What am I missing?

You almost want something like,
U256::from_str(&hash)?
There is a conversion from &str in the FromStr trait called from_str. It returns a Result<T, E> value, because parsing a string may fail.
I think I should be able to use the implementation std::convert::From<&'static str>, but I don't understand how I'm meant to keep hash in scope?
You can’t keep the hash in scope with 'static lifetime. It looks like this is a convenience method to allow you to use string constants in your program—but it is really nothing more than U256::from_str(&hash).unwrap().
However…
If you want a SHA-1, the best type is probably [u8; 20] or maybe [u32; 5].
You want a base 16 decoder, something like base16::decode_slice. Here’s how that might look in action:
/// Error if the hash cannot be parsed.
struct InvalidHash;
/// Type for SHA-1 hashes.
type SHA1 = [u8; 20];
fn read_hash(s: &str) -> Result<SHA1, InvalidHash> {
let mut hash = [0; 20];
match base16::decode_slice(s, &mut hash[..]) {
Ok(20) => Ok(hash),
_ => Err(InvalidHash),
}
}

Rust lifetime subtyping doesn't work with Cell

Given a value of type Vec<&'static str>, I can freely convert that to Vec<&'r str>, as 'r is a subregion of 'static. That seems to work for most types, e.g. Vec, pairs etc. However, it doesn't work for types like Cell or RefCell. Concretely, down_vec compiles, but down_cell doesn't:
use std::cell::Cell;
fn down_vec<'p, 'r>(x: &'p Vec<&'static str>) -> &'p Vec<&'r str> {
x
}
fn down_cell<'p, 'r>(x: &'p Cell<&'static str>) -> &'p Cell<&'r str> {
x
}
Giving the error:
error[E0308]: mismatched types
--> src/lib.rs:9:5
|
9 | x
| ^ lifetime mismatch
|
= note: expected reference `&'p std::cell::Cell<&'r str>`
found reference `&'p std::cell::Cell<&'static str>`
note: the lifetime `'r` as defined on the function body at 8:18...
--> src/lib.rs:8:18
|
8 | fn down_cell<'p, 'r>(x: &'p Cell<&'static str>) -> &'p Cell<&'r str> {
| ^^
= note: ...does not necessarily outlive the static lifetime
Why does this not work for Cell? How does the compiler track that it doesn't work? Is there an alternative that can make it work?

Cell and RefCell are different because they allow mutation of the internal value through a shared reference.
To see why this is important, we can write a function that uses down_cell to leak a reference to freed memory:
fn oops() -> &'static str {
let cell = Cell::new("this string doesn't matter");
let local = String::from("this string is local to oops");
let broken = down_cell(&cell); // use our broken function to rescope the Cell
broken.set(&local); // use the rescoped Cell to mutate `cell`
cell.into_inner() // return a reference to `local`
} // uh-oh! `local` is dropped here
oops contains no unsafe blocks, but it compiles, so in order to prevent accessing freed memory the compiler must reject down_cell.
The type level explanation for why this is so is because Cell<T> and RefCell<T> contain an UnsafeCell<T>, which makes them invariant in T, while Box<T> and Vec<T> are covariant in T.
The reason Vec, Box and other container-like structures can be covariant is because those containers require &mut access to mutate their contents, and &mut T is itself invariant in T. You couldn't write a function like oops using down_vec -- the compiler wouldn't allow it.
References
The Subtyping and Variance chapter of the Rustonomicon
How does the Rust compiler know `Cell` has internal mutability?
Why does linking lifetimes matter only with mutable references?

Why do fat pointers sometimes percolate outwards?

I thought that I had understood fat pointers in Rust, but I have a case where I can't understand why they seem to percolate outwards from an inner type. Presumably my mental model is off, but I'm struggling to come up with a satisfactory explanation for this code:
use std::cell::RefCell;
use std::fmt::Debug;
use std::mem::size_of;
use std::rc::Rc;
fn main() {
println!("{}", size_of::<Rc<RefCell<Vec<u8>>>>());
println!("{}", size_of::<Rc<RefCell<Debug>>>());
println!("{}", size_of::<Box<Rc<RefCell<Debug>>>>());
}
which, on a 64-bit machine, prints 8, 16, 8. Playground link.
Since Rc makes a Box internally (using into_raw_non_null), I expected this to print 8, 8, 8. Is there a reason why, at least from size_of's perspective, the fat pointer seems to percolate outwards from Debug, even past Rc's Box? Is it because it's stored as a raw pointer perhaps?

Ultimately, Rc<RefCell<Debug>> is the trait object and trait objects are fat pointers. The types inside and outside of it are not fat pointers.
There are no fat pointers in the Vec<u8> set, whatsoever. A Vec<T> is a (*mut T, usize, usize), a RefCell<T> is a (T, usize), and an Rc<T> is a (*mut T).
size_of | is
---------------------+---
Vec<u8> | 24
RefCell<Vec<u8>> | 32
Rc<RefCell<Vec<u8>>> | 8
Your second and third cases do involve a fat pointer for a trait object: Rc<RefCell<dyn Debug>>. Putting a trait object behind another pointer (the Rc) creates a thin pointer to the concrete type: *mut RefCell<dyn Debug>.
size_of | is
----------------------------+---
Rc<RefCell<dyn Debug>> | 16
Box<Rc<RefCell<dyn Debug>>> | 8
Notably, it's impossible to create a RefCell<dyn Debug>:
error[E0277]: the size for values of type `dyn std::fmt::Debug` cannot be known at compilation time
--> src/main.rs:4:20
|
4 | println!("{}", mem::size_of::<RefCell<dyn Debug>>());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time
|
= help: within `std::cell::RefCell<dyn std::fmt::Debug>`, the trait `std::marker::Sized` is not implemented for `dyn std::fmt::Debug`
= note: to learn more, visit <https://doc.rust-lang.org/book/second-edition/ch19-04-advanced-types.html#dynamically-sized-types-and-the-sized-trait>
= note: required because it appears within the type `std::cell::RefCell<dyn std::fmt::Debug>`
= note: required by `std::mem::size_of`
The trait object requires some indirection; when you add some, you've finally constructed some type of fat pointer.
You can use the unstable option -Z print-type-sizes to explore the layout of structs:
type: `std::rc::RcBox<std::cell::RefCell<dyn std::fmt::Debug>>`: 24 bytes, alignment: 8 bytes
field `.strong`: 8 bytes
field `.weak`: 8 bytes
field `.value`: 8 bytes
type: `core::nonzero::NonZero<*const std::rc::RcBox<std::cell::RefCell<dyn std::fmt::Debug>>>`: 16 bytes, alignment: 8 bytes
field `.0`: 16 bytes
type: `std::ptr::NonNull<std::rc::RcBox<std::cell::RefCell<dyn std::fmt::Debug>>>`: 16 bytes, alignment: 8 bytes
field `.pointer`: 16 bytes
type: `std::rc::Rc<std::cell::RefCell<dyn std::fmt::Debug>>`: 16 bytes, alignment: 8 bytes
field `.ptr`: 16 bytes
field `.phantom`: 0 bytes, offset: 0 bytes, alignment: 1 bytes
type: `std::cell::RefCell<dyn std::fmt::Debug>`: 8 bytes, alignment: 8 bytes
field `.borrow`: 8 bytes
field `.value`: 0 bytes
I'm not 100% about parsing this output, as I expect RefCell<dyn Debug> to be an unsized type (as shown by the error above). I assume the meaning of "0 bytes" is overloaded.

How can you easily borrow a Vec<Vec<T>> as a &[&[T]]?

How can you easily borrow a vector of vectors as a slice of slices?
fn use_slice_of_slices<T>(slice_of_slices: &[&[T]]) {
// Do something...
}
fn main() {
let vec_of_vec = vec![vec![0]; 10];
use_slice_of_slices(&vec_of_vec);
}
I will get the following error:
error[E0308]: mismatched types
--> src/main.rs:7:25
|
7 | use_slice_of_slices(&vec_of_vec);
| ^^^^^^^^^^^ expected slice, found struct `std::vec::Vec`
|
= note: expected type `&[&[_]]`
found type `&std::vec::Vec<std::vec::Vec<{integer}>>`
I could just as easily define use_slice_of_slices as
fn use_slice_of_slices<T>(slice_of_slices: &[Vec<T>]) {
// Do something
}
and the outer vector would be borrowed as a slice and all would work. But what if, just for the sake of argument, I want to borrow it as a slice of slices?
Assuming automatic coercing from &Vec<Vec<T>> to &[&[T]] is not possible, then how can I define a function borrow_vec_of_vec as below?
fn borrow_vec_of_vec<'a, T: 'a>(vec_of_vec: Vec<Vec<T>>) -> &'a [&'a [T]] {
// Borrow vec_of_vec...
}
To put it in another way, how could I implement Borrow<[&[T]]> for Vec<Vec<T>>?

You cannot.
By definition, a slice is a view on an existing collection of element. It cannot conjure up new elements, or new views of existing elements, out of thin air.
This stems from the fact that Rust generic parameters are generally invariants. That is, while a &Vec<T> can be converted as a &[T] after a fashion, the T in those two expressions MUST match.
A possible work-around is to go generic yourself.
use std::fmt::Debug;
fn use_slice_of_slices<U, T>(slice_of_slices: &[U])
where
U: AsRef<[T]>,
T: Debug,
{
for slice in slice_of_slices {
println!("{:?}", slice.as_ref());
}
}
fn main() {
let vec_of_vec = vec![vec![0]; 10];
use_slice_of_slices(&vec_of_vec);
}
Instead of imposing what the type of the element should be, you instead accept any type... but place a bound that it must be coercible to [T].
This has nearly the same effect, as then the generic function can only manipulate [T] as a slice. As a bonus, it works with multiple types (any which can be coerced into a [T]).

A deref coercion from Vec<T> to &[T] is cheap. A Vec<T> is represented by a struct essentially containing a pointer to the heap-allocated data, the capacity of the heap allocation and the current length of the vector. A slice &[T] is a fat pointer consisting of a pointer to the data and the length of the slice. The conversion from Vec<T> to &[T] essentially requires to copy the pointer and the length from the Vec<T> struct to a new fat pointer.
If we want to convert from Vec<Vec<T>> to &[&[T]], we need to perform the above conversion for each of the inner vectors. This means we need to store an unknown number of fat pointers somewhere. This requires to allocate space for these fat pointers somewhere. When converting a single vector, the compiler will reserve space for the single resulting fat pointer on the stack. For an unknown, potentially large, number of fat pointers this is not possible, and the conversion also isn't cheap anymore. This is the reason this conversion isn't easily possible, and you need to write explicit code for it.
So whenever you can, you should instead change your function signature as suggested in Matthieu's answer. If you don't control the function signature, your only choice is to write the explicit conversion code, allocating a new vector:
fn vecs_to_slices<T>(vecs: &[Vec<T>]) -> Vec<&[T]> {
vecs.iter().map(Vec::as_slice).collect()
}
Applied to the functions in the original post, this can be used like this:
use_slice_of_slices(&vecs_to_slice(&vec_of_vec));

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Convert Vec<&T> to &[T] without allocation - rust

Related

How do I cast the elements referred by a slice without heap allocation?

In Rust, what's the correct way to keep data read from a file in scope?

Rust lifetime subtyping doesn't work with Cell

Why do fat pointers sometimes percolate outwards?

How can you easily borrow a Vec<Vec<T>> as a &[&[T]]?

Categories

Resources