I'm trying to use a vector as a buffer like this, I'm wondering if there is a way to take a slice from the vector without growing it, so that code like this work:
fn example(r: Read) {
let buffer: Vec<u8> = Vec::with_capacity(1024);
let slice: &mut[u8] = &mut buffer;
r.read(slice); // doesnt work since the slice has length zero
}
If one was to just take a slice of the capacity, you would have a slice of uninitialized data. This is unsafe.
You can do this with Vec's set_len. However
This is unsafe. Reading the data is memory safe, but a vector of another type, or misuse of set_len, may not be. Overflow checking and proper cleanup is important.
This could well be a significant security flaw.
If you are using non-primitive types, you need to consider panic safety.
The standard library has a policy against allowing reads to uninitialized memory, even if memory-safe.
The basic way of doing this is
unsafe {
buffer.set_len(buffer.capacity());
let new_len = try!(r.read(slice));
buffer.set_len(cmp::min(buffer.len(), new_len));
}
The cmp::min is needed if you don't totally trust read's implementation, since incorrect output can result in too-large set length.
I want to add three details to #Veedrac answer.
1 - This is the relevant code that creates the slices from Vec<T>, and in both cases the slice goes from the start of the vector up to self.len:
fn deref(&self) -> &[T]
fn deref_mut(&mut self) -> &mut [T]
2 - There is an optimization in the TcpStream to avoid zeroing the memory, much like #Veedrac's alternative, but it is only used for read_to_end and not in read. And the BufReader that does zeroing starts with small allocations for performance reasons
3 - The slice has no information about the vector size, so the vector the length must be updated anyways, otherwise the buffer will be larger than the data and the zeros will be used.
Related
I'm learning Rust and I'm trying to solve an advent of code challenge (day 9 2015).
I created a situation where I end up with a variable that has the type Vec<&&str> (note the double '&', it's not a typo). I'm now wondering if this type is different than Vec<&str>. I can't figure out if a reference to a reference to something would ever make sense. I know I can avoid this situation by using String for the from and to variables. I'm asking if Vec<&&str> == Vec<&str> and if I should try and avoid Vec<&&str>.
Here is the code that triggered this question:
use itertools::Itertools
use std::collections::{HashSet};
fn main() {
let contents = fs::read_to_string("input.txt").unwrap();
let mut vertices: HashSet<&str> = HashSet::new();
for line in contents.lines() {
let data: Vec<&str> = line.split(" ").collect();
let from = data[0];
let to = data[2];
vertices.insert(from);
vertices.insert(to);
}
// `Vec<&&str>` originates from here
let permutations_iter = vertices.iter().permutations(vertices.len());
for perm in permutations_iter {
let length_trip = compute_length_of_trip(&perm);
}
}
fn compute_length_of_trip(trip: &Vec<&&str>) -> u32 {
...
}
Are Vec<&str> and Vec<&&str> different types?
I'm now wondering if this type is different than Vec<&str>.
Yes, a Vec<&&str> is a type different from Vec<&str> - you can't pass a Vec<&&str> where a Vec<&str> is expected and vice versa. Vec<&str> stores string slice references, which you can think of as pointers to data inside some strings. Vec<&&str> stores references to such string slice references, i.e. pointers to pointers to data. With the latter, accessing the string data requires an additional indirection.
However, Rust's auto-dereferencing makes it possible to use a Vec<&&str> much like you'd use a Vec<&str> - for example, v[0].len() will work just fine on either, v[some_idx].chars() will iterate over chars with either, and so on. The only difference is that Vec<&&str> stores the data more indirectly and therefore requires a bit more work on each access, which can lead to slightly less efficient code.
Note that you can always convert a Vec<&&str> to Vec<&str> - but since doing so requires allocating a new vector, if you decide you don't want Vec<&&str>, it's better not to create it in the first place.
Can I avoid Vec<&&str> and how?
Since a &str is Copy, you can avoid the creation of Vec<&&str> by adding a .copied() when you iterate over vertices, i.e. change vertices.iter() to vertices.iter().copied(). If you don't need vertices sticking around, you can also use vertices.into_iter(), which will give out &str, as well as free vertices vector as soon as the iteration is done.
The reason why the additional reference arises and the ways to avoid it have been covered on StackOverflow before.
Should I avoid Vec<&&str>?
There is nothing inherently wrong with Vec<&&str> that would require one to avoid it. In most code you'll never notice the difference in efficiency between Vec<&&str> and Vec<&str>. Having said that, there are some reasons to avoid it beyond performance in microbenchmarks. The additional indirection in Vec<&&str> requires the exact &strs it was created from (and not just the strings that own the data) to stick around and outlive the new collection. This is not relevant in your case, but would become noticeable if you wanted to return the permutations to the caller that owns the strings. Also, there is value in the simpler type that doesn't accumulate a reference on each transformation. Just imagine needing to transform the Vec<&&str> further into a new vector - you wouldn't want to deal with Vec<&&&str>, and so on for every new transformation.
Regarding performance, less indirection is usually better since it avoids an extra memory access and increases data locality. However, one should also note that a Vec<&str> takes up 16 bytes per element (on 64-bit architectures) because a slice reference is represented by a "fat pointer", i.e. a pointer/length pair. A Vec<&&str> (as well as Vec<&&&str> etc.) on the other hand takes up only 8 bytes per element, because a reference to a fat reference is represented by a regular "thin" pointer. So if your vector measures millions of elements, a Vec<&&str> might be more efficient than Vec<&str> simply because it occupies less memory. As always, if in doubt, measure.
The reason you have &&str is that the data &str is owned by vertices and when you create an interator over that data you are simply getting a reference to that data, hence the &&str.
There's really nothing to avoid here. It simply shows your iterator references the data that is inside the HashSet.
I saw the workarounds and they where kinda long. Am I missing a feature of Rust or a simple solution (Important: not workaround). I feel like I should be able to do this with maybe a simple macro but arrayref crate implementations aren't what I am looking for. Is this a feature that needs to be added to Rust or creating fixed size slicing from fixed sized array in a smaller scope is something bad.
Basically what I want to do is this;
fn f(arr:[u8;4]){
arr[0];
}
fn basic(){
let mut arr:[u8;12] = [0;12];
// can't I borrow the whole array but but a fixed slice to it?
f(&mut arr[8..12]); // But this is know on compile time?
f(&mut arr[8..12] as &[u8;4]); // Why can't I do these things?
}
What I want can be achieved by below code(from other so threads)
use array_ref;
fn foo(){
let buf:[u8;12] = [0;12];
let (_, fixed_slice) = mut_array_refs![
&mut buf,
8,
4
];
write_u32_into(fixed_slice,0);
}
fn write_u32_into(fixed_slice:&mut [u8;12],num:u32){
// won't have to check if fixed_slice.len() == 12 and won't panic
}
But I looked into the crate and even though this never panics there are many unsafe blocks and many lines of code. It is a workaround for the Rust itself. In the first place I wanted something like this to get rid of the overhead of checking the size and the possible runtime panic.
Also this is a little overhead it doesn't matter isn't a valid answer because technically I should be able to guarantee this in compile time even if the overhead is small this doesn't mean rust doesn't need to have this type of feature or I should not be looking for an ideal way.
Note: Can this be solved with lifetimes?
Edit: If we where able to have a different syntax for fixed slices such as arr[12;;16] and when I borrowed them this way it would borrow it would borrow the whole arr. I think this way many functions for example (write_u32) would be implemented in a more "rusty" way.
Use let binding with slice_patterns feature. It was stabilized in Rust 1.42.
let v = [1, 2, 3]; // inferred [i32; 3]
let [_, ref subarray # ..] = v; // subarray is &[i32; 2]
let a = v[0]; // 1
let b = subarray[1]; // 3
Here is a section from the Rust reference about slice patterns.
Why it doesn't work
What you want is not available as a feature in rust stable or nightly because multiple things related to const are not stabilized yet, namely const generics and const traits. The reason traits are involved is because the arr[8..12] is a call to the core::ops::Index::<Range<usize>> trait that returns a reference to a slice, in your case [u8]. This type is unsized and not equal to [u8; 4] even if the compiler could figure out that it is, rust is inherently safe and can be overprotective sometimes to ensure safety.
What can you do then?
You have a few routes you can take to solve this issue, I'll stay in a no_std environment for all this as that seems to be where you're working and will avoid extra crates.
Change the function signature
The current function signature you have takes the four u8s as an owned value. If you only are asking for 4 values you can instead take those values as parameters to the function. This option breaks down when you need larger arrays but at that point, it would be better to take the array as a reference or using the method below.
The most common way, and the best way in my opinion, is to take the array in as a reference to a slice (&[u8] or &mut [u8]). This is not the same as taking a pointer to the value in C, slices in rust also carry the length of themselves so you can safely iterate through them without worrying about buffer overruns or if you read all the data. This does require changing the algorithms below to account for variable-sized input but most of the time there is a just as good option to use.
The safe way
Slice can be converted to arrays using TryInto, but this comes at the cost of runtime size checking which you seem to want to avoid. This is an option though and may result in a minimal performance impact.
Example:
fn f(arr: [u8;4]){
arr[0];
}
fn basic(){
let mut arr:[u8;12] = [0;12];
f(arr[8..12].try_into().unwrap());
}
The unsafe way
If you're willing to leave the land of safety there are quite a few things you can do to force the compiler to recognize the data as you want it to, but they can be abused. It's usually better to use rust idioms rather than force other methods in but this is a valid option.
fn basic(){
let mut arr:[u8;12] = [0;12];
f(unsafe {*(arr[8..12].as_ptr() as *const [u8; 4])});
}
TL;DR
I recommend changing your types to utilize slices rather than arrays but if that's not feasible I'd suggest avoiding unsafety, the performance won't be as bad as you think.
Most patterns in Rust are captured by traits (Iterator, From, Borrow, etc.).
How come a pattern as pervasive as len/is_empty has no associated trait in the standard library? Would that cause problems which I do not foresee? Was it deemed useless? Or is it only that nobody thought of it (which seems unlikely)?
Was it deemed useless?
I would guess that's the reason.
What could you do with the knowledge that something is empty or has length 15? Pretty much nothing, unless you also have a way to access the elements of the collection for example. The trait that unifies collections is Iterator. In particular an iterator can tell you how many elements its underlying collection has, but it also does a lot more.
Also note that should you need an Empty trait, you can create one and implement it for all standard collections, unlike interfaces in most languages. This is the power of traits. This also means that the standard library doesn't need to provide small utility traits for every single use case, they can be provided by libraries!
Just adding a late but perhaps useful answer here. Depending on what exactly you need, using the slice type might be a good option, rather than specifying a trait. Slices have len(), is_empty(), and other useful methods (full docs here). Consider the following:
use core::fmt::Display;
fn printme<T: Display>(x: &[T]) {
println!("length: {}, empty: ", x.len());
for c in x {
print!("{}, ", c);
}
println!("\nDone!");
}
fn main() {
let s = "This is some string";
// Vector
let vv: Vec<char> = s.chars().collect();
printme(&vv);
// Array
let x = [1, 2, 3, 4];
printme(&x);
// Empty
let y:Vec<u8> = Vec::new();
printme(&y);
}
printme can accept either a vector or an array. Most other things that it accepts will need some massaging.
I think maybe the reason for there being no Length trait is that most functions will either a) work through an iterator without needing to know its length (with Iterator), or b) require len because they do some sort of random element access, in which case a slice would be the best bet. In the first case, knowing length may be helpful to pre-allocate memory of some size, but size_hint takes care of this when used for anything like Vec::with_capacity, or ExactSizeIterator for anything that needs specific allocations. Most other cases would probably need to be collected to a vector at some point within the function, which has its len.
Playground link to my example here: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=9a034c2e8b75775449afa110c05858e7
To be more specific, why doesn't Arc<T> implement from_raw with a dynamically sized T while Box<T> does?
use std::sync::Arc;
fn main() {
let x = vec![1, 2, 3].into_boxed_slice();
let y = Box::into_raw(x);
let z = unsafe { Arc::from_raw(y) }; // ERROR
}
(play)
As pointed out in the comments, Arc::from_raw must be used with a pointer from Arc::into_raw, so the above example doesn't make sense. My original question (Is it possible to create an Arc<[T]> from a Vec<T>) remains: is this possible, and if not, why?
As of Rust 1.21.0, you can do this:
let thing: Arc<[i32]> = vec![1, 2, 3].into();
This was enabled by RFC 1845:
In addition: From<Vec<T>> for Rc<[T]> and From<Box<T: ?Sized>> for Rc<T> will be added.
Identical APIs will also be added for Arc.
Internally, this uses a method called copy_from_slice, so the allocation of the Vec is not reused. For the details why, check out DK.'s answer.
No.
First of all, as already noted in comments, you can't toss raw pointers around willy-nilly like that. To quote the documentation of Arc::from_raw:
The raw pointer must have been previously returned by a call to a Arc::into_raw.
You absolutely must read the documentation any time you're using an unsafe method.
Secondly, the conversion you want is impossible. Vec<T> → Box<[T]> works because, internally, Vec<T> is effectively a (Box<[T]>, usize) pair. So, all the method does is give you access to that internal Box<[T]> pointer [1]. Arc<[T]>, however, is not physically compatible with a Box<[T]>, because it has to contain the reference counts. The thing being pointed to by Arc<T> has a different size and layout to the thing being pointed to by Box<T>.
The only way you could get from Vec<T> to Arc<[T]> would be to reallocate the contents of the vector in a reference-counted allocation... which I'm not aware of any way to do. I don't believe there's any particular reason it couldn't be implemented, it just hasn't [2].
All that said, I believe not being able to use dynamically sized types with Arc::into_raw/Arc::from_raw is a bug. It's certainly possible to get Arcs with dynamically sized types... though only by casting from pointers to fixed-sized types.
[1]: Not quite. Vec<T> doesn't actually have a Box<[T]> inside it, but it has something compatible. It also has to shrink the slice to not contain uninitialised elements.
[2]: Rust does not, on the whole, have good support for allocating dynamically sized things in general. It's possible that part of the reason for this hole in particular is that Box<T> also can't allocate arrays directly, which is possibly because Vec<T> exists, because Vec<T> used to be part of the language itself, and why would you add array allocation to Box when Vec already exists? "Why not have ArcVec<T>, then?" Because you'd never be able to construct one due to shared ownership.
Arc<[T]> is an Arc containing a pointer to a slice of T's. But [T] is not actually Sized at compile time since the compiler does not know how long it will be (versus &[T] which is just a reference and thus has a known size).
use std::sync::Arc;
fn main() {
let v: Vec<u32> = vec![1, 2, 3];
let b: Box<[u32]> = v.into_boxed_slice();
let y: Arc<[u32]> = Arc::new(*b);
print!("{:?}", y)
}
Play Link
However, you can make an Arc<&[T]> without making a boxed slice:
use std::sync::Arc;
fn main() {
let v = vec![1, 2, 3];
let y: Arc<&[u32]> = Arc::new(&v[..]);
print!("{:?}", y)
}
Shared Ref Play Link
However, this seems like a study in the type system with little practical value. If what you really want is a view of the Vec that you can pass around between threads, an Arc<&[T]> will give you what you want. And if you need it to be on the heap, Arc<Box<&[T]>> works fine too.
&[T] is confusing me.
I naively assumed that like &T, &[T] was a pointer, which is to say, a numeric pointer address.
However, I've seen some code like this, that I was rather surprised to see work fine (simplified for demonstration purposes; but you see code like this in many 'as_slice()' implementations):
extern crate core;
extern crate collections;
use self::collections::str::raw::from_utf8;
use self::core::raw::Slice;
use std::mem::transmute;
fn main() {
let val = "Hello World";
{
let value:&str;
{
let bytes = val.as_bytes();
let mut slice = Slice { data: &bytes[0] as *const u8, len: bytes.len() };
unsafe {
let array:&[u8] = transmute(slice);
value = from_utf8(array);
}
// slice.len = 0;
}
println!("{}", value);
}
}
So.
I initially thought that this was invalid code.
That is, the instance of Slice created inside the block scope is returned to outside the block scope (by transmute), and although the code runs, the println! is actually accessing data that is no longer valid through unsafe pointers. Bad!
...but that doesn't seem to be the case.
Consider commenting the line // slice.len = 0;
This code still runs fine (prints 'Hello World') when this happens.
So the line...
value = from_utf8(array);
If it was an invalid pointer to the 'slice' variable, the len at the println() statement would be 0, but it is not. So effectively a copy not just of a pointer value, but a full copy of the Slice structure.
Is that right?
Does that mean that in general its valid to return a &[T] as long as the actual inner data pointer is valid, regardless of the scope of the original &[T] that is being returned, because a &[T] assignment is a copy operation?
(This seems, to me, to be extremely counter intuitive... so perhaps I am misunderstanding; if I'm right, having two &[T] that point to the same data cannot be valid, because they won't sync lengths if you modify one...)
A slice &[T], as you have noticed, is "equivalent" to a structure std::raw::Slice. In fact, Slice is an internal representation of &[T] value, and yes, it is a pointer and a length of data behind that pointer. Sometimes such structure is called "fat pointer", that is, a pointer and an additional piece of information.
When you pass &[T] value around, you indeed are just copying its contents - the pointer and the length.
If it was an invalid pointer to the 'slice' variable, the len at the println() statement would be 0, but it is not. So effectively a copy not just of a pointer value, but a full copy of the Slice structure.
Is that right?
So, yes, exactly.
Does that mean that in general its valid to return a &[T] as long as the actual inner data pointer is valid, regardless of the scope of the original &[T] that is being returned, because a &[T] assignment is a copy operation?
And this is also true. That's the whole idea of borrowed references, including slices - borrowed references are statically checked to be used as long as their referent is alive. When DST finally lands, slices and regular references will be even more unified.
(This seems, to me, to be extremely counter intuitive... so perhaps I am misunderstanding; if I'm right, having two &[T] that point to the same data cannot be valid, because they won't sync lengths if you modify one...)
And this is actually an absolutely valid concern; it is one of the problems with aliasing. However, Rust is designed exactly to prevent such bugs. There are two things which render aliasing of slices valid.
First, slices can't change length; there are no methods defined on &[T] which would allow you changing its length in place. You can create a derived slice from a slice, but it will be a new object whatsoever.
But even if slices can't change length, if the data could be mutated through them, they still could bring disaster if aliased. For example, if values in slices are enum instances, mutating a value in such an aliased slice could make a pointer to internals of enum value contained in this slice invalid. So, second, Rust aliasable slices (&[T]) are immutable. You can't change values contained in them and you can't take mutable references into them.
These two features (and compiler checks for lifetimes) make aliasing of slices absolutely safe. However, sometimes you do need to modify the data in a slice. And then you need mutable slice, called &mut [T]. You can change your data through such slice; but these slices are not aliasable. You can't create two mutable slices into the same structure (an array, for example), so you can't do anything dangerous.
Note, however, that using transmute() to transform a slice into a Slice or vice versa is an unsafe operation. &[T] is guaranteed statically to be correct if you create it using right methods, like calling as_slice() on a Vec. However, creating it manually using Slice struct and then transmuting it into &[T] is error-prone and can easily segfault your program, for example, when you assign it more length than is actually allocated.