Creating a slice and appending to it in Rust - rust

In Go, there are make and append functions, from which the first one let's you create a slice of the specified type, length and capacity, whereas the second one let's you append an element to the specified slice. It works more or less like in this toy example:
func main() {
// Creates a slice of type int, which has length 0 (so it is empty), and has capacity 5.
s := make([]int, 0, 5)
// Appends the integer 0 to the slice.
s = append(s, 0)
// Appends the integer 1 to the slice.
s = append(s, 1)
// Appends the integers 2, 3, and 4 to the slice.
s = append(s, 2, 3, 4)
}
Does Rust offer similar features for working with slices?

No.
Go and Rust slices are different:
in Go, a slice is a proxy to another container which allows both observing and mutating the container,
in Rust, a slice is a view in another container which therefore only allows observing the container (though it may allow mutating the individual elements).
As a result, you cannot use a Rust slice to insert, append or remove elements from the underlying container. Instead, you need either:
to use a mutable reference to the container itself,
to design a trait and use a mutable reference to said trait.
Note: Rust std does not provide trait abstractions for its collections unlike Java, but you may still create some yourself if you think it is worth it for a particular problem.

Related

Can I use an iterator in global state in Rust?

I want to use an iterator as global state in Rust. Simplified example:
static nums = (0..).filter(|&n|n%2==0);
Is this possible?
You can do it, but you'll have to fight the language along the way.
First, true Rust statics created with the static declaration need to be compile-time constants. So something like static FOO: usize = 10 will compile, but static BAR: String = "foo".to_string() won't, because BAR requires a run-time allocation. While your iterator doesn't require a run-time allocation (though using it will make your life simpler, as you'll see later), its type is complex enough that it doesn't support compile-time initialization.
Second, Rust statics require specifying the full type up-front. This is a problem for arbitrary iterators, which one would like to create by combining iterator adapters and closures. While in this particular case, as mcarton points out, one could specify the type as Filter<RangeFrom<i32>, fn(&i32) -> bool>, it'd be closely tied to the current implementation. You'd have to change the type as soon as you switch to a different combinator. To avoid the hassle it's better to hide the iterator behind a dyn Iterator reference, i.e. type-erase it by putting it in a Box. Erasing the type involves dynamic dispatch, but so would specifying the filter function through a function pointer.
Third, Rust statics are read-only, and Iterator::next() takes &mut self, as it updates the state of the iteration. Statics must be read-only because Rust is multi-threaded, and writing to a static without proof that there are no readers or other writers would allow a data race in safe code. So to advance your global iterator, you must wrap it in a Mutex, which provides both thread safety and interior mutability.
After the long introduction, let's take a look at the fairly short implementation:
use lazy_static::lazy_static;
use std::sync::Mutex;
lazy_static! {
static ref NUMS: Mutex<Box<dyn Iterator<Item = u32> + Send + Sync>> =
Mutex::new(Box::new((0..).filter(|&n| n % 2 == 0)));
}
lazy_static is used to implement the create-on-first-use idiom to work around the non-const initial value. The first time NUMS is accessed, it will create the iterator.
As explained above, the iterator itself is boxed and wrapped in a Mutex. Since global variables are assumed to be accessed from multiple threads, our boxed iterator implements Send and Sync in addition to Iterator.
The result is used as follows:
fn main() {
assert_eq!(NUMS.lock().unwrap().next(), Some(0)); // take single value
assert_eq!(
// take multiple values
Vec::from_iter(NUMS.lock().unwrap().by_ref().take(5)),
vec![2, 4, 6, 8, 10]
);
}
Playground
No. For multiple reasons:
Iterators types tend to be complicated. This is usually not a problem because iterator types must rarely be named, but statics must be explicitly typed. In this case the type is still relatively simple: core::iter::Filter<core::ops::RangeFrom<i32>, fn(&i32) -> bool>.
Iterator's main method, next, needs a &mut self parameter. statics can't be mutable by default, as this would not be safe.
Iterators can only be iterated once. Therefore it makes little sense to have a global iterator in the first place.
The value to initialize a static must be a constant expression. Your initializer is not a constant expression.

What does it mean for all values of a type must use the same amount of memory in Rust?

Forgive me if there is an obvious answer to the question I'm asking, but I just don't quite understand it.
The Dynamically Sized Types and the Sized Trait section in chapter 19.3 Advanced Types of the 《The Rust Programming Language》 mentions:
Rust needs to know how much memory to allocate for any value of a particular type, and all values of a type must use the same amount of memory. If Rust allowed us to write this code, these two str values would need to take up the same amount of space. But they have different lengths: s1 needs 12 bytes of storage and s2 needs 15. This is why it’s not possible to create a variable holding a dynamically sized type.
When it says "and all values of a type must use the same amount of memory", it is meant to refer to dynamically sized types, not types such as vectors or arrays, right? v1 and v2 are also unlikely to occupy the same amount of memory.
let v1 = vec![1, 2, 3];
let v2 = vec![1, 2, 3, 4, 5, 6];
It's correct and considers vectors as well. A Vec<T> is roughly just a pointer to a position on the heap, a capacity, and a length. It could be defined, more or less, as
pub struct Vec<T>(T*, usize, usize);
And every value of that structure clearly has the same size. When Rust says that every value of a type has to have the same size, it only refers to the size of the structure itself, not to the recursive size of all things it points to. Box<T> has a constant size, regardless of T, which is why Box can hold even things that are dynamically sized, such as trait objects. Likewise, String is basically just a pointer.
Likewise, if we define
pub enum MyEnum {
A(i32),
B(i32, i32),
}
Then MyEnum::A is no smaller than MyEnum::B, for similar reasons, despite the latter having more data than the former.
Every type that can be stored and accessed without the indirection of a reference or Box must have the Sized trait implemented. This means that every instance of the type will have the same size. A str is a DST, as the data it holds can be of a variable length, and thus, you can only access strs as references, or from String, which holds the str data on the heap, through a pointer.
Every Vec also takes the same space, which is 24 bytes on a 64-bit machine.
For example:
let vec = vec![1, 2, 3, 4];
println!("{}", std::mem::size_of_val(&vec)); // Prints '24'.

Why can I start a slice past the end of a vector in Rust?

Given v = vec![1,2,3,4], why does v[4..] return an empty vector, but v[5..] panics, while both v[4] and v[5] panic? I suspect this has to do with the implementation of slicing without specifying either the start- or endpoint, but I couldn't find any information on this online.
This is simply because std::ops::RangeFrom is defined to be "bounded inclusively below".
A quick recap of all the plumbing: v[4..] desugars to std::ops::Index using 4.. (which parses as a std::ops::RangeFrom) as the parameter. std::ops::RangeFrom implements std::slice::SliceIndex and Vec has an implementation for std::ops::Index for any parameter that implements std::slice::SliceIndex. So what you are looking at is a RangeFrom being used to std::ops::Index the Vec.
std::ops::RangeFrom is defined to always be inclusive on the lower bound. For example [0..] will include the first element of the thing being indexed. If (in your case) the Vec is empty, then [0..] will be the empty slice. Notice: if the lower bound wasn't inclusive, there would be no way to slice an empty Vec at all without causing a panic, which would be cumbersome.
A simple way to think about it is "where the fence-post is put".
A v[0..] in a vec![0, 1, 2 ,3] is
| 0 1 2 3 |
^
|- You are slicing from here. This includes the
entire `Vec` (even if it was empty)
In v[4..] it is
| 0 1 2 3 |
^
|- You are slicing from here to the end of the Vector.
Which results in, well, nothing.
while a v[5..] would be
| 0 1 2 3 |
^
|- Slicing from here to infinity is definitely
outside the `Vec` and, also, the
caller's fault, so panic!
and a v[3..] is
| 0 1 2 3 |
^
|- slicing from here to the end results in `&[3]`
While the other answer explains how to understand and remember the indexing behavior implemented in Rust standard library, the real reason why it is the way it is has nothing to do with technical limitations. It comes down to the design decision made by the authors of Rust standard library.
Given v = vec![1,2,3,4], why does v[4..] return an empty vector, but v[5..] panics [..] ?
Because it was decided so. The code below that handles slice indexing (full source) will panic if the start index is larger than the slice's length.
fn index(self, slice: &[T]) -> &[T] {
if self.start > slice.len() {
slice_start_index_len_fail(self.start, slice.len());
}
// SAFETY: `self` is checked to be valid and in bounds above.
unsafe { &*self.get_unchecked(slice) }
}
fn slice_start_index_len_fail(index: usize, len: usize) -> ! {
panic!("range start index {} out of range for slice of length {}", index, len);
}
How could it be implemented differently? I personally like how Python does it.
v = [1, 2, 3, 4]
a = v[4] # -> Raises an exception - Similar to Rust's behavior (panic)
b = v[5] # -> Same, raises an exception - Also similar to Rust's
# (Equivalent to Rust's v[4..])
w = v[4:] # -> Returns an empty list - Similar to Rust's
x = v[5:] # -> Also returns an empty list - Different from Rust's, which panics
Python's approach is not necessarily better than Rust's, because there's always a trade-off. Python's approach is more convenient (there's no need to check if a start index is not greater than the length), but if there's a bug, it's harder to find because it doesn't fail early.
Although Rust can technically follow Python's approach, its designers decided to fail early by panicking in order that a bug can be faster to find, but with a cost of some inconvenience (programmers need to ensure that a start index is not greater than the length).

Why is "&&" being used in closure arguments?

I have two questions regarding this example:
let a = [1, 2, 3];
assert_eq!(a.iter().find(|&&x| x == 2), Some(&2));
assert_eq!(a.iter().find(|&&x| x == 5), None);
Why is &&x used in the closure arguments rather than just x? I understand that & is passing a reference to an object, but what does using it twice mean?
I don't understand what the documentation says:
Because find() takes a reference, and many iterators iterate over references, this leads to a possibly confusing situation where the argument is a double reference. You can see this effect in the examples below, with &&x.
Why is Some(&2) used rather than Some(2)?
a is of type [i32; 3]; an array of three i32s.
[i32; 3] does not implement an iter method, but it does dereference into &[i32].
&[i32] implements an iter method which produces an iterator.
This iterator implements Iterator<Item=&i32>.
It uses &i32 rather than i32 because the iterator has to work on arrays of any type, and not all types can be safely copied. So rather than restrict itself to copyable types, it iterates over the elements by reference rather than by value.
find is a method defined for all Iterators. It lets you look at each element and return the one that matches the predicate. Problem: if the iterator produces non-copyable values, then passing the value into the predicate would make it impossible to return it from find. The value cannot be re-generated, since iterators are not (in general) rewindable or restartable. Thus, find has to pass the element to the predicate by-reference rather than by-value.
So, if you have an iterator that implements Iterator<Item=T>, then Iterator::find requires a predicate that takes a &T and returns a bool. [i32]::iter produces an iterator that implements Iterator<Item=&i32>. Thus, Iterator::find called on an array iterator requires a predicate that takes a &&i32. That is, it passes the predicate a pointer to a pointer to the element in question.
So if you were to write a.iter().find(|x| ..), the type of x would be &&i32. This cannot be directly compared to the literal i32 value 2. There are several ways of fixing this. One is to explicitly dereference x: a.iter().find(|x| **x == 2). The other is to use pattern matching to destructure the double reference: a.iter().find(|&&x| x == 2). These two approaches are, in this case, doing exactly the same thing. [1]
As for why Some(&2) is used: because a.iter() is an iterator over &i32, not an iterator of i32. If you look at the documentation for Iterator::find, you'll see that for Iterator<Item=T>, it returns an Option<T>. Hence, in this case, it returns an Option<&i32>, so that's what you need to compare it against.
[1]: The differences only matter when you're talking about non-Copy types. For example, |&&x| .. wouldn't work on a &&String, because you'd have to be able to move the String out from behind the reference, and that's not allowed. However, |x| **x .. would work, because that is just reaching inside the reference without moving anything.
1) I thought the book explanation was good, maybe my example with .cloned() below will be useful. But since .iter() iterates over references, you have to specify reference additionally because find expects a reference.
2) .iter() is iterating over references; therefore, you find a reference.
You could use .cloned() to see what it would look like if you didn't have to do deal with references:
assert_eq!(a.iter().cloned().find(|&x| x == 2), Some(2));

Reverse order of a reference to immutable array slice

I would like to reverse the order of a slice:
&[1, 2, 3] -> &[3, 2, 1]
This is my code:
fn iterate_over_file(data: &[u8], ...) -> ... {
...
for cur_data in data.chunks(chunk_size) {
let reversed_cur_data = cur_data.reverse() // this returns ()
...
...
}
This data parameter comes from a file I read in using FileBuffer, and I'd like to keep it as a referenced slice (and not turn it into an owned Vec, since it's a heavy computation to make).
How could I reverse the order of cur_data with the minimal amount of operations and memory allocation? Its length is known for a specific runtime of my program (called here chunk_size), but it changes between different runs. reversed() seems to return (), which makes sense as it's done in-place, and I only have a referenced slice. .iter().rev() creates an iterator, but then I'd have to call .next() on it several times to get the slice back, which is both not elegant and not effective, as I have at least tens of millions of cur_data lines per file.
Not only does reverse return (), it also requires a mutable slice, which you do not have. The optimal solution depends on exactly what you need to do with the data. If you only need to iterate over the data, cur_data.iter().rev() is exactly the right and the most efficient choice.
If you need the reversed data inside a slice for further processing, such as to send the reversed chunk to a function that expects a slice, you can collect the data into a vector. To avoid a new allocation for each chunk, you can reuse the same vector across all loop iterations:
let mut reversed = Vec::with_capacity(chunk_size);
for cur_data in data.chunks(chunk_size) {
// truncate the slice at the beginning of each iteration.
// Vec explicitly guarantees that this will *not* deallocate,
// it will only reset its internal length. An allocation will
// thus only happen prior to the loop.
reversed.truncate(0);
reversed.extend(cur_data.iter().rev());
// &reversed is now the reversed_cur_data you need
...
}

Resources