Convert vectors to arrays and back [duplicate] - rust

This question already has an answer here:
Is there a good way to convert a Vec<T> to an array?
(1 answer)
Closed 7 years ago.
I am attempting to figure the most Rust-like way of converting from a vector to array and back. These macros will work and can even be made generic with some unsafe blocks, but it all feels very un-Rust like.
I would appreciate any input and hold no punches, I think this code is far from nice or optimal. I have only played with Rust for a few weeks now and chasing releases and docs so really appreciate help.
macro_rules! convert_u8vec_to_array {
($container:ident, $size:expr) => {{
if $container.len() != $size {
None
} else {
use std::mem;
let mut arr : [_; $size] = unsafe { mem::uninitialized() };
for element in $container.into_iter().enumerate() {
let old_val = mem::replace(&mut arr[element.0],element.1);
unsafe { mem::forget(old_val) };
}
Some(arr)
}
}};
}
fn array_to_vec(arr: &[u8]) -> Vec<u8> {
let mut vector = Vec::new();
for i in arr.iter() {
vector.push(*i);
}
vector
}
fn vector_as_u8_4_array(vector: Vec<u8>) -> [u8;4] {
let mut arr = [0u8;4];
for i in (0..4) {
arr[i] = vector[i];
}
arr
}

The code seems fine to me, although there's a very important safety thing to note: there can be no panics while arr isn't fully initialised. Running destructors on uninitialised memory could easily lead be undefined behaviour, and, in particular, this means that into_iter and the next method of it should never panic (I believe it is impossible for the enumerate and mem::* parts of the iterator to panic given the constraints of the code).
That said, one can express the replace/forget idiom with a single function: std::ptr::write.
for (idx, element) in $container.into_iter().enumerate() {
ptr::write(&mut arr[idx], element);
}
Although, I would write it as:
for (place, element) in arr.iter_mut().zip($container.into_iter()) {
ptr::write(place, element);
}
Similarly, one can apply some iterator goodness to the u8 specialised versions:
fn array_to_vec(arr: &[u8]) -> Vec<u8> {
arr.iter().cloned().collect()
}
fn vector_as_u8_4_array(vector: Vec<u8>) -> [u8;4] {
let mut arr = [0u8;4];
for (place, element) in arr.iter_mut().zip(vector.iter()) {
*place = *element;
}
arr
}
Although the first is probably better written as arr.to_vec(), and the second as
let mut arr = [0u8; 4];
std::slice::bytes::copy_memory(&vector, &mut arr);
arr
Although that function is unstable currently, and hence only usable on nightly.

Related

How to pass &mut str and change the original mut str without a return?

I'm learning Rust from the Book and I was tackling the exercises at the end of chapter 8, but I'm hitting a wall with the one about converting words into Pig Latin. I wanted to see specifically if I could pass a &mut String to a function that takes a &mut str (to also accept slices) and modify the referenced string inside it so the changes are reflected back outside without the need of a return, like in C with a char **.
I'm not quite sure if I'm just messing up the syntax or if it's more complicated than it sounds due to Rust's strict rules, which I have yet to fully grasp. For the lifetime errors inside to_pig_latin() I remember reading something that explained how to properly handle the situation but right now I can't find it, so if you could also point it out for me it would be very appreciated.
Also what do you think of the way I handled the chars and indexing inside strings?
use std::io::{self, Write};
fn main() {
let v = vec![
String::from("kaka"),
String::from("Apple"),
String::from("everett"),
String::from("Robin"),
];
for s in &v {
// cannot borrow `s` as mutable, as it is not declared as mutable
// cannot borrow data in a `&` reference as mutable
to_pig_latin(&mut s);
}
for (i, s) in v.iter().enumerate() {
print!("{}", s);
if i < v.len() - 1 {
print!(", ");
}
}
io::stdout().flush().unwrap();
}
fn to_pig_latin(mut s: &mut str) {
let first = s.chars().nth(0).unwrap();
let mut pig;
if "aeiouAEIOU".contains(first) {
pig = format!("{}-{}", s, "hay");
s = &mut pig[..]; // `pig` does not live long enough
} else {
let mut word = String::new();
for (i, c) in s.char_indices() {
if i != 0 {
word.push(c);
}
}
pig = format!("{}-{}{}", word, first.to_lowercase(), "ay");
s = &mut pig[..]; // `pig` does not live long enough
}
}
Edit: here's the fixed code with the suggestions from below.
fn main() {
// added mut
let mut v = vec![
String::from("kaka"),
String::from("Apple"),
String::from("everett"),
String::from("Robin"),
];
// added mut
for mut s in &mut v {
to_pig_latin(&mut s);
}
for (i, s) in v.iter().enumerate() {
print!("{}", s);
if i < v.len() - 1 {
print!(", ");
}
}
println!();
}
// converted into &mut String
fn to_pig_latin(s: &mut String) {
let first = s.chars().nth(0).unwrap();
if "aeiouAEIOU".contains(first) {
s.push_str("-hay");
} else {
// added code to make the new first letter uppercase
let second = s.chars().nth(1).unwrap();
*s = format!(
"{}{}-{}ay",
second.to_uppercase(),
// the slice starts at the third char of the string, as if &s[2..]
&s[first.len_utf8() * 2..],
first.to_lowercase()
);
}
}
I'm not quite sure if I'm just messing up the syntax or if it's more complicated than it sounds due to Rust's strict rules, which I have yet to fully grasp. For the lifetime errors inside to_pig_latin() I remember reading something that explained how to properly handle the situation but right now I can't find it, so if you could also point it out for me it would be very appreciated.
What you're trying to do can't work: with a mutable reference you can update the referee in-place, but this is extremely limited here:
a &mut str can't change length or anything of that matter
a &mut str is still just a reference, the memory has to live somewhere, here you're creating new Strings inside your function then trying to use these as the new backing buffers for the reference, which as the compiler tells you doesn't work: the String will be deallocated at the end of the function
What you could do is take an &mut String, that lets you modify the owned string itself in-place, which is much more flexible. And, in fact, corresponds exactly to your request: an &mut str corresponds to a char*, it's a pointer to a place in memory.
A String is also a pointer, so an &mut String is a double-pointer to a zone in memory.
So something like this:
fn to_pig_latin(s: &mut String) {
let first = s.chars().nth(0).unwrap();
if "aeiouAEIOU".contains(first) {
*s = format!("{}-{}", s, "hay");
} else {
let mut word = String::new();
for (i, c) in s.char_indices() {
if i != 0 {
word.push(c);
}
}
*s = format!("{}-{}{}", word, first.to_lowercase(), "ay");
}
}
You can also likely avoid some of the complete string allocations by using somewhat finer methods e.g.
fn to_pig_latin(s: &mut String) {
let first = s.chars().nth(0).unwrap();
if "aeiouAEIOU".contains(first) {
s.push_str("-hay")
} else {
s.replace_range(first.len_utf8().., "");
write!(s, "-{}ay", first.to_lowercase()).unwrap();
}
}
although the replace_range + write! is not very readable and not super likely to be much of a gain, so that might as well be a format!, something along the lines of:
fn to_pig_latin(s: &mut String) {
let first = s.chars().nth(0).unwrap();
if "aeiouAEIOU".contains(first) {
s.push_str("-hay")
} else {
*s = format!("{}-{}ay", &s[first.len_utf8()..], first.to_lowercase());
}
}

Adding an append method to a singly linked list

I was looking through the singly linked list example on rustbyexample.com and I noticed the implementation had no append method, so I decided to try and implement it:
fn append(self, elem: u32) -> List {
let mut node = &self;
loop {
match *node {
Cons(_, ref tail) => {
node = tail;
},
Nil => {
node.prepend(elem);
break;
},
}
}
return self;
}
The above is one of many different attempts, but I cannot seem to find a way to iterate down to the tail and modify it, then somehow return the head, without upsetting the borrow checker in some way.
I am trying to figure out a solution that doesn't involve copying data or doing additional bookkeeping outside the append method.
As described in Cannot obtain a mutable reference when iterating a recursive structure: cannot borrow as mutable more than once at a time, you need to transfer ownership of the mutable reference when performing iteration. This is needed to ensure you never have two mutable references to the same thing.
We use similar code as that Q&A to get a mutable reference to the last item (back) which will always be the Nil variant. We then call it and set that Nil item to a Cons. We wrap all that with a by-value function because that's what the API wants.
No extra allocation, no risk of running out of stack frames.
use List::*;
#[derive(Debug)]
enum List {
Cons(u32, Box<List>),
Nil,
}
impl List {
fn back(&mut self) -> &mut List {
let mut node = self;
loop {
match {node} {
&mut Cons(_, ref mut next) => node = next,
other => return other,
}
}
}
fn append_ref(&mut self, elem: u32) {
*self.back() = Cons(elem, Box::new(Nil));
}
fn append(mut self, elem: u32) -> Self {
self.append_ref(elem);
self
}
}
fn main() {
let n = Nil;
let n = n.append(1);
println!("{:?}", n);
let n = n.append(2);
println!("{:?}", n);
let n = n.append(3);
println!("{:?}", n);
}
When non-lexical lifetimes are enabled, this function can be more obvious:
fn back(&mut self) -> &mut List {
let mut node = self;
while let Cons(_, next) = node {
node = next;
}
node
}
As the len method is implemented recursively, I have done the same for the append implementation:
fn append(self, elem: u32) -> List {
match self {
Cons(current_elem, tail_box) => {
let tail = *tail_box;
let new_tail = tail.append(elem);
new_tail.prepend(current_elem)
}
Nil => {
List::new().prepend(elem)
}
}
}
One possible iterative solution would be to implement append in terms of prepend and a reverse function, like so (it won't be as performant but should still only be O(N)):
// Reverses the list
fn rev(self) -> List {
let mut result = List::new();
let mut current = self;
while let Cons(elem, tail) = current {
result = result.prepend(elem);
current = *tail;
}
result
}
fn append(self, elem: u32) -> List {
self.rev().prepend(elem).rev()
}
So, it's actually going to be slightly more difficult than you may think; mostly because Box is really missing a destructive take method which would return its content.
Easy way: the recursive way, no return.
fn append_rec(&mut self, elem: u32) {
match *self {
Cons(_, ref mut tail) => tail.append_rec(elem),
Nil => *self = Cons(elem, Box::new(Nil)),
}
}
This is relatively easy, as mentioned.
Harder way: the recursive way, with return.
fn append_rec(self, elem: u32) -> List {
match self {
Cons(e, tail) => Cons(e, Box::new((*tail).append_rec(elem))),
Nil => Cons(elem, Box::new(Nil)),
}
}
Note that this is grossly inefficient. For a list of size N, we are destroying N boxes and allocating N new ones. In place mutation (the first approach), was much better in this regard.
Harder way: the iterative way, with no return.
fn append_iter_mut(&mut self, elem: u32) {
let mut current = self;
loop {
match {current} {
&mut Cons(_, ref mut tail) => current = tail,
c # &mut Nil => {
*c = Cons(elem, Box::new(Nil));
return;
},
}
}
}
Okay... so iterating (mutably) over a nested data structure is not THAT easy because ownership and borrow-checking will ensure that:
a mutable reference is never copied, only moved,
a mutable reference with an outstanding borrow cannot be modified.
This is why here:
we use {current} to move current into the match,
we use c # &mut Nil because we need a to name the match of &mut Nil since current has been moved.
Note that thankfully rustc is smart enough to check the execution path and detect that it's okay to continue looping as long as we take the Cons branch since we reinitialize current in that branch, however it's not okay to continue after taking the Nil branch, which forces us to terminate the loop :)
Harder way: the iterative way, with return
fn append_iter(self, elem: u32) -> List {
let mut stack = List::default();
{
let mut current = self;
while let Cons(elem, tail) = current {
stack = stack.prepend(elem);
current = take(tail);
}
}
let mut result = List::new();
result = result.prepend(elem);
while let Cons(elem, tail) = stack {
result = result.prepend(elem);
stack = take(tail);
}
result
}
In the recursive way, we were using the stack to keep the items for us, here we use a stack structure instead.
It's even more inefficient than the recursive way with return was; each node cause two deallocations and two allocations.
TL;DR: in-place modifications are generally more efficient, don't be afraid of using them when necessary.

Implement slice_shift_char using the std library

I'd like to use the &str method slice_shift_char, but it is marked as unstable in the documentation:
Unstable: awaiting conventions about shifting and slices and may not
be warranted with the existence of the chars and/or char_indices
iterators
What would be a good way to implement this method, with Rust's current std library? So far I have:
fn slice_shift_char(s: &str) -> Option<(char, &str)> {
let mut ixs = s.char_indices();
let next = ixs.next();
match next {
Some((next_pos, ch)) => {
let rest = unsafe {
s.slice_unchecked(next_pos, s.len())
};
Some((ch, rest))
},
None => None
}
}
I'd like to avoid the call to slice_unchecked. I'm using Rust 1.1.
Well, you can look at the source code, and you'll get https://github.com/rust-lang/rust/blob/master/src/libcollections/str.rs#L776-L778 and https://github.com/rust-lang/rust/blob/master/src/libcore/str/mod.rs#L1531-L1539 . The second:
fn slice_shift_char(&self) -> Option<(char, &str)> {
if self.is_empty() {
None
} else {
let ch = self.char_at(0);
let next_s = unsafe { self.slice_unchecked(ch.len_utf8(), self.len()) };
Some((ch, next_s))
}
}
If you don't want the unsafe, you can just use a normal slice:
fn slice_shift_char(&self) -> Option<(char, &str)> {
if self.is_empty() {
None
} else {
let ch = self.char_at(0);
let len = self.len();
let next_s = &self[ch.len_utf8().. len];
Some((ch, next_s))
}
}
The unstable slice_shift_char function has been deprecated since Rust 1.9.0 and removed completely in Rust 1.11.0.
As of Rust 1.4.0, the recommended approach of implementing this is:
Use .chars() to get an iterator of the char content
Iterate on this iterator once to get the first character.
Call .as_str() on that iterator to recover the remaining uniterated string.
fn slice_shift_char(a: &str) -> Option<(char, &str)> {
let mut chars = a.chars();
chars.next().map(|c| (c, chars.as_str()))
}
fn main() {
assert_eq!(slice_shift_char("hello"), Some(('h', "ello")));
assert_eq!(slice_shift_char("ĺḿńóṕ"), Some(('ĺ', "ḿńóṕ")));
assert_eq!(slice_shift_char(""), None);
}

Using str and String interchangably

Suppose I'm trying to do a fancy zero-copy parser in Rust using &str, but sometimes I need to modify the text (e.g. to implement variable substitution). I really want to do something like this:
fn main() {
let mut v: Vec<&str> = "Hello there $world!".split_whitespace().collect();
for t in v.iter_mut() {
if (t.contains("$world")) {
*t = &t.replace("$world", "Earth");
}
}
println!("{:?}", &v);
}
But of course the String returned by t.replace() doesn't live long enough. Is there a nice way around this? Perhaps there is a type which means "ideally a &str but if necessary a String"? Or maybe there is a way to use lifetime annotations to tell the compiler that the returned String should be kept alive until the end of main() (or have the same lifetime as v)?
Rust has exactly what you want in form of a Cow (Clone On Write) type.
use std::borrow::Cow;
fn main() {
let mut v: Vec<_> = "Hello there $world!".split_whitespace()
.map(|s| Cow::Borrowed(s))
.collect();
for t in v.iter_mut() {
if t.contains("$world") {
*t.to_mut() = t.replace("$world", "Earth");
}
}
println!("{:?}", &v);
}
as #sellibitze correctly notes, the to_mut() creates a new String which causes a heap allocation to store the previous borrowed value. If you are sure you only have borrowed strings, then you can use
*t = Cow::Owned(t.replace("$world", "Earth"));
In case the Vec contains Cow::Owned elements, this would still throw away the allocation. You can prevent that using the following very fragile and unsafe code (It does direct byte-based manipulation of UTF-8 strings and relies of the fact that the replacement happens to be exactly the same number of bytes.) inside your for loop.
let mut last_pos = 0; // so we don't start at the beginning every time
while let Some(pos) = t[last_pos..].find("$world") {
let p = pos + last_pos; // find always starts at last_pos
last_pos = pos + 5;
unsafe {
let s = t.to_mut().as_mut_vec(); // operating on Vec is easier
s.remove(p); // remove $ sign
for (c, sc) in "Earth".bytes().zip(&mut s[p..]) {
*sc = c;
}
}
}
Note that this is tailored exactly to the "$world" -> "Earth" mapping. Any other mappings require careful consideration inside the unsafe code.
std::borrow::Cow, specifically used as Cow<'a, str>, where 'a is the lifetime of the string being parsed.
use std::borrow::Cow;
fn main() {
let mut v: Vec<Cow<'static, str>> = vec![];
v.push("oh hai".into());
v.push(format!("there, {}.", "Mark").into());
println!("{:?}", v);
}
Produces:
["oh hai", "there, Mark."]

How to call count on an iterator and still use the iterator's items?

parts.count() leads to ownership transfer, so parts can't be used any more.
fn split(slice: &[u8], splitter: &[u8]) -> Option<Vec<u8>> {
let mut parts = slice.split(|b| splitter.contains(b));
let len = parts.count(); //ownership transfer
if len >= 2 {
Some(parts.nth(1).unwrap().to_vec())
} else if len >= 1 {
Some(parts.nth(0).unwrap().to_vec())
} else {
None
}
}
fn main() {
split(&[1u8, 2u8, 3u8], &[2u8]);
}
It is also possible to avoid unnecessary allocations of Vec if you only need to use the first or the second part:
fn split<'a>(slice: &'a [u8], splitter: &[u8]) -> Option<&'a [u8]> {
let mut parts = slice.split(|b| splitter.contains(b)).fuse();
let first = parts.next();
let second = parts.next();
second.or(first)
}
Then if you actually need a Vec you can map on the result:
split(&[1u8, 2u8, 3u8], &[2u8]).map(|s| s.to_vec())
Of course, if you want, you can move to_vec() conversion to the function:
second.or(first).map(|s| s.to_vec())
I'm calling fuse() on the iterator in order to guarantee that it will always return None after the first None is returned (which is not guaranteed by the general iterator protocol).
The other answers are good suggestions to answer your problem, but I'd like to point out another general solution: create multiple iterators:
fn split(slice: &[u8], splitter: &[u8]) -> Option<Vec<u8>> {
let mut parts = slice.split(|b| splitter.contains(b));
let parts2 = slice.split(|b| splitter.contains(b));
let len = parts2.count();
if len >= 2 {
Some(parts.nth(1).unwrap().to_vec())
} else if len >= 1 {
Some(parts.nth(0).unwrap().to_vec())
} else {
None
}
}
fn main() {
split(&[1u8, 2u8, 3u8], &[2u8]);
}
You can usually create multiple read-only iterators. Some iterators even implement Clone, so you could just say iter.clone().count(). Unfortunately, Split isn't one of them because it owns the passed-in closure.
One thing you can do is collect the results of the split in a new owned Vec, like this:
fn split(slice: &[u8], splitter: &[u8]) -> Option<Vec<u8>> {
let parts: Vec<&[u8]> = slice.split(|b| splitter.contains(b)).collect();
let len = parts.len();
if len >= 2 {
Some(parts.iter().nth(1).unwrap().to_vec())
} else if len >= 1 {
Some(parts.iter().nth(0).unwrap().to_vec())
} else {
None
}
}

Resources