I was looking through the singly linked list example on rustbyexample.com and I noticed the implementation had no append method, so I decided to try and implement it:
fn append(self, elem: u32) -> List {
let mut node = &self;
loop {
match *node {
Cons(_, ref tail) => {
node = tail;
},
Nil => {
node.prepend(elem);
break;
},
}
}
return self;
}
The above is one of many different attempts, but I cannot seem to find a way to iterate down to the tail and modify it, then somehow return the head, without upsetting the borrow checker in some way.
I am trying to figure out a solution that doesn't involve copying data or doing additional bookkeeping outside the append method.
As described in Cannot obtain a mutable reference when iterating a recursive structure: cannot borrow as mutable more than once at a time, you need to transfer ownership of the mutable reference when performing iteration. This is needed to ensure you never have two mutable references to the same thing.
We use similar code as that Q&A to get a mutable reference to the last item (back) which will always be the Nil variant. We then call it and set that Nil item to a Cons. We wrap all that with a by-value function because that's what the API wants.
No extra allocation, no risk of running out of stack frames.
use List::*;
#[derive(Debug)]
enum List {
Cons(u32, Box<List>),
Nil,
}
impl List {
fn back(&mut self) -> &mut List {
let mut node = self;
loop {
match {node} {
&mut Cons(_, ref mut next) => node = next,
other => return other,
}
}
}
fn append_ref(&mut self, elem: u32) {
*self.back() = Cons(elem, Box::new(Nil));
}
fn append(mut self, elem: u32) -> Self {
self.append_ref(elem);
self
}
}
fn main() {
let n = Nil;
let n = n.append(1);
println!("{:?}", n);
let n = n.append(2);
println!("{:?}", n);
let n = n.append(3);
println!("{:?}", n);
}
When non-lexical lifetimes are enabled, this function can be more obvious:
fn back(&mut self) -> &mut List {
let mut node = self;
while let Cons(_, next) = node {
node = next;
}
node
}
As the len method is implemented recursively, I have done the same for the append implementation:
fn append(self, elem: u32) -> List {
match self {
Cons(current_elem, tail_box) => {
let tail = *tail_box;
let new_tail = tail.append(elem);
new_tail.prepend(current_elem)
}
Nil => {
List::new().prepend(elem)
}
}
}
One possible iterative solution would be to implement append in terms of prepend and a reverse function, like so (it won't be as performant but should still only be O(N)):
// Reverses the list
fn rev(self) -> List {
let mut result = List::new();
let mut current = self;
while let Cons(elem, tail) = current {
result = result.prepend(elem);
current = *tail;
}
result
}
fn append(self, elem: u32) -> List {
self.rev().prepend(elem).rev()
}
So, it's actually going to be slightly more difficult than you may think; mostly because Box is really missing a destructive take method which would return its content.
Easy way: the recursive way, no return.
fn append_rec(&mut self, elem: u32) {
match *self {
Cons(_, ref mut tail) => tail.append_rec(elem),
Nil => *self = Cons(elem, Box::new(Nil)),
}
}
This is relatively easy, as mentioned.
Harder way: the recursive way, with return.
fn append_rec(self, elem: u32) -> List {
match self {
Cons(e, tail) => Cons(e, Box::new((*tail).append_rec(elem))),
Nil => Cons(elem, Box::new(Nil)),
}
}
Note that this is grossly inefficient. For a list of size N, we are destroying N boxes and allocating N new ones. In place mutation (the first approach), was much better in this regard.
Harder way: the iterative way, with no return.
fn append_iter_mut(&mut self, elem: u32) {
let mut current = self;
loop {
match {current} {
&mut Cons(_, ref mut tail) => current = tail,
c # &mut Nil => {
*c = Cons(elem, Box::new(Nil));
return;
},
}
}
}
Okay... so iterating (mutably) over a nested data structure is not THAT easy because ownership and borrow-checking will ensure that:
a mutable reference is never copied, only moved,
a mutable reference with an outstanding borrow cannot be modified.
This is why here:
we use {current} to move current into the match,
we use c # &mut Nil because we need a to name the match of &mut Nil since current has been moved.
Note that thankfully rustc is smart enough to check the execution path and detect that it's okay to continue looping as long as we take the Cons branch since we reinitialize current in that branch, however it's not okay to continue after taking the Nil branch, which forces us to terminate the loop :)
Harder way: the iterative way, with return
fn append_iter(self, elem: u32) -> List {
let mut stack = List::default();
{
let mut current = self;
while let Cons(elem, tail) = current {
stack = stack.prepend(elem);
current = take(tail);
}
}
let mut result = List::new();
result = result.prepend(elem);
while let Cons(elem, tail) = stack {
result = result.prepend(elem);
stack = take(tail);
}
result
}
In the recursive way, we were using the stack to keep the items for us, here we use a stack structure instead.
It's even more inefficient than the recursive way with return was; each node cause two deallocations and two allocations.
TL;DR: in-place modifications are generally more efficient, don't be afraid of using them when necessary.
Related
I have a Cons list:
#[derive(Debug)]
pub enum Cons {
Empty,
Pair(i64, Box<Cons>),
}
I want to implement FromIterator<i64> for this type, in an efficient manner.
Attempt one is straightforward: implement a push method which recursively traverses the list and transforms a Cons::Empty into a Cons::Pair(x, Box::new(Cons::Empty)); repeatedly call this push method. This operation is O(n^2) in time and O(n) in temporary space for the stack frames.
Attempt two will combine the recursion with the iterator to improve the time performance: by pulling a single item from the iterator to construct a Cons::Pair and then recursing to construct the remainder of the list, we now construct the list in O(n) time and O(n) temporary space:
impl FromIterator<i64> for Cons {
fn from_iter<I>(iter: I) -> Self
where
I: IntoIterator<Item = i64>,
{
let mut iter = iter.into_iter();
match iter.next() {
Some(x) => Cons::Pair(x, Box::new(iter.collect())),
None => Cons::Empty,
}
}
}
In C, it would be possible to implement this method using O(n) operations and O(1) working space size. However, I cannot translate it into Rust. The idea is simple, but it requires storing two mutable pointers to the same value; something that Rust forbids. A failed attempt:
impl FromIterator<i64> for Cons {
fn from_iter<I>(iter: I) -> Self
where
I: IntoIterator<Item = i64>,
{
let mut iter = iter.into_iter();
let ret = Box::new(Cons::Empty);
let mut cursor = ret;
loop {
match iter.next() {
Some(x) => {
let mut next = Box::new(Cons::Empty);
*cursor = Cons::Pair(x, next);
cursor = next;
}
None => break,
}
}
return *ret;
}
}
error[E0382]: use of moved value: `next`
--> src/lib.rs:20:30
|
18 | let mut next = Box::new(Cons::Empty);
| -------- move occurs because `next` has type `Box<Cons>`, which does not implement the `Copy` trait
19 | *cursor = Cons::Pair(x, next);
| ---- value moved here
20 | cursor = next;
| ^^^^ value used here after move
error[E0382]: use of moved value: `*ret`
--> src/lib.rs:25:16
|
13 | let ret = Box::new(Cons::Empty);
| --- move occurs because `ret` has type `Box<Cons>`, which does not implement the `Copy` trait
14 | let mut cursor = ret;
| --- value moved here
...
25 | return *ret;
| ^^^^ value used here after move
Is it possible to perform this algorithm in safe Rust? How else could I implement an efficient FromIterator for my Cons type? I understand that I may be able to make some headway by switching Box to Rc, but I'd like to avoid this if possible.
You are attempting to have two owners of a single variable, but Rust only allows a single owner. You do this twice: once for ret and once for next. Instead, use mutable references.
I chose to introduce a last() method which can be used in an implementation of Extend and participate in more abstractions.
#[derive(Debug)]
pub enum Cons {
Empty,
Pair(i64, Box<Cons>),
}
impl Cons {
fn last(&mut self) -> &mut Self {
let mut this = self;
loop {
eprintln!("1 loop turn");
match this {
Cons::Empty => return this,
Cons::Pair(_, next) => this = next,
}
}
}
}
impl FromIterator<i64> for Cons {
fn from_iter<I>(iter: I) -> Self
where
I: IntoIterator<Item = i64>,
{
let mut this = Cons::Empty;
this.extend(iter);
this
}
}
impl Extend<i64> for Cons {
fn extend<I>(&mut self, iter: I)
where
I: IntoIterator<Item = i64>,
{
let mut this = self.last();
for i in iter {
eprintln!("1 loop turn");
*this = Cons::Pair(i, Box::new(Cons::Empty));
this = match this {
Cons::Empty => unreachable!(),
Cons::Pair(_, next) => next,
};
}
}
}
fn main() {
dbg!(Cons::from_iter(0..10));
}
This produces
Pair(0, Pair(1, Pair(2, Pair(3, Pair(4, Pair(5, Pair(6, Pair(7, Pair(8, Pair(9, Empty))))))))))
0 -> 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8 -> 9 -> ⏚
See also:
Adding an append method to a singly linked list
How to implement an addition method of linked list?
How do I return a mutable reference to the last element in a singly linked list to append an element?
Learn Rust With Entirely Too Many Linked Lists
I am solving a leetcode problem in Rust, it's a linked list problem.
The part that I am stuck at is that I have a working algorithm, but I wasn't able to return from the function, below is my solution
pub fn remove_nth_from_end(head: Option<Box<ListNode>>, n: i32) -> Option<Box<ListNode>> {
let mut cursor = head.clone().unwrap();
let mut count: i32 = 0;
while cursor.next != None {
count += 1;
cursor = cursor.next.unwrap();
}
let mut n = count - n;
let mut new_cursor = head.unwrap();
while n != 0 {
n -= 1;
new_cursor = new_cursor.next.unwrap();
}
new_cursor.next = new_cursor.next.unwrap().next;
head // <- error: used of moved value
}
I first clone the head so that I can iterate through the linked list to get its total number of nodes.
Then, I will have to remove one node from the list, hence I'm not cloning the head, instead I use it directly, in this case the variable is moved. So after I am done removing the node, I would like to return the head, so that I can return the whole linked list.
However, because of the ownership system in rust, I wasn't able to return a moved value. The problem is I couldn't clone the value as well because if I were to clone, then the head is no longer pointing to the linked list where I removed one node from it.
How would one solve this kind of issue in Rust? I am fairly new to Rust, just picked up the language recently.
One way is to use &mut over the nodes and then use Option::take to take ownership of the nodes while leaving None behind. Use those combinations to mutate the list:
impl Solution {
pub fn remove_nth_from_end(mut head: Option<Box<ListNode>>, mut n: i32) -> Option<Box<ListNode>> {
match n {
0 => head.and_then(|node| node.next),
_ => {
let mut new_head = &mut head;
while n > 0 {
new_head = if let Some(next) = new_head {
&mut next.next
} else {
return head;
};
n -= 1;
}
let to_skip = new_head.as_mut().unwrap().next.take();
new_head.as_mut().map(|node| {
node.next = if let Some(mut other_node) = to_skip {
other_node.next.take()
} else {
None
};
});
head
}
}
}
}
Playground
Disclaimer: This do not implement it working from the end of the list but from the beginning of it. Didn't realize that part, but that should be the problem itself.
In rust, the following function is legal:
fn unwrap<T>(s:Option<T>) -> T {
s.unwrap()
}
It takes ownership of s, panics if it is a None, and returns ownership of the contents of s (which is legal since an Option owns its contents).
I was trying to write a similar function with signature
fn unwrap_set<T>(s: BTreeSet<T>) -> T {
...
}
The idea is that it panics unless s has size 1, in which case it returns the single element. It seems like this should be possible for the same reason unwrap is possible, however none of the methods on BTreeSet had the right signature (they would need to have return type T). The closest was take, and I tried to do
let mut s2 = s;
let v: &T = s2.iter().next().unwrap();
s2.take(v).unwrap()
but this failed.
Is writing a function like unwrap_set possible?
The easiest way to do this would be to use BTreeSet<T>'s implementation of IntoIterator, which would allow you to easily pull owned values out of the set one at a time:
fn unwrap_set<T>(s: BTreeSet<T>) -> T {
let mut it = s.into_iter();
if let Some(first) = it.next() {
if let None = it.next() {
return first;
}
}
panic!("set must have a single value");
}
If you wanted to indirectly rely on IntoIterator you could also use a normal loop, but I don't think it's as readable that way so I probably wouldn't do this:
fn unwrap_set<T>(s: BTreeSet<T>) -> T {
let mut result = None;
for item in s {
// If there is a second value, bail out
if let Some(_) = result {
result = None;
break;
}
result = Some(item);
}
return result.expect("set must have a single value");
}
parts.count() leads to ownership transfer, so parts can't be used any more.
fn split(slice: &[u8], splitter: &[u8]) -> Option<Vec<u8>> {
let mut parts = slice.split(|b| splitter.contains(b));
let len = parts.count(); //ownership transfer
if len >= 2 {
Some(parts.nth(1).unwrap().to_vec())
} else if len >= 1 {
Some(parts.nth(0).unwrap().to_vec())
} else {
None
}
}
fn main() {
split(&[1u8, 2u8, 3u8], &[2u8]);
}
It is also possible to avoid unnecessary allocations of Vec if you only need to use the first or the second part:
fn split<'a>(slice: &'a [u8], splitter: &[u8]) -> Option<&'a [u8]> {
let mut parts = slice.split(|b| splitter.contains(b)).fuse();
let first = parts.next();
let second = parts.next();
second.or(first)
}
Then if you actually need a Vec you can map on the result:
split(&[1u8, 2u8, 3u8], &[2u8]).map(|s| s.to_vec())
Of course, if you want, you can move to_vec() conversion to the function:
second.or(first).map(|s| s.to_vec())
I'm calling fuse() on the iterator in order to guarantee that it will always return None after the first None is returned (which is not guaranteed by the general iterator protocol).
The other answers are good suggestions to answer your problem, but I'd like to point out another general solution: create multiple iterators:
fn split(slice: &[u8], splitter: &[u8]) -> Option<Vec<u8>> {
let mut parts = slice.split(|b| splitter.contains(b));
let parts2 = slice.split(|b| splitter.contains(b));
let len = parts2.count();
if len >= 2 {
Some(parts.nth(1).unwrap().to_vec())
} else if len >= 1 {
Some(parts.nth(0).unwrap().to_vec())
} else {
None
}
}
fn main() {
split(&[1u8, 2u8, 3u8], &[2u8]);
}
You can usually create multiple read-only iterators. Some iterators even implement Clone, so you could just say iter.clone().count(). Unfortunately, Split isn't one of them because it owns the passed-in closure.
One thing you can do is collect the results of the split in a new owned Vec, like this:
fn split(slice: &[u8], splitter: &[u8]) -> Option<Vec<u8>> {
let parts: Vec<&[u8]> = slice.split(|b| splitter.contains(b)).collect();
let len = parts.len();
if len >= 2 {
Some(parts.iter().nth(1).unwrap().to_vec())
} else if len >= 1 {
Some(parts.iter().nth(0).unwrap().to_vec())
} else {
None
}
}
Editor's note: This code example is from a version of Rust prior to 1.0 when many iterators implemented Copy. Updated versions of this code produce a different errors, but the answers still contain valuable information.
I'm trying to write a function to split a string into clumps of letters and numbers; for example, "test123test" would turn into [ "test", "123", "test" ]. Here's my attempt so far:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
match iter.peek() {
None => return bits,
Some(c) => if c.is_digit() {
bits.push(iter.take_while(|c| c.is_digit()).collect());
} else {
bits.push(iter.take_while(|c| !c.is_digit()).collect());
}
}
}
return bits;
}
However, this doesn't work, looping forever. It seems that it is using a clone of iter each time I call take_while, starting from the same position over and over again. I would like it to use the same iter each time, advancing the same iterator over all the each_times. Is this possible?
As you identified, each take_while call is duplicating iter, since take_while takes self and the Peekable chars iterator is Copy. (Only true before Rust 1.0 — editor)
You want to be modifying the iterator each time, that is, for take_while to be operating on an &mut to your iterator. Which is exactly what the .by_ref adaptor is for:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
match iter.peek().map(|c| *c) {
None => return bits,
Some(c) => if c.is_digit(10) {
bits.push(iter.by_ref().take_while(|c| c.is_digit(10)).collect());
} else {
bits.push(iter.by_ref().take_while(|c| !c.is_digit(10)).collect());
},
}
}
}
fn main() {
println!("{:?}", split("123abc456def"))
}
Prints
["123", "bc", "56", "ef"]
However, I imagine this is not correct.
I would actually recommend writing this as a normal for loop, using the char_indices iterator:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
if input.is_empty() {
return bits;
}
let mut is_digit = input.chars().next().unwrap().is_digit(10);
let mut start = 0;
for (i, c) in input.char_indices() {
let this_is_digit = c.is_digit(10);
if is_digit != this_is_digit {
bits.push(input[start..i].to_string());
is_digit = this_is_digit;
start = i;
}
}
bits.push(input[start..].to_string());
bits
}
This form also allows for doing this with much fewer allocations (that is, the Strings are not required), because each returned value is just a slice into the input, and we can use lifetimes to state this:
pub fn split<'a>(input: &'a str) -> Vec<&'a str> {
let mut bits = vec![];
if input.is_empty() {
return bits;
}
let mut is_digit = input.chars().next().unwrap().is_digit(10);
let mut start = 0;
for (i, c) in input.char_indices() {
let this_is_digit = c.is_digit(10);
if is_digit != this_is_digit {
bits.push(&input[start..i]);
is_digit = this_is_digit;
start = i;
}
}
bits.push(&input[start..]);
bits
}
All that changed was the type signature, removing the Vec<String> type hint and the .to_string calls.
One could even write an iterator like this, to avoid having to allocate the Vec. Something like fn split<'a>(input: &'a str) -> Splits<'a> { /* construct a Splits */ } where Splits is a struct that implements Iterator<&'a str>.
take_while takes self by value: it consumes the iterator. Before Rust 1.0 it also was unfortunately able to be implicitly copied, leading to the surprising behaviour that you are observing.
You cannot use take_while for what you are wanting for these reasons. You will need to manually unroll your take_while invocations.
Here is one of many possible ways of dealing with this:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
let seeking_digits = match iter.peek() {
None => return bits,
Some(c) => c.is_digit(10),
};
if seeking_digits {
bits.push(take_while(&mut iter, |c| c.is_digit(10)));
} else {
bits.push(take_while(&mut iter, |c| !c.is_digit(10)));
}
}
}
fn take_while<I, F>(iter: &mut std::iter::Peekable<I>, predicate: F) -> String
where
I: Iterator<Item = char>,
F: Fn(&char) -> bool,
{
let mut out = String::new();
loop {
match iter.peek() {
Some(c) if predicate(c) => out.push(*c),
_ => return out,
}
let _ = iter.next();
}
}
fn main() {
println!("{:?}", split("test123test"));
}
This yields a solution with two levels of looping; another valid approach would be to model it as a state machine one level deep only. Ask if you aren’t sure what I mean and I’ll demonstrate.