Conditionally sort a Vec in Rust

Conditionally sort a Vec in Rust - rust

Let's say I want to sort a Vec of non-Clone items - but only maybe (this is a boiled down example of an issue in my code).
My attempt would be something like:
fn maybe_sort<T>(x: Vec<T>) -> Vec<T>
where
T: std::cmp::Ord,
{
// First, I need a copy of the vector - but only the vector, not the items inside
let mut copied = x.iter().collect::<Vec<_>>();
copied.sort();
// In my actual code the line below depends on the sorted vec
if rand::random() {
return copied.into_iter().map(|x| *x).collect::<Vec<_>>();
} else {
return x;
}
}
Alas the borrow checker isn't happy. I have a shared reference to each item in the Vec, and although I am not ever returning 2 references to the same item, Rust can't tell.
Is there a way to do this without unsafe? (and if not, what's the cleanest way to do it with unsafe.

You can .enumerate() the values to keep their original index. You can sort this based on its value T and decide whether to return the sorted version, or reverse the sort by sorting by original index.
fn maybe_sort<T: Ord>(x: Vec<T>) -> Vec<T> {
let mut items: Vec<_> = x.into_iter().enumerate().collect();
items.sort_by(|(_, a), (_, b)| a.cmp(b));
if rand::random() {
// return items in current order
}
else {
// undo the sort
items.sort_by_key(|(index, _)| *index);
}
items.into_iter().map(|(_, value)| value).collect()
}

If T implements Default, you can do it with a single sort and without unsafe like this:
fn maybe_sort<T: Ord + Default> (mut x: Vec<T>) -> Vec<T> {
let mut idx = (0..x.len()).collect::<Vec<_>>();
idx.sort_by_key (|&i| &x[i]);
if rand::random() {
return x;
} else {
let mut r = Vec::new();
r.resize_with (x.len(), Default::default);
for (i, v) in idx.into_iter().zip (x.drain(..)) {
r[i] = v;
}
return r;
}
}
Playground
If T does not implement Default, the same thing can be done with MaybeUninit:
use std::mem::{self, MaybeUninit};
fn maybe_sort<T: Ord> (mut x: Vec<T>) -> Vec<T> {
let mut idx = (0..x.len()).collect::<Vec<_>>();
idx.sort_by_key (|&i| &x[i]);
if rand::random() {
return x;
} else {
let mut r = Vec::new();
r.resize_with (x.len(), || unsafe { MaybeUninit::uninit().assume_init() });
for (i, v) in idx.into_iter().zip (x.drain(..)) {
r[i] = MaybeUninit::new (v);
}
return unsafe { mem::transmute::<_, Vec<T>> (r) };
}
}
Playground
Finally, here's a safe solution which doesn't require T to implement Default, but allocates an extra buffer (there is theoretically a way to reorder the indices in place, but I'll leave it as an exercise to the reader ☺):
fn maybe_sort<T: Ord> (mut x: Vec<T>) -> Vec<T> {
let mut idx = (0..x.len()).collect::<Vec<_>>();
idx.sort_by_key (|&i| &x[i]);
if rand::random() {
let mut rev = vec![0; x.len()];
for (i, &j) in idx.iter().enumerate() {
rev[j] = i;
}
for i in 0..x.len() {
while rev[i] != i {
let j = rev[i];
x.swap (j, i);
rev.swap (j, i);
}
}
}
x
}
Playground

Related

Peek inmplementation for linked list in rust

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=693594655ea355b40e2175542c653879
I want peek() to remove the last element of the list, returning data. What am I missing?
type Link<T> = Option<Box<Node<T>>>;
struct Node<T> {
pub data: T,
pub next: Link<T>,
}
struct List<T> {
pub head: Link<T>,
}
impl<T> List<T> {
fn peek(&mut self) -> Option<T> {
let mut node = &self.head;
while let Some(cur_node) = &mut node {
if cur_node.next.is_some() {
node = &cur_node.next;
continue;
}
}
let last = node.unwrap();
let last = last.data;
return Some(last);
}
}
#[test]
fn peek_test() {
let mut q = List::new();
q.push(1);
q.push(2);
q.push(3);
assert_eq!(q.empty(), false);
assert_eq!(q.peek().unwrap(), 1);
assert_eq!(q.peek().unwrap(), 2);
assert_eq!(q.peek().unwrap(), 3);
assert_eq!(q.empty(), true);
}
To save the head, I need to access the elements by reference, but the puzzle does not fit in my head. I looked at "too-many-lists", but the value is simply returned by reference, and I would like to remove the tail element.

To make this work you have to switch from taking a shared reference (&) to a mutable one.
This results in borrow checker errors with your code wihch is why I had to change the while let loop into one
which checks if the next element is Some and only then borrows node's content mutably and advances it.
At last I Option::take that last element and return it's data. I use Option::map to avoid having to unwrap which would panic for empty lists anyways if you wanted to keep your variant you should replace unwrap with the try operator ?.
So in short you can implement a pop_back like this:
pub fn pop_back(&mut self) -> Option<T> {
let mut node = &mut self.head;
while node.as_ref().map(|n| n.next.is_some()).unwrap_or_default() {
node = &mut node.as_mut().unwrap().next;
}
node.take().map(|last| last.data)
}

I suggest something like below, Just because I spent time on it .-)
fn peek(&mut self) -> Option<T> {
match &self.head {
None => return None,
Some(v) =>
if v.next.is_none() {
let last = self.head.take();
let last = last.unwrap().data;
return Some(last);
}
}
let mut current = &mut self.head;
loop {
match current {
None => return None,
Some(node) if node.next.is_some() && match &node.next { None => false, Some(v) => v.next.is_none()} => {
let last = node.next.take();
let last = last.unwrap().data;
return Some(last);
},
Some(node) => {
current = &mut node.next;
}
}
}
}

Can I perform binary tree search with the standard library without wrapping the float type and abusing the BTreeMap?

I would like to find the first element which is greater than a limit from an ordered collection. While iteration over it is always an option, I need a faster one. Currently, I came up with a solution like this but it feels a little hacky:
use std::cmp::Ordering;
use std::collections::BTreeMap;
use std::ops::Bound::{Included, Unbounded};
#[derive(Debug)]
struct FloatWrapper(f32);
impl Eq for FloatWrapper {}
impl PartialEq for FloatWrapper {
fn eq(&self, other: &Self) -> bool {
(self.0 - other.0).abs() < 1.17549435e-36f32
}
}
impl Ord for FloatWrapper {
fn cmp(&self, other: &Self) -> Ordering {
if (self.0 - other.0).abs() < 1.17549435e-36f32 {
Ordering::Equal
} else if self.0 - other.0 > 0.0 {
Ordering::Greater
} else if self.0 - other.0 < 0.0 {
Ordering::Less
} else {
Ordering::Equal
}
}
}
impl PartialOrd for FloatWrapper {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
Some(self.cmp(other))
}
}
The wrapper around the float is not nice even that I am sure that there will be no NaNs
The Range is also unnecessary since I want a single element.
Is there a better way of achieving a similar result using only Rust's standard library? I know that there are plenty of tree implementations but it feels like overkill.
After the suggestions in the answer to use the iterator I did a little benchmark with the following code:
fn main() {
let measure = vec![
10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190,
200,
];
let mut measured_binary = Vec::new();
let mut measured_iter = Vec::new();
let mut measured_vec = Vec::new();
for size in measure {
let mut ww = BTreeMap::new();
let mut what_found = Vec::new();
for _ in 0..size {
let now: f32 = thread_rng().gen_range(0.0, 1.0);
ww.insert(FloatWrapper(now), now);
}
let what_to_search: Vec<FloatWrapper> = (0..10000)
.map(|_| thread_rng().gen_range(0.0, 0.8))
.map(|x| FloatWrapper(x))
.collect();
let mut rez = 0;
for current in &what_to_search {
let now = Instant::now();
let m = find_one(&ww, current);
rez += now.elapsed().as_nanos();
what_found.push(m);
}
measured_binary.push(rez);
rez = 0;
for current in &what_to_search {
let now = Instant::now();
let m = find_two(&ww, current);
rez += now.elapsed().as_nanos();
what_found.push(m);
}
measured_iter.push(rez);
let ww_in_vec: Vec<(FloatWrapper, f32)> =
ww.iter().map(|(&key, &value)| (key, value)).collect();
rez = 0;
for current in &what_to_search {
let now = Instant::now();
let m = find_three(&ww_in_vec, current);
rez += now.elapsed().as_nanos();
what_found.push(m);
}
measured_vec.push(rez);
println!("{:?}", what_found);
}
println!("binary :{:?}", measured_binary);
println!("iter_map :{:?}", measured_iter);
println!("iter_vec :{:?}", measured_vec);
}
fn find_one(from_what: &BTreeMap<FloatWrapper, f32>, what: &FloatWrapper) -> f32 {
let v: Vec<f32> = from_what
.range((Included(what), (Unbounded)))
.take(1)
.map(|(_, &v)| v)
.collect();
*v.get(0).expect("we are in truble")
}
fn find_two(from_what: &BTreeMap<FloatWrapper, f32>, what: &FloatWrapper) -> f32 {
from_what
.iter()
.skip_while(|(i, _)| *i < what) // Skipping all elements before it
.take(1) // Reducing the iterator to 1 element
.map(|(_, &v)| v) // Getting its value, dereferenced
.next()
.expect("we are in truble") // Our
}
fn find_three(from_what: &Vec<(FloatWrapper, f32)>, what: &FloatWrapper) -> f32 {
*from_what
.iter()
.skip_while(|(i, _)| i < what) // Skipping all elements before it
.take(1) // Reducing the iterator to 1 element
.map(|(_, v)| v) // Getting its value, dereferenced
.next()
.expect("we are in truble") // Our
}
The key takeaway for me is that it is worth to use the binary search after ~50 elements. In my case with 30000 elements means 200x speedup (at least based on this microbenchmark).

You said you wanted a std-only solution, but this is a common enough problem, so here's a solution using the crate ordered-float:
Cargo.toml
[dependencies]
ordered-float = "1.0"
main.rs
use ordered_float::OrderedFloat; // 1.0.2
use std::collections::BTreeMap;
fn main() {
let mut ww = BTreeMap::new();
ww.insert(OrderedFloat(1.0), "one");
ww.insert(OrderedFloat(2.0), "two");
ww.insert(OrderedFloat(3.0), "three");
ww.insert(OrderedFloat(4.0), "three");
let rez = ww.range(OrderedFloat(1.5)..).next().map(|(_, &v)| v);
println!("{:?}", rez);
}
prints
Some("two")
Now, isn't that nice and clean? If you want a less verbose syntax, I suggest wrapping the BTreeMap itself, so you can give it appropriately named methods that make sense for your application.
NaN behavior
Be aware that OrderedFloat may not behave the way you expect in the presence of NaNs:
NaN is sorted as greater than all other values and equal to itself, in contradiction with the IEEE standard.

Now that we've gone over and clarified the requirements a bit, there's a couple of bad news for you:
You're not getting away from the requirement to have a wrapping type. As I'm sure you've discovered, this is because no floating-point type implements Ord
You're also not getting away from a combinator of some sort
First, we're going to clear up your impl, as they both have shortfalls described in the comments. In the future, it may make sense to use the wrapper traits in eq-float, as they already implement all this. The implementations at fault are PartialEq and Ord, and they both break down on a few points. The new implementations:
impl Ord for FloatWrapper {
fn cmp(&self, other: &Self) -> Ordering {
self.0.partial_cmp(&other.0).unwrap_or_else(|| {
if self.0.is_nan() && !other.0.is_nan() {
Ordering::Less
} else if !self.0.is_nan() && other.0.is_nan() {
Ordering::Greater
} else {
Ordering::Equal
}
})
}
}
impl PartialEq for FloatWrapper {
fn eq(&self, other: &Self) -> bool {
if self.0.is_nan() && other.0.is_nan() {
true
} else {
self.0 == other.0
}
}
}
Nothing surprising, we're just abusing the fact that f32 implements PartialOrd for Ord and surfacing all other implementations on FloatWrapper itself.
Now, for the combinator. Your current combinator will force a range of elements to be stored temporarily in memory, to then discard one. We can do better by abusing the fact that iter() is a sorted iterator. So, we can skip while we search, and then take the first:
let mut first_element = ww.iter()
.skip_while(|(i, _)| *i < &FloatWrapper::new(1.5)) // Skipping all elements before it
.take(1) // Reducing the iterator to 1 element
.map(|(_, &v)| v) // Getting its value, dereferenced
.next(); // Our result
This yields a 10% speedup in low-element-count situations over your first implementation.

How to call count on an iterator and still use the iterator's items?

parts.count() leads to ownership transfer, so parts can't be used any more.
fn split(slice: &[u8], splitter: &[u8]) -> Option<Vec<u8>> {
let mut parts = slice.split(|b| splitter.contains(b));
let len = parts.count(); //ownership transfer
if len >= 2 {
Some(parts.nth(1).unwrap().to_vec())
} else if len >= 1 {
Some(parts.nth(0).unwrap().to_vec())
} else {
None
}
}
fn main() {
split(&[1u8, 2u8, 3u8], &[2u8]);
}

It is also possible to avoid unnecessary allocations of Vec if you only need to use the first or the second part:
fn split<'a>(slice: &'a [u8], splitter: &[u8]) -> Option<&'a [u8]> {
let mut parts = slice.split(|b| splitter.contains(b)).fuse();
let first = parts.next();
let second = parts.next();
second.or(first)
}
Then if you actually need a Vec you can map on the result:
split(&[1u8, 2u8, 3u8], &[2u8]).map(|s| s.to_vec())
Of course, if you want, you can move to_vec() conversion to the function:
second.or(first).map(|s| s.to_vec())
I'm calling fuse() on the iterator in order to guarantee that it will always return None after the first None is returned (which is not guaranteed by the general iterator protocol).

The other answers are good suggestions to answer your problem, but I'd like to point out another general solution: create multiple iterators:
fn split(slice: &[u8], splitter: &[u8]) -> Option<Vec<u8>> {
let mut parts = slice.split(|b| splitter.contains(b));
let parts2 = slice.split(|b| splitter.contains(b));
let len = parts2.count();
if len >= 2 {
Some(parts.nth(1).unwrap().to_vec())
} else if len >= 1 {
Some(parts.nth(0).unwrap().to_vec())
} else {
None
}
}
fn main() {
split(&[1u8, 2u8, 3u8], &[2u8]);
}
You can usually create multiple read-only iterators. Some iterators even implement Clone, so you could just say iter.clone().count(). Unfortunately, Split isn't one of them because it owns the passed-in closure.

One thing you can do is collect the results of the split in a new owned Vec, like this:
fn split(slice: &[u8], splitter: &[u8]) -> Option<Vec<u8>> {
let parts: Vec<&[u8]> = slice.split(|b| splitter.contains(b)).collect();
let len = parts.len();
if len >= 2 {
Some(parts.iter().nth(1).unwrap().to_vec())
} else if len >= 1 {
Some(parts.iter().nth(0).unwrap().to_vec())
} else {
None
}
}

String join on strings in Vec in reverse order without a `collect`

I'm trying to join strings in a vector into a single string, in reverse from their order in the vector. The following works:
let v = vec!["a".to_string(), "b".to_string(), "c".to_string()];
v.iter().rev().map(|s| s.clone()).collect::<Vec<String>>().connect(".")
However, this ends up creating a temporary vector that I don't actually need. Is it possible to do this without a collect? I see that connect is a StrVector method. Is there nothing for raw iterators?

I believe this is the shortest you can get:
fn main() {
let v = vec!["a".to_string(), "b".to_string(), "c".to_string()];
let mut r = v.iter()
.rev()
.fold(String::new(), |r, c| r + c.as_str() + ".");
r.pop();
println!("{}", r);
}
The addition operation on String takes its left operand by value and pushes the second operand in-place, which is very nice - it does not cause any reallocations. You don't even need to clone() the contained strings.
I think, however, that the lack of concat()/connect() methods on iterators is a serious drawback. It bit me a lot too.

I don't know if they've heard our Stack Overflow prayers or what, but the itertools crate happens to have just the method you need - join.
With it, your example might be laid out as follows:
use itertools::Itertools;
let v = ["a", "b", "c"];
let connected = v.iter().rev().join(".");

Here's an iterator extension trait that I whipped up, just for you!
pub trait InterleaveExt: Iterator + Sized {
fn interleave(self, value: Self::Item) -> Interleave<Self> {
Interleave {
iter: self.peekable(),
value: value,
me_next: false,
}
}
}
impl<I: Iterator> InterleaveExt for I {}
pub struct Interleave<I>
where
I: Iterator,
{
iter: std::iter::Peekable<I>,
value: I::Item,
me_next: bool,
}
impl<I> Iterator for Interleave<I>
where
I: Iterator,
I::Item: Clone,
{
type Item = I::Item;
#[inline]
fn next(&mut self) -> Option<Self::Item> {
// Don't return a value if there's no next item
if let None = self.iter.peek() {
return None;
}
let next = if self.me_next {
Some(self.value.clone())
} else {
self.iter.next()
};
self.me_next = !self.me_next;
next
}
}
It can be called like so:
fn main() {
let a = &["a", "b", "c"];
let s: String = a.iter().cloned().rev().interleave(".").collect();
println!("{}", s);
let v = vec!["a".to_string(), "b".to_string(), "c".to_string()];
let s: String = v.iter().map(|s| s.as_str()).rev().interleave(".").collect();
println!("{}", s);
}
I've since learned that this iterator adapter already exists in itertools under the name intersperse — go use that instead!.
Cheating answer
You never said you needed the original vector after this, so we can reverse it in place and just use join...
let mut v = vec!["a".to_string(), "b".to_string(), "c".to_string()];
v.reverse();
println!("{}", v.join("."))

Using the same iterator multiple times in Rust

Editor's note: This code example is from a version of Rust prior to 1.0 when many iterators implemented Copy. Updated versions of this code produce a different errors, but the answers still contain valuable information.
I'm trying to write a function to split a string into clumps of letters and numbers; for example, "test123test" would turn into [ "test", "123", "test" ]. Here's my attempt so far:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
match iter.peek() {
None => return bits,
Some(c) => if c.is_digit() {
bits.push(iter.take_while(|c| c.is_digit()).collect());
} else {
bits.push(iter.take_while(|c| !c.is_digit()).collect());
}
}
}
return bits;
}
However, this doesn't work, looping forever. It seems that it is using a clone of iter each time I call take_while, starting from the same position over and over again. I would like it to use the same iter each time, advancing the same iterator over all the each_times. Is this possible?

As you identified, each take_while call is duplicating iter, since take_while takes self and the Peekable chars iterator is Copy. (Only true before Rust 1.0 — editor)
You want to be modifying the iterator each time, that is, for take_while to be operating on an &mut to your iterator. Which is exactly what the .by_ref adaptor is for:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
match iter.peek().map(|c| *c) {
None => return bits,
Some(c) => if c.is_digit(10) {
bits.push(iter.by_ref().take_while(|c| c.is_digit(10)).collect());
} else {
bits.push(iter.by_ref().take_while(|c| !c.is_digit(10)).collect());
},
}
}
}
fn main() {
println!("{:?}", split("123abc456def"))
}
Prints
["123", "bc", "56", "ef"]
However, I imagine this is not correct.
I would actually recommend writing this as a normal for loop, using the char_indices iterator:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
if input.is_empty() {
return bits;
}
let mut is_digit = input.chars().next().unwrap().is_digit(10);
let mut start = 0;
for (i, c) in input.char_indices() {
let this_is_digit = c.is_digit(10);
if is_digit != this_is_digit {
bits.push(input[start..i].to_string());
is_digit = this_is_digit;
start = i;
}
}
bits.push(input[start..].to_string());
bits
}
This form also allows for doing this with much fewer allocations (that is, the Strings are not required), because each returned value is just a slice into the input, and we can use lifetimes to state this:
pub fn split<'a>(input: &'a str) -> Vec<&'a str> {
let mut bits = vec![];
if input.is_empty() {
return bits;
}
let mut is_digit = input.chars().next().unwrap().is_digit(10);
let mut start = 0;
for (i, c) in input.char_indices() {
let this_is_digit = c.is_digit(10);
if is_digit != this_is_digit {
bits.push(&input[start..i]);
is_digit = this_is_digit;
start = i;
}
}
bits.push(&input[start..]);
bits
}
All that changed was the type signature, removing the Vec<String> type hint and the .to_string calls.
One could even write an iterator like this, to avoid having to allocate the Vec. Something like fn split<'a>(input: &'a str) -> Splits<'a> { /* construct a Splits */ } where Splits is a struct that implements Iterator<&'a str>.

take_while takes self by value: it consumes the iterator. Before Rust 1.0 it also was unfortunately able to be implicitly copied, leading to the surprising behaviour that you are observing.
You cannot use take_while for what you are wanting for these reasons. You will need to manually unroll your take_while invocations.
Here is one of many possible ways of dealing with this:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
let seeking_digits = match iter.peek() {
None => return bits,
Some(c) => c.is_digit(10),
};
if seeking_digits {
bits.push(take_while(&mut iter, |c| c.is_digit(10)));
} else {
bits.push(take_while(&mut iter, |c| !c.is_digit(10)));
}
}
}
fn take_while<I, F>(iter: &mut std::iter::Peekable<I>, predicate: F) -> String
where
I: Iterator<Item = char>,
F: Fn(&char) -> bool,
{
let mut out = String::new();
loop {
match iter.peek() {
Some(c) if predicate(c) => out.push(*c),
_ => return out,
}
let _ = iter.next();
}
}
fn main() {
println!("{:?}", split("test123test"));
}
This yields a solution with two levels of looping; another valid approach would be to model it as a state machine one level deep only. Ask if you aren’t sure what I mean and I’ll demonstrate.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Conditionally sort a Vec in Rust - rust

Related

Peek inmplementation for linked list in rust

Can I perform binary tree search with the standard library without wrapping the float type and abusing the BTreeMap?

How to call count on an iterator and still use the iterator's items?

String join on strings in Vec in reverse order without a `collect`

Using the same iterator multiple times in Rust

Categories

Resources