Join iterator of &str [duplicate] - rust

This question already has answers here:
What's an idiomatic way to print an iterator separated by spaces in Rust?
(4 answers)
Closed 3 years ago.
How do I convert an Iterator<&str> to a String, interspersed with a constant string such as "\n"?
For instance, given:
let xs = vec!["first", "second", "third"];
let it = xs.iter();
One may produce a string s by collecting into a Vec<&str> and joining the result:
let s = it
.map(|&x| x)
.collect::<Vec<&str>>()
.join("\n");
However, this unnecessarily allocates memory for a Vec<&str>.
Is there a more direct method?

You could use the itertools crate for that. I use the intersperse helper in the example, it is pretty much the join equivalent for iterators.
cloned() is needed to convert &&str items to &str items, it is not doing any allocations. It can be eventually replaced by copied() when rust#1.36 gets a stable release.
use itertools::Itertools; // 0.8.0
fn main() {
let words = ["alpha", "beta", "gamma"];
let merged: String = words.iter().cloned().intersperse(", ").collect();
assert_eq!(merged, "alpha, beta, gamma");
}
Playground

You can do it by using fold function of the iterator easily:
let s = it.fold(String::new(), |a, b| a + b + "\n");
The Full Code will be like following:
fn main() {
let xs = vec!["first", "second", "third"];
let it = xs.into_iter();
// let s = it.collect::<Vec<&str>>().join("\n");
let s = it.fold(String::new(), |a, b| a + b + "\n");
let s = s.trim_end();
println!("{:?}", s);
}
Playground
EDIT: After the comment of Sebastian Redl I have checked the performance cost of the fold usage and created a benchmark test on playground.
You can see that fold usage takes significantly more time for the many iterative approaches.
Did not check the allocated memory usage though.

there's relevant example in rust documentation: here.
let words = ["alpha", "beta", "gamma"];
// chars() returns an iterator
let merged: String = words.iter()
.flat_map(|s| s.chars())
.collect();
assert_eq!(merged, "alphabetagamma");
You can also use Extend trait:
fn f<'a, I: Iterator<Item=&'a str>>(data: I) -> String {
let mut ret = String::new();
ret.extend(data);
ret
}

Related

Rust string comparison same speed as Python . Want to parallelize the program

I am new to rust. I want to write a function which later can be imported into Python as a module using the pyo3 crate.
Below is the Python implementation of the function I want to implement in Rust:
def pcompare(a, b):
letters = []
for i, letter in enumerate(a):
if letter != b[i]:
letters.append(f'{letter}{i + 1}{b[i]}')
return letters
The first Rust implemention I wrote looks like this:
use pyo3::prelude::*;
#[pyfunction]
fn compare_strings_to_vec(a: &str, b: &str) -> PyResult<Vec<String>> {
if a.len() != b.len() {
panic!(
"Reads are not the same length!
First string is length {} and second string is length {}.",
a.len(), b.len());
}
let a_vec: Vec<char> = a.chars().collect();
let b_vec: Vec<char> = b.chars().collect();
let mut mismatched_chars = Vec::new();
for (mut index,(i,j)) in a_vec.iter().zip(b_vec.iter()).enumerate() {
if i != j {
index += 1;
let mutation = format!("{i}{index}{j}");
mismatched_chars.push(mutation);
}
}
Ok(mismatched_chars)
}
#[pymodule]
fn compare_strings(_py: Python<'_>, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(compare_strings_to_vec, m)?)?;
Ok(())
}
Which I builded in --release mode. The module could be imported to Python, but the performance was quite similar to the performance of the Python implementation.
My first question is: Why is the Python and Rust function similar in speed?
Now I am working on a parallelization implementation in Rust. When just printing the result variable, the function works:
use rayon::prelude::*;
fn main() {
let a: Vec<char> = String::from("aaaa").chars().collect();
let b: Vec<char> = String::from("aaab").chars().collect();
let length = a.len();
let index: Vec<_> = (1..=length).collect();
let mut mismatched_chars: Vec<String> = Vec::new();
(a, index, b).into_par_iter().for_each(|(x, i, y)| {
if x != y {
let mutation = format!("{}{}{}", x, i, y).to_string();
println!("{mutation}");
//mismatched_chars.push(mutation);
}
});
}
However, when I try to push the mutation variable to the mismatched_charsvector:
use rayon::prelude::*;
fn main() {
let a: Vec<char> = String::from("aaaa").chars().collect();
let b: Vec<char> = String::from("aaab").chars().collect();
let length = a.len();
let index: Vec<_> = (1..=length).collect();
let mut mismatched_chars: Vec<String> = Vec::new();
(a, index, b).into_par_iter().for_each(|(x, i, y)| {
if x != y {
let mutation = format!("{}{}{}", x, i, y).to_string();
//println!("{mutation}");
mismatched_chars.push(mutation);
}
});
}
I get the following error:
error[E0596]: cannot borrow `mismatched_chars` as mutable, as it is a captured variable in a `Fn` closure
--> src/main.rs:16:13
|
16 | mismatched_chars.push(mutation);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cannot borrow as mutable
For more information about this error, try `rustc --explain E0596`.
error: could not compile `testing_compare_strings` due to previous error
I tried A LOT of different things. When I do:
use rayon::prelude::*;
fn main() {
let a: Vec<char> = String::from("aaaa").chars().collect();
let b: Vec<char> = String::from("aaab").chars().collect();
let length = a.len();
let index: Vec<_> = (1..=length).collect();
let mut mismatched_chars: Vec<&str> = Vec::new();
(a, index, b).into_par_iter().for_each(|(x, i, y)| {
if x != y {
let mutation = format!("{}{}{}", x, i, y).to_string();
mismatched_chars.push(&mutation);
}
});
}
The error becomes:
error[E0596]: cannot borrow `mismatched_chars` as mutable, as it is a captured variable in a `Fn` closure
--> src/main.rs:16:13
|
16 | mismatched_chars.push(&mutation);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cannot borrow as mutable
error[E0597]: `mutation` does not live long enough
--> src/main.rs:16:35
|
10 | let mut mismatched_chars: Vec<&str> = Vec::new();
| -------------------- lifetime `'1` appears in the type of `mismatched_chars`
...
16 | mismatched_chars.push(&mutation);
| ----------------------^^^^^^^^^-
| | |
| | borrowed value does not live long enough
| argument requires that `mutation` is borrowed for `'1`
17 | }
| - `mutation` dropped here while still borrowed
I suspect that the solution is quite simple, but I cannot see it myself.
You have the right idea with what you are doing, but you will want to try to use an iterator chain with filter and map to remove or convert iterator items into different values. Rayon also provides a collect method similar to regular iterators to convert items into a type T: FromIterator (such as Vec<T>).
fn compare_strings_to_vec(a: &str, b: &str) -> Vec<String> {
// Same as with the if statement, but just a little shorter to write
// Plus, it will print out the two values it is comparing if it errors.
assert_eq!(a.len(), b.len(), "Reads are not the same length!");
// Zip the character iterators from a and b together
a.chars().zip(b.chars())
// Iterate with the index of each item
.enumerate()
// Rayon function which turns a regular iterator into a parallel one
.par_bridge()
// Filter out values where the characters are the same
.filter(|(_, (a, b))| a != b)
// Convert the remaining values into an error string
.map(|(index, (a, b))| {
format!("{}{}{}", a, index + 1, b)
})
// Turn the items of this iterator into a Vec (Or any other FromIterator type).
.collect()
}
Rust Playground
Optimizing for speed
On the other hand, if you want speed we need to approach this problem from a different direction. You may have noticed, but the rayon version is quite slow since the cost of spawning a thread and using concurrency structures is orders of magnitude more than just simply comparing the bytes in the original thread. In my benchmarks, I found that even with better workload distribution, additional threads were only helpful on my machine (64GB RAM, 16 cores) when the strings were at least 1-2 million bytes long. Given that you have stated they are typically ~30,000 bytes long I think using rayon (or really any other threading for comparisons of this size) will only slow down your code.
Using criterion for benchmarking, I eventually came to this implementation. It generally gets about 2.8156 µs per run on strings of 30,000 characters with 10 different bytes. For comparison, the code posted in the original question usually gets around 61.156 µs on my system under the same conditions so this should give a ~20x speedup. It can vary a bit, but it consistently got the best results in the benchmark. I'm guessing this should be fast enough to have this step no-longer be the bottleneck in your code.
This key focus of this implementation is to do the comparisons in batches. We can take advantage of the 128bit registers on most CPUs to compare the input in 16 byte batches. Upon an inequality being found, the 16 byte section it covers is re-scanned for the exact position of the discrepancy. This gives a decent boost to performance. I initially thought that a usize would work better, but it seems that was not the case. I also attempted to use the portable_simd nightly feature to write a simd version of this code, but I was unable to match the speed of this code. I suspect this was either due to missed optimizations or a lack of experience to effectively use simd on my part.
I was worried about drops in speed due to alignment of chunks not being enforced for u128 values, but it seems to mostly be a non-issue. First of all, it is generally quite difficult to find allocators which are willing to allocate to an address which is not a multiple of the system word size. Of course, this is due to practicality rather than any actual requirement. When I manually gave it unaligned slices (unaligned for u128s), it is not significantly effected. This is why I do not attempt to enforce that the start index of the slice be aligned to align_of::<u128>().
fn compare_strings_to_vec(a: &str, b: &str) -> Vec<String> {
let a_bytes = a.as_bytes();
let b_bytes = b.as_bytes();
let remainder = a_bytes.len() % size_of::<u128>();
// Strongly suggest to the compiler we are iterating though u128
a_bytes
.chunks_exact(size_of::<u128>())
.zip(b_bytes.chunks_exact(size_of::<u128>()))
.enumerate()
.filter(|(_, (a, b))| {
let a_block: &[u8; 16] = (*a).try_into().unwrap();
let b_block: &[u8; 16] = (*b).try_into().unwrap();
u128::from_ne_bytes(*a_block) != u128::from_ne_bytes(*b_block)
})
.flat_map(|(word_index, (a, b))| {
fast_path(a, b).map(move |x| word_index * size_of::<u128>() + x)
})
.chain(
fast_path(
&a_bytes[a_bytes.len() - remainder..],
&b_bytes[b_bytes.len() - remainder..],
)
.map(|x| a_bytes.len() - remainder + x),
)
.map(|index| {
format!(
"{}{}{}",
char::from(a_bytes[index]),
index + 1,
char::from(b_bytes[index])
)
})
.collect()
}
/// Very similar to regular route, but with nothing fancy, just get the indices of the overlays
#[inline(always)]
fn fast_path<'a>(a: &'a [u8], b: &'a [u8]) -> impl 'a + Iterator<Item = usize> {
a.iter()
.zip(b.iter())
.enumerate()
.filter_map(|(x, (a, b))| (a != b).then_some(x))
}
You cannot directly access the field mismatched_chars in a multithreading environment.
You can use Arc<RwLock> to access the field in multithreading.
use rayon::prelude::*;
use std::sync::{Arc, RwLock};
fn main() {
let a: Vec<char> = String::from("aaaa").chars().collect();
let b: Vec<char> = String::from("aaab").chars().collect();
let length = a.len();
let index: Vec<_> = (1..=length).collect();
let mismatched_chars: Arc<RwLock<Vec<String>>> = Arc::new(RwLock::new(Vec::new()));
(a, index, b).into_par_iter().for_each(|(x, i, y)| {
if x != y {
let mutation = format!("{}{}{}", x, i, y);
mismatched_chars
.write()
.expect("could not acquire write lock")
.push(mutation);
}
});
for mismatch in mismatched_chars
.read()
.expect("could not acquire read lock")
.iter()
{
eprintln!("{}", mismatch);
}
}

How can I join a Vec<> of i32 numbers into a String? [duplicate]

This question already has answers here:
What's an idiomatic way to print an iterator separated by spaces in Rust?
(4 answers)
Closed 2 years ago.
I want to join a list of numbers into a String. I have the following code:
let ids: Vec<i32> = Vec::new();
ids.push(1);
ids.push(2);
ids.push(3);
ids.push(4);
let joined = ids.join(",");
print!("{}", joined);
However, I get the following compilation error:
error[E0599]: no method named `join` found for struct `std::vec::Vec<i32>` in the current scope
--> src\data\words.rs:95:22
|
95 | let joined = ids.join(",");
| ^^^^ method not found in `std::vec::Vec<i32>`
|
= note: the method `join` exists but the following trait bounds were not satisfied:
`<[i32] as std::slice::Join<_>>::Output = _`
I'm a bit unclear as to what to do. I understand the implementation of traits, but whatever trait it's expecting, I would expect to be natively implemented for i32. I would expect joining integers into a string to be more trivial than this. Should I cast all of them to Strings first?
EDIT: It's not the same as the linked question, because here I am specifically asking about numbers not being directly "joinable", and the reason for the trait to not be implemented by the number type. I looked fairly hard for something in this direction and found nothing, which is why I asked this question.
Also, it's more likely that someone will search specifically for a question phrased like this instead of the more general "idiomatic printing of iterator values".
I would do
let ids = vec!(1,2,3,4);
let joined: String = ids.iter().map( |&id| id.to_string() + ",").collect();
print!("{}", joined);
Generally when you have a collection of one type in Rust, and want to turn it to another type, you call .iter().map(...) on it. The advantage of this method is you keep your ids as integers which is nice, have no mutable state, and don't need an extra library. Also if you want a more complex transformation than just a casting, this is a very good method. The disadvantage is you have a trailing comma in joined. playground link
If you don't want to explicitly convert into string, then you can use Itertools::join method (this is an external crate though)
PlayGround
Relevant code:
use itertools::Itertools;
let mut ids: Vec<i32> = ...;
let joined = Itertools::join(&mut ids.iter(), ",");
print!("{}", joined);
Frxstrem suggestion:
let joined = ids.iter().join(".");
Using the [T]::join() method requires that [T] implements the Join trait. The Join trait is only implemented for [T] where T implements Borrow<str> (like String or &str) or Borrow<[U]> (like &[U] or Vec<U>). In other words, you can only join a vector of strings or a vector of slices/vectors.
In general, Rust requires you to be very explicit about type conversion, so in many cases you shouldn't expect the language to e.g. automatically convert an integer to a string for you.
To solve your problem, you need to explicitly convert your integers into strings before pushing them into your vector:
let mut ids: Vec<String> = Vec::new();
ids.push(1.to_string());
ids.push(2.to_string());
ids.push(3.to_string());
ids.push(4.to_string());
let joined = ids.join(",");
print!("{}", joined);
Playground example
If you want a generic solution:
fn join<I, T>(it: I, sep: &str) -> String
where
I: IntoIterator<Item = T>,
T: std::fmt::Display,
{
use std::fmt::Write;
let mut it = it.into_iter();
let first = it.next().map(|f| f.to_string()).unwrap_or_default();
it.fold(first, |mut acc, s| {
write!(acc, "{}{}", sep, s).expect("Writing in a String shouldn't fail");
acc
})
}
fn main() {
assert_eq!(join(Vec::<i32>::new(), ", "), "");
assert_eq!(join(vec![1], ", "), "1");
assert_eq!(join(vec![1, 2, 3, 4], ", "), "1, 2, 3, 4");
}
Maybe this implement
If you prefer that style, you can use an extension method:
trait JoinIterator {
fn join(self, sep: &str) -> String;
}
impl<I, T> JoinIterator for I
where
I: IntoIterator<Item = T>,
T: std::fmt::Display,
{
fn join(self, sep: &str) -> String {
use std::fmt::Write;
let mut it = self.into_iter();
let first = it.next().map(|f| f.to_string()).unwrap_or_default();
it.fold(first, |mut acc, s| {
write!(acc, "{}{}", sep, s).expect("Writing in a String shouldn't fail");
acc
})
}
}
fn main() {
assert_eq!(Vec::<i32>::new().join(", "), "");
assert_eq!(vec![1].join(", "), "1");
assert_eq!(vec![1, 2, 3, 4].join(", "), "1, 2, 3, 4");
}

pretty printing a Vec<char> with a separator [duplicate]

This question already has answers here:
What's an idiomatic way to print an iterator separated by spaces in Rust?
(4 answers)
Closed 3 years ago.
I am trying to apply join (or something similar) to a Vec<char> in order to pretty print it.
What I came up with so far is this (and this does what I want):
let vec: Vec<char> = "abcdef".chars().collect();
let sep = "-";
let vec_str: String = vec
.iter().map(|c| c.to_string()).collect::<Vec<String>>().join(sep);
println!("{}", vec_str); // a-b-c-d-e-f
That seems overly complex (and allocates a Vec<String> that is not really needed).
I also tried to get std::slice::join to work by explicitly creating a slice:
let vec_str: String = (&vec[..]).join('-');
but here the compiler complains:
method not found in &[char]
Is there a simpler way to create a printable String from a Vec<char> with a separator between the elements?
You can use intersperse from the itertools crate.
use itertools::Itertools; // 0.8.2
fn main() {
let vec : Vec<_> = "abcdef".chars().collect();
let sep = '-';
let sep_str : String = vec.iter().intersperse(&sep).collect();
println!("{}", sep_str);
}
Playground

How do I sort a vector of Strings alphabetically? [duplicate]

This question already has answers here:
How do I sort an array?
(1 answer)
Case-insensitive string matching in Rust
(2 answers)
Closed 3 years ago.
I want to order a Strings vector alphabetically
fn main() {
let mut vec = Vec::new();
vec.push("richard");
vec.push("charles");
vec.push("Peter");
println!("{:?}", vec);
}
I tried println!("{:?}", vec.sort()); and println!("{}", vec.sort_by(|a,b| b.cmp(a))); and both response is ().
And I expect the following result
["charles", "Peter", "richard"]
sort function is defined on slices (and on Vecs, as they can Deref to slices) as pub fn sort(&mut self), i.e. it performs sorting in place, mutating the existing piece of data. So to achieve what you're trying to do, you can try the following:
fn main() {
let mut vec = Vec::new();
vec.push("richard");
vec.push("charles");
vec.push("Peter");
vec.sort();
println!("{:?}", vec);
}
Unhappily, this isn't quite the thing you want, since this will sort "Peter" before "charles" - the default comparator of strings is case-sensitive (in fact, it's even locale-agnostic, since it compares basing on Unicode code points). So, if you want to perform case-insensitive sorting, here's the modification:
fn main() {
let mut vec = Vec::new();
vec.push("richard");
vec.push("charles");
vec.push("Peter");
vec.sort_by(|a, b| a.to_lowercase().cmp(&b.to_lowercase()));
println!("{:?}", vec);
}

Rust String concatenation [duplicate]

This question already has answers here:
How do I concatenate strings?
(9 answers)
Closed 7 years ago.
I started programming with Rust this week and I am having a lot of problems understanding how Strings work.
Right now, I am trying to do a simple program that prints a list of players appending their order(for learning purposes only).
let res : String = pl.name.chars().enumerate().fold(String::new(),|res,(i,ch)| -> String {
res+=format!("{} {}\n",i.to_string(),ch.to_string());
});
println!("{}", res);
This is my idea, I know I could just use a for loop but the objective is to understand the different Iterator functions.
So, my problem is that the String concatenation does not work.
Compiling prueba2 v0.1.0 (file:///home/pancho111203/projects/prueba2)
src/main.rs:27:13: 27:16 error: binary assignment operation `+=` cannot be applied to types `collections::string::String` and `collections::string::String` [E0368]
src/main.rs:27 res+=format!("{} {}\n",i.to_string(),ch.to_string());
^~~
error: aborting due to previous error
Could not compile `prueba2`.
I tried using &str but it is not possible to create them from i and ch values.
First, in Rust x += y is not overloadable, so += operator won't work for anything except basic numeric types. However, even if it worked for strings, it would be equivalent to x = x + y, like in the following:
res = res + format!("{} {}\n",i.to_string(),ch.to_string())
Even if this were allowed by the type system (it is not because String + String "overload" is not defined in Rust), this is still not how fold() operates. You want this:
res + &format!("{} {}\n", i, ch)
or, as a compilable example,
fn main(){
let x = "hello";
let res : String = x.chars().enumerate().fold(String::new(), |res, (i, ch)| {
res + &format!("{} {}\n", i, ch)
});
println!("{}", res);
}
When you perform a fold, you don't reassign the accumulator variable, you need to return the new value for it to be used on the next iteration, and this is exactly what res + format!(...) do.
Note that I've removed to_string() invocations because they are completely unnecessary - in fact, x.to_string() is equivalent to format!("{}", x), so you only perform unnecessary allocations here.
Additionally, I'm taking format!() result by reference: &format!(...). This is necessary because + "overload" for strings is defined for String + &str pair of types, so you need to convert from String (the result of format!()) to &str, and this can be done simply by using & here (because of deref coercion).
In fact, the following would be more efficient:
use std::fmt::Write;
fn main(){
let x = "hello";
let res: String = x.chars().enumerate().fold(String::new(), |mut res, (i, ch)| {
write!(&mut res, "{} {}\n", i, ch).unwrap();
res
});
println!("{}", res);
}
which could be written more idiomatically as
use std::fmt::Write;
fn main(){
let x = "hello";
let mut res = String::new();
for (i, ch) in x.chars().enumerate() {
write!(&mut res, "{} {}\n", i, ch).unwrap();
}
println!("{}", res);
}
(try it on playpen)
This way no extra allocations (i.e. new strings from format!()) are created. We just fill the string with the new data, very similar, for example, to how StringBuilder in Java works. use std::fmt::Write here is needed to allow calling write!() on &mut String.
I would also suggest reading the chapter on strings in the official Rust book (and the book as a whole if you're new to Rust). It explains what String and &str are, how they are different and how to work with them efficiently.

Resources