Split string only once in Rust

Split string only once in Rust - rust

I want to split a string by a separator only once and put it into a tuple. I tried doing
fn splitOnce(in_string: &str) -> (&str, &str) {
let mut splitter = in_string.split(':');
let first = splitter.next().unwrap();
let second = splitter.fold("".to_string(), |a, b| a + b);
(first, &second)
}
but I keep getting told that second does not live long enough. I guess it's saying that because splitter only exists inside the function block but I'm not really sure how to address that. How to I coerce second into existing beyond the function block? Or is there a better way to split a string only once?

You are looking for str::splitn:
fn split_once(in_string: &str) -> (&str, &str) {
let mut splitter = in_string.splitn(2, ':');
let first = splitter.next().unwrap();
let second = splitter.next().unwrap();
(first, second)
}
fn main() {
let (a, b) = split_once("hello:world:earth");
println!("{} --- {}", a, b)
}
Note that Rust uses snake_case.
I guess it's saying that because splitter only exists inside the function block
Nope, it's because you've created a String and are trying to return a reference to it; you cannot do that. second is what doesn't live long enough.
How to I coerce second into existing beyond the function block?
You don't. This is a fundamental aspect of Rust. If something needs to live for a certain mount of time, you just have to make it exist for that long. In this case, as in the linked question, you'd return the String:
fn split_once(in_string: &str) -> (&str, String) {
let mut splitter = in_string.split(':');
let first = splitter.next().unwrap();
let second = splitter.fold("".to_string(), |a, b| a + b);
(first, second)
}

str::split_once is now built-in.
Doc examples:
assert_eq!("cfg".split_once('='), None);
assert_eq!("cfg=".split_once('='), Some(("cfg", "")));
assert_eq!("cfg=foo".split_once('='), Some(("cfg", "foo")));
assert_eq!("cfg=foo=bar".split_once('='), Some(("cfg", "foo=bar")));

Related

Rust string comparison same speed as Python . Want to parallelize the program

I am new to rust. I want to write a function which later can be imported into Python as a module using the pyo3 crate.
Below is the Python implementation of the function I want to implement in Rust:
def pcompare(a, b):
letters = []
for i, letter in enumerate(a):
if letter != b[i]:
letters.append(f'{letter}{i + 1}{b[i]}')
return letters
The first Rust implemention I wrote looks like this:
use pyo3::prelude::*;
#[pyfunction]
fn compare_strings_to_vec(a: &str, b: &str) -> PyResult<Vec<String>> {
if a.len() != b.len() {
panic!(
"Reads are not the same length!
First string is length {} and second string is length {}.",
a.len(), b.len());
}
let a_vec: Vec<char> = a.chars().collect();
let b_vec: Vec<char> = b.chars().collect();
let mut mismatched_chars = Vec::new();
for (mut index,(i,j)) in a_vec.iter().zip(b_vec.iter()).enumerate() {
if i != j {
index += 1;
let mutation = format!("{i}{index}{j}");
mismatched_chars.push(mutation);
}
}
Ok(mismatched_chars)
}
#[pymodule]
fn compare_strings(_py: Python<'_>, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(compare_strings_to_vec, m)?)?;
Ok(())
}
Which I builded in --release mode. The module could be imported to Python, but the performance was quite similar to the performance of the Python implementation.
My first question is: Why is the Python and Rust function similar in speed?
Now I am working on a parallelization implementation in Rust. When just printing the result variable, the function works:
use rayon::prelude::*;
fn main() {
let a: Vec<char> = String::from("aaaa").chars().collect();
let b: Vec<char> = String::from("aaab").chars().collect();
let length = a.len();
let index: Vec<_> = (1..=length).collect();
let mut mismatched_chars: Vec<String> = Vec::new();
(a, index, b).into_par_iter().for_each(|(x, i, y)| {
if x != y {
let mutation = format!("{}{}{}", x, i, y).to_string();
println!("{mutation}");
//mismatched_chars.push(mutation);
}
});
}
However, when I try to push the mutation variable to the mismatched_charsvector:
use rayon::prelude::*;
fn main() {
let a: Vec<char> = String::from("aaaa").chars().collect();
let b: Vec<char> = String::from("aaab").chars().collect();
let length = a.len();
let index: Vec<_> = (1..=length).collect();
let mut mismatched_chars: Vec<String> = Vec::new();
(a, index, b).into_par_iter().for_each(|(x, i, y)| {
if x != y {
let mutation = format!("{}{}{}", x, i, y).to_string();
//println!("{mutation}");
mismatched_chars.push(mutation);
}
});
}
I get the following error:
error[E0596]: cannot borrow `mismatched_chars` as mutable, as it is a captured variable in a `Fn` closure
--> src/main.rs:16:13
|
16 | mismatched_chars.push(mutation);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cannot borrow as mutable
For more information about this error, try `rustc --explain E0596`.
error: could not compile `testing_compare_strings` due to previous error
I tried A LOT of different things. When I do:
use rayon::prelude::*;
fn main() {
let a: Vec<char> = String::from("aaaa").chars().collect();
let b: Vec<char> = String::from("aaab").chars().collect();
let length = a.len();
let index: Vec<_> = (1..=length).collect();
let mut mismatched_chars: Vec<&str> = Vec::new();
(a, index, b).into_par_iter().for_each(|(x, i, y)| {
if x != y {
let mutation = format!("{}{}{}", x, i, y).to_string();
mismatched_chars.push(&mutation);
}
});
}
The error becomes:
error[E0596]: cannot borrow `mismatched_chars` as mutable, as it is a captured variable in a `Fn` closure
--> src/main.rs:16:13
|
16 | mismatched_chars.push(&mutation);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cannot borrow as mutable
error[E0597]: `mutation` does not live long enough
--> src/main.rs:16:35
|
10 | let mut mismatched_chars: Vec<&str> = Vec::new();
| -------------------- lifetime `'1` appears in the type of `mismatched_chars`
...
16 | mismatched_chars.push(&mutation);
| ----------------------^^^^^^^^^-
| | |
| | borrowed value does not live long enough
| argument requires that `mutation` is borrowed for `'1`
17 | }
| - `mutation` dropped here while still borrowed
I suspect that the solution is quite simple, but I cannot see it myself.

You have the right idea with what you are doing, but you will want to try to use an iterator chain with filter and map to remove or convert iterator items into different values. Rayon also provides a collect method similar to regular iterators to convert items into a type T: FromIterator (such as Vec<T>).
fn compare_strings_to_vec(a: &str, b: &str) -> Vec<String> {
// Same as with the if statement, but just a little shorter to write
// Plus, it will print out the two values it is comparing if it errors.
assert_eq!(a.len(), b.len(), "Reads are not the same length!");
// Zip the character iterators from a and b together
a.chars().zip(b.chars())
// Iterate with the index of each item
.enumerate()
// Rayon function which turns a regular iterator into a parallel one
.par_bridge()
// Filter out values where the characters are the same
.filter(|(_, (a, b))| a != b)
// Convert the remaining values into an error string
.map(|(index, (a, b))| {
format!("{}{}{}", a, index + 1, b)
})
// Turn the items of this iterator into a Vec (Or any other FromIterator type).
.collect()
}
Rust Playground
Optimizing for speed
On the other hand, if you want speed we need to approach this problem from a different direction. You may have noticed, but the rayon version is quite slow since the cost of spawning a thread and using concurrency structures is orders of magnitude more than just simply comparing the bytes in the original thread. In my benchmarks, I found that even with better workload distribution, additional threads were only helpful on my machine (64GB RAM, 16 cores) when the strings were at least 1-2 million bytes long. Given that you have stated they are typically ~30,000 bytes long I think using rayon (or really any other threading for comparisons of this size) will only slow down your code.
Using criterion for benchmarking, I eventually came to this implementation. It generally gets about 2.8156 µs per run on strings of 30,000 characters with 10 different bytes. For comparison, the code posted in the original question usually gets around 61.156 µs on my system under the same conditions so this should give a ~20x speedup. It can vary a bit, but it consistently got the best results in the benchmark. I'm guessing this should be fast enough to have this step no-longer be the bottleneck in your code.
This key focus of this implementation is to do the comparisons in batches. We can take advantage of the 128bit registers on most CPUs to compare the input in 16 byte batches. Upon an inequality being found, the 16 byte section it covers is re-scanned for the exact position of the discrepancy. This gives a decent boost to performance. I initially thought that a usize would work better, but it seems that was not the case. I also attempted to use the portable_simd nightly feature to write a simd version of this code, but I was unable to match the speed of this code. I suspect this was either due to missed optimizations or a lack of experience to effectively use simd on my part.
I was worried about drops in speed due to alignment of chunks not being enforced for u128 values, but it seems to mostly be a non-issue. First of all, it is generally quite difficult to find allocators which are willing to allocate to an address which is not a multiple of the system word size. Of course, this is due to practicality rather than any actual requirement. When I manually gave it unaligned slices (unaligned for u128s), it is not significantly effected. This is why I do not attempt to enforce that the start index of the slice be aligned to align_of::<u128>().
fn compare_strings_to_vec(a: &str, b: &str) -> Vec<String> {
let a_bytes = a.as_bytes();
let b_bytes = b.as_bytes();
let remainder = a_bytes.len() % size_of::<u128>();
// Strongly suggest to the compiler we are iterating though u128
a_bytes
.chunks_exact(size_of::<u128>())
.zip(b_bytes.chunks_exact(size_of::<u128>()))
.enumerate()
.filter(|(_, (a, b))| {
let a_block: &[u8; 16] = (*a).try_into().unwrap();
let b_block: &[u8; 16] = (*b).try_into().unwrap();
u128::from_ne_bytes(*a_block) != u128::from_ne_bytes(*b_block)
})
.flat_map(|(word_index, (a, b))| {
fast_path(a, b).map(move |x| word_index * size_of::<u128>() + x)
})
.chain(
fast_path(
&a_bytes[a_bytes.len() - remainder..],
&b_bytes[b_bytes.len() - remainder..],
)
.map(|x| a_bytes.len() - remainder + x),
)
.map(|index| {
format!(
"{}{}{}",
char::from(a_bytes[index]),
index + 1,
char::from(b_bytes[index])
)
})
.collect()
}
/// Very similar to regular route, but with nothing fancy, just get the indices of the overlays
#[inline(always)]
fn fast_path<'a>(a: &'a [u8], b: &'a [u8]) -> impl 'a + Iterator<Item = usize> {
a.iter()
.zip(b.iter())
.enumerate()
.filter_map(|(x, (a, b))| (a != b).then_some(x))
}

You cannot directly access the field mismatched_chars in a multithreading environment.
You can use Arc<RwLock> to access the field in multithreading.
use rayon::prelude::*;
use std::sync::{Arc, RwLock};
fn main() {
let a: Vec<char> = String::from("aaaa").chars().collect();
let b: Vec<char> = String::from("aaab").chars().collect();
let length = a.len();
let index: Vec<_> = (1..=length).collect();
let mismatched_chars: Arc<RwLock<Vec<String>>> = Arc::new(RwLock::new(Vec::new()));
(a, index, b).into_par_iter().for_each(|(x, i, y)| {
if x != y {
let mutation = format!("{}{}{}", x, i, y);
mismatched_chars
.write()
.expect("could not acquire write lock")
.push(mutation);
}
});
for mismatch in mismatched_chars
.read()
.expect("could not acquire read lock")
.iter()
{
eprintln!("{}", mismatch);
}
}

Creating struct with values from function parameter Vec<String>and returning Vec<struct> to caller

The purpose of my program is to read questions/answers from a file (line by line), and create several structs from it, put into a Vec for further processing.
I have a rather long piece of code, which I tried to separate into several functions (full version on Playground; hopefully is valid link).
I suppose I'm not understanding a lot about borrowing, lifetimes and other things. Apart from that, the given examples from all around I've seen, I'm not able to adapt to my given problems.
Tryigin to remodel my struct fields from &str to String didn't change anything. As it was with creating Vec<Question> within get_question_list.
Function of concern is as follows:
fn get_question_list<'a>(mut questions: Vec<Question<'a>>, lines: Vec<String>) -> Vec<Question<'a>> {
let count = lines.len();
for i in (0..count).step_by(2) {
let q: &str = lines.get(i).unwrap();
let a: &str = lines.get(i + 1).unwrap();
questions.push(Question::new(q, a));
}
questions
}
This code fails with the compiler as following (excerpt):
error[E0597]: `lines` does not live long enough
--> src/main.rs:126:23
|
119 | fn get_question_list<'a>(mut questions: Vec<Question<'a>>, lines: Vec<String>) -> Vec<Question<'a>> {
| -- lifetime `'a` defined here
...
126 | let a: &str = lines.get(i + 1).unwrap();
| ^^^^^ borrowed value does not live long enough
127 |
128 | questions.push(Question::new(q, a));
| ----------------------------------- argument requires that `lines` is borrowed for `'a`
...
163 | }
| - `lines` dropped here while still borrowed
Call to get_question_list is around:
let lines: Vec<String> = content.split("\n").map(|s| s.to_string()).collect();
let counter = lines.len();
if counter % 2 != 0 {
return Err("Found lines in quiz file are not even (one question or answer is missing.).");
}
questions = get_question_list(questions, lines);
Ok(questions)

The issue is that your Questions are supposed to borrow something (hence the lifetime annotation), but lines gets moved into the function, so when you create a new question from a line, it's borrowing function-local data, which is going to be destroyed at the end of the function. As a consequence, the questions you're creating can't escape the function creating them.
Now what you could do is not move the lines into the function: lines: &[String] would have the lines be owned by the caller, which would "fix" get_question_list.
However the exact same problem exists in read_questions_from_file, and there it can not be resolved: the lines are read from a file, and thus are necessarily local to the function (unless you move the lines-reading to main and read_questions_from_file only borrows them as well).
Therefore the simplest proper fix is to change Question to own its data:
struct Question {
question: String,
answer: String
}
This way the question itself keeps its data alive, and the issue goes away.
We can improve things further though, I think:
First, we can strip out the entire mess around newlines by using String::lines, it will handle cross-platform linebreaks, and will strip them.
It also seems rather odd that get_question_list takes a vector by value only to append to it and immediately return it. A more intuitive interface would be to either:
take the "output vector" by &mut so the caller can pre-size or reuse it across multiple loads, which doesn't really seem useful in this case
or create the output vector internally, which seems like the most sensible case here
Here is what I would consider a more pleasing version: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c0d440d67654b92c75d136eba2bba0c1
fn read_questions_from_file(filename: &str) -> Result<Vec<Question>, Box<dyn Error>> {
let file_content = read_file(filename)?;
let lines: Vec<_> = file_content.lines().collect();
if lines.len() % 2 != 0 {
return Err(Box::new(OddLines));
}
let mut questions = Vec::with_capacity(lines.len() / 2);
for chunk in lines.chunks(2) {
if let [q, a] = chunk {
questions.push(Question::new(q.to_string(), a.to_string()))
} else {
unreachable!("Odd lines should already have been checked");
}
}
Ok(questions)
}
Note that I inlined / removed get_question_list as I don't think it pulls its weight at this point, and it's both trivial and very specific.
Here is a variant which works similarly but with different tradeoffs: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3b8f95aef5bcae904545617749086dbc
fn read_questions_from_file(filename: &str) -> Result<Vec<Question>, Box<dyn Error>> {
let file_content = read_file(filename)?;
let mut lines = file_content.lines();
let mut questions = Vec::new();
while let Some(q) = lines.next() {
let a = lines.next().ok_or(OddLines)?;
questions.push(Question::new(q.to_string(), a.to_string()));
}
Ok(questions)
}
it avoids collecting the lines to a Vec, but as a result has to process the file to the end before it knows that said file is suitable, and it can't preallocate Questions.
At this point, because we do not care for lines being a Vec anymore, we could operate on a BufRead and strip out read_file as well:
fn read_questions_from_file(filename: &str) -> Result<Vec<Question>, Box<dyn Error>> {
let file_content = BufReader::new(File::open(filename)?);
let mut lines = file_content.lines();
let mut questions = Vec::new();
while let Some(q) = lines.next() {
let a = lines.next().ok_or(OddLines)?;
questions.push(Question::new(q?, a?));
}
Ok(questions)
}
The extra ? are because while str::Lines yields &str, io::Lines yields Result<String, io::Error>: IO errors are reported lazily when a read is attempted, meaning every line-read could report a failure if read_to_string would have failed.
OTOH since io::Lines returns a Result<String, ...> we can use q and a directly without needing to convert them to String.

Iterating over lines in a file and looking for substring from a vec! in rust

I'm writing a project in which a struct System can be constructed from a data file.
In the data file, some lines contain keywords that indicates values to be read either inside the line or in the subsequent N following lines (separated with a blank line from the line).
I would like to have a vec! containing the keywords (statically known at compile time), check if the line returned by the iterator contains the keyword and do the appropriate operations.
Now my code looks like this:
impl System {
fn read_data<P>(filename: P) -> io::Result<io::Lines<io::BufReader<File>>> where P: AsRef<Path> {
let file = File::open(filename)?;
let f = BufReader::new(file);
Ok(f.lines())
}
...
pub fn new_from_data<P>(dataname: P) -> System where P: AsRef<Path> {
let keywd = vec!["atoms", "atom types".into(),
"Atoms".into()];
let mut sys = System::new();
if let Ok(mut lines) = System::read_data(dataname) {
while let Some(line) = lines.next() {
for k in keywd {
let split: Vec<&str> = line.unwrap().split(" ").collect();
if split.contains(k) {
match k {
"atoms" => sys.natoms = split[0].parse().unwrap(),
"atom types" => sys.ntypes = split[0].parse().unwrap(),
"Atoms" => {
lines.next();
// assumes fields are: atom-ID molecule-ID atom-type q x y z
for _ in 1..=sys.natoms {
let atline = lines.next().unwrap().unwrap();
let data: Vec<&str> = atline.split(" ").collect();
let atid: i32 = data[0].parse().unwrap();
let molid: i32 = data[1].parse().unwrap();
let atype: i32 = data[2].parse().unwrap();
let charge: f32 = data[3].parse().unwrap();
let x: f32 = data[4].parse().unwrap();
let y: f32 = data[5].parse().unwrap();
let z: f32 = data[6].parse().unwrap();
let at = Atom::new(atid, molid, atype, charge, x, y, z);
sys.atoms.push(at);
};
},
_ => (),
}
}
}
}
}
sys
}
}
I'm very unsure on two points:
I don't know if I treated the line by line reading of the file in an idiomatic way as I tinkered some examples from the book and Rust by example. But returning an iterator makes me wonder when and how unwrap the results. For example, when calling the iterator inside the while loop do I have to unwrap twice like in let atline = lines.next().unwrap().unwrap();? I think that the compiler does not complain yet because of the 1st error it encounters which is
I cannot wrap my head around the type the give to the value k as I get a typical:
error[E0308]: mismatched types
--> src/system/system.rs:65:39
|
65 | if split.contains(k) {
| ^ expected `&str`, found `str`
|
= note: expected reference `&&str`
found reference `&str`
error: aborting due to previous error
How are we supposed to declare the substring and compare it to the strings I put in keywd? I tried to deference k in contains, tell it to look at &keywd etc but I just feel I'm wasting my time for not properly adressing the problem. Thanks in advance, any help is indeed appreciated.

Let's go through the issues one by one. I'll go through the as they appear in the code.
First you need to borrow keywd in the for loop, i.e. &keywd. Because otherwise keywd gets moved after the first iteration of the while loop, and thus why the compiler complains about that.
for k in &keywd {
let split: Vec<&str> = line.unwrap().split(" ").collect();
Next, when you call .unwrap() on line, that's the same problem. That causes the inner Ok value to get moved out of the Result. Instead you can do line.as_ref().unwrap() as then you get a reference to the inner Ok value and aren't consuming the line Result.
Alternatively, you can .filter_map(Result::ok) on your lines, to avoid (.as_ref()).unwrap() altogether.
You can add that directly to read_data and even simply the return type using impl ....
fn read_data<P>(filename: P) -> io::Result<impl Iterator<Item = String>>
where
P: AsRef<Path>,
{
let file = File::open(filename)?;
let f = BufReader::new(file);
Ok(f.lines().filter_map(Result::ok))
}
Note that you're splitting line for every keywd, which is needless. So you can move that outside of your for loop as well.
All in all, it ends up looking like this:
if let Ok(mut lines) = read_data("test.txt") {
while let Some(line) = lines.next() {
let split: Vec<&str> = line.split(" ").collect();
for k in &keywd {
if split.contains(k) {
...
Given that we borrowed &keywd, then we don't need to change k to &k, as now k is already &&str.

Using an iterator as an argument to a function multiple times from one vector

I'm trying to write some Rust code to decode GPS data from an SDR receiver. I'm reading samples in from a file and converting the binary data to a series of complex numbers, which is a time-consuming process. However, there are times when I want to stream samples in without keeping them in memory (e.g. one very large file processed only one way or samples directly from the receiver) and other times when I want to keep the whole data set in memory (e.g. one small file processed in multiple different ways) to avoid repeating the work of parsing the binary file.
Therefore, I want to write functions or structs with iterators to be as general as possible, but I know they aren't sized, so I need to put them in a Box. I would have expected something like this to work.
This is the simplest example I could come up with to demonstrate the same basic problem.
fn sum_squares_plus(iter: Box<Iterator<Item = usize>>, x: usize) -> usize {
let mut ans: usize = 0;
for i in iter {
ans += i * i;
}
ans + x
}
fn main() {
// Pretend this is an expensive operation that I don't want to repeat five times
let small_data: Vec<usize> = (0..10).collect();
for x in 0..5 {
// Want to iterate over immutable references to the elements of small_data
let iterbox: Box<Iterator<Item = usize>> = Box::new(small_data.iter());
println!("{}: {}", x, sum_squares_plus(iterbox, x));
}
// 0..100 is more than 0..10 and I'm only using it once,
// so I want to 'stream' it instead of storing it all in memory
let x = 55;
println!("{}: {}", x, sum_squares_plus(Box::new(0..100), x));
}
I've tried several different variants of this, but none seem to work. In this particular case, I'm getting
error[E0271]: type mismatch resolving `<std::slice::Iter<'_, usize> as std::iter::Iterator>::Item == usize`
--> src/main.rs:15:52
|
15 | let iterbox: Box<Iterator<Item = usize>> = Box::new(small_data.iter());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected reference, found usize
|
= note: expected type `&usize`
found type `usize`
= note: required for the cast to the object type `dyn std::iter::Iterator<Item = usize>`
I'm not worried about concurrency and I'd be happy to just get it working sequentially on a single thread, but a concurrent solution would be a nice bonus.

The current error you're running into is here:
let iterbox:Box<Iterator<Item = usize>> = Box::new(small_data.iter());
You're declaring that you want an iterator that returns usize items, but small_data.iter() is an iterator that returns references to usize items (&usize). That why you get the error "expected reference, found usize". usize is a small type that's cloneable so you can simply use the .cloned() iterator adapter to provide an iterator that actually returns a usize.
let iterbox: Box<Iterator<Item = usize>> = Box::new(small_data.iter().cloned());
Once you're past that hurdle, the next problem is that the iterator returned over small_data contains a reference to the small_data. Since sum_squares_plus is defined to accept a Box<Iterator<Item = usize>>, it's implied in that signature that the Iterator trait object within the box has a 'static lifetime. The iterator you're providing does not because it borrows small_data. To fix that you need to adjust the sum_squares_plus definition to
fn sum_squares_plus<'a>(iter: Box<Iterator<Item = usize> + 'a>, x: usize) -> usize
Note the 'a lifetime annotations. The code should then compile, but unless there's some constraints other than what's clearly defined here, a more idiomatic and efficient approach would be to avoid using trait objects and the associated allocations. The below code should work using static dispatch without any trait objects.
fn sum_squares_plus<I: Iterator<Item = usize>>(iter: I, x: usize) -> usize {
let mut ans: usize = 0;
for i in iter {
ans += i * i;
}
ans + x
}
fn main() {
// Pretend this is an expensive operation that I don't want to repeat five times
let small_data: Vec<usize> = (0..10).collect();
for x in 0..5 {
println!("{}: {}", x, sum_squares_plus(small_data.iter().cloned(), x));
}
// 0..100 is more than 0..10 and I'm only using it once,
// so I want to 'stream' it instead of storing it all in memory
let x = 55;
println!("{}: {}", x, sum_squares_plus(Box::new(0..100), x));
}

How to translate "x-y" to vec![x, x+1, … y-1, y]?

This solution seems rather inelegant:
fn parse_range(&self, string_value: &str) -> Vec<u8> {
let values: Vec<u8> = string_value
.splitn(2, "-")
.map(|part| part.parse().ok().unwrap())
.collect();
{ values[0]..(values[1] + 1) }.collect()
}
Since splitn(2, "-") returns exactly two results for any valid string_value, it would be better to assign the tuple directly to two variables first and last rather than a seemingly arbitrary-length Vec. I can't seem to do this with a tuple.
There are two instances of collect(), and I wonder if it can be reduced to one (or even zero).

Trivial implementation
fn parse_range(string_value: &str) -> Vec<u8> {
let pos = string_value.find(|c| c == '-').expect("No valid string");
let (first, second) = string_value.split_at(pos);
let first: u8 = first.parse().expect("Not a number");
let second: u8 = second[1..].parse().expect("Not a number");
{ first..second + 1 }.collect()
}
Playground
I would recommend returning a Result<Vec<u8>, Error> instead of panicking with expect/unwrap.
Nightly implementation
My next thought was about the second collect. Here is a code example which uses nightly code, but you won't need any collect at all.
#![feature(conservative_impl_trait, inclusive_range_syntax)]
fn parse_range(string_value: &str) -> impl Iterator<Item = u8> {
let pos = string_value.find(|c| c == '-').expect("No valid string");
let (first, second) = string_value.split_at(pos);
let first: u8 = first.parse().expect("Not a number");
let second: u8 = second[1..].parse().expect("Not a number");
first..=second
}
fn main() {
println!("{:?}", parse_range("3-7").collect::<Vec<u8>>());
}

Instead of calling collect the first time, just advance the iterator:
let mut values = string_value
.splitn(2, "-")
.map(|part| part.parse().unwrap());
let start = values.next().unwrap();
let end = values.next().unwrap();
Do not call .ok().unwrap() — that converts the Result with useful error information to an Option, which has no information. Just call unwrap directly on the Result.
As already mentioned, if you want to return a Vec, you'll want to call collect to create it. If you want to return an iterator, you can. It's not bad even in stable Rust:
fn parse_range(string_value: &str) -> std::ops::Range<u8> {
let mut values = string_value
.splitn(2, "-")
.map(|part| part.parse().unwrap());
let start = values.next().unwrap();
let end = values.next().unwrap();
start..end + 1
}
fn main() {
assert!(parse_range("1-5").eq(1..6));
}
Sadly, inclusive ranges are not yet stable, so you'll need to continue to use +1 or switch to nightly.
Since splitn(2, "-") returns exactly two results for any valid string_value, it would be better to assign the tuple directly to two variables first and last rather than a seemingly arbitrary-length Vec. I can't seem to do this with a tuple.
This is not possible with Rust's type system. You are asking for dependent types, a way for runtime values to interact with the type system. You'd want splitn to return a (&str, &str) for a value of 2 and a (&str, &str, &str) for a value of 3. That gets even more complicated when the argument is a variable, especially when it's set at run time.
The closest workaround would be to have a runtime check that there are no more values:
assert!(values.next().is_none());
Such a check doesn't feel valuable to me.
See also:
What is the correct way to return an Iterator (or any other trait)?
How do I include the end value in a range?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Split string only once in Rust - rust

str::split_once is now built-in. Doc examples: assert_eq!("cfg".split_once('='), None); assert_eq!("cfg=".split_once('='), Some(("cfg", ""))); assert_eq!("cfg=foo".split_once('='), Some(("cfg", "foo"))); assert_eq!("cfg=foo=bar".split_once('='), Some(("cfg", "foo=bar")));

Related

Rust string comparison same speed as Python . Want to parallelize the program

Creating struct with values from function parameter Vec<String>and returning Vec<struct> to caller

Iterating over lines in a file and looking for substring from a vec! in rust

Using an iterator as an argument to a function multiple times from one vector

How to translate "x-y" to vec![x, x+1, … y-1, y]?

Categories

Resources