How to skip n items from inside of an iterator loop? - rust

This code:
play
fn main() {
let text = "abcd";
for char in text.chars() {
if char == 'b' {
// skip 2 chars
}
print!("{}", char);
}
// prints `abcd`, but I want `ad`
}
prints abcd, but I want to skip 2 chars if b was found, so that it prints ad. How do I do that?
I tried to put the iterator into a variable outside the loop and manipulate that iterator within the loop, but the Borrow Checker doesn't allow that.

AFAIK you can't do that with a for loop. You will need to desugar it by hand:
let mut it = text.chars();
while let Some(char) = it.next() {
if char == 'b' {
it.nth(1); // nth(1) skips/consumes exactly 2 items
continue;
}
print!("{}", char);
}
Playground

If you want to keep an iterator style, you can use std::iter::successors (I've replaced the special char with '!' for being more readable:
fn my_iter<'a>(s: &'a str) -> impl Iterator<Item = char> + 'a {
let mut it = s.chars();
std::iter::successors(it.next(), move |c| {
if *c == '!' {
it.next().and_then(|_| it.next())
} else {
it.next()
}
})
.filter(|c| *c != '!')
}
fn main() {
assert!(my_iter("a!bc").eq("ac".chars()));
assert!(my_iter("!abcd").eq("bcd".chars()));
assert!(my_iter("abc!d").eq("abc".chars()));
assert!(my_iter("abcd!").eq("abcd".chars()));
}

Related

How to make this iterator/for loop idiomatic in Rust

How can I get rid of the ugly let mut i = 0; and i += 1;? Is there a more idomatic way I can write this loop? I tried .enumerate() but it doesn't work on a &[&str].
use std::fmt::Write;
pub fn build_proverb(list: &[&str]) -> String {
let mut proverb = String::new();
let mut i = 0;
for word in list {
i += 1;
if i < list.len() {
write!(proverb, "For want of a {} the {} was lost.\n", word, list[i]);
} else {
write!(proverb, "And all for the want of a {}.", list[0]);
}
}
proverb
}
You were close, enumerate() doesn't exist for &[&str] because enumerate() is a function on Iterator, and &[&str] is not an Iterator.
But you can call list.iter() to get an iterator, which you can then call enumerate() on.
Full example:
pub fn build_proverb(list: &[&str]) -> String {
let mut proverb = String::new();
for (i, word) in list.iter().enumerate() {
if i < list.len() {
write!(proverb, "For want of a {} the {} was lost.\n", word, list[i]);
} else {
write!(proverb, "And all for the want of a {}.", list[0]);
}
}
proverb
}
The fact you have word and list[i] (after incrementing i) makes it look like you want something like windows:
pub fn build_proverb(list: &[&str]) -> String {
let mut proverb = String::new();
for words in list.windows(2) {
let first = words[0];
let second = words[1];
write!(proverb, "For want of a {} the {} was lost.\n", first, second);
}
write!(proverb, "And all for the want of a {}.", list[0]);
proverb
}

Parse allowing nested parentheses in nom

I'm using nom. I'd like to parse a string that's surrounded by parentheses, and allowing for additional nested parentheses within the string.
So (a + b) would parse as a + b, and ((a + b)) would parse as (a + b)
This works for the first case, but not the nested case:
pub fn parse_expr(input: &str) -> IResult<&str, &str> {
// TODO: this will fail with nested parentheses, but `rest` doesn't seem to
// be working.
delimited(tag("("), take_until(")"), tag(")"))(input)
}
I tried using rest but this doesn't respect the final ):
pub fn parse_expr(input: &str) -> IResult<&str, &str> {
delimited(tag("("), rest, tag(")"))(input)
}
Thanks!
I found a reference to this in the nom issue log: https://github.com/Geal/nom/issues/1253
I'm using this function, from parse_hyperlinks — basically a hand-written parser for this https://docs.rs/parse-hyperlinks/0.23.3/src/parse_hyperlinks/lib.rs.html#41 :
pub fn take_until_unbalanced(
opening_bracket: char,
closing_bracket: char,
) -> impl Fn(&str) -> IResult<&str, &str> {
move |i: &str| {
let mut index = 0;
let mut bracket_counter = 0;
while let Some(n) = &i[index..].find(&[opening_bracket, closing_bracket, '\\'][..]) {
index += n;
let mut it = i[index..].chars();
match it.next().unwrap_or_default() {
c if c == '\\' => {
// Skip the escape char `\`.
index += '\\'.len_utf8();
// Skip also the following char.
let c = it.next().unwrap_or_default();
index += c.len_utf8();
}
c if c == opening_bracket => {
bracket_counter += 1;
index += opening_bracket.len_utf8();
}
c if c == closing_bracket => {
// Closing bracket.
bracket_counter -= 1;
index += closing_bracket.len_utf8();
}
// Can not happen.
_ => unreachable!(),
};
// We found the unmatched closing bracket.
if bracket_counter == -1 {
// We do not consume it.
index -= closing_bracket.len_utf8();
return Ok((&i[index..], &i[0..index]));
};
}
if bracket_counter == 0 {
Ok(("", i))
} else {
Err(Err::Error(Error::from_error_kind(i, ErrorKind::TakeUntil)))
}
}
}

Rust - Multiple Calls to Iterator Methods

I have this following rust code:
fn tokenize(line: &str) -> Vec<&str> {
let mut tokens = Vec::new();
let mut chars = line.char_indices();
for (i, c) in chars {
match c {
'"' => {
if let Some(pos) = chars.position(|(_, x)| x == '"') {
tokens.push(&line[i..=i+pos]);
} else {
// Not a complete string
}
}
// Other options...
}
}
tokens
}
I am trying to elegantly extract a string surrounded by double quotes from the line, but since chars.position takes a mutable reference and chars is moved into the for loop, I get a compilation error - "value borrowed after move". The compiler suggests borrowing chars in the for loop but this doesn't work because an immutable reference is not an iterator (and a mutable one would cause the original problem where I can't borrow mutably again for position).
I feel like there should be a simple solution to this.
Is there an idiomatic way to do this or do I need to regress to appending characters one by one?
Because a for loop will take ownership of chars (because it calls .into_iter() on it) you can instead manually iterate through chars using a while loop:
fn tokenize(line: &str) -> Vec<&str> {
let mut tokens = Vec::new();
let mut chars = line.char_indices();
while let Some((i, c)) = chars.next() {
match c {
'"' => {
if let Some(pos) = chars.position(|(_, x)| x == '"') {
tokens.push(&line[i..=i+pos]);
} else {
// Not a complete string
}
}
// Other options...
}
}
}
It works if you just desugar the for-loop:
fn tokenize(line: &str) -> Vec<&str> {
let mut tokens = Vec::new();
let mut chars = line.char_indices();
while let Some((i, c)) = chars.next() {
match c {
'"' => {
if let Some(pos) = chars.position(|(_, x)| x == '"') {
tokens.push(&line[i..=i+pos]);
} else {
// Not a complete string
}
},
_ => {},
}
}
tokens
}
The normal for-loop prevents additional modification of the iterator because this usually leads to surprising and hard-to-read code. Doing it as a while-loop has no such protection.
If all you want to do is find quoted strings, I would not, however, go with an iterator at all here.
fn tokenize(line: &str) -> Vec<&str> {
let mut tokens = Vec::new();
let mut line = line;
while let Some(pos) = line.find('"') {
line = &line[(pos+1)..];
if let Some(end) = line.find('"') {
tokens.push(&line[..end]);
line = &line[(end+1)..];
} else {
// Not a complete string
}
}
tokens
}

Is there a method like JavaScript's substr in Rust?

I looked at the Rust docs for String but I can't find a way to extract a substring.
Is there a method like JavaScript's substr in Rust? If not, how would you implement it?
str.substr(start[, length])
The closest is probably slice_unchecked but it uses byte offsets instead of character indexes and is marked unsafe.
For characters, you can use s.chars().skip(pos).take(len):
fn main() {
let s = "Hello, world!";
let ss: String = s.chars().skip(7).take(5).collect();
println!("{}", ss);
}
Beware of the definition of Unicode characters though.
For bytes, you can use the slice syntax:
fn main() {
let s = b"Hello, world!";
let ss = &s[7..12];
println!("{:?}", ss);
}
You can use the as_str method on the Chars iterator to get back a &str slice after you have stepped on the iterator. So to skip the first start chars, you can call
let s = "Some text to slice into";
let mut iter = s.chars();
iter.by_ref().nth(start); // eat up start values
let slice = iter.as_str(); // get back a slice of the rest of the iterator
Now if you also want to limit the length, you first need to figure out the byte-position of the length character:
let end_pos = slice.char_indices().nth(length).map(|(n, _)| n).unwrap_or(0);
let substr = &slice[..end_pos];
This might feel a little roundabout, but Rust is not hiding anything from you that might take up CPU cycles. That said, I wonder why there's no crate yet that offers a substr method.
This code performs both substring-ing and string-slicing, without panicking nor allocating:
use std::ops::{Bound, RangeBounds};
trait StringUtils {
fn substring(&self, start: usize, len: usize) -> &str;
fn slice(&self, range: impl RangeBounds<usize>) -> &str;
}
impl StringUtils for str {
fn substring(&self, start: usize, len: usize) -> &str {
let mut char_pos = 0;
let mut byte_start = 0;
let mut it = self.chars();
loop {
if char_pos == start { break; }
if let Some(c) = it.next() {
char_pos += 1;
byte_start += c.len_utf8();
}
else { break; }
}
char_pos = 0;
let mut byte_end = byte_start;
loop {
if char_pos == len { break; }
if let Some(c) = it.next() {
char_pos += 1;
byte_end += c.len_utf8();
}
else { break; }
}
&self[byte_start..byte_end]
}
fn slice(&self, range: impl RangeBounds<usize>) -> &str {
let start = match range.start_bound() {
Bound::Included(bound) | Bound::Excluded(bound) => *bound,
Bound::Unbounded => 0,
};
let len = match range.end_bound() {
Bound::Included(bound) => *bound + 1,
Bound::Excluded(bound) => *bound,
Bound::Unbounded => self.len(),
} - start;
self.substring(start, len)
}
}
fn main() {
let s = "abcdèfghij";
// All three statements should print:
// "abcdè, abcdèfghij, dèfgh, dèfghij."
println!("{}, {}, {}, {}.",
s.substring(0, 5),
s.substring(0, 50),
s.substring(3, 5),
s.substring(3, 50));
println!("{}, {}, {}, {}.",
s.slice(..5),
s.slice(..50),
s.slice(3..8),
s.slice(3..));
println!("{}, {}, {}, {}.",
s.slice(..=4),
s.slice(..=49),
s.slice(3..=7),
s.slice(3..));
}
For my_string.substring(start, len)-like syntax, you can write a custom trait:
trait StringUtils {
fn substring(&self, start: usize, len: usize) -> Self;
}
impl StringUtils for String {
fn substring(&self, start: usize, len: usize) -> Self {
self.chars().skip(start).take(len).collect()
}
}
// Usage:
fn main() {
let phrase: String = "this is a string".to_string();
println!("{}", phrase.substring(5, 8)); // prints "is a str"
}
The solution given by oli_obk does not handle last index of string slice. It can be fixed with .chain(once(s.len())).
Here function substr implements a substring slice with error handling. If invalid index is passed to function, then a valid part of string slice is returned with Err-variant. All corner cases should be handled correctly.
fn substr(s: &str, begin: usize, length: Option<usize>) -> Result<&str, &str> {
use std::iter::once;
let mut itr = s.char_indices().map(|(n, _)| n).chain(once(s.len()));
let beg = itr.nth(begin);
if beg.is_none() {
return Err("");
} else if length == Some(0) {
return Ok("");
}
let end = length.map_or(Some(s.len()), |l| itr.nth(l-1));
if let Some(end) = end {
return Ok(&s[beg.unwrap()..end]);
} else {
return Err(&s[beg.unwrap()..s.len()]);
}
}
let s = "abc🙂";
assert_eq!(Ok("bc"), substr(s, 1, Some(2)));
assert_eq!(Ok("c🙂"), substr(s, 2, Some(2)));
assert_eq!(Ok("c🙂"), substr(s, 2, None));
assert_eq!(Err("c🙂"), substr(s, 2, Some(99)));
assert_eq!(Ok(""), substr(s, 2, Some(0)));
assert_eq!(Err(""), substr(s, 5, Some(4)));
Note that this does not handle unicode grapheme clusters. For example, "y̆es" contains 4 unicode chars but 3 grapheme clusters. Crate unicode-segmentation solves this problem. Unicode grapheme clusters are handled correctly if part
let mut itr = s.char_indices()...
is replaced with
use unicode_segmentation::UnicodeSegmentation;
let mut itr = s.grapheme_indices(true)...
Then also following works
assert_eq!(Ok("y̆"), substr("y̆es", 0, Some(1)));
Knowing about the various syntaxes of the slice type might be beneficial for some of the readers.
Reference to a part of a string
&s[6..11]
If you start at index 0, you can omit the value
&s[0..1] ^= &s[..1]
Equivalent if your substring contains the last byte of the string
&s[3..s.len()] ^= &s[3..]
This also applies when the slice encompasses the entire string
&s[..]
You can also use the range inclusive operator to include the last value
&s[..=1]
Link to docs: https://doc.rust-lang.org/book/ch04-03-slices.html
I would suggest you use the crate substring. (And look at its source code if you want to learn how to do this properly.)
I couldn't find the exact substr implementation that I'm familiar with from other programming languages like: JavaScript, Dart, and etc.
Here is possible implementation of method substr to &str and String
Let's define a trait for making able to implement functions to default types, (like extensions in Dart).
trait Substr {
fn substr(&self, start: usize, end: usize) -> String;
}
Then implement this trait for &str
impl<'a> Substr for &'a str {
fn substr(&self, start: usize, end: usize) -> String {
if start > end || start == end {
return String::new();
}
self.chars().skip(start).take(end - start).collect()
}
}
Try:
fn main() {
let string = "Hello, world!";
let substring = string.substr(0, 4);
println!("{}", substring); // Hell
}
You can also use .to_string()[ <range> ].
This example takes an immutable slice of the original string, then mutates that string to demonstrate the original slice is preserved.
let mut s: String = "Hello, world!".to_string();
let substring: &str = &s.to_string()[..6];
s.replace_range(..6, "Goodbye,");
println!("{} {} universe!", s, substring);
// Goodbye, world! Hello, universe!
I'm not very experienced in Rust but I gave it a try. If someone could correct my answer please don't hesitate.
fn substring(string:String, start:u32, end:u32) -> String {
let mut substr = String::new();
let mut i = start;
while i < end + 1 {
substr.push_str(&*(string.chars().nth(i as usize).unwrap().to_string()));
i += 1;
}
return substr;
}
Here is a playground

How to implement trim for Vec<u8>?

Rust provides a trim method for strings: str.trim() removing leading and trailing whitespace. I want to have a method that does the same for bytestrings. It should take a Vec<u8> and remove leading and trailing whitespace (space, 0x20 and htab, 0x09).
Writing a trim_left() is easy, you can just use an iterator with skip_while(): Rust Playground
fn main() {
let a: &[u8] = b" fo o ";
let b: Vec<u8> = a.iter().map(|x| x.clone()).skip_while(|x| x == &0x20 || x == &0x09).collect();
println!("{:?}", b);
}
But to trim the right characters I would need to look ahead if no other letter is in the list after whitespace was found.
Here's an implementation that returns a slice, rather than a new Vec<u8>, as str::trim() does. It's also implemented on [u8], since that's more general than Vec<u8> (you can obtain a slice from a vector cheaply, but creating a vector from a slice is more costly, since it involves a heap allocation and a copy).
trait SliceExt {
fn trim(&self) -> &Self;
}
impl SliceExt for [u8] {
fn trim(&self) -> &[u8] {
fn is_whitespace(c: &u8) -> bool {
*c == b'\t' || *c == b' '
}
fn is_not_whitespace(c: &u8) -> bool {
!is_whitespace(c)
}
if let Some(first) = self.iter().position(is_not_whitespace) {
if let Some(last) = self.iter().rposition(is_not_whitespace) {
&self[first..last + 1]
} else {
unreachable!();
}
} else {
&[]
}
}
}
fn main() {
let a = b" fo o ";
let b = a.trim();
println!("{:?}", b);
}
If you really need a Vec<u8> after the trim(), you can just call into() on the slice to turn it into a Vec<u8>.
fn main() {
let a = b" fo o ";
let b: Vec<u8> = a.trim().into();
println!("{:?}", b);
}
This is a much simpler version than the other answers.
pub fn trim_ascii_whitespace(x: &[u8]) -> &[u8] {
let from = match x.iter().position(|x| !x.is_ascii_whitespace()) {
Some(i) => i,
None => return &x[0..0],
};
let to = x.iter().rposition(|x| !x.is_ascii_whitespace()).unwrap();
&x[from..=to]
}
Weird that this isn't in the standard library. I would have thought it was a common task.
Anyway here it is as a complete file/trait (with tests!) that you can copy/paste.
use std::ops::Deref;
/// Trait to allow trimming ascii whitespace from a &[u8].
pub trait TrimAsciiWhitespace {
/// Trim ascii whitespace (based on `is_ascii_whitespace()`) from the
/// start and end of a slice.
fn trim_ascii_whitespace(&self) -> &[u8];
}
impl<T: Deref<Target=[u8]>> TrimAsciiWhitespace for T {
fn trim_ascii_whitespace(&self) -> &[u8] {
let from = match self.iter().position(|x| !x.is_ascii_whitespace()) {
Some(i) => i,
None => return &self[0..0],
};
let to = self.iter().rposition(|x| !x.is_ascii_whitespace()).unwrap();
&self[from..=to]
}
}
#[cfg(test)]
mod test {
use super::TrimAsciiWhitespace;
#[test]
fn basic_trimming() {
assert_eq!(b" A ".trim_ascii_whitespace(), b"A");
assert_eq!(b" AB ".trim_ascii_whitespace(), b"AB");
assert_eq!(b"A ".trim_ascii_whitespace(), b"A");
assert_eq!(b"AB ".trim_ascii_whitespace(), b"AB");
assert_eq!(b" A".trim_ascii_whitespace(), b"A");
assert_eq!(b" AB".trim_ascii_whitespace(), b"AB");
assert_eq!(b" A B ".trim_ascii_whitespace(), b"A B");
assert_eq!(b"A B ".trim_ascii_whitespace(), b"A B");
assert_eq!(b" A B".trim_ascii_whitespace(), b"A B");
assert_eq!(b" ".trim_ascii_whitespace(), b"");
assert_eq!(b" ".trim_ascii_whitespace(), b"");
}
}
All we have to do is find the index of the first non-whitespace character, one time counting forward from the start, and another time counting backwards from the end.
fn is_not_whitespace(e: &u8) -> bool {
*e != 0x20 && *e != 0x09
}
fn main() {
let a: &[u8] = b" fo o ";
// find the index of first non-whitespace char
let begin = a.iter()
.position(is_not_whitespace);
// find the index of the last non-whitespace char
let end = a.iter()
.rev()
.position(is_not_whitespace)
.map(|j| a.len() - j);
// build it
let vec = begin.and_then(|i| end.map(|j| a[i..j].iter().collect()))
.unwrap_or(Vec::new());
println!("{:?}", vec);
}

Resources