Take N elements with saving that satisfies predicate - rust

I have that vec of strings
vec![
"import a\n",
"\n",
"\n",
"b = 1 + 2\n",
"\n",
"print(b)\n",
"print(b + 1)\n",
"\n"
];
And I want to take first 3 non "\n" lines also saving all "\n" lines between them. So that result would be this
vec![
"import a\n",
"\n",
"\n",
"b = 1 + 2\n",
"\n",
"print(b)\n"
];
Ideally if it could be done like this
lines.take_n_saving(3, |line| line == "\n")

Vec::retain() can do this in-place, but you need an external counter (captured by the closure).
fn main() {
let mut lines = vec![
"import a\n",
"\n",
"\n",
"b = 1 + 2\n",
"\n",
"print(b)\n",
"print(b + 1)\n",
"\n",
];
let mut keep = 3;
lines.retain(|l| {
let result = keep > 0;
if *l != "\n" {
keep -= 1;
}
result
});
println!("{:?}", lines);
}
/*
["import a\n", "\n", "\n", "b = 1 + 2\n", "\n", "print(b)\n"]
*/

You can use std::iter::filter. Below is an example:
fn take_n_saving<'a, F: Fn(&str) -> bool>(n: i32, input: Vec<&'a str>, f: F)
-> Vec<&'a str>
{
let mut count = 0;
return input
.into_iter()
.filter(|x| match count >= n {
true => false,
false => {
if !f(x) {
count += 1;
}
true
}
})
.collect();
}
fn main() {
let input = vec![
"import a\n",
"\n",
"\n",
"b = 1 + 2\n",
"\n",
"print(b)\n",
"print(b + 1)\n",
"\n",
];
let output: Vec<_> = take_n_saving(3, input, |x: &str| x == "\n");
println!("{:?}", output);
}

I just search through rust source code and found that .take() uses Take under the hood. I think that this Take can be subclassed from something like TakePred
impl<I> Iterator for TakePred<I> where I: Iterator {
fn next(&mut self) -> Option<<I as Iterator>::Item> {
if self.n != 0 {
let elem = self.iter.next()
if !self.pred(elem) {
self.n -= 1
}
elem
} else {
None
}
}
}
So that solution for my problem would be this
lines.take_pred(3, |line| line == "\n")
And Take implementation would be this (pseudo)
Take(n) = TakePred(n, pred: always())
But it is just an idea, not an exact solution.

Related

Use par_split on a String, process using rayon and collect result in a Vector

I am trying to read a file into a string messages defined on line #14. The file contains several blocks where each block starts with a number. After I read the file contents into the string messahes, each block is separated by newline and each line in a block is separated by __SEP__. I would like to use par_split() on the string messages, process each block using rayon and collect output from each block into a vector vec_finale.g. by calling collect() on line 54 or some similar mechanism to produce a vector that contains vec_local on line 53 produced by each block. Any pointers on how I can achieve this are highly appreciated.
My code is as follows:
fn starts_with_digit_or_at_sign(inp: &str) -> bool {
let mut at_sign_found = false;
if inp.len() > 0 {
let ch = inp.chars().next().unwrap();
if ch.is_numeric() || ch == '#' {
return true;
}
}
return false;
}
fn main() {
let filepath = "inp.log";
let data = std::fs::read_to_string(filepath).expect("file not found!");
let mut messages: String = String::from("");
let separator_char = '\n';
let separator: String = String::from("__SEP__");
let mut found_first_message = false;
let mut start_of_new_msg = false;
let mut line_num = 0;
for line in data.lines() {
line_num += 1;
if line.len() > 0 {
if starts_with_digit_or_at_sign(line) {
start_of_new_msg = true;
if !found_first_message {
found_first_message = true;
} else {
messages.push(separator_char);
}
}
if found_first_message {
if !start_of_new_msg {
messages.push_str(&separator);
}
messages.push_str(line);
if start_of_new_msg {
start_of_new_msg = false;
let mut tmp = String::from("Lnumber ");
tmp.push_str(&line_num.to_string());
messages.push_str(&separator);
messages.push_str(&tmp);
}
}
}
}
messages.par_split(separator_char).for_each(|l| {
println!(
"line: '{}' len: {}, {}",
l,
l.len(),
rayon::current_num_threads()
);
let vec_local: Vec<i32> = vec![l.len() as i32];
}); // <-- line 54
}
Output produced by the cide is as follows:
line: '1__SEP__Lnumber 1__SEP__a__SEP__b__SEP__c' len: 41, 8
line: '3__SEP__Lnumber 9__SEP__g__SEP__h__SEP__i' len: 41, 8
line: '2__SEP__Lnumber 5__SEP__d__SEP__e__SEP__f' len: 41, 8
line: '4__SEP__Lnumber 13__SEP__j__SEP__k__SEP__l' len: 42, 8
File inp.log is as follows:
1
a
b
c
2
d
e
f
3
g
h
i
4
j
k
l
I was able to resolve the issue by using par_lines() instead as follows:
let tmp: Vec<_> = messages.par_lines().map(|l| proc_len(l)).collect();
...
...
...
fn proc_len(inp: &str) -> Vec<usize> {
let vec: Vec<usize> = vec![inp.len()];
return vec;
}

How to make this iterator/for loop idiomatic in Rust

How can I get rid of the ugly let mut i = 0; and i += 1;? Is there a more idomatic way I can write this loop? I tried .enumerate() but it doesn't work on a &[&str].
use std::fmt::Write;
pub fn build_proverb(list: &[&str]) -> String {
let mut proverb = String::new();
let mut i = 0;
for word in list {
i += 1;
if i < list.len() {
write!(proverb, "For want of a {} the {} was lost.\n", word, list[i]);
} else {
write!(proverb, "And all for the want of a {}.", list[0]);
}
}
proverb
}
You were close, enumerate() doesn't exist for &[&str] because enumerate() is a function on Iterator, and &[&str] is not an Iterator.
But you can call list.iter() to get an iterator, which you can then call enumerate() on.
Full example:
pub fn build_proverb(list: &[&str]) -> String {
let mut proverb = String::new();
for (i, word) in list.iter().enumerate() {
if i < list.len() {
write!(proverb, "For want of a {} the {} was lost.\n", word, list[i]);
} else {
write!(proverb, "And all for the want of a {}.", list[0]);
}
}
proverb
}
The fact you have word and list[i] (after incrementing i) makes it look like you want something like windows:
pub fn build_proverb(list: &[&str]) -> String {
let mut proverb = String::new();
for words in list.windows(2) {
let first = words[0];
let second = words[1];
write!(proverb, "For want of a {} the {} was lost.\n", first, second);
}
write!(proverb, "And all for the want of a {}.", list[0]);
proverb
}

Is there a way to update a string in place in rust?

You can also consider this as, is it possible to URLify a string in place in rust?
For example,
Problem statement: Replace whitespace with %20
Assumption: String will have enough capacity left to accommodate new characters.
Input: Hello how are you
Output: Hello%20how%20are%20you
I know there are ways to do this if we don't have to do this "in place". I am solving a problem that explicitly states that you have to update in place.
If there isn't any safe way to do this, is there any particular reason behind that?
[Edit]
I was able to solve this using unsafe approach, but would appreciate a better approach than this. More idiomatic approach if there is.
fn space_20(sentence: &mut String) {
if !sentence.is_ascii() {
panic!("Invalid string");
}
let chars: Vec<usize> = sentence.char_indices().filter(|(_, ch)| ch.is_whitespace()).map(|(idx, _)| idx ).collect();
let char_count = chars.len();
if char_count == 0 {
return;
}
let sentence_len = sentence.len();
sentence.push_str(&"*".repeat(char_count*2)); // filling string with * so that bytes array becomes of required size.
unsafe {
let bytes = sentence.as_bytes_mut();
let mut final_idx = sentence_len + (char_count * 2) - 1;
let mut i = sentence_len - 1;
let mut char_ptr = char_count - 1;
loop {
if i != chars[char_ptr] {
bytes[final_idx] = bytes[i];
if final_idx == 0 {
// all elements are filled.
println!("all elements are filled.");
break;
}
final_idx -= 1;
} else {
bytes[final_idx] = '0' as u8;
bytes[final_idx - 1] = '2' as u8;
bytes[final_idx - 2] = '%' as u8;
// final_idx is of type usize cannot be less than 0.
if final_idx < 3 {
println!("all elements are filled at start.");
break;
}
final_idx -= 3;
// char_ptr is of type usize cannot be less than 0.
if char_ptr > 0 {
char_ptr -= 1;
}
}
if i == 0 {
// all elements are parsed.
println!("all elements are parsed.");
break;
}
i -= 1;
}
}
}
fn main() {
let mut sentence = String::with_capacity(1000);
sentence.push_str(" hello, how are you?");
// sentence.push_str("hello, how are you?");
// sentence.push_str(" hello, how are you? ");
// sentence.push_str(" ");
// sentence.push_str("abcd");
space_20(&mut sentence);
println!("{}", sentence);
}
An O(n) solution that neither uses unsafe nor allocates (provided that the string has enough capacity), using std::mem::take:
fn urlify_spaces(text: &mut String) {
const SPACE_REPLACEMENT: &[u8] = b"%20";
// operating on bytes for simplicity
let mut buffer = std::mem::take(text).into_bytes();
let old_len = buffer.len();
let space_count = buffer.iter().filter(|&&byte| byte == b' ').count();
let new_len = buffer.len() + (SPACE_REPLACEMENT.len() - 1) * space_count;
buffer.resize(new_len, b'\0');
let mut write_pos = new_len;
for read_pos in (0..old_len).rev() {
let byte = buffer[read_pos];
if byte == b' ' {
write_pos -= SPACE_REPLACEMENT.len();
buffer[write_pos..write_pos + SPACE_REPLACEMENT.len()]
.copy_from_slice(SPACE_REPLACEMENT);
} else {
write_pos -= 1;
buffer[write_pos] = byte;
}
}
*text = String::from_utf8(buffer).expect("invalid UTF-8 during URL-ification");
}
(playground)
Basically, it calculates the final length of the string, sets up a reading pointer and a writing pointer, and translates the string from right to left. Since "%20" has more characters than " ", the writing pointer never catches up with the reading pointer.
Is it possible to do this without unsafe?
Yes like this:
fn main() {
let mut my_string = String::from("Hello how are you");
let mut insert_positions = Vec::new();
let mut char_counter = 0;
for c in my_string.chars() {
if c == ' ' {
insert_positions.push(char_counter);
char_counter += 2; // Because we will insert two extra chars here later.
}
char_counter += 1;
}
for p in insert_positions.iter() {
my_string.remove(*p);
my_string.insert(*p, '0');
my_string.insert(*p, '2');
my_string.insert(*p, '%');
}
println!("{}", my_string);
}
Here is the Playground.
But should you do it?
As discussed for example here on Reddit this is almost always not the recommended way of doing this, because both remove and insert are O(n) operations as noted in the documentation.
Edit
A slightly better version:
fn main() {
let mut my_string = String::from("Hello how are you");
let mut insert_positions = Vec::new();
let mut char_counter = 0;
for c in my_string.chars() {
if c == ' ' {
insert_positions.push(char_counter);
char_counter += 2; // Because we will insert two extra chars here later.
}
char_counter += 1;
}
for p in insert_positions.iter() {
my_string.remove(*p);
my_string.insert_str(*p, "%20");
}
println!("{}", my_string);
}
and the corresponding Playground.

How to trim space less than n times?

How to eliminate up to n spaces at the beginning of each line?
For example, when trim 4 space:
" 5" ->" 5"
" 4" ->"4"
" 3" ->"3"
const INPUT:&str = " 4\n 2\n0\n\n 6\n";
const OUTPUT:&str = "4\n2\n0\n\n 6\n";
#[test]
fn main(){
assert_eq!(&trim_deindent(INPUT,4), OUTPUT)
}
I was about to comment textwrap::dedent, but then I noticed "2", which has less than 4 spaces. So you wanted it to keep removing spaces, if there is any up until 4.
Just writing a quick solution, it could look something like this:
Your assert will pass, but note that lines ending in \r\n will be converted to \n, as lines does not provide a way to differentiate between \n and \r\n.
fn trim_deindent(text: &str, max: usize) -> String {
let mut new_text = text
.lines()
.map(|line| {
let mut max = max;
line.chars()
// Skip while `c` is a whitespace and at most `max` spaces
.skip_while(|c| {
if max == 0 {
false
} else {
max -= 1;
c.is_whitespace()
}
})
.collect::<String>()
})
.collect::<Vec<_>>()
.join("\n");
// Did the original `text` end with a `\n` then add it again
if text.ends_with('\n') {
new_text.push('\n');
}
new_text
}
If you want to retain both \n and \r\n then you can go a more complex route of scanning through the string, and thus avoiding using lines.
fn trim_deindent(text: &str, max: usize) -> String {
let mut new_text = String::new();
let mut line_start = 0;
loop {
let mut max = max;
// Skip `max` spaces
let after_space = text[line_start..].chars().position(|c| {
// We can't use `is_whitespace` here, as that will skip past `\n` and `\r` as well
if (max == 0) || !is_horizontal_whitespace(c) {
true
} else {
max -= 1;
false
}
});
if let Some(after_space) = after_space {
let after_space = line_start + after_space;
let line = &text[after_space..];
// Find `\n` or use the line length (if it's the last line)
let end = line
.chars()
.position(|c| c == '\n')
.unwrap_or_else(|| line.len());
// Push the line (including the line ending) onto `new_text`
new_text.push_str(&line[..=end]);
line_start = after_space + end + 1;
} else {
break;
}
}
new_text
}
#[inline]
fn is_horizontal_whitespace(c: char) -> bool {
(c != '\r') && (c != '\n') && c.is_whitespace()
}

Split a string keeping the separators

Is there a trivial way to split a string keeping the separators?
Instead of this:
let texte = "Ten. Million. Questions. Let's celebrate all we've done together.";
let v: Vec<&str> = texte.split(|c: char| !(c.is_alphanumeric() || c == '\'')).filter(|s| !s.is_empty()).collect();
which results with ["Ten", "Million", "Questions", "Let's", "celebrate", "all", "we've", "done", "together"].
I would like something that gives me :
["Ten", ".", " ", "Million", ".", " ", "Questions", ".", " ", "Let's", " ", "celebrate", " ", "all", " ", "we've", " ", "done", " ", "together", "."].
I am trying that kind of code (it assumes the string begins with a letter and ends with a 'non'-letter) :
let texte = "Ten. Million. Questions. Let's celebrate all we've done together. ";
let v1: Vec<&str> = texte.split(|c: char| !(c.is_alphanumeric() || c == '\'')).filter(|s| !s.is_empty()).collect();
let v2: Vec<&str> = texte.split(|c: char| c.is_alphanumeric() || c == '\'').filter(|s| !s.is_empty()).collect();
let mut w: Vec<&str> = Vec::new();
let mut j = 0;
for i in v2 {
w.push(v1[j]);
w.push(i);
j = j+1;
}
It gives me almost the result I wrote earlier but it's good :
["Ten", ". ", "Million", ". ", "Questions", ". ", "Let's", " ", "celebrate", " ", "all", " ", "we've", " ", "done", " ", "together", "."]
However is there a better way to code that ? Because I tried to enumerate on v2 but it didn't work, and it looks rough to use j in the for loop.
Using str::match_indices:
let text = "Ten. Million. Questions. Let's celebrate all we've done together.";
let mut result = Vec::new();
let mut last = 0;
for (index, matched) in text.match_indices(|c: char| !(c.is_alphanumeric() || c == '\'')) {
if last != index {
result.push(&text[last..index]);
}
result.push(matched);
last = index + matched.len();
}
if last < text.len() {
result.push(&text[last..]);
}
println!("{:?}", result);
Prints:
["Ten", ".", " ", "Million", ".", " ", "Questions", ".", " ", "Let\'s", " ", "celebrate", " ", "all", " ", "we\'ve", " ", "done", " ", "together", "."]
str::split_inclusive, available since Rust 1.51, returns an iterator keeping the delimiters as part of the matched strings, and may be useful in certain cases:
#[test]
fn split_with_delimiter() {
let items: Vec<_> = "alpha,beta;gamma"
.split_inclusive(&[',', ';'][..])
.collect();
assert_eq!(&items, &["alpha,", "beta;", "gamma"]);
}
#[test]
fn split_with_delimiter_allows_consecutive_delimiters() {
let items: Vec<_> = ",;".split_inclusive(&[',', ';'][..]).collect();
assert_eq!(&items, &[",", ";"]);
}
I was not able to find anything in the standard library, so I wrote my own:
This version uses the unstable pattern API as it's more flexible, but the link above has a fallback that I've hardcoded for my specific stable usecase.
#![feature(pattern)]
use std::str::pattern::{Pattern, Searcher};
#[derive(Copy, Clone, Debug, PartialEq)]
pub enum SplitType<'a> {
Match(&'a str),
Delimiter(&'a str),
}
pub struct SplitKeepingDelimiter<'p, P>
where
P: Pattern<'p>,
{
searcher: P::Searcher,
start: usize,
saved: Option<usize>,
}
impl<'p, P> Iterator for SplitKeepingDelimiter<'p, P>
where
P: Pattern<'p>,
{
type Item = SplitType<'p>;
fn next(&mut self) -> Option<Self::Item> {
if self.start == self.searcher.haystack().len() {
return None;
}
if let Some(end_of_match) = self.saved.take() {
let s = &self.searcher.haystack()[self.start..end_of_match];
self.start = end_of_match;
return Some(SplitType::Delimiter(s));
}
match self.searcher.next_match() {
Some((start, end)) => {
if self.start == start {
let s = &self.searcher.haystack()[start..end];
self.start = end;
Some(SplitType::Delimiter(s))
} else {
let s = &self.searcher.haystack()[self.start..start];
self.start = start;
self.saved = Some(end);
Some(SplitType::Match(s))
}
}
None => {
let s = &self.searcher.haystack()[self.start..];
self.start = self.searcher.haystack().len();
Some(SplitType::Match(s))
}
}
}
}
pub trait SplitKeepingDelimiterExt: ::std::ops::Index<::std::ops::RangeFull, Output = str> {
fn split_keeping_delimiter<P>(&self, pattern: P) -> SplitKeepingDelimiter<P>
where
P: for<'a> Pattern<'a>,
{
SplitKeepingDelimiter {
searcher: pattern.into_searcher(&self[..]),
start: 0,
saved: None,
}
}
}
impl SplitKeepingDelimiterExt for str {}
#[cfg(test)]
mod test {
use super::SplitKeepingDelimiterExt;
#[test]
fn split_with_delimiter() {
use super::SplitType::*;
let delims = &[',', ';'][..];
let items: Vec<_> = "alpha,beta;gamma".split_keeping_delimiter(delims).collect();
assert_eq!(
&items,
&[
Match("alpha"),
Delimiter(","),
Match("beta"),
Delimiter(";"),
Match("gamma")
]
);
}
#[test]
fn split_with_delimiter_allows_consecutive_delimiters() {
use super::SplitType::*;
let delims = &[',', ';'][..];
let items: Vec<_> = ",;".split_keeping_delimiter(delims).collect();
assert_eq!(&items, &[Delimiter(","), Delimiter(";")]);
}
}
You'll note that I needed to track if something was one of the delimiters or not, but that should be easy to adapt if you don't need it.

Resources