How do I get a substring between two patterns in Rust? - string

I want to create a substring in Rust. It starts with an occurrence of a string and ends at the end of the string minus four characters or at a certain character.
My first approach was
string[string.find("pattern").unwrap()..string.len()-5]
That is wrong because Rust's strings are valid UTF-8 and thus byte and not char based.
My second approach is correct but too verbose:
let start_bytes = string.find("pattern").unwrap();
let mut char_byte_counter = 0;
let result = line.chars()
.skip_while(|c| {
char_byte_counter += c.len_utf8();
return start_bytes > char_byte_counter;
})
.take_while(|c| *c != '<')
.collect::<String>();
Are there simpler ways to create substrings? Is there any part of the standard library I did not find?

I don't remember a built-in library function in other languages that works exactly the way you want (give me the substring between two patterns, or between the first and the end if the second does not exist).
I think you would have to write some custom logic anyway.
The closest equivalent to a "substring" function is slicing. However (as you found out) it works with bytes, not with unicode characters, so you will have to be careful with indices. In "Löwe", the 'e' is at (byte) index 4, not 3 (playground). But you can still use it in your case, because you are not working with indices directly (using find instead to... find the index you need for you)
Here's how you could do it with slicing (bonus, you don't need to re-allocate other Strings):
// adding some unicode to check that everything works
// also ouside of ASCII
let line = "asdfapatterndf1老虎23<12";
let start_bytes = line.find("pattern").unwrap_or(0); //index where "pattern" starts
// or beginning of line if
// "pattern" not found
let end_bytes = line.find("<").unwrap_or(line.len()); //index where "<" is found
// or end of line
let result = &line[start_bytes..end_bytes]; //slicing line, returns patterndf1老虎23

Try using something like the following method:
//Return result in &str or empty &str if not found
fn between<'a>(source: &'a str, start: &'a str, end: &'a str) -> &'a str {
let start_position = source.find(start);
if start_position.is_some() {
let start_position = start_position.unwrap() + start.len();
let source = &source[start_position..];
let end_position = source.find(end).unwrap_or_default();
return &source[..end_position];
}
return "";
}

This method approximate to O(n) with char and grapheme in mind. It works, but I'm not sure if there are any bugs.
fn between(str: &String, start: String, end: String, limit_one:bool, ignore_case: bool) -> Vec<String> {
let mut result:Vec<String> = vec![];
let mut starts = start.graphemes(true);
let mut ends = end.graphemes(true);
let sc = start.graphemes(true).count();
let ec = end.graphemes(true).count();
let mut m = 0;
let mut started:bool = false;
let mut temp = String::from("");
let mut temp2 = String::from("");
for c in str.graphemes(true) {
if started == false {
let opt = starts.next();
match opt {
Some(d) => {
if (ignore_case && c.to_uppercase().cmp(&d.to_uppercase()) == std::cmp::Ordering::Equal) || c == d {
m += 1;
if m == sc {
started = true;
starts = start.graphemes(true);
}
} else {
m = 0;
starts = start.graphemes(true);
}
},
None => {
starts = start.graphemes(true);
let opt = starts.next();
match opt {
Some(e) => {
if (ignore_case && c.to_uppercase().cmp(&e.to_uppercase()) == std::cmp::Ordering::Equal) || c == e {
m += 1;
if m == sc {
started = true;
starts = start.graphemes(true);
}
}
},
None => {}
}
}
}
}
else if started == true {
let opt = ends.next();
match opt {
Some(e) => {
if (ignore_case && c.to_uppercase().cmp(&e.to_uppercase()) == std::cmp::Ordering::Equal) || c == e {
m += 1;
temp2.push_str(e);
}
else {
temp.push_str(&temp2.to_string());
temp2 = String::from("") ;
temp.push_str(c);
ends = end.graphemes(true);
}
},
None => {
ends = end.graphemes(true);
let opt = ends.next();
match opt {
Some(e) => {
if (ignore_case && c.to_uppercase().cmp(&e.to_uppercase()) == std::cmp::Ordering::Equal) || c == e {
m += 1;
temp2.push_str(e);
}
else {
temp.push_str(&temp2.to_string());
temp2 = String::from("") ;
temp.push_str(c);
ends = end.graphemes(true);
}
},
None => {
}
}
}
}
if temp2.graphemes(true).count() == end.graphemes(true).count() {
temp2 = String::from("") ;
result.push(temp);
if limit_one == true { return result; }
started = false;
temp = String::from("") ;
}
}
}
return result;
}

Related

How to convert 2 bounded loop to iteration syntax

How can I convert this loop based implementation to iteration syntax?
fn parse_number<B: AsRef<str>>(input: B) -> Option<u32> {
let mut started = false;
let mut b = String::with_capacity(50);
let radix = 16;
for c in input.as_ref().chars() {
match (started, c.is_digit(radix)) {
(false, false) => {},
(false, true) => {
started = true;
b.push(c);
},
(true, false) => {
break;
}
(true, true) => {
b.push(c);
},
}
}
if b.len() == 0 {
None
} else {
match u32::from_str_radix(b.as_str(), radix) {
Ok(v) => Some(v),
Err(_) => None,
}
}
}
The main problem that I found is that you need to terminate the iterator early and be able to ignore characters until the first numeric char is found.
.map_while() fails because it has no state.
.reduce() and .fold() would iterate over the entire str regardless if the number has already ended.
It looks like you want to find the first sequence of digits while ignoring any non-digits before that. You can use a combination of .skip_while and .take_while:
fn parse_number<B: AsRef<str>>(input: B) -> Option<u32> {
let input = input.as_ref();
let radix = 10;
let digits: String = input.chars()
.skip_while(|c| !c.is_digit(radix))
.take_while(|c| c.is_digit(radix))
.collect();
u32::from_str_radix(&digits, radix).ok()
}
fn main() {
dbg!(parse_number("I have 52 apples"));
}
[src/main.rs:14] parse_number("I have 52 apples") = Some(
52,
)

How do i convert and calculate a string expression into arithmetic expression without external crate?

How do i convert and calculate a string expression into arithmetic expression without external crate
for example: “500+10-66*32”. expected result = 14208 (Do not want the precedence of operator)
//a = ‘+’, b = ‘-’, c = ‘*’, d = ‘/’, e = ‘(’, f = ‘)’
use std::collections::VecDeque;
fn calculate(s: String) -> i32 {
let mut multi_active = false;
const SPACE: char = ' ';
const SIGN_PLUS: char = '+';
const SIGN_MINUS: char = '-';
const SIGN_MULTIPLY: char = '*';
const SIGN_DIVIDE: char = '/';
const PAREN_OPEN: char = '(';
const PAREN_CLOSED: char = ')';
let len_s: usize = s.len();
let mut num: i32 = 0;
let mut ans: i32 = 0;
let mut sign: i32 = 1;
let mut stk: VecDeque<i32> = VecDeque::with_capacity(len_s);
stk.push_back(sign);
for ch in s.chars() {
println!("chars:{}",ch);
match ch {
'0'.. => {
num = num * 10 + (ch as i32 - '0' as i32);
println!("given numbers:{num}");
}
SIGN_PLUS | SIGN_MINUS => {
// println!("b4 ans = {ans}");
// println!("b4 sig = {sign}");
// println!("b4 num = {num}");
ans += sign * num;
sign = stk.back().unwrap() * if ch == SIGN_PLUS { 1 } else { -1 };
num = 0;
// println!("addition ans = {ans}");
// println!("multiply sig = {sign}");
// println!("multiply num = {num}");
multi_active = false;
}
PAREN_OPEN => {
stk.push_back(sign);
// println!("brak open");
// multi_active = false;
}
PAREN_CLOSED => {
stk.pop_back();
// multi_active = false;
}
SIGN_MULTIPLY => {
println!("b4 ans = {ans}"); //0 always
println!("b4 sig = {sign}"); // 1 always
println!("b4 num = {num}");// 10 first number
// 10 = 0 + 1 * 10
ans = ans + sign * num; // current ans = 10 target=>27
println!("simple multi- {}", ans);
//ans=3;
sign = stk.back().unwrap() * if ch == SIGN_MULTIPLY { 1 } else { -1 };
num = 0;
// println!("multiply ans = {ans}");
// println!("multiply sig = {sign}");
// println!("multiply num = {num}");
multi_active = true;
}
_ => {}
}
}
println!("final:{ans}####{sign}####{num}===={:?}",multi_active);
// if multi_active {
// // ans = (ans-3)*num+ 3;
// ans= ans*num;
// }
// else{
ans = ans + sign * num;
//}
ans
}
fn main() {
let inputs = "2+44+6+1".to_owned();
let outs = calculate(inputs);
println!("{outs}");
}
Expected Results
Input: “500+10-66*32”
Result: 14208
I have sucessfully implementde addition and subtraction, now stuck with order of precedence from left to right.
Here's a poor man's calculator. No negative values, no operator precedence, no parentheticals, and no graceful error handling; use at your own risk:
fn main() {
let e = "500+10-66*32";
let ops = ['+', '-', '*', '/'];
let values: Vec<f64> = e.split(&ops).map(|v| v.trim().parse().unwrap()).collect();
let operands: Vec<_> = e.matches(&ops).collect();
let (&(mut curr), values) = values.split_first().unwrap();
for (op, &value) in operands.into_iter().zip(values) {
match op {
"+" => { curr = curr + value },
"-" => { curr = curr - value },
"*" => { curr = curr * value },
"/" => { curr = curr / value },
_ => unreachable!(),
}
}
println!("{}", curr);
}
14208
Apart from a few compiles errors (empty char ''). You have no code to do the arithmetic. You are doing lexical manipulations.
I would give more advice but am stopped by confusion. I don't know what you are trying to do with the as and bs, etc.

Rust: why can't I pattern match a mut String Option inside a loop?

I have this code here, where I try to extract some text based on a delimiter:
//not working
let mut text: Option<String> = None;
for d in DELIMITERS {
let split_res = full_text.split_once(d);
if let Some((_, t0)) = split_res {
let t = t0.to_string();
match text {
None => text = Some(t),
Some(t2) => if t.len() < t2.len() {
text = Some(t);
}
}
}
}
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=8e4e0e8d7b2271b8f8ebd126896236ea
But I get these errors from the compiler
^^ value moved here, in previous iteration of loop
note: these 2 reinitializations might get skipped
What's going on here?
Why can't I pattern match on text? Is the problem that my text variable gets consumed? I can't really understand where and how?
Changing the code to use as_ref() on text fixes the error but I don't understand why this is necessary:
//working
let mut text: Option<String> = None;
for d in DELIMITERS {
let split_res = full_text.split_once(d);
if let Some((_, t0)) = split_res {
let t = t0.to_string();
match text.as_ref() {
None => text = Some(t),
Some(t2) => if t.len() < t2.len() {
text = Some(t);
}
}
}
}
If you do not use as_ref, you are moving the object hence consuming it, so it is not available in the next iteration.
You can also match a reference:
match &text {}
Playground
Or you can take the inner value, leaving a None behind, that way you do not drop it. And since you re-assignate it afterwards keeps the same functionality:
const DELIMITERS: [&'static str; 2] = [
"example",
"not-found",
];
fn main() {
let full_text = String::from("This is an example test");
let mut text: Option<String> = None;
for d in DELIMITERS {
let split_res = full_text.split_once(d);
if let Some((_, t0)) = split_res {
let t = t0.to_string();
match text.take() {
None => text = Some(t),
Some(t2) => if t.len() < t2.len() {
text = Some(t);
}
}
}
}
if let Some(t) = text {
println!("{}", t);
}
}
Playground

Is there a way to update a string in place in rust?

You can also consider this as, is it possible to URLify a string in place in rust?
For example,
Problem statement: Replace whitespace with %20
Assumption: String will have enough capacity left to accommodate new characters.
Input: Hello how are you
Output: Hello%20how%20are%20you
I know there are ways to do this if we don't have to do this "in place". I am solving a problem that explicitly states that you have to update in place.
If there isn't any safe way to do this, is there any particular reason behind that?
[Edit]
I was able to solve this using unsafe approach, but would appreciate a better approach than this. More idiomatic approach if there is.
fn space_20(sentence: &mut String) {
if !sentence.is_ascii() {
panic!("Invalid string");
}
let chars: Vec<usize> = sentence.char_indices().filter(|(_, ch)| ch.is_whitespace()).map(|(idx, _)| idx ).collect();
let char_count = chars.len();
if char_count == 0 {
return;
}
let sentence_len = sentence.len();
sentence.push_str(&"*".repeat(char_count*2)); // filling string with * so that bytes array becomes of required size.
unsafe {
let bytes = sentence.as_bytes_mut();
let mut final_idx = sentence_len + (char_count * 2) - 1;
let mut i = sentence_len - 1;
let mut char_ptr = char_count - 1;
loop {
if i != chars[char_ptr] {
bytes[final_idx] = bytes[i];
if final_idx == 0 {
// all elements are filled.
println!("all elements are filled.");
break;
}
final_idx -= 1;
} else {
bytes[final_idx] = '0' as u8;
bytes[final_idx - 1] = '2' as u8;
bytes[final_idx - 2] = '%' as u8;
// final_idx is of type usize cannot be less than 0.
if final_idx < 3 {
println!("all elements are filled at start.");
break;
}
final_idx -= 3;
// char_ptr is of type usize cannot be less than 0.
if char_ptr > 0 {
char_ptr -= 1;
}
}
if i == 0 {
// all elements are parsed.
println!("all elements are parsed.");
break;
}
i -= 1;
}
}
}
fn main() {
let mut sentence = String::with_capacity(1000);
sentence.push_str(" hello, how are you?");
// sentence.push_str("hello, how are you?");
// sentence.push_str(" hello, how are you? ");
// sentence.push_str(" ");
// sentence.push_str("abcd");
space_20(&mut sentence);
println!("{}", sentence);
}
An O(n) solution that neither uses unsafe nor allocates (provided that the string has enough capacity), using std::mem::take:
fn urlify_spaces(text: &mut String) {
const SPACE_REPLACEMENT: &[u8] = b"%20";
// operating on bytes for simplicity
let mut buffer = std::mem::take(text).into_bytes();
let old_len = buffer.len();
let space_count = buffer.iter().filter(|&&byte| byte == b' ').count();
let new_len = buffer.len() + (SPACE_REPLACEMENT.len() - 1) * space_count;
buffer.resize(new_len, b'\0');
let mut write_pos = new_len;
for read_pos in (0..old_len).rev() {
let byte = buffer[read_pos];
if byte == b' ' {
write_pos -= SPACE_REPLACEMENT.len();
buffer[write_pos..write_pos + SPACE_REPLACEMENT.len()]
.copy_from_slice(SPACE_REPLACEMENT);
} else {
write_pos -= 1;
buffer[write_pos] = byte;
}
}
*text = String::from_utf8(buffer).expect("invalid UTF-8 during URL-ification");
}
(playground)
Basically, it calculates the final length of the string, sets up a reading pointer and a writing pointer, and translates the string from right to left. Since "%20" has more characters than " ", the writing pointer never catches up with the reading pointer.
Is it possible to do this without unsafe?
Yes like this:
fn main() {
let mut my_string = String::from("Hello how are you");
let mut insert_positions = Vec::new();
let mut char_counter = 0;
for c in my_string.chars() {
if c == ' ' {
insert_positions.push(char_counter);
char_counter += 2; // Because we will insert two extra chars here later.
}
char_counter += 1;
}
for p in insert_positions.iter() {
my_string.remove(*p);
my_string.insert(*p, '0');
my_string.insert(*p, '2');
my_string.insert(*p, '%');
}
println!("{}", my_string);
}
Here is the Playground.
But should you do it?
As discussed for example here on Reddit this is almost always not the recommended way of doing this, because both remove and insert are O(n) operations as noted in the documentation.
Edit
A slightly better version:
fn main() {
let mut my_string = String::from("Hello how are you");
let mut insert_positions = Vec::new();
let mut char_counter = 0;
for c in my_string.chars() {
if c == ' ' {
insert_positions.push(char_counter);
char_counter += 2; // Because we will insert two extra chars here later.
}
char_counter += 1;
}
for p in insert_positions.iter() {
my_string.remove(*p);
my_string.insert_str(*p, "%20");
}
println!("{}", my_string);
}
and the corresponding Playground.

How to trim space less than n times?

How to eliminate up to n spaces at the beginning of each line?
For example, when trim 4 space:
" 5" ->" 5"
" 4" ->"4"
" 3" ->"3"
const INPUT:&str = " 4\n 2\n0\n\n 6\n";
const OUTPUT:&str = "4\n2\n0\n\n 6\n";
#[test]
fn main(){
assert_eq!(&trim_deindent(INPUT,4), OUTPUT)
}
I was about to comment textwrap::dedent, but then I noticed "2", which has less than 4 spaces. So you wanted it to keep removing spaces, if there is any up until 4.
Just writing a quick solution, it could look something like this:
Your assert will pass, but note that lines ending in \r\n will be converted to \n, as lines does not provide a way to differentiate between \n and \r\n.
fn trim_deindent(text: &str, max: usize) -> String {
let mut new_text = text
.lines()
.map(|line| {
let mut max = max;
line.chars()
// Skip while `c` is a whitespace and at most `max` spaces
.skip_while(|c| {
if max == 0 {
false
} else {
max -= 1;
c.is_whitespace()
}
})
.collect::<String>()
})
.collect::<Vec<_>>()
.join("\n");
// Did the original `text` end with a `\n` then add it again
if text.ends_with('\n') {
new_text.push('\n');
}
new_text
}
If you want to retain both \n and \r\n then you can go a more complex route of scanning through the string, and thus avoiding using lines.
fn trim_deindent(text: &str, max: usize) -> String {
let mut new_text = String::new();
let mut line_start = 0;
loop {
let mut max = max;
// Skip `max` spaces
let after_space = text[line_start..].chars().position(|c| {
// We can't use `is_whitespace` here, as that will skip past `\n` and `\r` as well
if (max == 0) || !is_horizontal_whitespace(c) {
true
} else {
max -= 1;
false
}
});
if let Some(after_space) = after_space {
let after_space = line_start + after_space;
let line = &text[after_space..];
// Find `\n` or use the line length (if it's the last line)
let end = line
.chars()
.position(|c| c == '\n')
.unwrap_or_else(|| line.len());
// Push the line (including the line ending) onto `new_text`
new_text.push_str(&line[..=end]);
line_start = after_space + end + 1;
} else {
break;
}
}
new_text
}
#[inline]
fn is_horizontal_whitespace(c: char) -> bool {
(c != '\r') && (c != '\n') && c.is_whitespace()
}

Resources