Replacing numbered placeholders with elements of a vector in Rust? - string

I have the following:
A Vec<&str>.
A &str that may contain $0, $1, etc. referencing the elements in the vector.
I want to get a version of my &str where all occurences of $i are replaced by the ith element of the vector. So if I have vec!["foo", "bar"] and $0$1, the result would be foobar.
My first naive approach was to iterate over i = 1..N and do a search and replace for every index. However, this is a quite ugly and inefficient solution. Also, it gives undesired outputs if any of the values in the vector contains the $ character.
Is there a better way to do this in Rust?

This solution is inspired (including copied test cases) by Shepmaster's, but simplifies things by using the replace_all method.
use regex::{Regex, Captures};
fn template_replace(template: &str, values: &[&str]) -> String {
let regex = Regex::new(r#"\$(\d+)"#).unwrap();
regex.replace_all(template, |captures: &Captures| {
values
.get(index(captures))
.unwrap_or(&"")
}).to_string()
}
fn index(captures: &Captures) -> usize {
captures.get(1)
.unwrap()
.as_str()
.parse()
.unwrap()
}
fn main() {
assert_eq!("ab", template_replace("$0$1", &["a", "b"]));
assert_eq!("$1b", template_replace("$0$1", &["$1", "b"]));
assert_eq!("moo", template_replace("moo", &[]));
assert_eq!("abc", template_replace("a$0b$0c", &[""]));
assert_eq!("abcde", template_replace("a$0c$1e", &["b", "d"]));
println!("It works!");
}

I would use a regex
use regex::Regex; // 1.1.0
fn example(s: &str, vals: &[&str]) -> String {
let r = Regex::new(r#"\$(\d+)"#).unwrap();
let mut start = 0;
let mut new = String::new();
for caps in r.captures_iter(s) {
let m = caps.get(0).expect("Regex group 0 missing");
let d = caps.get(1).expect("Regex group 1 missing");
let d: usize = d.as_str().parse().expect("Could not parse index");
// Copy non-placeholder
new.push_str(&s[start..m.start()]);
// Copy placeholder
new.push_str(&vals[d]);
start = m.end()
}
// Copy non-placeholder
new.push_str(&s[start..]);
new
}
fn main() {
assert_eq!("ab", example("$0$1", &["a", "b"]));
assert_eq!("$1b", example("$0$1", &["$1", "b"]));
assert_eq!("moo", example("moo", &[]));
assert_eq!("abc", example("a$0b$0c", &[""]));
}
See also:
Split a string keeping the separators

Related

How can I duplicate the first and last elements of a vector?

I would like to take a vector of characters and duplicate the first letter and the last one.
The only way I managed to do that is with this ugly code:
fn repeat_ends(s: &Vec<char>) -> Vec<char> {
let mut result: Vec<char> = Vec::new();
let first = s.first().unwrap();
let last = s.last().unwrap();
result.push(*first);
result.append(&mut s.clone());
result.push(*last);
result
}
fn main() {
let test: Vec<char> = String::from("Hello world !").chars().collect();
println!("{:?}", repeat_ends(&test)); // "HHello world !!"
}
What would be a better way to do it?
I am not sure if it is "better" but one way is using slice patterns:
fn repeat_ends(s: &Vec<char>) -> Vec<char> {
match s[..] {
[first, .. , last ] => {
let mut out = Vec::with_capacity(s.len() + 2);
out.push(first);
out.extend(s);
out.push(last);
out
},
_ => panic!("whatever"), // or s.clone()
}
}
If it can be mutable:
fn repeat_ends(s: &mut Vec<char>) {
if let [first, .. , last ] = s[..] {
s.insert(0, first);
s.push(last);
}
}
If it's ok to mutate the original vector, this does the job:
fn repeat_ends(s: &mut Vec<char>) {
let first = *s.first().unwrap();
s.insert(0, first);
let last = *s.last().unwrap();
s.push(last);
}
fn main() {
let mut test: Vec<char> = String::from("Hello world !").chars().collect();
repeat_ends(&mut test);
println!("{}", test.into_iter().collect::<String>()); // "HHello world !!"
}
Vec::insert:
Inserts an element at position index within the vector, shifting all elements after it to the right.
This means the function repeat_ends would be O(n) with n being the number of characters in the vector. I'm not sure if there is a more efficient method if you need to use a vector, but I'd be curious to hear it if there is.

How to change order of letter in rust with swap?

I need to change order in every word in sentence. I have string of separators to split my code into a words, and function swap(0,1) to change order of letters in word. However I need to skip first and last letter in every word and I can't use regular expressions for this purposes.
Here some code:
const SEPARATORS: &str = " ,;:!?./%*$=+)#_-('\"&1234567890\r\n";
fn main() {
dbg!(mix("According, research"));
}
fn mix(s: &str) -> String {
let mut a: Vec<char> = s.chars().collect();
for group in a.split_mut(|num| SEPARATORS.contains(*num)) {
group.chunks_exact_mut(2).for_each(|x| x.swap(0, 1));
}
a.iter().collect()
}
Output as follows:
[src/main.rs:4] mix("According, research") = "cAocdrnig, eresrahc"
But I need output as follows:
[src/main.rs:4] mix("According, research") = "Accroidng, rseaerch"
Someone knows how to fix it ?
All you need is to use a slice that doesn't have the first and last character using group[1..len-2]:
const SEPARATORS: &str = " ,;:!?./%*$=+)#_-('\"&1234567890\r\n";
fn main() {
dbg!(mix("According, research"));
}
fn mix(s: &str) -> String {
let mut a: Vec<char> = s.chars().collect();
for group in a.split_mut(|num| SEPARATORS.contains(*num)) {
let len = group.len();
if len > 2 {
group[1..len-2].chunks_exact_mut(2).for_each(|x| x.swap(0, 1));
}
}
a.iter().collect()
}

How to convert a string of digits into a vector of digits?

I'm trying to store a string (or str) of digits, e.g. 12345 into a vector, such that the vector contains {1,2,3,4,5}.
As I'm totally new to Rust, I'm having problems with the types (String, str, char, ...) but also the lack of any information about conversion.
My current code looks like this:
fn main() {
let text = "731671";
let mut v: Vec<i32>;
let mut d = text.chars();
for i in 0..text.len() {
v.push( d.next().to_digit(10) );
}
}
You're close!
First, the index loop for i in 0..text.len() is not necessary since you're going to use an iterator anyway. It's simpler to loop directly over the iterator: for ch in text.chars(). Not only that, but your index loop and the character iterator are likely to diverge, because len() returns you the number of bytes and chars() returns you the Unicode scalar values. Being UTF-8, the string is likely to have fewer Unicode scalar values than it has bytes.
Next hurdle is that to_digit(10) returns an Option, telling you that there is a possibility the character won't be a digit. You can check whether to_digit(10) returned the Some variant of an Option with if let Some(digit) = ch.to_digit(10).
Pieced together, the code might now look like this:
fn main() {
let text = "731671";
let mut v = Vec::new();
for ch in text.chars() {
if let Some(digit) = ch.to_digit(10) {
v.push(digit);
}
}
println!("{:?}", v);
}
Now, this is rather imperative: you're making a vector and filling it digit by digit, all by yourself. You can try a more declarative or functional approach by applying a transformation over the string:
fn main() {
let text = "731671";
let v: Vec<u32> = text.chars().flat_map(|ch| ch.to_digit(10)).collect();
println!("{:?}", v);
}
ArtemGr's answer is pretty good, but their version will skip any characters that aren't digits. If you'd rather have it fail on bad digits, you can use this version instead:
fn to_digits(text: &str) -> Option<Vec<u32>> {
text.chars().map(|ch| ch.to_digit(10)).collect()
}
fn main() {
println!("{:?}", to_digits("731671"));
println!("{:?}", to_digits("731six71"));
}
Output:
Some([7, 3, 1, 6, 7, 1])
None
To mention the quick and dirty elephant in the room, if you REALLY know your string contains only digits in the range '0'..'9', than you can avoid memory allocations and copies and use the underlying &[u8] representation of String from str::as_bytes directly. Subtract b'0' from each element whenever you access it.
If you are doing competitive programming, this is one of the worthwhile speed and memory optimizations.
fn main() {
let text = "12345";
let digit = text.as_bytes();
println!("Text = {:?}", text);
println!("value of digit[3] = {}", digit[3] - b'0');
}
Output:
Text = "12345"
value of digit[3] = 4
This solution combines ArtemGr's + notriddle's solutions:
fn to_digits(string: &str) -> Vec<u32> {
let opt_vec: Option<Vec<u32>> = string
.chars()
.map(|ch| ch.to_digit(10))
.collect();
match opt_vec {
Some(vec_of_digits) => vec_of_digits,
None => vec![],
}
}
In my case, I implemented this function in &str.
pub trait ExtraProperties {
fn to_digits(self) -> Vec<u32>;
}
impl ExtraProperties for &str {
fn to_digits(self) -> Vec<u32> {
let opt_vec: Option<Vec<u32>> = self
.chars()
.map(|ch| ch.to_digit(10))
.collect();
match opt_vec {
Some(vec_of_digits) => vec_of_digits,
None => vec![],
}
}
}
In this way, I transform &str to a vector containing digits.
fn main() {
let cnpj: &str = "123456789";
let nums: Vec<u32> = cnpj.to_digits();
println!("cnpj: {cnpj}"); // cnpj: 123456789
println!("nums: {nums:?}"); // nums: [1, 2, 3, 4, 5, 6, 7, 8, 9]
}
See the Rust Playground.

How to "crop" characters off the beginning of a string in Rust?

I want a function that can take two arguments (string, number of letters to crop off front) and return the same string except with the letters before character x gone.
If I write
let mut example = "stringofletters";
CropLetters(example, 3);
println!("{}", example);
then the output should be:
ingofletters
Is there any way I can do this?
In many uses it would make sense to simply return a slice of the input, avoiding any copy. Converting #Shepmaster's solution to use immutable slices:
fn crop_letters(s: &str, pos: usize) -> &str {
match s.char_indices().skip(pos).next() {
Some((pos, _)) => &s[pos..],
None => "",
}
}
fn main() {
let example = "stringofletters"; // works with a String if you take a reference
let cropped = crop_letters(example, 3);
println!("{}", cropped);
}
Advantages over the mutating version are:
No copy is needed. You can call cropped.to_string() if you want a newly allocated result; but you don't have to.
It works with static string slices as well as mutable String etc.
The disadvantage is that if you really do have a mutable string you want to modify, it would be slightly less efficient as you'd need to allocate a new String.
Issues with your original code:
Functions use snake_case, types and traits use CamelCase.
"foo" is a string literal of type &str. These may not be changed. You will need something that has been heap-allocated, such as a String.
The call crop_letters(stringofletters, 3) would transfer ownership of stringofletters to the method, which means you wouldn't be able to use the variable anymore. You must pass in a mutable reference (&mut).
Rust strings are not ASCII, they are UTF-8. You need to figure out how many bytes each character requires. char_indices is a good tool here.
You need to handle the case of when the string is shorter than 3 characters.
Once you have the byte position of the new beginning of the string, you can use drain to move a chunk of bytes out of the string. We just drop these bytes and let the String move over the remaining bytes.
fn crop_letters(s: &mut String, pos: usize) {
match s.char_indices().nth(pos) {
Some((pos, _)) => {
s.drain(..pos);
}
None => {
s.clear();
}
}
}
fn main() {
let mut example = String::from("stringofletters");
crop_letters(&mut example, 3);
assert_eq!("ingofletters", example);
}
See Chris Emerson's answer if you don't actually need to modify the original String.
I found this answer which I don't consider really idiomatic:
fn crop_with_allocation(string: &str, len: usize) -> String {
string.chars().skip(len).collect()
}
fn crop_without_allocation(string: &str, len: usize) -> &str {
// optional length check
if string.len() < len {
return &"";
}
&string[len..]
}
fn main() {
let example = "stringofletters"; // works with a String if you take a reference
let cropped = crop_with_allocation(example, 3);
println!("{}", cropped);
let cropped = crop_without_allocation(example, 3);
println!("{}", cropped);
}
my version
fn crop_str(s: &str, n: usize) -> &str {
let mut it = s.chars();
for _ in 0..n {
it.next();
}
it.as_str()
}
#[test]
fn test_crop_str() {
assert_eq!(crop_str("123", 1), "23");
assert_eq!(crop_str("ЖФ1", 1), "Ф1");
assert_eq!(crop_str("ЖФ1", 2), "1");
}

Using str and String interchangably

Suppose I'm trying to do a fancy zero-copy parser in Rust using &str, but sometimes I need to modify the text (e.g. to implement variable substitution). I really want to do something like this:
fn main() {
let mut v: Vec<&str> = "Hello there $world!".split_whitespace().collect();
for t in v.iter_mut() {
if (t.contains("$world")) {
*t = &t.replace("$world", "Earth");
}
}
println!("{:?}", &v);
}
But of course the String returned by t.replace() doesn't live long enough. Is there a nice way around this? Perhaps there is a type which means "ideally a &str but if necessary a String"? Or maybe there is a way to use lifetime annotations to tell the compiler that the returned String should be kept alive until the end of main() (or have the same lifetime as v)?
Rust has exactly what you want in form of a Cow (Clone On Write) type.
use std::borrow::Cow;
fn main() {
let mut v: Vec<_> = "Hello there $world!".split_whitespace()
.map(|s| Cow::Borrowed(s))
.collect();
for t in v.iter_mut() {
if t.contains("$world") {
*t.to_mut() = t.replace("$world", "Earth");
}
}
println!("{:?}", &v);
}
as #sellibitze correctly notes, the to_mut() creates a new String which causes a heap allocation to store the previous borrowed value. If you are sure you only have borrowed strings, then you can use
*t = Cow::Owned(t.replace("$world", "Earth"));
In case the Vec contains Cow::Owned elements, this would still throw away the allocation. You can prevent that using the following very fragile and unsafe code (It does direct byte-based manipulation of UTF-8 strings and relies of the fact that the replacement happens to be exactly the same number of bytes.) inside your for loop.
let mut last_pos = 0; // so we don't start at the beginning every time
while let Some(pos) = t[last_pos..].find("$world") {
let p = pos + last_pos; // find always starts at last_pos
last_pos = pos + 5;
unsafe {
let s = t.to_mut().as_mut_vec(); // operating on Vec is easier
s.remove(p); // remove $ sign
for (c, sc) in "Earth".bytes().zip(&mut s[p..]) {
*sc = c;
}
}
}
Note that this is tailored exactly to the "$world" -> "Earth" mapping. Any other mappings require careful consideration inside the unsafe code.
std::borrow::Cow, specifically used as Cow<'a, str>, where 'a is the lifetime of the string being parsed.
use std::borrow::Cow;
fn main() {
let mut v: Vec<Cow<'static, str>> = vec![];
v.push("oh hai".into());
v.push(format!("there, {}.", "Mark").into());
println!("{:?}", v);
}
Produces:
["oh hai", "there, Mark."]

Resources