Running a number of consecutive replacements on the same string - rust

I found this example for substring replacement:
use std::str;
let string = "orange";
let new_string = str::replace(string, "or", "str");
If I want to run a number of consecutive replacements on the same string, for sanitization purposes, how can I do that without allocating a new variable for each replacement?
If you were to write idiomatic Rust, how would you write multiple chained substring replacements?

The regex engine can be used to do a single pass with multiple replacements of the string, though I would be surprised if this is actually more performant:
extern crate regex;
use regex::{Captures, Regex};
fn main() {
let re = Regex::new("(or|e)").unwrap();
let string = "orange";
let result = re.replace_all(string, |cap: &Captures| {
match &cap[0] {
"or" => "str",
"e" => "er",
_ => panic!("We should never get here"),
}.to_string()
});
println!("{}", result);
}

how would you write multiple chained substring replacements?
I would do it just as asked:
fn main() {
let a = "hello";
let b = a.replace("e", "a").replace("ll", "r").replace("o", "d");
println!("{}", b);
}
It you are asking how to do multiple concurrent replacements, passing through the string just once, then it does indeed get much harder.
This does require allocating new memory for each replace call, even if no replacement was needed. An alternate implementation of replace might return a Cow<str> which only includes the owned variant when the replacement would occur. A hacky implementation of that could look like:
use std::borrow::Cow;
trait MaybeReplaceExt<'a> {
fn maybe_replace(self, needle: &str, replacement: &str) -> Cow<'a, str>;
}
impl<'a> MaybeReplaceExt<'a> for &'a str {
fn maybe_replace(self, needle: &str, replacement: &str) -> Cow<'a, str> {
// Assumes that searching twice is better than unconditionally allocating
if self.contains(needle) {
self.replace(needle, replacement).into()
} else {
self.into()
}
}
}
impl<'a> MaybeReplaceExt<'a> for Cow<'a, str> {
fn maybe_replace(self, needle: &str, replacement: &str) -> Cow<'a, str> {
// Assumes that searching twice is better than unconditionally allocating
if self.contains(needle) {
self.replace(needle, replacement).into()
} else {
self
}
}
}
fn main() {
let a = "hello";
let b = a.maybe_replace("e", "a")
.maybe_replace("ll", "r")
.maybe_replace("o", "d");
println!("{}", b);
let a = "hello";
let b = a.maybe_replace("nope", "not here")
.maybe_replace("still no", "i swear")
.maybe_replace("but no", "allocation");
println!("{}", b);
assert_eq!(b.as_ptr(), a.as_ptr());
}

I would not use regex or .replace().replace().replace() or .maybe_replace().maybe_replace().maybe_replace() for this. They all have big flaws.
Regex is probably the most reasonable option but regexes are just a terrible terrible idea if you can at all avoid them. If your patterns come from user input then you're going to have to deal with escaping them which is a security nightmare.
.replace().replace().replace() is terrible for obvious reasons.
.maybe_replace().maybe_replace().maybe_replace() is only very slightly better than that, because it only improves efficiency when a pattern doesn't match. It doesn't avoid the repeated allocations if they all match, and in that case it is actually worse because it searches the strings twice.
There's a much better solution: Use the AhoCarasick crate. There's even an example in the readme:
use aho_corasick::AhoCorasick;
let patterns = &["fox", "brown", "quick"];
let haystack = "The quick brown fox.";
let replace_with = &["sloth", "grey", "slow"];
let ac = AhoCorasick::new(patterns);
let result = ac.replace_all(haystack, replace_with);
assert_eq!(result, "The slow grey sloth.");
for sanitization purposes
I should also say that blacklisting "bad" strings is completely the wrong way to do sanitisation.

There is no way in the standard library to do this; it’s a tricky thing to get right with a large number of variations on how you would go about doing it, depending on a number of factors. You would need to write such a function yourself.

Stumbled upon this in codewars. Credit goes to user gom68
fn replace_multiple(rstring: &str) -> String {
rstring.chars().map(|c|
match c {
'A' => 'Z',
'B' => 'Y',
'C' => 'X',
'D' => 'W',
s => s
}
).collect::<String>()
}

Related

Make iterator of nested iterators

How could I pack the following code into a single iterator?
use std::io::{BufRead, BufReader};
use std::fs::File;
let file = BufReader::new(File::open("sample.txt").expect("Unable to open file"));
for line in file.lines() {
for ch in line.expect("Unable to read line").chars() {
println!("Character: {}", ch);
}
}
Naively, I’d like to have something like (I skipped unwraps)
let lines = file.lines().next();
Reader {
line: lines,
char: next().chars()
}
and iterate over Reader.char till hitting None, then refreshing Reader.line to a new line and Reader.char to the first character of the line. This doesn't seem to be possible though because Reader.char depends on the temporary variable.
Please notice that the question is about nested iterators, reading text files is used as an example.
You can use the flat_map() iterator utility to create new iterator that can produce any number of items for each item in the iterator it's called on.
In this case, that's complicated by the fact that lines() returns an iterator of Results, so the Err case must be handled.
There's also the issue that .chars() references the original string to avoid an additional allocation, so you have to collect the characters into another iterable container.
Solving both issues results in this mess:
fn example() -> impl Iterator<Item=Result<char, std::io::Error>> {
let file = BufReader::new(File::open("sample.txt").expect("Unable to open file"));
file.lines().flat_map(|line| match line {
Err(e) => vec![Err(e)],
Ok(line) => line.chars().map(Ok).collect(),
})
}
If String gave us an into_chars() method we could avoid collect() here, but then we'd have differently-typed iterators and would need to use either Box<dyn Iterator> or something like either::Either.
Since you already use .expect() here, you can simplify a bit by using .expect() within the closure to avoid handling the Err case:
fn example() -> impl Iterator<Item=char> {
let file = BufReader::new(File::open("sample.txt").expect("Unable to open file"));
file.lines().flat_map(|line|
line.expect("Unable to read line").chars().collect::<Vec<_>>()
)
}
In the general case, flat_map() is usually quite easy. You just need to be mindful of whether you are iterating owned vs borrowed values; both cases have some sharp corners. In this case, iterating over owned String values makes using .chars() problematic. If we could iterate over borrowed str slices we wouldn't have to .collect().
Drawing on the answer from #cdhowie and this answer that suggests using IntoIter to get an iterator of owned chars, I was able to come up with this solution that is the closest to what I expected:
use std::fs::File;
use std::io;
use std::io::{BufRead, BufReader, Lines};
use std::vec::IntoIter;
struct Reader {
lines: Lines<BufReader<File>>,
iter: IntoIter<char>,
}
impl Reader {
fn new(filename: &str) -> Self {
let file = BufReader::new(File::open(filename).expect("Unable to open file"));
let mut lines = file.lines();
let iter = Reader::char_iter(lines.next().expect("Unable to read file"));
Reader { lines, iter }
}
fn char_iter(line: io::Result<String>) -> IntoIter<char> {
line.unwrap().chars().collect::<Vec<_>>().into_iter()
}
}
impl Iterator for Reader {
type Item = char;
fn next(&mut self) -> Option<Self::Item> {
match self.iter.next() {
None => {
self.iter = match self.lines.next() {
None => return None,
Some(line) => Reader::char_iter(line),
};
Some('\n')
}
Some(val) => Some(val),
}
}
}
it works as expected:
let reader = Reader::new("src/main.rs");
for ch in reader {
print!("{}", ch);
}

String recursion in Rust

In Rust, I am trying to obtain all possible combinations of a-z characters up to a fixed length with no repeating letters.
For example, for a limited set of a-f and a length of 3 I should get:
abc
abd
abe
abf
acb
acd
ace
acf
adb
... etc
I've been struggling to do this through recursion and have been banging my head on ownership and borrows. The only way I've managed to do it is as follows, but this is cloning strings all over the place and is very inefficient. There are probably standard permutation/combination functions for this in the standard library, I don't know, but I'm interested in understanding how this can be done manually.
fn main() {
run(&String::new());
}
fn run(target: &String) {
for a in 97..123 { // ASCII a..z
if !target.contains(char::from(a)) {
let next = target.clone() + char::from(a).to_string().as_str(); // Working but terrible
if next.len() == 3 { // Required string size
println!("{}", next);
} else {
run(&next);
}
}
}
}
First off, a couple of remarks:
&String is kind of an anti-pattern that is rarely seen. It serves no purpose; all the functionality that String has over str requires mutability. So it should either be &mut String or &str.
97..123 is uncommon ... use 'a'..='z'.
Now to the actual problem:
As long as you pass a non-mutable string into the recursion, you won't get around cloning the data. I'd make the string mutable, then you can simply append and remove single characters from it.
Like this:
fn main() {
run(&mut String::new());
}
fn run(target: &mut String) {
for a in 'a'..='z' {
if !target.contains(a) {
target.push(a);
if target.len() == 3 {
// Required string size
println!("{}", target);
} else {
run(target);
}
target.pop();
}
}
}
Just for comprehensiveness, here is an alternate way to do it, when not using recursion. It does use .permutations() from crate Itertools.
Also, It's probably cleaner to return the String object from the function directly, instead of passing a mutable reference by argument.
use std::ops::RangeInclusive;
use itertools::Itertools;
fn main() {
println!("result: {}",combine('a'..='d', 3));
println!("result: {}",combine('a'..='g', 4));
println!("result: {}",combine('a'..='c', 3));
println!("result: {}",combine('a'..='c', 4)); // assertion fail
}
fn combine(range: RangeInclusive<char>, depth: usize) -> String
{
assert!( *range.end() as usize - *range.start() as usize + 1 >= depth);
let perms = range.permutations(depth);
let mut result = String::new();
perms.for_each(|mut item| {
item.push(' ');
result += &item.into_iter().collect::<String>();
});
result.pop(); // pop last superfluous space char
result
}

Replacing numbered placeholders with elements of a vector in Rust?

I have the following:
A Vec<&str>.
A &str that may contain $0, $1, etc. referencing the elements in the vector.
I want to get a version of my &str where all occurences of $i are replaced by the ith element of the vector. So if I have vec!["foo", "bar"] and $0$1, the result would be foobar.
My first naive approach was to iterate over i = 1..N and do a search and replace for every index. However, this is a quite ugly and inefficient solution. Also, it gives undesired outputs if any of the values in the vector contains the $ character.
Is there a better way to do this in Rust?
This solution is inspired (including copied test cases) by Shepmaster's, but simplifies things by using the replace_all method.
use regex::{Regex, Captures};
fn template_replace(template: &str, values: &[&str]) -> String {
let regex = Regex::new(r#"\$(\d+)"#).unwrap();
regex.replace_all(template, |captures: &Captures| {
values
.get(index(captures))
.unwrap_or(&"")
}).to_string()
}
fn index(captures: &Captures) -> usize {
captures.get(1)
.unwrap()
.as_str()
.parse()
.unwrap()
}
fn main() {
assert_eq!("ab", template_replace("$0$1", &["a", "b"]));
assert_eq!("$1b", template_replace("$0$1", &["$1", "b"]));
assert_eq!("moo", template_replace("moo", &[]));
assert_eq!("abc", template_replace("a$0b$0c", &[""]));
assert_eq!("abcde", template_replace("a$0c$1e", &["b", "d"]));
println!("It works!");
}
I would use a regex
use regex::Regex; // 1.1.0
fn example(s: &str, vals: &[&str]) -> String {
let r = Regex::new(r#"\$(\d+)"#).unwrap();
let mut start = 0;
let mut new = String::new();
for caps in r.captures_iter(s) {
let m = caps.get(0).expect("Regex group 0 missing");
let d = caps.get(1).expect("Regex group 1 missing");
let d: usize = d.as_str().parse().expect("Could not parse index");
// Copy non-placeholder
new.push_str(&s[start..m.start()]);
// Copy placeholder
new.push_str(&vals[d]);
start = m.end()
}
// Copy non-placeholder
new.push_str(&s[start..]);
new
}
fn main() {
assert_eq!("ab", example("$0$1", &["a", "b"]));
assert_eq!("$1b", example("$0$1", &["$1", "b"]));
assert_eq!("moo", example("moo", &[]));
assert_eq!("abc", example("a$0b$0c", &[""]));
}
See also:
Split a string keeping the separators

How do I convert reverse domain notation to PascalCase?

I want to convert "foo.bar.baz" to "FooBarBaz". My input will always be only ASCII. I tried:
let result = "foo.bar.baz"
.to_string()
.split(".")
.map(|x| x[0..1].to_string().to_uppercase() + &x[1..])
.fold("".to_string(), |acc, x| acc + &x);
println!("{}", result);
but that feels inefficient.
Your solution is a good start. You could probably make it work without heap allocations in the "functional" style; I prefer putting complex logic into normal for loops though.
Also I don't like assuming input is in ASCII without actually checking - this should work with any string.
You probably could also use String::with_capacity in your code to avoid reallocations in standard cases.
Playground
fn dotted_to_pascal_case(s: &str) -> String {
let mut result = String::with_capacity(s.len());
for part in s.split('.') {
let mut cs = part.chars();
if let Some(c) = cs.next() {
result.extend(c.to_uppercase());
}
result.push_str(cs.as_str());
}
result
}
fn main() {
println!("{}", dotted_to_pascal_case("foo.bar.baz"));
}
Stefan's answer is correct, but I decided to get rid of that first String allocation and go full-functional, without loops:
fn dotted_to_pascal_case(s: &str) -> String {
s.split('.')
.map(|piece| piece.chars())
.flat_map(|mut chars| {
chars
.next()
.expect("empty section between dots!")
.to_uppercase()
.chain(chars)
})
.collect()
}
fn main() {
println!("{}", dotted_to_pascal_case("foo.bar.baz"));
}

How to "crop" characters off the beginning of a string in Rust?

I want a function that can take two arguments (string, number of letters to crop off front) and return the same string except with the letters before character x gone.
If I write
let mut example = "stringofletters";
CropLetters(example, 3);
println!("{}", example);
then the output should be:
ingofletters
Is there any way I can do this?
In many uses it would make sense to simply return a slice of the input, avoiding any copy. Converting #Shepmaster's solution to use immutable slices:
fn crop_letters(s: &str, pos: usize) -> &str {
match s.char_indices().skip(pos).next() {
Some((pos, _)) => &s[pos..],
None => "",
}
}
fn main() {
let example = "stringofletters"; // works with a String if you take a reference
let cropped = crop_letters(example, 3);
println!("{}", cropped);
}
Advantages over the mutating version are:
No copy is needed. You can call cropped.to_string() if you want a newly allocated result; but you don't have to.
It works with static string slices as well as mutable String etc.
The disadvantage is that if you really do have a mutable string you want to modify, it would be slightly less efficient as you'd need to allocate a new String.
Issues with your original code:
Functions use snake_case, types and traits use CamelCase.
"foo" is a string literal of type &str. These may not be changed. You will need something that has been heap-allocated, such as a String.
The call crop_letters(stringofletters, 3) would transfer ownership of stringofletters to the method, which means you wouldn't be able to use the variable anymore. You must pass in a mutable reference (&mut).
Rust strings are not ASCII, they are UTF-8. You need to figure out how many bytes each character requires. char_indices is a good tool here.
You need to handle the case of when the string is shorter than 3 characters.
Once you have the byte position of the new beginning of the string, you can use drain to move a chunk of bytes out of the string. We just drop these bytes and let the String move over the remaining bytes.
fn crop_letters(s: &mut String, pos: usize) {
match s.char_indices().nth(pos) {
Some((pos, _)) => {
s.drain(..pos);
}
None => {
s.clear();
}
}
}
fn main() {
let mut example = String::from("stringofletters");
crop_letters(&mut example, 3);
assert_eq!("ingofletters", example);
}
See Chris Emerson's answer if you don't actually need to modify the original String.
I found this answer which I don't consider really idiomatic:
fn crop_with_allocation(string: &str, len: usize) -> String {
string.chars().skip(len).collect()
}
fn crop_without_allocation(string: &str, len: usize) -> &str {
// optional length check
if string.len() < len {
return &"";
}
&string[len..]
}
fn main() {
let example = "stringofletters"; // works with a String if you take a reference
let cropped = crop_with_allocation(example, 3);
println!("{}", cropped);
let cropped = crop_without_allocation(example, 3);
println!("{}", cropped);
}
my version
fn crop_str(s: &str, n: usize) -> &str {
let mut it = s.chars();
for _ in 0..n {
it.next();
}
it.as_str()
}
#[test]
fn test_crop_str() {
assert_eq!(crop_str("123", 1), "23");
assert_eq!(crop_str("ЖФ1", 1), "Ф1");
assert_eq!(crop_str("ЖФ1", 2), "1");
}

Resources