I have this code:
fn main() {
let hello = r##"#[0] !sell 0 100 ars \"belo lemon\" 1"##;
let x: Vec<&str> = hello.split(" ").collect();
println!("{x:?}");
}
I want to have this output
["#[0]", "sell", "0", "100", "ars", "belo lemon", "1"]
I am kinda new in rust, I can't find how to do it, any ideas?
I would do this with Regex.
The idea of this regex is to consider two branches:
quote, anything but quote, quote
anything (not empty) but quote or space
When dealing with the capture, we have to test which of these two branches matches.
Revised in order to avoid owned strings and to remove the surrounding quotes in the result.
(Note that the example in the question has changed since the first post; this answer does not match anymore the situation in the question)
// [dependencies]
// regex = "1.5"
use regex::Regex;
fn main() {
let re = Regex::new(r#"["]([^"]*)["]|([^" ]+)"#).unwrap();
let hello = r#"[0] sell 0 100 ars "belo lemon" 1"#;
let x: Vec<&str> = re
.captures_iter(hello)
.map(|cap| {
if let Some(quoted) = cap.get(1) {
quoted
} else {
cap.get(2).unwrap()
}
.as_str()
})
.collect();
println!("{x:?}");
}
/*
["[0]", "sell", "0", "100", "ars", "belo lemon", "1"]
*/
Related
I need to change order in every word in sentence. I have string of separators to split my code into a words, and function swap(0,1) to change order of letters in word. However I need to skip first and last letter in every word and I can't use regular expressions for this purposes.
Here some code:
const SEPARATORS: &str = " ,;:!?./%*$=+)#_-('\"&1234567890\r\n";
fn main() {
dbg!(mix("According, research"));
}
fn mix(s: &str) -> String {
let mut a: Vec<char> = s.chars().collect();
for group in a.split_mut(|num| SEPARATORS.contains(*num)) {
group.chunks_exact_mut(2).for_each(|x| x.swap(0, 1));
}
a.iter().collect()
}
Output as follows:
[src/main.rs:4] mix("According, research") = "cAocdrnig, eresrahc"
But I need output as follows:
[src/main.rs:4] mix("According, research") = "Accroidng, rseaerch"
Someone knows how to fix it ?
All you need is to use a slice that doesn't have the first and last character using group[1..len-2]:
const SEPARATORS: &str = " ,;:!?./%*$=+)#_-('\"&1234567890\r\n";
fn main() {
dbg!(mix("According, research"));
}
fn mix(s: &str) -> String {
let mut a: Vec<char> = s.chars().collect();
for group in a.split_mut(|num| SEPARATORS.contains(*num)) {
let len = group.len();
if len > 2 {
group[1..len-2].chunks_exact_mut(2).for_each(|x| x.swap(0, 1));
}
}
a.iter().collect()
}
I'm working on a parser for a mini language, and I have the need to differentiate between plain strings ("hello") and strings that are meant to be operators/commands, and start with a specific sigil character (e.g. "$add").
I also want to add a way for the user to escape the sigil, in which a double-sigil gets consolidated into one, and then is treated like a plain string.
As an example:
"hello" becomes Str("hello")
"$add" becomes Operator(Op::Add)
"$$add" becomes Str("$add")
What would be the best way to do this check and manipulation? I was looking for a method that counts how many times a character appears at the start of a string, to no avail.
Can't you just use starts_with?
fn main() {
let line_list= [ "hello", "$add", "$$add" ];
let mut result;
for line in line_list.iter() {
if line.starts_with("$$") {
result = line[1..].to_string();
}
else if line.starts_with("$") {
result = format!("operator:{}", &line[1..]);
}
else {
result = line.to_string();
}
println!("result = {}", result);
}
}
Output
result = hello
result = operator:add
result = $add
According to the comments, your problem seems to be related to the access to the first chars.
The proper and efficient way is to get a char iterator:
#[derive(Debug)]
enum Token {
Str(String),
Operator(String),
}
impl From<&str> for Token {
fn from(s: &str) -> Self {
let mut chars = s.chars();
let first_char = chars.next();
let second_char = chars.next();
match (first_char, second_char) {
(Some('$'), Some('$')) => {
Token::Str(format!("${}", chars.as_str()))
}
(Some('$'), Some(c)) => {
// your real handling here is probably different
Token::Operator(format!("{}{}", c, chars.as_str()))
}
_ => {
Token::Str(s.to_string())
}
}
}
}
fn main() {
println!("{:?}", Token::from("π"));
println!("{:?}", Token::from("hello"));
println!("{:?}", Token::from("$add"));
println!("{:?}", Token::from("$$add"));
}
Result:
Str("π")
Str("hello")
Operator("add")
Str("$add")
playground
I have the following:
A Vec<&str>.
A &str that may contain $0, $1, etc. referencing the elements in the vector.
I want to get a version of my &str where all occurences of $i are replaced by the ith element of the vector. So if I have vec!["foo", "bar"] and $0$1, the result would be foobar.
My first naive approach was to iterate over i = 1..N and do a search and replace for every index. However, this is a quite ugly and inefficient solution. Also, it gives undesired outputs if any of the values in the vector contains the $ character.
Is there a better way to do this in Rust?
This solution is inspired (including copied test cases) by Shepmaster's, but simplifies things by using the replace_all method.
use regex::{Regex, Captures};
fn template_replace(template: &str, values: &[&str]) -> String {
let regex = Regex::new(r#"\$(\d+)"#).unwrap();
regex.replace_all(template, |captures: &Captures| {
values
.get(index(captures))
.unwrap_or(&"")
}).to_string()
}
fn index(captures: &Captures) -> usize {
captures.get(1)
.unwrap()
.as_str()
.parse()
.unwrap()
}
fn main() {
assert_eq!("ab", template_replace("$0$1", &["a", "b"]));
assert_eq!("$1b", template_replace("$0$1", &["$1", "b"]));
assert_eq!("moo", template_replace("moo", &[]));
assert_eq!("abc", template_replace("a$0b$0c", &[""]));
assert_eq!("abcde", template_replace("a$0c$1e", &["b", "d"]));
println!("It works!");
}
I would use a regex
use regex::Regex; // 1.1.0
fn example(s: &str, vals: &[&str]) -> String {
let r = Regex::new(r#"\$(\d+)"#).unwrap();
let mut start = 0;
let mut new = String::new();
for caps in r.captures_iter(s) {
let m = caps.get(0).expect("Regex group 0 missing");
let d = caps.get(1).expect("Regex group 1 missing");
let d: usize = d.as_str().parse().expect("Could not parse index");
// Copy non-placeholder
new.push_str(&s[start..m.start()]);
// Copy placeholder
new.push_str(&vals[d]);
start = m.end()
}
// Copy non-placeholder
new.push_str(&s[start..]);
new
}
fn main() {
assert_eq!("ab", example("$0$1", &["a", "b"]));
assert_eq!("$1b", example("$0$1", &["$1", "b"]));
assert_eq!("moo", example("moo", &[]));
assert_eq!("abc", example("a$0b$0c", &[""]));
}
See also:
Split a string keeping the separators
I do not expect the following code to work, but as part of grammar exploration, I tried in playground:
fn main() {
struct EOF {};
let lines = vec![Ok("line 1"), Ok("line 2"), Err(EOF {})];
for Ok(line) in lines {
println!("{}", line);
}
}
The error message is
error[E0005]: refutable pattern in `for` loop binding: `Err(_)` not covered
--> src/main.rs:4:9
|
4 | for Ok(line) in lines {
| ^^^^^^^^ pattern `Err(_)` not covered
According to the message above it looks like I only need to add a match arm for the Err case. But what is the right grammar to do so?
You can use patterns as the binding in a for loop, but not refutable patterns. The difference between refutable and irrefutable patterns is described here, but the gist of it is, if a pattern could fail, you can't use it in a let statement, a for loop, the parameter of a function or closure, or other places where the syntax specifically requires an irrefutable pattern.
An example of an irrefutable pattern being used in a for loop might be something like this:
let mut numbers = HashMap::new();
numbers.insert("one", 1);
numbers.insert("two", 2);
numbers.insert("three", 3);
for (name, number) in &numbers {
println!("{}: {}", name, number);
}
(name, number) is an irrefutable pattern, because any place where it type checks, it will match. It type checks here because the items being iterated over (defined by the implementation of IntoIterator for &HashMap) are tuples. You could also write the above as
for tuple in &numbers {
let (name, number) = tuple;
println!("{}: {}", name, number);
}
because let is another place where only irrefutable patterns are allowed.
Yes, you can use patterns in many places, but not all of them allow you to conditionally branch when there are multiple possible patterns.
A for loop is one place where you cannot add conditions. That's what the error is telling you with "refutable pattern": there's a pattern that will not be handled. Instead, you mostly use the pattern to perform destructuring of the loop variable:
struct Thing {
foo: u8,
}
fn main() {
let things = vec![Thing { foo: 1 }, Thing { foo: 2 }, Thing { foo: 3 }];
for Thing { foo } in things {
println!("{}", foo);
}
}
Conditional:
match
if let
while let
Unconditional:
for
let
function parameters
But what is the right grammar to do so?
This gets the result you want:
fn main() {
struct EOF;
let lines = vec![Ok("line 1"), Ok("line 2"), Err(EOF)];
for line in lines.into_iter().flat_map(|e| e) {
println!("{}", line);
}
}
Note that you can use flat_map here because Result implements the into_iter method provided by the IntoIterator trait.
This is another option using if let:
fn main() {
struct EOF;
let lines = vec![Ok("line 1"), Ok("line 2"), Err(EOF)];
for result in lines {
if let Ok(line) = result {
println!("{}", line);
}
}
}
You may also want to stop iteration on an Err case:
fn main() {
struct EOF;
let lines = vec![Ok("line 1"), Ok("line 2"), Err(EOF), Ok("line 3") ];
let mut lines_iter = lines.into_iter();
while let Some(Ok(line)) = lines_iter.next() {
println!("{}", line);
}
}
I found this example for substring replacement:
use std::str;
let string = "orange";
let new_string = str::replace(string, "or", "str");
If I want to run a number of consecutive replacements on the same string, for sanitization purposes, how can I do that without allocating a new variable for each replacement?
If you were to write idiomatic Rust, how would you write multiple chained substring replacements?
The regex engine can be used to do a single pass with multiple replacements of the string, though I would be surprised if this is actually more performant:
extern crate regex;
use regex::{Captures, Regex};
fn main() {
let re = Regex::new("(or|e)").unwrap();
let string = "orange";
let result = re.replace_all(string, |cap: &Captures| {
match &cap[0] {
"or" => "str",
"e" => "er",
_ => panic!("We should never get here"),
}.to_string()
});
println!("{}", result);
}
how would you write multiple chained substring replacements?
I would do it just as asked:
fn main() {
let a = "hello";
let b = a.replace("e", "a").replace("ll", "r").replace("o", "d");
println!("{}", b);
}
It you are asking how to do multiple concurrent replacements, passing through the string just once, then it does indeed get much harder.
This does require allocating new memory for each replace call, even if no replacement was needed. An alternate implementation of replace might return a Cow<str> which only includes the owned variant when the replacement would occur. A hacky implementation of that could look like:
use std::borrow::Cow;
trait MaybeReplaceExt<'a> {
fn maybe_replace(self, needle: &str, replacement: &str) -> Cow<'a, str>;
}
impl<'a> MaybeReplaceExt<'a> for &'a str {
fn maybe_replace(self, needle: &str, replacement: &str) -> Cow<'a, str> {
// Assumes that searching twice is better than unconditionally allocating
if self.contains(needle) {
self.replace(needle, replacement).into()
} else {
self.into()
}
}
}
impl<'a> MaybeReplaceExt<'a> for Cow<'a, str> {
fn maybe_replace(self, needle: &str, replacement: &str) -> Cow<'a, str> {
// Assumes that searching twice is better than unconditionally allocating
if self.contains(needle) {
self.replace(needle, replacement).into()
} else {
self
}
}
}
fn main() {
let a = "hello";
let b = a.maybe_replace("e", "a")
.maybe_replace("ll", "r")
.maybe_replace("o", "d");
println!("{}", b);
let a = "hello";
let b = a.maybe_replace("nope", "not here")
.maybe_replace("still no", "i swear")
.maybe_replace("but no", "allocation");
println!("{}", b);
assert_eq!(b.as_ptr(), a.as_ptr());
}
I would not use regex or .replace().replace().replace() or .maybe_replace().maybe_replace().maybe_replace() for this. They all have big flaws.
Regex is probably the most reasonable option but regexes are just a terrible terrible idea if you can at all avoid them. If your patterns come from user input then you're going to have to deal with escaping them which is a security nightmare.
.replace().replace().replace() is terrible for obvious reasons.
.maybe_replace().maybe_replace().maybe_replace() is only very slightly better than that, because it only improves efficiency when a pattern doesn't match. It doesn't avoid the repeated allocations if they all match, and in that case it is actually worse because it searches the strings twice.
There's a much better solution: Use the AhoCarasick crate. There's even an example in the readme:
use aho_corasick::AhoCorasick;
let patterns = &["fox", "brown", "quick"];
let haystack = "The quick brown fox.";
let replace_with = &["sloth", "grey", "slow"];
let ac = AhoCorasick::new(patterns);
let result = ac.replace_all(haystack, replace_with);
assert_eq!(result, "The slow grey sloth.");
for sanitization purposes
I should also say that blacklisting "bad" strings is completely the wrong way to do sanitisation.
There is no way in the standard library to do this; it’s a tricky thing to get right with a large number of variations on how you would go about doing it, depending on a number of factors. You would need to write such a function yourself.
Stumbled upon this in codewars. Credit goes to user gom68
fn replace_multiple(rstring: &str) -> String {
rstring.chars().map(|c|
match c {
'A' => 'Z',
'B' => 'Y',
'C' => 'X',
'D' => 'W',
s => s
}
).collect::<String>()
}