It's easy to use nom to parse a string until a character is found. How to use nom to gobble a string until a delimiter or the end? deals with this.
How do I do the same with a string (multiple characters) instead of a single delimiter?
For example, to parse abchello, I want to parse everything until hello is found.
take_until parse everything up to the provided string, excluded.
use nom::{bytes::complete::take_until, IResult};
fn parser(s: &str) -> IResult<&str, &str> {
take_until("hello")(s)
}
fn main() {
let result = parser("abchello");
assert_eq!(Ok(("hello", "abc")), result);
}
This code returns the correct result.
use nom::{IResult, bytes::complete::is_not};
fn parser(s: &str) -> IResult<&str, &str> {
is_not("hello")(s)
}
fn main() {
let result = parser("abchello");
println!("{:?}", result);
}
The documentation is here.
cargo run
-> Ok(("hello", "abc"))
Related
From input "\"Name 1\" something else" I want to extract "Name 1" and the remaining string as " something else". Notice the escaped \".
My current solutions is
use nom::bytes::complete::{tag, is_not};
use nom::sequence::pair;
use nom::IResult;
fn parse_between(i: &str) -> IResult<&str, &str> {
let (i, (_o1, o2)) = pair(tag("\""), is_not("\""))(i)?;
if let Some(res) = i.strip_prefix("\"") {
return Ok((res, o2));
}
Ok((i, o2))
}
fn main() {
println!("{:?}", parse_between("\"Name 1\" something else"));
}
where the output is Ok((" something else", "Name 1")).
Is there a better way to do this? I feel as though calling strip_prefix is an extra step I shouldn't be doing?
Rust Playground link
I believe you're looking for delimited. Here's an example from the nom docs:
use nom::{
IResult,
sequence::delimited,
// see the "streaming/complete" paragraph lower for an explanation of these submodules
character::complete::char,
bytes::complete::is_not
};
fn parens(input: &str) -> IResult<&str, &str> {
delimited(char('('), is_not(")"), char(')'))(input)
}
Adopting it to do what you're looking for:
use nom::bytes::complete::is_not;
use nom::character::complete::char;
use nom::sequence::delimited;
use nom::IResult;
fn parse_between(i: &str) -> IResult<&str, &str> {
delimited(char('"'), is_not("\""), char('"'))(i)
}
fn main() {
println!("{:?}", parse_between("\"Name 1\" something else"));
// Ok((" something else", "Name 1"))
}
So basically, I have a text file with the following syntax:
String int
String int
String int
I have an idea how to read the Values if there is only one entry per line, but if there are multiple, I do not know how to do it.
In Java, I would do something simple with while and Scanner but in Rust I have no clue.
I am fairly new to Rust so please help me.
Thanks for your help in advance
Solution
Here is my modified Solution of #netwave 's code:
use std::fs;
use std::io::{BufRead, BufReader, Error};
fn main() -> Result<(), Error> {
let buff_reader = BufReader::new(fs::File::open(file)?);
for line in buff_reader.lines() {
let parsed = sscanf::scanf!(line?, "{} {}", String, i32);
println!("{:?}\n", parsed);
}
Ok(())
}
You can use the BuffRead trait, which has a read_line method. Also you can use lines.
For doing so the easiest option would be to wrap the File instance with a BuffReader:
use std::fs;
use std::io::{BufRead, BufReader};
...
let buff_reader = BufReader::new(fs::File::open(path)?);
loop {
let mut buff = String::new();
buff_reader.read_line(&mut buff)?;
println!("{}", buff);
}
Playground
Once you have each line you can easily use sscanf crate to parse the line to the types you need:
let parsed = sscanf::scanf!(buff, "{} {}", String, i32);
Based on: https://doc.rust-lang.org/rust-by-example/std_misc/file/read_lines.html
For data.txt to contain:
str1 100
str2 200
str3 300
use std::fs::File;
use std::io::{self, BufRead};
use std::path::Path;
fn main() {
// File hosts must exist in current path before this produces output
if let Ok(lines) = read_lines("./data.txt") {
// Consumes the iterator, returns an (Optional) String
for line in lines {
if let Ok(data) = line {
let values: Vec<&str> = data.split(' ').collect();
match values.len() {
2 => {
let strdata = values[0].parse::<String>();
let intdata = values[1].parse::<i32>();
println!("Got: {:?} {:?}", strdata, intdata);
},
_ => panic!("Invalid input line {}", data),
};
}
}
}
}
// The output is wrapped in a Result to allow matching on errors
// Returns an Iterator to the Reader of the lines of the file.
fn read_lines<P>(filename: P) -> io::Result<io::Lines<io::BufReader<File>>>
where P: AsRef<Path>, {
let file = File::open(filename)?;
Ok(io::BufReader::new(file).lines())
}
Outputs:
Got: Ok("str1") Ok(100)
Got: Ok("str2") Ok(200)
Got: Ok("str3") Ok(300)
I'm trying to remove newline characters from a String (file content read from a file) and convert it to a Vec<u8>.
Example input string:
let ss = String::from("AAAAAAAA\nBBBBBBBBB\nCCCCCC\nDDDDD\n\n");
fn parse(s: String) -> Vec<u8> {
let s = s.chars().skip_while(|c| *c == '\n');
let sett = s.into_iter().map(|c| c as u8).collect();
sett
}
While I get no error, skip_while doesn't seem to remove the newline characters from the string. What am I doing wrong here?
You can basically replace the \n from the string then convert it to Vec<u8> with into_bytes()
fn parse(s: String) -> Vec<u8> {
s.replace("\n", "").into_bytes()
}
If you want to do it with iterators you can do it with filter:
fn parse(s: String) -> Vec<u8> {
s.chars().filter(|c| *c != '\n').map(|c| c as u8).collect()
}
You can call it like following:
use std::str::from_utf8;
fn main() {
let my_string = String::from("AAAAAAAA\nBBBBBBBBB\nCCCCCC\nDDDDD\n\n");
let parsed_string = parse(my_string.clone());
println!("{:?}", from_utf8(&parsed_string));
}
Playground
I have the following:
A Vec<&str>.
A &str that may contain $0, $1, etc. referencing the elements in the vector.
I want to get a version of my &str where all occurences of $i are replaced by the ith element of the vector. So if I have vec!["foo", "bar"] and $0$1, the result would be foobar.
My first naive approach was to iterate over i = 1..N and do a search and replace for every index. However, this is a quite ugly and inefficient solution. Also, it gives undesired outputs if any of the values in the vector contains the $ character.
Is there a better way to do this in Rust?
This solution is inspired (including copied test cases) by Shepmaster's, but simplifies things by using the replace_all method.
use regex::{Regex, Captures};
fn template_replace(template: &str, values: &[&str]) -> String {
let regex = Regex::new(r#"\$(\d+)"#).unwrap();
regex.replace_all(template, |captures: &Captures| {
values
.get(index(captures))
.unwrap_or(&"")
}).to_string()
}
fn index(captures: &Captures) -> usize {
captures.get(1)
.unwrap()
.as_str()
.parse()
.unwrap()
}
fn main() {
assert_eq!("ab", template_replace("$0$1", &["a", "b"]));
assert_eq!("$1b", template_replace("$0$1", &["$1", "b"]));
assert_eq!("moo", template_replace("moo", &[]));
assert_eq!("abc", template_replace("a$0b$0c", &[""]));
assert_eq!("abcde", template_replace("a$0c$1e", &["b", "d"]));
println!("It works!");
}
I would use a regex
use regex::Regex; // 1.1.0
fn example(s: &str, vals: &[&str]) -> String {
let r = Regex::new(r#"\$(\d+)"#).unwrap();
let mut start = 0;
let mut new = String::new();
for caps in r.captures_iter(s) {
let m = caps.get(0).expect("Regex group 0 missing");
let d = caps.get(1).expect("Regex group 1 missing");
let d: usize = d.as_str().parse().expect("Could not parse index");
// Copy non-placeholder
new.push_str(&s[start..m.start()]);
// Copy placeholder
new.push_str(&vals[d]);
start = m.end()
}
// Copy non-placeholder
new.push_str(&s[start..]);
new
}
fn main() {
assert_eq!("ab", example("$0$1", &["a", "b"]));
assert_eq!("$1b", example("$0$1", &["$1", "b"]));
assert_eq!("moo", example("moo", &[]));
assert_eq!("abc", example("a$0b$0c", &[""]));
}
See also:
Split a string keeping the separators
I want a function that can take two arguments (string, number of letters to crop off front) and return the same string except with the letters before character x gone.
If I write
let mut example = "stringofletters";
CropLetters(example, 3);
println!("{}", example);
then the output should be:
ingofletters
Is there any way I can do this?
In many uses it would make sense to simply return a slice of the input, avoiding any copy. Converting #Shepmaster's solution to use immutable slices:
fn crop_letters(s: &str, pos: usize) -> &str {
match s.char_indices().skip(pos).next() {
Some((pos, _)) => &s[pos..],
None => "",
}
}
fn main() {
let example = "stringofletters"; // works with a String if you take a reference
let cropped = crop_letters(example, 3);
println!("{}", cropped);
}
Advantages over the mutating version are:
No copy is needed. You can call cropped.to_string() if you want a newly allocated result; but you don't have to.
It works with static string slices as well as mutable String etc.
The disadvantage is that if you really do have a mutable string you want to modify, it would be slightly less efficient as you'd need to allocate a new String.
Issues with your original code:
Functions use snake_case, types and traits use CamelCase.
"foo" is a string literal of type &str. These may not be changed. You will need something that has been heap-allocated, such as a String.
The call crop_letters(stringofletters, 3) would transfer ownership of stringofletters to the method, which means you wouldn't be able to use the variable anymore. You must pass in a mutable reference (&mut).
Rust strings are not ASCII, they are UTF-8. You need to figure out how many bytes each character requires. char_indices is a good tool here.
You need to handle the case of when the string is shorter than 3 characters.
Once you have the byte position of the new beginning of the string, you can use drain to move a chunk of bytes out of the string. We just drop these bytes and let the String move over the remaining bytes.
fn crop_letters(s: &mut String, pos: usize) {
match s.char_indices().nth(pos) {
Some((pos, _)) => {
s.drain(..pos);
}
None => {
s.clear();
}
}
}
fn main() {
let mut example = String::from("stringofletters");
crop_letters(&mut example, 3);
assert_eq!("ingofletters", example);
}
See Chris Emerson's answer if you don't actually need to modify the original String.
I found this answer which I don't consider really idiomatic:
fn crop_with_allocation(string: &str, len: usize) -> String {
string.chars().skip(len).collect()
}
fn crop_without_allocation(string: &str, len: usize) -> &str {
// optional length check
if string.len() < len {
return &"";
}
&string[len..]
}
fn main() {
let example = "stringofletters"; // works with a String if you take a reference
let cropped = crop_with_allocation(example, 3);
println!("{}", cropped);
let cropped = crop_without_allocation(example, 3);
println!("{}", cropped);
}
my version
fn crop_str(s: &str, n: usize) -> &str {
let mut it = s.chars();
for _ in 0..n {
it.next();
}
it.as_str()
}
#[test]
fn test_crop_str() {
assert_eq!(crop_str("123", 1), "23");
assert_eq!(crop_str("ЖФ1", 1), "Ф1");
assert_eq!(crop_str("ЖФ1", 2), "1");
}