I'm looking at the nom crate for rust, which contains lots of functions to parse bytes/characters.
Many of the functions, such as tag(), seen below, process input that's provided not as a parameter to the function, but that appears instead in a second set of parentheses, following what I would call the parameters. If, in examples, one looks for a needle in a haystack, then the tag() function uses a parameter of its own, which is how one specifies the needle, but the haystack is specified separately, after the parameter parentheses, inside parentheses of its own (perhaps because it's a single value tuple?).
use nom::bytes::complete::tag;
fn parser(s: &str) -> IResult<&str, &str> {
tag("Hello")(s)
}
In the example above, tag()'s job is to test whether the input s starts with Hello. You can call parser, passing in "Hello everybody!, and the tag() function does indeed verify that the start of s is Hello. But how did (s) find its way into tag()?
Can someone explain this syntax to me, or show where to read about it. It works, and I can use it, but I don't understand what I'm looking at!
thanks
The return value of tag() is impl Fn(Input) -> IResult<Input, Input, Error>, i.e. the function returns another function. The first set of parentheses is for calling tag(); the second set is for calling the function it returns.
This allows you to store the "parser" returned by these functions in a variable and use it multiple times. Or, put differently, instead of the function definition in your question you could also write
let parser = tag("Hello");
and then call parser the same way you would call the function.
tag("Hello") just returns a function, which is then immediately invoked with the argument s, i.e. tag("Hello")(s). Here's a simple implementation example:
fn tag<'a>(needle: &'a str) -> impl Fn(&str) -> bool + 'a {
move |haystack: &str| haystack.starts_with(needle)
}
fn parser(s: &str) -> bool {
tag("Hello")(s)
}
fn main() {
println!("{}", parser("Hello everbody!")); // true
println!("{}", parser("Bye everybody!")); // false
}
playground
Related
Looking at question number 3 here.
As an example, I've edited as such.
fn main() {
never_return();
// println!("Failed!");
}
fn never_return() -> ! {
// Implement this function, don't modify the fn signatures
panic!("stop")
}
The expectation when returning something from a fn is without a trailing ;. In the above case, the panic!(_) returns a type never and does what I would expect it to. However, the same fn signature, returning !, compiles to the same regardless of whether there is a ; after the panic macro or not. I'm assuming this is the case because of the intrinsics of a panic? But couldn't find a technical explanation to it that I understood.
Why is this the case?
I think you're misunderstanding the ! type.
The panic! macro does not "return a type never", rather it never returns.
By using a fn foo() -> ! signature, you're declaring that this function never actually returns, and invoking another function/macro that never returns satisfies that.
Similarly, the following compiles:
fn never_returns() -> ! {
loop { }
}
since it loops forever, it never returns
I have an Add implementation that looks like this,
impl<T: Into<u64>> Add<T> for Sequence {
type Output = Self;
fn add(self, rhs: T) -> Self::Output {
let mut o = self.clone();
o.myadd(rhs.into()).unwrap();
o
}
}
The function myadd returns a Result; This actually works fine. The problem is in the real world the method is Sequence.add() implemented on Sequence that I want the Add to call. If I rename myadd to add like this,
o.add(rhs.into()).unwrap();
Then it no longer compiles I get instead "error[E0599]: no method named unwrap found for struct sequence::Sequence in the current scope" which tells me that the add it's finding is not returning a Result, it's returning a Sequence which is not what I want. How can I qualify the trait in the call to add?
I was able to target add that I supplied under impl Sequence {} with the following,
Sequence::add(r, rhs.into()).unwrap();
You can also use
Self::add(r, rhs.into()).unwrap();
I'm doing the Exercism Rust problem in which a string has arbitrary length, but could be null, and needs to be classified based on its last two graphemes.
My understanding is that Option is used to account for something that could be null, or could be not null, when this is unknown at compile time, so I've tried this:
extern crate unicode_segmentation;
use unicode_segmentation::UnicodeSegmentation;
pub fn reply(message: &str) -> &str {
let message_opt: Option<[&str; 2]> = message.graphemes(true).rev().take(2).nth(0).collect();
}
My understanding of which, is that the right hand side will give an array of two &strs, if the string is non zero in length, or will return none, and the left hand side will store it as an option (so that I can later match on Some or None)
The error is:
no method named 'collect' found for type std::option::Option<&str> in the current scope
This doesn't make sense to me, as I (think) I'm trying to collect the output of an iterator, I am not collecting an option.
The error message isn't lying to you. Option does not have a method called collect.
I (think) I'm trying to collect the output of an iterator
Iterator::nth returns an Option. Option does not implement Iterator; you cannot call collect on it.
Option<[&str; 2]>
You can't do this, either:
How do I collect into an array?
I'd write this as
let mut graphemes = message.graphemes(true).fuse();
let message_opt = match (graphemes.next_back(), graphemes.next_back()) {
(Some(a), Some(b)) => Some([a, b]),
_ => None,
};
I'm trying to create a vector of Strings to test arg parsing (since this is what std::env::args() returns) but struggling with how to do this concisely.
What I want:
let test_args = vec!["-w", "60", "arg"]; // should array of Strings
let expected_results = my_arg_parser(test_args);
This obviously doesn't work because the vectors contents are all &strs.
Using String::from but works but doesn't scale well and is ugly :)
let args = vec![String::from("-w"), String::from("60"), String::from("args")];
I could map over the references and return string objects, but this seems very verbose:
let args = vec!["-w", "60", "args"].iter().map(|x| x.to_string()).collect::<Vec<String>>();
Should I just create a helper function to do the conversion, or is there an easier way?
You can use the to_string() method directly on the literals:
let test_args = vec!["-w".to_string(), "60".to_string(), "arg".to_string()];
Otherwise a macro to do this would be as simple as:
macro_rules! vec_of_strings {
($($x:expr),*) => (vec![$($x.to_string()),*]);
}
See play.rust.org example
JDemler already provided a nice answer. I have two additional things to say:
First, you can also use into() instead of to_string() for all elements but the first. This is slightly shorter and also equivalent to to_string()/String::from(). Looks like this:
vec!["a".to_string(), "b".into(), "c".into()];
Second, you might want to redesign your arg parsing. I will assume here that you won't mutate the Strings you get from env::args(). I imagine your current function to look like:
fn parse_args(args: &[String]) -> SomeResult { ... }
But you can make that function more generic by not accepting Strings but AsRef<str>. It would look like this:
fn parse_args<T: AsRef<str>>(args: &[T]) -> SomeResult { ... }
In the documentation you can see that String as well as str itself implement that trait. Therefore you can pass a &[String] and a &[&str] into your function. Awesome, eh?
In similar fashion, if you want to accept anything that can be converted into an owned String, you can accept <T: Into<String>> and if you want to return either a String or an &str, you can use Cow. You can read more about that here and here.
Apart from all that: there are plenty of good CLI-Arg parsers out there (clap-rs, docopt-rs, ...), so you might not need to write your own.
I agree that Lukas Kalbertodt's answer is the best — use generics to accept anything that can look like a slice of strings.
However, you can clean up the map version a little bit:
There's no need to allocate a vector for the initial set of strings.
There's no need to use the complete type (Vec<String>); you could specify just the collection (Vec<_>). If you pass the result to a function that only accepts a Vec<String>, then you don't need any explicit types at all; it can be completely inferred.
You can use a slightly shorter s.into() in the map.
fn do_stuff_with_args(args: Vec<String>) { println!("{}", args.len()) }
fn main() {
let args = ["-w", "60", "args"].iter().map(|&s| s.into()).collect();
do_stuff_with_args(args);
}
Forgive me if this is a dumb question, but I'm new to Rust, and having a hard time writing this toy program to test my understanding.
I want a function that given a string, returns the first word in each line, as an iterator (because the input could be huge, I don't want to buffer the result as an array). Here's the program I wrote which collects the result as an array first:
fn get_first_words(input: ~str) -> ~[&str] {
return input.lines_any().filter_map(|x| x.split_str(" ").nth(0)).collect();
}
fn main() {
let s = ~"Hello World\nFoo Bar";
let words = get_words(s);
for word in words.iter() {
println!("{}", word);
}
}
Result (as expected):
Hello
Foo
How do I modify this to return an Iterator instead? I'm apparently not allowed to make Iterator<&str> the return type. If I try #Iterator<&str>, rustc says
error: The managed box syntax is being replaced by the `std::gc::Gc` and `std::rc::Rc` types. Equivalent functionality to managed trait objects will be implemented but is currently missing.
I can't figure out for the life of me how to make that work.
Similarly, trying to return ~Iterator<&str> makes rustc complain that the actual type is std::iter::FilterMap<....blah...>.
In C# this is really easy, as you simply return the result of the equivalent map call as an IEnumerable<string>. Then the callee doesn't have to know what the actual type is that's returned, it only uses methods available in the IEnumerable interface.
Is there nothing like returning an interface in Rust??
(I'm using Rust 0.10)
I believe that the equivalent of the C# example would be returning ~Iterator<&str>. This can be done, but must be written explicitly: rather than returning x, return ~x as ~Iterator<&'a str>. (By the way, your function is going to have to take &'a str rather than ~str—if you don’t know why, ask and I’ll explain.)
This is not, however, idiomatic Rust because it is needlessly inefficient. The idiomatic Rust is to list the return type explicitly. You can specify it in one place like this if you like:
use std::iter::{FilterMap, Map};
use std::str::CharSplits;
type Foo = FilterMap<'a, &'a str, &'a str,
Map<'a, &'a str, &'a str,
CharSplits<'a, char>>>
And then list Foo as the return type.
Yes, this is cumbersome. At present, there is no such thing as inferring a return type in any way. This has, however, been discussed and I believe it likely that it will come eventually in some syntax similar to fn foo<'a>(&'a str) -> Iterator<&'a str>. For now, though, there is no fancy sugar.