Take prefix slice of &str that matches `Pattern` in rust - string

My ultimate goal is to parse the prefix number of a &str if there is one. So I want a function that given "123abc345" will give me a pair (u32, &str) which is (123, "abc345").
My idea is that if I have a Pattern type I should be able to do something like
/// `None` if there is no prefix in `s` that matches `p`,
/// Otherwise a pair of the longest matching prefix and the rest
/// of the string `s`.
fn split_prefix<P:Pattern<'a>(s: &'a str, p: P) -> Option<(&'a str, &'a str)>;
My goal would be achieved by doing something like
let num = if let Some((num_s, rest)) = split_prefix(s, char::is_digit) {
s = rest;
num_s.parse()
}
What's the best way to get that?

I looked at the source for str::split_once and modified slightly to inclusively return a greedily matched prefix.
Playground
#![feature(pattern)]
use std::str::pattern::{Pattern, Searcher};
/// See source code for `std::str::split_once`
fn split_prefix<'a, P: Pattern<'a>>(s: &'a str, p: P) -> Option<(&'a str, &'a str)> {
let (start, _) = p.into_searcher(s).next_reject()?;
// `start` here is the start of the unmatched (rejected) substring, so that is our sole delimiting index
unsafe { Some((s.get_unchecked(..start), s.get_unchecked(start..))) }
// If constrained to strictly safe rust code, an alternative is:
// s.get(..start).zip(s.get(start..))
}
This generic prefix splitter could then be wrapped in a specialized function to parse out numerical prefixes:
fn parse_numeric_prefix<'a>(s: &'a str) -> Option<(u32, &'a str)> {
split_prefix(s, char::is_numeric)
.map(|(num_s, rest)| num_s.parse().ok().zip(Some(rest)))
.flatten()
}
UPDATE:
I just re-read your question and realized you want a None when there is no prefix match. Updated functions:
Playground
fn split_prefix<'a, P: Pattern<'a>>(s: &'a str, p: P) -> Option<(&'a str, &'a str)> {
let (start, _) = p.into_searcher(s).next_reject()?;
if start == 0 {
None
} else {
unsafe { Some((s.get_unchecked(..start), s.get_unchecked(start..))) }
}
}
fn parse_numeric_prefix<'a>(s: &'a str) -> Option<(u32, &'a str)> {
split_prefix(s, char::is_numeric)
// We can unwrap the bare `Result` now since we know there's a
// matched numeric which will parse
.map(|(num_s, rest)| (num_s.parse().unwrap(), rest))
}

Related

Multiple parameter lists in Rust

I've just started learning Rust, and I wonder how best to translate the pattern of multiple parameter lists.
In Scala, I can define functions taking multiple parameter lists as follows:
def add(n1: Int)(n2: Int) = n1 + n2
This can be used, for example, for function specialisation:
val incrementer = add(1)
val three = incrementer(2)
val four = incrementer(three)
One of my favourite uses of this feature is incrementally constructing immutable data structures. For example, where the initial caller might not know all of the required fields, so they can fill some of them, get back a closure taking the rest of the fields, and then pass that along for someone else to fill in the rest, completing construction.
I tried to implement this partial construction pattern in Rust:
#[derive(Debug)]
#[non_exhaustive]
pub struct Name<'a> {
pub first: &'a str,
pub last: &'a str,
}
impl<'a> Name<'a> {
pub fn new(first: &'a str, last: &'a str) -> Result<Self, &'static str> {
if first.len() > 0 && last.len() > 0 {
return Ok(Self { first, last });
}
return Err("first and last must not be empty");
}
pub fn first(first: &'a str) -> impl Fn(&'a str) -> (Result<Name, &'a str>) {
|last| Name::new(first, last)
}
}
But it's extremely verbose, and it seems like there should be a much easier way (imagine there were 5 fields, I'd have to write 5 functions).
In essence I would like something like this (pseudo-Rust):
pub fn first(first: &'a str)(last: &'a str) -> Result<Name, &'static str> {
Name::new(first, last)
}
let takes_last = first("John")
let name = takes_last("Smith").unwrap()
What is the best way to have this pattern in Rust?
As Chayim said in a comment you did the currying part the best way possible, of course in Rust you'd usually just define the function taking all parameters, if you want to partially apply a function just use a closure at the call site:
fn add(a: i32, b: i32) -> i32 {
a + b
}
let incrementer = |b| add(1, b);
let three = incrementer(2)
let four = incrementer(three)
One of my favourite uses of this feature is incrementally constructing immutable data structures.
Very well. Rust's ownership rules and type system can be quite awesome for this. Consider your toy example. Here would be one way to implement:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=5f823b81c3b2ca2d01e1ffb0d23aff72
struct NameBuilder {
first: Option<String>,
last: Option<String>,
}
struct Name {
first: String,
last: String,
}
impl NameBuilder {
fn new() -> Self {
NameBuilder {
first: None,
last: None,
}
}
fn with_first(mut self, first: String) -> Result<Self, &'static str> {
if first.len() == 0 {
Err("First name cannot be empty")
} else {
self.first = Some(first);
Ok(self)
}
}
fn with_last(mut self, last: String) -> Result<NameBuilder, &'static str> {
if last.len() == 0 {
Err("Last name cannot be empty")
} else {
self.last = Some(last);
Ok(self)
}
}
fn to_name(self) -> Result<Name, &'static str> {
Ok(Name {
first: self.first.ok_or("Must provide a first name!")?,
last: self.last.ok_or("Must provide a last name!")?,
})
}
}
fn main() -> Result<(), &'static str> {
let name_builder = NameBuilder::new();
assert!(name_builder.to_name().is_err());
let name_builder = NameBuilder::new()
.with_first("Homer".to_string())?
.with_last("Simpson".to_string())?;
let name = name_builder.to_name()?;
assert_eq!(name.first, "Homer");
assert_eq!(name.last, "Simpson");
Ok(())
}
This is obviously overkill for what you're doing in your example but can work really nice in situations where there are lots of parameters but where for any given concrete use case you'd only explicitly set a few of them and use default values for the rest.
An added benefit is that you can freely choose the order in which you build it.
In my example I opted for String rather than &'a str mostly so I don't have to type so many awkward &''s :p
NOTE: Even though the with_first method takes in mut self as argument, we still are dealing with an essentially immutable data structure, because we're just consuming self (i.e. taking ownership). Basically, there's no way that someone would hold a reference and then be surprised by us setting the first name to something else, because you can't move self while someone is still borrowing it.
This is of course not the only way to make a fluent interface. You could also think of a purely functional approach where data is immutable and you don't consume self. Then we're entering "persistent data structures" territory, e.g. https://github.com/orium/rpds

Assembling a string and returning it with lifetime parameters for a l-system

I'm trying to implement a L-System struct and am struggling with it. I already tried different approaches but my main struggle comes from lifetime of references. What I'm trying to achieve is passing the value of the applied axioms back to my system variable, which i passed with the necessary lifetime in apply_axioms_once.
use std::collections::HashMap;
struct LSytem<'a> {
axioms: HashMap<&'a char, &'a str>,
}
impl<'a> LSytem<'a> {
fn apply_axioms_once(&mut self, system: &'a mut str) -> &'a str {
let mut applied: String = String::new();
for c in system.chars() {
let axiom = self.axioms.get(&c).unwrap();
for s in axiom.chars() {
applied.push(s);
}
}
system = applied.as_str();
system
}
fn apply_axioms(&mut self, system: &'a str, iterations: u8) -> &'a str {
let mut applied: &str = system;
// check for 0?
for _ in 0..iterations {
applied = self.apply_axioms_once(applied);
}
&applied
}
}
I already read a couple of similar questions, but still can't quite wrap my head around it. What seems to be the most on point answer is https://stackoverflow.com/a/42506211/18422275, but I'm still puzzled about how to apply this to my issue.
I am still a beginner in rust, and way more bloody than i thought.
This can't work because you return a reference of a data created inside the function (so the given data has a lifetime until the end of the function scope, the returned reference would point to nothing).
You shoud try to return String from your functions instead, so the returned data can be owned.
I made this example to try out:
use std::collections::HashMap;
struct LSytem<'a> {
axioms: HashMap<&'a char, &'a str>,
}
impl<'a> LSytem<'a> {
fn apply_axioms_once(&mut self, system: &String) -> String {
let mut applied: String = String::new();
for c in system.chars() {
let axiom = self.axioms.get(&c).unwrap();
for s in axiom.chars() {
applied.push(s);
}
}
applied
}
fn apply_axioms(&mut self, system: &String, iterations: u8) ->String{
let mut applied = String::from(system);
// check for 0?
for _ in 0..iterations {
applied = self.apply_axioms_once(system);
}
applied
}
}
fn main() {
let mut ls = LSytem {axioms: HashMap::new()};
ls.axioms.insert(&'a', "abc");
let s = String::from("a");
ls.apply_axioms(&s,1);
}

Create peek_while for iterator

I'm trying to create a peek_while method for my iterator which should basically do the same as take_while but only consume the character after the predicate matched. I've taken some inspiration from https://stackoverflow.com/a/30540952/10315665 and from the actual take_while source code https://github.com/rust-lang/rust/blob/2c7bc5e33c25e29058cbafefe680da8d5e9220e9/library/core/src/iter/adapters/take_while.rs#L42-L54 and arrived at this result:
pub struct PeekWhile<I: Iterator, P> {
iter: Peekable<I>,
predicate: P,
}
impl<I, P> fmt::Debug for PeekWhile<I, P>
where
I: Iterator + Debug,
<I as Iterator>::Item: Debug,
{
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.debug_struct("PeekWhile")
.field("iter", &self.iter)
.finish()
}
}
impl<I, P> Iterator for PeekWhile<I, P>
where
I: Iterator,
P: FnMut(&I::Item) -> bool,
{
type Item = I::Item;
fn next(&mut self) -> Option<I::Item> {
let n = self.iter.peek()?;
while let Some(n) = self.iter.peek() {
if (self.predicate)(&n) {
return self.iter.next();
} else {
break;
}
}
None
}
}
pub trait PeekWhileExt: Iterator {
fn peek_while<P>(self, predicate: P) -> PeekWhile<Self, P>
where
Self: Iterator,
Self: Sized,
P: FnMut(&Self::Item) -> bool,
{
PeekWhile {
iter: self.peekable(),
predicate,
}
}
}
impl<I: Iterator> PeekWhileExt for I {}
This is causing an infinite loop, and while I'm not sure why, I see that the take_while::next method does not have a loop, so I changed that to:
let n = self.iter.peek()?;
if (self.predicate)(n) {
Some(n)
} else {
None
}
which now gives me:
mismatched types
expected associated type `<I as Iterator>::Item`
found reference `&<I as Iterator>::Item`
So how would I go about creating such an iterator? Is the code so far correct, and how to I complete it? I know that itertools has https://docs.rs/itertools/0.10.1/itertools/trait.Itertools.html#method.peeking_take_while which sounds promising, but this is a learning project, both in programming concepts and rust itself (I'm creating a JSON parser btw), so I would very much be interested in completing that portion of the code without any libraries.
Example use case (not tested):
let chars = "keyword:".chars();
assert_eq!(chars.peek_while(|c| c.is_alphabetic()).collect::<String>(), "keyword");
assert_eq!(chars.next().unwrap(), ':');
// ^
// Very important that the next char doesn't get lost
Appreciate any help! 😊

Rayon find_any, and return the found item's value

Let's say I have a function f with signature
fn f(a: u8) -> Result<bool, SomeError> {}
Now I have a Vec<u8> and I wish to find if there's any element in this Vec whose value by f is Ok(b), and, in that case, return the value b (and stop calculating f for the rest of the Vec). I wish to have a function with signature
fn my_function(v: Vec<u8>) -> Option<bool> {}
Here's my first implementation:
fn my_function(v: Vec<u8>) -> Option<bool> {
let found = v.par_iter().find_any(|a| f(a).is_ok());
match found {
Some(a) => Some(f(a).unwrap()),
None => None
}
}
But I'm doing one useless f calculation at the end. How could I refactor the code to avoid this additional f calling?
Rayon's map, filter, reduce won't work because they go through the whole Vec, which I want to avoid.
Rayon's .flat_map(…) method will apply f to each element, treat each return value as an iterator, and flatten all of those results into a single new iterator. Results can be used as iterables of one (if Ok) or zero (if Err) elements, so this has the effect of unwrapping Ok results and discarding Errs. You can then apply .find_any(|_| true) to get the first available resulting value without requiring a second call to f(…).
use rayon::prelude::*;
fn my_function(v: Vec<u8>) -> Option<bool> {
v.par_iter().flat_map(|x| f(*x)).find_any(|_| true)
}
fn f(a: u8) -> Result<bool, SomeError> {
if a == 42 {
Ok(true)
} else {
Err(SomeError {})
}
}
fn main() {
println!("{:?}", my_function(vec![0, 1, 2, 42, 3, 42, 0]));
}
#[derive(Debug)]
struct SomeError {}
impl std::error::Error for SomeError {}
impl std::fmt::Display for SomeError {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(f, "{:?}", self)
}
}
Playground Output
Some(true)

Why does borrow checker need life time tags for output when the inputs are very clear?

Why does the borrow checker gets confused about the lifetimes in the below code
fn main() {
let ss = "abc"; // lets say 'a scope
let tt = "def"; // lets say 'b scope
let result = func(ss, tt);
}
fn func(s: &str, t: &str) -> &str {
t
}
| fn func(s: &str, t: &str) -> &str {
| ^ expected lifetime parameter
|
= help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from `s` or `t`
Why does it even matter what is going out in this code? Am I missing something very important edge case?
but when I annotate them with life time tags it works.
fn func<'a>(s: &'a str, t: &'a str) -> &'a str {
t
}
I read that each variable binding (let) creates an Implicit scope, then how come 2 input variables have same scope. Correct me If I'm worng. In the function call 'func' stack, "s" will be pushed first and then "t", So "s" and "t" have different lifetimes. First "t" is dropped and then "s".
You haven’t told the compiler whether the return value may borrow from s, from t, from both, or from neither:
fn from_s<'a, 'b>(s: &'a str, t: &'b str) -> &'a str {
// can be abbreviated: fn from_s<'a>(s: &'a str, t: &str) -> &'a str
s
}
fn from_t<'a, 'b>(s: &'a str, t: &'b str) -> &'b str {
// can be abbreviated: fn from_t<'a>(s: &str, t: &'a str) -> &'a str
t
}
fn from_both<'a>(s: &'a str, t: &'a str) -> &'a str {
if s < t {
s
} else {
t
}
}
fn from_neither<'a, 'b>(s: &'a str, t: &'b str) -> &'static str {
// can be abbreviated: fn func(s: &str, t: &str) -> &'static str
"foo"
}
The compiler can assume the last one isn’t what you wanted if you didn’t write 'static. But you still need to disambiguate between the first three.
To see why the difference would matter, consider a caller like
fn main() {
let s = String::from("s");
let r;
{
let t = String::from("t");
r = from_s(&s, &t);
// t goes out of scope
}
println!("{}", r);
}
If the compiler allowed you to call from_t instead of from_s, you’d be printing a string that had already been freed.
If I understand correctly, the question is "why both arguments may have the same lifetime?" The short answer is that the lifetime annotations are not concrete values, but rather bounds - it states that "this value must live no more/no less then this lifetime".
When you're writing your code as you do in question: fn func<'a>(s: &'a str, t: &'a str) -> &'a str, you're saying literally the following:
there is some lifetime - let's name it 'a, which can be different on every call site.
arguments s and t must both live no less then 'a (for string literals, this is always the case, since they are 'static, but this may not hold for &String coerced to &str) - that is, function type is contravariant over arguments types (and the lifetime is part of a type).
return value must live no more then 'a - function type is covariant over the return type.
(for more information on variance see the Rustonomicon)
Simplified, this means that both arguments must outlive the return value. This is not always what you want - consider the following case (note that I'm returning s now, so that the initialization order doesn't change):
fn main() {
let ss = "abc";
let mut result = "";
{
let tt = "def".to_string();
result = func(ss, &tt);
}
println!("{}", result);
}
fn func<'a>(s: &'a str, t: &'a str) -> &'a str {
s
}
(playground)
This code won't compile, although it is logically correct, since the lifetime annotations don't agree with logic: second argument, t, is in no way connected to the return value, and yet it limits its lifetime, according to function annotations. But when we change function to the following:
fn func<'a, 'b>(s: &'a str, t: &'b str) -> &'a str {
s
}
...it compiles and return the desired result (although with some warnings), since now the lifetime 'b isn't connected with 'a and, in fact, can be removed at all - lifetime elision will do its work well.

Resources