I was looking at the following code that checks if a string is a prefix of the other.
pub fn prefix_matches(prefix: &str, request_path: &str) -> bool {
let mut prefixes = prefix
.split('/')
.map(|p| Some(p))
.chain(std::iter::once(None));
let mut request_paths = request_path
.split('/')
.map(|p| Some(p))
.chain(std::iter::once(None));
for (prefix, request_path) in prefixes.by_ref().zip(&mut request_paths) {
match (prefix, request_path) {
(Some(prefix), Some(request_path)) => {
if (prefix != "*") && (prefix != request_path) {
return false;
}
}
(Some(_), None) => return false,
(None, None) => break,
(None, Some(_)) => break,
}
}
true
}
I'd like to understand why .chain(std::iter::once(None)) is necessary. I know it is probably for "padding", but I'm not exactly sure how it helps.
Consider the example prefix_matches("/v1/publishers", "/v1"). Without .chain(std::iter::once(None)), the next of "v1" for the second argument "/v1" is None as any iterator ends with None, so that the second iteration of the for loop should be for (publishers, None), and the function "should" return false. But I got a panic by removing .chain(std::iter::once(None)).
Let's examine what the iterator returns in your example:
assume iter = prefixes.by_ref().zip(&mut request_paths)
Without inserted None
#
iter.next()
prefix
request_path
request_paths.next()
1
Some((Some(""), Some("")))
Some("")
Some("")
Some(Some(""))
2
Some((Some("v1"), Some("v1")))
Some("v1")
Some("v1")
Some(Some("v1"))
3
None because request_paths is exhausted
-
-
None
Because request_paths is exhausted before we see a difference the for loop exits and we return true
Now With inserted None
#
iter.next()
prefix
request_path
request_paths.next()
1
Some((Some(""), Some("")))
Some("")
Some("")
Some(Some(""))
2
Some((Some("v1"), Some("v1")))
Some("v1")
Some("v1")
Some(Some("v1"))
3
Some((Some("publishers"), None))
Some("publishers")
None
Some(None)
4
not reached because we return early
-
-
None
We see that in the second variant where we chain a None at the end of request_paths we can observe that prefixes has more elements and thus can early return a false in this case:
(Some(_), None) => return false,
The other .chain(...) isn't needed at all since all you do if the prefixes iterator returns None is break the loop and you could in fact shorten your function to this:
pub fn prefix_matches(prefix: &str, request_path: &str) -> bool {
let mut prefixes = prefix.split('/');
let mut request_paths = request_path
.split('/')
.map(|p| Some(p))
.chain(std::iter::once(None));
for (prefix, request_path) in prefixes.by_ref().zip(&mut request_paths) {
match request_path {
Some(request_path) => {
if (prefix != "*") && (prefix != request_path) {
return false;
}
}
None => return false,
}
}
true
}
A better way is to use itertools's zip_longest():
use itertools::{Itertools, EitherOrBoth};
pub fn prefix_matches(prefix: &str, request_path: &str) -> bool {
for item in prefix.split('/').zip_longest(request_path.split('/')) {
match item {
EitherOrBoth::Both(prefix, request_path) => {
if (prefix != "*") && (prefix != request_path) {
return false;
}
}
EitherOrBoth::Left(_) => return false,
EitherOrBoth::Right(_) => break,
}
}
true
}
Related
How can I convert this loop based implementation to iteration syntax?
fn parse_number<B: AsRef<str>>(input: B) -> Option<u32> {
let mut started = false;
let mut b = String::with_capacity(50);
let radix = 16;
for c in input.as_ref().chars() {
match (started, c.is_digit(radix)) {
(false, false) => {},
(false, true) => {
started = true;
b.push(c);
},
(true, false) => {
break;
}
(true, true) => {
b.push(c);
},
}
}
if b.len() == 0 {
None
} else {
match u32::from_str_radix(b.as_str(), radix) {
Ok(v) => Some(v),
Err(_) => None,
}
}
}
The main problem that I found is that you need to terminate the iterator early and be able to ignore characters until the first numeric char is found.
.map_while() fails because it has no state.
.reduce() and .fold() would iterate over the entire str regardless if the number has already ended.
It looks like you want to find the first sequence of digits while ignoring any non-digits before that. You can use a combination of .skip_while and .take_while:
fn parse_number<B: AsRef<str>>(input: B) -> Option<u32> {
let input = input.as_ref();
let radix = 10;
let digits: String = input.chars()
.skip_while(|c| !c.is_digit(radix))
.take_while(|c| c.is_digit(radix))
.collect();
u32::from_str_radix(&digits, radix).ok()
}
fn main() {
dbg!(parse_number("I have 52 apples"));
}
[src/main.rs:14] parse_number("I have 52 apples") = Some(
52,
)
Minimal example of the structure of my code (playground link):
struct Error;
fn answer() -> Result<Option<i64>, Error> {
(0..100_i64)
.map(|i| -> Result<Option<i64>, Error> {
let candidate = i * 7;
if candidate <= 42 {
Ok(Some(candidate))
} else if candidate == 666 {
Err(Error)
} else {
Ok(None)
}
})
.max()
}
The goal is to take the maximum over the i64 values, returning Ok(None) if none of the Options contained a value, and immediately returning Err(Error) if any of the values were Err(Error).
Of course this doesn't compile as is, because we can't take the max() over an iterable of Results.
With a plain for loop, this would be possible (but inelegant):
fn answer() -> Result<Option<i64>, Error> {
let items = (0..100_i64)
.map(|i| -> Result<Option<i64>, Error> {
let candidate = i * 7;
if candidate <= 42 {
Ok(Some(candidate))
} else if candidate == 666 {
Err(Error)
} else {
Ok(None)
}
});
let mut max = None;
for item in items {
match item {
Ok(candidate) => {
// Conveniently, None < Some(_).
max = std::cmp::max(max, candidate);
}
Err(Error) => {
return Err(Error);
}
}
}
Ok(max)
}
Can it be done using chaining syntax and ? instead?
If you don't want to use an external crate, you can use Iterator's try_fold adaptor, which is only a little more verbose:
struct Error;
fn answer() -> Result<Option<i64>, Error> {
(0..100_i64)
.map(|i| -> Result<Option<i64>, Error> {
let candidate = i * 7;
if candidate <= 42 {
Ok(Some(candidate))
} else if candidate == 666 {
Err(Error)
} else {
Ok(None)
}
})
.try_fold(None, |prev, next| next.map(|ok| std::cmp::max(prev, ok)))
}
Using Itertools::fold_ok from the itertools crate:
fn answer() -> Result<Option<i64>, Error> {
(0..100_i64)
.map(|i| -> Result<Option<i64>, Error> {
let candidate = i * 7;
if candidate <= 42 {
Ok(Some(candidate))
} else if candidate == 666 {
Err(Error)
} else {
Ok(None)
}
})
.fold_ok(None, std::cmp::max) // Conveniently, None < Some(_)
}
I guess that the very existence of this function means that we'd need a Result-aware max function, like max_ok, in order to do this more cleanly.
Don't ask why I'm learning Rust using linked lists. I want to mutably iterate down a recursive structure of Option<Rc<RefCell<Node>>> while keeping the ability to swap out nodes and unwrap them. I have a singly-linked list type with a tail pointer to the last node.
pub struct List<T> {
maybe_head: Option<Rc<RefCell<Node<T>>>>,
maybe_tail: Option<Rc<RefCell<Node<T>>>>,
length: usize,
}
struct Node<T> {
value: T,
maybe_next: Option<Rc<RefCell<Node<T>>>>,
}
Let's say we have a constructor and an append function:
impl<T> List<T> {
pub fn new() -> Self {
List {
maybe_head: None,
maybe_tail: None,
length: 0,
}
}
pub fn put_first(&mut self, t: T) -> &mut Self {
let new_node_rc = Rc::new(RefCell::new(Node {
value: t,
maybe_next: mem::replace(&mut self.maybe_head, None),
}));
match self.length == 0 {
true => {
let new_node_rc_clone = new_node_rc.clone();
self.maybe_head = Some(new_node_rc);
self.maybe_tail = Some(new_node_rc_clone);
},
false => {
self.maybe_head = Some(new_node_rc);
},
}
self.length += 1;
self
}
}
I want to remove and return the final node by moving the tail pointer to its predecessor, then returning the old tail. After iterating down the list using RefCell::borrow() and Rc::clone(), the first version of remove_last() below panics when trying to unwrap the tail's Rc. How do I iterate down this recursive structure without incrementing each node's strongcount?
PANICKING VERSION
pub fn remove_last(&mut self) -> Option<T> {
let mut opt: Option<Rc<RefCell<Node<T>>>>;
if let Some(rc) = &self.maybe_head {
opt = Some(Rc::clone(rc))
} else {
return None;
};
let mut rc: Rc<RefCell<Node<T>>>;
let mut countdown_to_penultimate: i32 = self.length as i32 - 2;
loop {
rc = match opt {
None => panic!(),
Some(ref wrapped_rc) => Rc::clone(wrapped_rc),
};
match RefCell::borrow(&rc).maybe_next {
Some(ref next_rc) => {
if countdown_to_penultimate == 0 {
self.maybe_tail = Some(Rc::clone(x));
}
opt = Some(Rc::clone(next_rc));
countdown_to_penultimate -= 1;
},
None => {
let grab_tail = match Rc::try_unwrap(opt.take().unwrap()) {
Ok(something) => {
return Some(something.into_inner().value);
}
Err(_) => panic!(),
};
},
}
}
If all I do during iteration is move the tail pointer and enclose the iteration code in a {...} block to drop cloned references, I can then safely swap out and return the old tail, but this is obviously unsatisfying.
UNSATISFYING WORKING VERSION
pub fn remove_last(&mut self) -> Option<T> {
{let mut opt: Option<Rc<RefCell<Node<T>>>>;
if let Some(rc) = &self.maybe_head {
opt = Some(Rc::clone(rc))
} else {
return None;
};
let mut rc: Rc<RefCell<Node<T>>>;
let mut countdown_to_penultimate: i32 = self.length as i32 - 2;
loop {
rc = match opt {
None => panic!(),
Some(ref wrapped_rc) => Rc::clone(wrapped_rc),
};
match RefCell::borrow(&rc).maybe_next {
Some(ref next_rc) => {
if countdown_to_penultimate == 0 {
self.maybe_tail = Some(Rc::clone(&rc));
}
opt = Some(Rc::clone(next_rc));
countdown_to_penultimate -= 1;
},
None => {
break;
},
}
}}
match self.maybe_tail {
None => panic!(),
Some(ref rc) => {
let tail = mem::replace(&mut RefCell::borrow_mut(rc).maybe_next, None);
return Some(Rc::try_unwrap(tail.unwrap()).ok().unwrap().into_inner().value);
}
};
}
I wrote a List::remove_last() that I can live with, although I'd still like to know what more idiomatic Rust code here might look like. I find that this traversal idiom also extends naturally into things like removing the n-th node or removing the first node that matches some predicate.
fn remove_last(&mut self) -> Option<T> {
let mut opt: Option<Rc<RefCell<Node<T>>>>;
let mut rc: Rc<RefCell<Node<T>>>;
#[allow(unused_must_use)]
match self.length {
0 => {
return None;
}
1 => {
let head = mem::replace(&mut self.maybe_head, None);
mem::replace(&mut self.maybe_tail, None);
self.length -= 1;
return Some(
Rc::try_unwrap(head.unwrap())
.ok()
.unwrap()
.into_inner()
.value,
);
}
_ => {
opt = Some(Rc::clone(self.maybe_head.as_ref().unwrap()));
}
}
loop {
rc = match opt {
None => unreachable!(),
Some(ref wrapped_rc) => Rc::clone(wrapped_rc),
};
let mut borrowed_node = RefCell::borrow_mut(&rc);
let maybe_next = &mut borrowed_node.maybe_next;
match maybe_next {
None => unreachable!(),
Some(_)
if std::ptr::eq(
maybe_next.as_ref().unwrap().as_ptr(),
self.maybe_tail.as_ref().unwrap().as_ptr(),
) =>
{
borrowed_node.maybe_next = None;
let old_tail = self.maybe_tail.replace(Rc::clone(&rc));
self.length -= 1;
return Some(
Rc::try_unwrap(old_tail.unwrap())
.ok()
.unwrap()
.into_inner()
.value,
);
}
Some(ref next_rc) => {
opt = Some(Rc::clone(next_rc));
}
}
}
}
Problem description
Using serde_json to deserialize a very long array of objects into a Vec<T> can take a long time, because the entire array must be read into memory up front. I'd like to iterate over the items in the array instead to avoid the up-front processing and memory requirements.
My approach so far
StreamDeserializer cannot be used directly, because it can only iterate over self-delimiting types placed back-to-back. So what I've done so far is to write a custom struct to implement Read, wrapping another Read but omitting the starting and ending square brackets, as well as any commas.
For example, the reader will transform the JSON [{"name": "foo"}, {"name": "bar"}, {"name": "baz"}] into {"name": "foo"} {"name": "bar"} {"name": "baz"} so it can be used with StreamDeserializer.
Here is the code in its entirety:
use std::io;
/// An implementation of `Read` that transforms JSON input where the outermost
/// structure is an array. The enclosing brackets and commas are removed,
/// causing the items to be adjacent to one another. This works with
/// [`serde_json::StreamDeserializer`].
pub(crate) struct ArrayStreamReader<T> {
inner: T,
depth: Option<usize>,
inside_string: bool,
escape_next: bool,
}
impl<T: io::Read> ArrayStreamReader<T> {
pub(crate) fn new_buffered(inner: T) -> io::BufReader<Self> {
io::BufReader::new(ArrayStreamReader {
inner,
depth: None,
inside_string: false,
escape_next: false,
})
}
}
#[inline]
fn do_copy(dst: &mut [u8], src: &[u8], len: usize) {
if len == 1 {
dst[0] = src[0]; // Avoids memcpy call.
} else {
dst[..len].copy_from_slice(&src[..len]);
}
}
impl<T: io::Read> io::Read for ArrayStreamReader<T> {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
if buf.is_empty() {
return Ok(0);
}
let mut tmp = vec![0u8; buf.len()];
// The outer loop is here in case every byte was skipped, which can happen
// easily if `buf.len()` is 1. In this situation, the operation is retried
// until either no bytes are read from the inner stream, or at least 1 byte
// is written to `buf`.
loop {
let byte_count = self.inner.read(&mut tmp)?;
if byte_count == 0 {
return if self.depth.is_some() {
Err(io::ErrorKind::UnexpectedEof.into())
} else {
Ok(0)
};
}
let mut tmp_pos = 0;
let mut buf_pos = 0;
for (i, b) in tmp.iter().cloned().enumerate() {
if self.depth.is_none() {
match b {
b'[' => {
tmp_pos = i + 1;
self.depth = Some(0);
},
b if b.is_ascii_whitespace() => {},
b'\0' => break,
_ => return Err(io::ErrorKind::InvalidData.into()),
}
continue;
}
if self.inside_string {
match b {
_ if self.escape_next => self.escape_next = false,
b'\\' => self.escape_next = true,
b'"' if !self.escape_next => self.inside_string = false,
_ => {},
}
continue;
}
let depth = self.depth.unwrap();
match b {
b'[' | b'{' => self.depth = Some(depth + 1),
b']' | b'}' if depth > 0 => self.depth = Some(depth - 1),
b'"' => self.inside_string = true,
b'}' if depth == 0 => return Err(io::ErrorKind::InvalidData.into()),
b',' | b']' if depth == 0 => {
let len = i - tmp_pos;
do_copy(&mut buf[buf_pos..], &tmp[tmp_pos..], len);
tmp_pos = i + 1;
buf_pos += len;
// Then write a space to separate items.
buf[buf_pos] = b' ';
buf_pos += 1;
if b == b']' {
// Reached the end of outer array. If another array
// follows, the stream will continue.
self.depth = None;
}
},
_ => {},
}
}
if tmp_pos < byte_count {
let len = byte_count - tmp_pos;
do_copy(&mut buf[buf_pos..], &tmp[tmp_pos..], len);
buf_pos += len;
}
if buf_pos > 0 {
// If at least some data was read, return with the amount. Otherwise, the outer
// loop will try again.
return Ok(buf_pos);
}
}
}
}
It is used like so:
use std::io;
use serde::Deserialize;
#[derive(Deserialize)]
struct Item {
name: String,
}
fn main() -> io::Result<()> {
let json = br#"[{"name": "foo"}, {"name": "bar"}]"#;
let wrapped = ArrayStreamReader::new_buffered(&json[..]);
let first_item: Item = serde_json::Deserializer::from_reader(wrapped)
.into_iter()
.next()
.unwrap()?;
assert_eq!(first_item.name, "foo");
Ok(())
}
At last, a question
There must be a better way to do this, right?
Can you put another match clause in one of the match results of a match like this in:
pub fn is_it_file(input_file: &str) -> String {
let path3 = Path::new(input_file);
match path3.is_file() {
true => "File!".to_string(),
false => match path3.is_dir() {
true => "Dir!".to_string(),
_ => "Don't care",
}
}
}
If not why ?
Yes you can (see Qantas' answer). But Rust often has prettier ways to do what you want. You can do multiple matches at once by using tuples.
pub fn is_it_file(input_file: &str) -> String {
let path3 = Path::new(input_file);
match (path3.is_file(), path3.is_dir()) {
(true, false) => "File!",
(false, true) => "Dir!",
_ => "Neither or Both... bug?",
}.to_string()
}
Sure you can, match is an expression:
fn main() {
fn foo() -> i8 {
let a = true;
let b = false;
match a {
true => match b {
true => 1,
false => 2
},
false => 3
}
}
println!("{}", foo()); // 2
}
You can view the results of this on the Rust playpen.
The only thing that seems off about your code to me is the inconsistent usage of .to_string() in your code, the last match case doesn't have that.