I'm working through the book and I'm not understanding why this function doesn't compile:
pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
contents
.lines() // Fetch an iterator for each line in `contents`
.map(|x| x.to_lowercase()) // (x is now String) Convert each line to lowercase
.filter(|x| x.contains(query)) // Filter out lines that do not contain query
.map(|x| x.trim()) // Eliminate extra whitespace
.collect() // Consume iterator and produce Vec<&str>
}
Without the to_lowercase() line it will run, and I'm guessing that is because that will return a String instead of the &str we'll need to output at the end. However when I either substitute a conversion back to &str like:
// -- snip --
.map(|x| x.to_lowercase().to_str())
// -- snip --
This states that a temporary value is being referenced. Which I assume because &str reference the String, when the String is released it makes my &str invalid as well.
Are closures just not a good way of handling this, and I should break it into different statement?
This states that a temporary value is being referenced. Which I assume because &str reference the String, when the String is released it makes my &str invalid as well.
This assumption is correct.
Are closures just not a good way of handling this, and I should break it into different statement?
No amount of refactoring that function will change the fact that to_lowercase() requires modifying the &str and has to produce a String, so if lowercasing the contents is a requirement then this is the best you can do:
fn search(query: &str, contents: &str) -> Vec<String> {
contents
.lines() // Fetch an iterator for each line in contents
.map(|x| x.trim().to_lowercase()) // Trim & lowercase string
.filter(|x| x.contains(query)) // Filter out lines that do not contain query
.collect() // Consume iterator and produce Vec<String>
}
If you want to perform case-insensitive filtering but still return the unmodified contents (no lowercasing) then you can do this:
fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
contents
.lines() // Fetch an iterator for each line in contents
.filter(|x| x.to_lowercase().contains(query)) // Filter out lines that do not contain query
.map(|x| x.trim()) // Trim whitesapce
.collect() // Consume iterator and produce Vec<&'a str>
}
For the person that follows behind me, here's where I ended up. Thanks #pretzelhammer.
pub fn search_case_insensitive<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
let query = query.to_lowercase(); // transform query to lowercase()
contents
.lines()
.filter(|x| x.to_lowercase().contains(&query))
.map(|x| x.trim())
.collect()
}
Related
My ultimate goal is to parse the prefix number of a &str if there is one. So I want a function that given "123abc345" will give me a pair (u32, &str) which is (123, "abc345").
My idea is that if I have a Pattern type I should be able to do something like
/// `None` if there is no prefix in `s` that matches `p`,
/// Otherwise a pair of the longest matching prefix and the rest
/// of the string `s`.
fn split_prefix<P:Pattern<'a>(s: &'a str, p: P) -> Option<(&'a str, &'a str)>;
My goal would be achieved by doing something like
let num = if let Some((num_s, rest)) = split_prefix(s, char::is_digit) {
s = rest;
num_s.parse()
}
What's the best way to get that?
I looked at the source for str::split_once and modified slightly to inclusively return a greedily matched prefix.
Playground
#![feature(pattern)]
use std::str::pattern::{Pattern, Searcher};
/// See source code for `std::str::split_once`
fn split_prefix<'a, P: Pattern<'a>>(s: &'a str, p: P) -> Option<(&'a str, &'a str)> {
let (start, _) = p.into_searcher(s).next_reject()?;
// `start` here is the start of the unmatched (rejected) substring, so that is our sole delimiting index
unsafe { Some((s.get_unchecked(..start), s.get_unchecked(start..))) }
// If constrained to strictly safe rust code, an alternative is:
// s.get(..start).zip(s.get(start..))
}
This generic prefix splitter could then be wrapped in a specialized function to parse out numerical prefixes:
fn parse_numeric_prefix<'a>(s: &'a str) -> Option<(u32, &'a str)> {
split_prefix(s, char::is_numeric)
.map(|(num_s, rest)| num_s.parse().ok().zip(Some(rest)))
.flatten()
}
UPDATE:
I just re-read your question and realized you want a None when there is no prefix match. Updated functions:
Playground
fn split_prefix<'a, P: Pattern<'a>>(s: &'a str, p: P) -> Option<(&'a str, &'a str)> {
let (start, _) = p.into_searcher(s).next_reject()?;
if start == 0 {
None
} else {
unsafe { Some((s.get_unchecked(..start), s.get_unchecked(start..))) }
}
}
fn parse_numeric_prefix<'a>(s: &'a str) -> Option<(u32, &'a str)> {
split_prefix(s, char::is_numeric)
// We can unwrap the bare `Result` now since we know there's a
// matched numeric which will parse
.map(|(num_s, rest)| (num_s.parse().unwrap(), rest))
}
I have a reasonably simple function (let's call it intersection) that takes two parameters of type &[u32] and I'd like the return type to be &[u32]. This function takes in two slices(arrays?), and returns a new slice(array?) containing elements that are in both slices.
pub fn intersection<'a>(left: &'a [u32], right: &'a [u32]) -> &'a [u32] {
let left_set: HashSet<u32> = left.iter().cloned().collect();
let right_set: HashSet<u32> = right.iter().cloned().collect();
// I can't figure out how to get a
// `&[u32]` output idiomatically
let result: &[u32] = left_set
.intersection(&right_set)
.into_iter()
.....
.....
result //<- this is a slice
}
I suppose I could do something like create a Vec<u32> but then borrow checker doesn't like me returning that Vec<u32>.
pub fn intersection<'a>(left: &'a [u32], right: &'a [u32]) -> &'a [u32] {
.....
.....
let mut result: Vec<u32> = left_set
.intersection(&right_set)
.into_iter()
.cloned()
.collect();
result.sort();
result.as_slice() //<-- ERROR cannot return reference to local variable
// `result` returns a reference to data owned by the current function
}
I'm probably missing a trick here. Any advice on how to do this idiomatically in Rust?
This function takes in two arrays
No, it takes two slices.
I'm probably missing a trick here. Any advice on how to do this idiomatically in Rust?
There is no trick and you can't. A slice is a form of borrow, by definition a slice refers to memory owned by some other collection (static memory, a vector, an array, ...).
This means like every other borrow it can't be returned if it borrows data from the local scope, that would result in a dangling pointer (as the actual owner will get destroyed when the scope ends).
The correct thing to do is to just return a Vec:
pub fn intersection<'a>(left: &'a [u32], right: &'a [u32]) -> Vec<u32> {
left.iter().collect::<HashSet<_>>().intersection(
&right.iter().collect()
).map(|&&v| v).collect()
}
Or if it's very common for one of the slices to be a subset of the other and you're happy paying for the check (possibly because you can use something like a bitmap) you could return a Cow and in the subset case return the subset slice:
pub fn intersection<'a>(left: &'a [u32], right: &'a [u32]) -> Cow<'a, [u32]> {
if issubset(left, right) {
Cow::Borrowed(left)
} else if issubset(right, left) {
Cow::Borrowed(right)
} else {
Cow::Owned(
left.iter().collect::<HashSet<_>>().intersection(
&right.iter().collect()
).map(|&&v| v).collect()
)
}
}
This question already has answers here:
How to get the index of the current element being processed in the iteration without a for loop?
(2 answers)
Closed 3 months ago.
So I'm following with the Rust Book tutorial on writing a grep clone with Rust. The book at first gives the example of this function to search a file for a given string:
pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
let mut results = Vec::new();
for line in contents.lines() {
if line.contains(query) {
results.push(line);
}
}
results
}
Which I then modified to that results would include the line number the match was found on, like so:
pub fn search<'a>(query: &str, contents: &'a str) -> Vec<String> {
let mut results = Vec::new();
for (index, line) in contents.lines().enumerate() {
if line.to_lowercase().contains(&query) {
let line_found = &index + 1;
results.push(String::from(format!("Line {line_found}: {line}")));
}
}
results
}
So then afterwards, the books shows how to use an iterator to make the code simpler and cleaner:
pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
contents
.lines()
.filter(|line| line.contains(query))
.collect()
}
And I'm struggling to figure out how I would obtain the same functionality to include the line number the match was found on with this function. In collect() is there a way for me access the index of the iterator and the line itself?
Use enumerate, which transforms an Iterator<Item = T> to an Iterator<Item = (usize, T)> where the first element of the tuple is your index. You already used it in your second example, it can be used in the transformed version as well since it's still an iterator combinator.
Suppose I have a Vec<&str> (I have no control over this type) and I need to transform each string. Since the elements are string slices (which don't own the string), I need to store the actual strings somewhere.
I thought about storing the new strings in a container that's able to grow without reallocating. So I came up with the following:
use std::collections::VecDeque;
fn transform(s: &str) -> String {
s.to_owned() + "blah"
}
pub fn func<'a>(strings: &mut Vec<&'a str>, string_storage: &'a mut VecDeque<String>) {
for string in strings.iter_mut() {
string_storage.push_back(transform(string));
*string = string_storage.back().unwrap().as_str();
}
}
Of course, it doesn't work (I get error[E0502]: cannot borrow `*string_storage` as mutable because it is also borrowed as immutable). I understand why the error happens, but I've struggled for some time to figure out a solution.
The best I could come up with is to separate the pushes and assignments into two loops:
pub fn func<'a>(strings: &mut Vec<&'a str>, string_storage: &'a mut Vec<String>) {
for string in strings.iter() {
string_storage.push(transform(string));
}
for (i, string) in strings.iter_mut().enumerate() {
*string = string_storage[i].as_str();
}
}
But it seems weird to have to iterate twice. Is there a simpler solution?
Is there a way to write a function that will look like this:
fn read_and_iter_u32_line<'a>(mut buf: String) -> Iterator<Item=u32> {
buf.truncate(0);
io::stdin().read_line(&mut buf).unwrap();
buf.split_whitespace()
.map(|s| s.parse::<u32>().unwrap())
}
Iterators are lazy. This means the data they are operating on needs to exist as long as the iterator itself, but buf ceases to exist when the function returns. If we keep buf around for longer it can work though.
Writing functions that return complex iterators is tricky at the moment, but it's possible:
use std::io;
use std::iter::{Iterator, Map};
use std::str::SplitWhitespace;
fn read_and_iter_u32_line(buf: &mut String) -> Map<SplitWhitespace, fn(&str) -> u32> {
buf.truncate(0);
io::stdin().read_line(buf).unwrap();
buf.split_whitespace().map(parse)
}
fn parse(s: &str) -> u32 {
s.parse::<u32>().unwrap()
}
Iterators are lazy, therefore they must borrow their input:
this requires that their input lives longer than they do
this requires, for safety reason, that the borrowing relationship be exposed at function boundaries
The former point requires that the buf be passed as a reference and not a value.
The latter point prevents you from returning a generic Iterator<Item=u32> (even boxed) because it hides the borrowing relationship between the returned iterator and the passed in buffer. So instead you must be explicit:
use std::io;
fn read_and_iter_u32_line<'a>(buf: &'a mut String)
-> std::iter::Map<std::str::SplitWhitespace<'a>, fn(&'a str) -> u32>
{
fn convert(s: &str) -> u32 { s.parse().unwrap() }
buf.truncate(0);
io::stdin().read_line(buf).unwrap();
buf.split_whitespace().map(convert)
}
fn main() {
let mut buf = "".to_string();
for i in read_and_iter_u32_line(&mut buf) {
println!("{}", i);
}
}
Note: actually, the lifetime annotation can be elided, but I have exposed it here to highlight why the generic iterator is not possible.