How to flat_map and split from an iterator of Strings - rust

Is there a way to make this work without switching to using iter instead of into_iter?
let strings: Vec<String> = vec!["1 2".to_string(), "3 4".to_string()];
strings.into_iter().flat_map(|str| str.split(" "));
The problem is
error[E0515]: cannot return value referencing function parameter `str`
--> src/lib.rs:3:40
|
3 | strings.into_iter().flat_map(|str| str.split(" "));
| ---^^^^^^^^^^^
| |
| returns a value referencing data owned by the current function
| `str` is borrowed here
When using iter instead of into_iter, I get an iterator of references and everything works but I'd like to know if it's possible to make this work on an iterator of Strings.

The issue with your code is that you are doing those actions:
You are consuming your vector with into_iter
Thus, inside the closure, you are taking a String by value that you borrow with split
In your temporary iterator, you hold a reference to this string
Conclusion: you are trying to return a reference to a local variable.
To solve this issue, you must create owned strings from the split string, and collect them to not hold a reference anymore:
fn main() {
let strings = vec!["1 2".to_string(), "3 4".into()];
let result = strings.into_iter().flat_map(|str| str.split(" ").map(str::to_owned).collect::<Vec<_>>());
println!("{:?}", result.collect::<Vec<_>>());
}
In fact, this would be less costly to not consume the vector at first:
fn main() {
let strings = vec!["1 2".to_string(), "3 4".into()];
let result = strings.iter().flat_map(|str| str.split(" ")).map(str::to_owned);
println!("{:?}", result.collect::<Vec<_>>());
}

I had the same problem and end up creating an iterator that take the ownership of the string, here is the code. Since you're splitting by space it should fit
struct Words {
buf: String,
offset: usize,
}
impl Words {
fn new(buf: String) -> Words {
Words { buf, offset: 0 }
}
}
impl Iterator for Words {
type Item = String;
fn next(&mut self) -> Option<String> {
let s = &(self.buf)[self.offset..];
let left = s.chars().take_while(|x| x.is_whitespace()).count();
let right = left + s[left..].chars().take_while(|x| !x.is_whitespace()).count();
if left < right {
self.offset += right;
return Some(String::from(&s[left..right]));
}
None
}
}
Here how I use it
fn read_file<'a>(buf: impl BufRead) -> impl Iterator<Item = String> {
buf.lines().filter_map(Result::ok).flat_map(Words::new)
}

Related

Temporarily cache owned value between iterator adapters

I'd like to know if there's a way to cache an owned value between iterator adapters, so that adapters later in the chain can reference it.
(Or if there's another way to allow later adapters to reference an owned value that lives inside the iterator chain.)
To illustrate what I mean, let's look at this (contrived) example:
I have a function that returns a String, which is called in an Iterator map() adapter, yielding an iterator over Strings. I'd like to get an iterator over the chars() in those Strings, but the chars() method requires a string slice, meaning a reference.
Is this possible to do, without first collecting the Strings?
Here's a minimal example that of course fails:
fn greet(c: &str) -> String {
"Hello, ".to_owned() + c
}
fn main() {
let names = ["Martin", "Helena", "Ingrid", "Joseph"];
let iterator = names.into_iter().map(greet);
let fails = iterator.flat_map(<str>::chars);
}
Playground
Using a closure instead of <str>::chars - |s| s.chars() - does of course not work either. It makes the types match, but breaks lifetimes.
Edit (2022-10-03): In response to the comments, here's some pseudocode of what I have in mind, but with incorrect lifetimes:
struct IteratorCache<'a, T, I>{
item : Option<T>,
inner : I,
_p : core::marker::PhantomData<&'a T>
}
impl<'a, T, I> Iterator for IteratorCache<'a, T,I>
where I: Iterator<Item=T>
{
type Item=&'a T;
fn next(&mut self) -> Option<&'a T> {
self.item = self.inner.next();
if let Some(x) = &self.item {
Some(&x)
} else {
None
}
}
}
The idea would be that the reference could stay valid until the next call to next(). However I don't know if this can be expressed with the function signature of the Iterator trait. (Or if this can be expressed at all.)
I don't think something like this exists yet, and collecting into a Vec<char> creates some overhead, but you can write such an iterator yourself with a little bit of trickery:
struct OwnedCharsIter {
s: String,
index: usize,
}
impl OwnedCharsIter {
pub fn new(s: String) -> Self {
Self { s, index: 0 }
}
}
impl Iterator for OwnedCharsIter {
type Item = char;
fn next(&mut self) -> Option<Self::Item> {
// Slice of leftover characters
let slice = &self.s[self.index..];
// Iterator over leftover characters
let mut chars = slice.chars();
// Query the next char
let next_char = chars.next()?;
// Compute the new index by looking at how many bytes are left
// after querying the next char
self.index = self.s.len() - chars.as_str().len();
// Return next char
Some(next_char)
}
}
fn greet(c: &str) -> String {
"Hello, ".to_owned() + c
}
fn main() {
let names = ["Martin", "Helena", "Ingrid", "Joseph"];
let iterator = names.into_iter().map(greet);
let chars_iter = iterator.flat_map(OwnedCharsIter::new);
println!("{:?}", chars_iter.collect::<String>())
}
"Hello, MartinHello, HelenaHello, IngridHello, Joseph"

Returns a reference to str from !format

I´m following the rust's reference book and in the chapter 10.3. Validating References with Lifetimes I'm playing with various values and ways to wrasp the concepts rigth.
But i can't found a solution to return an &str with a diferent lifetime for the longestv2 function:
fn main() {
let result:&str;
let result2:&str;
{
let string1 = String::from("long string is long");
{
let string2 = String::from("xyz");
result = longestv1(string1.as_str(), string2.as_str());
result2= longestv2(string1.as_str(), string2.as_str());
}
}
println!("The longestv1 string is {}", result);
println!("The longestv2 string is {}", result2);
}
fn longestv1<'a,'b>(x: &'a str, y: &'a str) -> &'b str {
if x.len() > y.len() {
"x"
} else {
"y"
}
}
fn longestv2<'a,'b>(x: &'a str, y: &'a str) -> &'b str {
if x.len() > y.len() {
format!("{} with {}",x, x.len()).as_str()
} else {
format!("{} with {}",y, y.len()).as_str()
}
}
It will give me this errors:
error[E0515]: cannot return reference to temporary value
--> src\main.rs:31:9
|
31 | format!("{} with {}",x, x.len()).as_str()
| --------------------------------^^^^^^^^^
| |
| returns a reference to data owned by the current function
| temporary value created here
error[E0515]: cannot return reference to temporary value
--> src\main.rs:33:9
|
33 | format!("{} with {}",y, y.len()).as_str()
| --------------------------------^^^^^^^^^
| |
| returns a reference to data owned by the current function
| temporary value created here
I want to return an &str like the first function longestv1, I know that String works but i want to wrasp the concepts
I tried to wrap and Option and then use as_ref() and other things in the web, copy() ,clone() , into() etc... but nothing trick rust to move out the temporaly value
TL;DR: You cannot.
You cannot return a reference to a temporary. Period. You can leak it, but don't.
The compiler will free your String at the end of the function, and then you'll have a dangling reference. This is bad.
Just return String.

reusable rust vector with associated vector of slices

I need rust code to read lines of a file, and break them into an array of slices. The working code is
use std::io::{self, BufRead};
fn main() {
let stdin = io::stdin();
let mut f = stdin.lock();
let mut line : Vec<u8> = Vec::new();
loop {
line.clear();
let sz = f.read_until(b'\n', &mut line).unwrap();
if sz == 0 {break};
let body : Vec<&[u8]> = line.split(|ch| *ch == b'\t').collect();
DoStuff(body);
}
}
However, that code is slower than I'd like. The code I want to write is
use std::io::{self, BufRead};
fn main() {
let stdin = io::stdin();
let mut f = stdin.lock();
let mut line : Vec<u8> = Vec::new();
let mut body: Vec<&[u8]> = Vec::new();
loop {
line.clear();
let sz = f.read_until(b'\n', &mut line).unwrap();
if sz == 0 {break};
body.extend(&mut line.split(|ch| *ch == b'\t'));
DoStuff(body);
body.clear();
}
}
but that runs afoul of the borrow checker.
In general, I'd like a class containing a Vec<u8> and an associated Vec<&[u8]>, which is the basis of a lot of C++ code I'm trying to replace.
Is there any way I can accomplish this?
I realize that I could replace the slices with pairs of integers, but that seems clumsy.
No, I can't just use the items from the iterator as they come through -- I need random access to the individual column values. In the simplified case where I do use the iterator directly, I get a 3X speedup, which is why I suspect a significant speedup by replacing collect with extend.
Other comments on this code is also welcome.
Just for sake of completeness, and since you are coming from C++, a more Rusty way of writing the code would be
use std::io::{self, BufRead};
fn do_stuff(body: &[&str]) {}
fn main() {
for line in io::stdin().lock().lines() {
let line = line.unwrap();
let body = line.split('\t').collect::<Vec<_>>();
do_stuff(&body);
}
}
This uses .lines() from BufRead to get an iterator over \n-delimited lines from the input. It assumes that your input is actually valid UTF8, which in your code was not a requirement. If it is not UTF8, use .split(b'\n'), .split(b'\t') and &[&u8] instead.
Notice that this does allocate and subsequently free a new Vec via .collect() every time the loop executes. We are somewhat relying on the allocator's free-list to make this cheap. But it is correct in all cases.
The reason your second example does not compile (after fixing the DoStuff(&body) is this:
12 | line.clear();
| ^^^^^^^^^^^^ mutable borrow occurs here
...
15 | body.extend(&mut line.split(|ch| *ch == b'\t'));
| ---- ---- immutable borrow occurs here
| |
| immutable borrow later used here
The problem here is the loop: Line 12 line.clear() will execute after line 15 body.extend() from the second iteration onwards. But the compiler has figured out that body borrows from line (it contains references to the fields inside line). The call to line.clear() mutably borrows line - all of line - and as far as the compiler is concerned is free to do anything it wants with the data it holds. This is an error because line.clear() could possibly mutate data that body has borrowed immutably. The compiler does not reason about the fact that .clear() obviously does not mutate the borrowed data, quite the opposite in fact, but the compiler's reasoning stops at the function signature.
I seems like the answer is
No, it's not possible to reuse the vector of slices.
The way to go is to make something like a slice, but with integer offsets rather than pointers. Code is attached, comments welcome.
Performance is currently 15% better than the C++, but the C++ is part of a larger system, and is probably doing some additional stuff.
/// pointers into a vector, simulating a slice without the ownership issues
#[derive(Debug, Clone)]
pub struct FakeSlice {
begin: u32,
end: u32,
}
/// A line of a text file, broken into columns.
/// Access to the `lines` and `parts` is allowed, but should seldom be necessary
/// `line` does not include the trailing newline
/// An empty line contains one empty column
///```
/// use std::io::BufRead;
/// let mut data = b"one\ttwo\tthree\n";
/// let mut dp = &data[..];
/// let mut line = cdx::TextLine::new();
/// let eof = line.read(&mut dp).unwrap();
/// assert_eq!(eof, false);
/// assert_eq!(line.strlen(), 13);
/// line.split(b'\t');
/// assert_eq!(line.len(), 3);
/// assert_eq!(line.get(1), b"two");
///```
#[derive(Debug, Clone)]
pub struct TextLine {
pub line: Vec<u8>,
pub parts: Vec<FakeSlice>,
}
impl TextLine {
/// make a new TextLine
pub fn new() -> TextLine {
TextLine {
line: Vec::new(),
parts: Vec::new(),
}
}
fn clear(&mut self) {
self.parts.clear();
self.line.clear();
}
/// How many column in the line
pub fn len(&self) -> usize {
self.parts.len()
}
/// How many bytes in the line
pub fn strlen(&self) -> usize {
self.line.len()
}
/// should always be false, but required by clippy
pub fn is_empty(&self) -> bool {
self.parts.is_empty()
}
/// Get one column. Return an empty column if index is too big.
pub fn get(&self, index: usize) -> &[u8] {
if index >= self.parts.len() {
&self.line[0..0]
} else {
&self.line[self.parts[index].begin as usize..self.parts[index].end as usize]
}
}
/// Read a new line from a file, should generally be followed by `split`
pub fn read<T: std::io::BufRead>(&mut self, f: &mut T) -> std::io::Result<bool> {
self.clear();
let sz = f.read_until(b'\n', &mut self.line)?;
if sz == 0 {
Ok(true)
} else {
if self.line.last() == Some(&b'\n') {
self.line.pop();
}
Ok(false)
}
}
/// split the line into columns
/// hypothetically you could split on one delimiter, do some work, then split on a different delimiter.
pub fn split(&mut self, delim: u8) {
self.parts.clear();
let mut begin: u32 = 0;
let mut end: u32 = 0;
#[allow(clippy::explicit_counter_loop)] // I need the counter to be u32
for ch in self.line.iter() {
if *ch == delim {
self.parts.push(FakeSlice { begin, end });
begin = end + 1;
}
end += 1;
}
self.parts.push(FakeSlice { begin, end });
}
}

How to "crop" characters off the beginning of a string in Rust?

I want a function that can take two arguments (string, number of letters to crop off front) and return the same string except with the letters before character x gone.
If I write
let mut example = "stringofletters";
CropLetters(example, 3);
println!("{}", example);
then the output should be:
ingofletters
Is there any way I can do this?
In many uses it would make sense to simply return a slice of the input, avoiding any copy. Converting #Shepmaster's solution to use immutable slices:
fn crop_letters(s: &str, pos: usize) -> &str {
match s.char_indices().skip(pos).next() {
Some((pos, _)) => &s[pos..],
None => "",
}
}
fn main() {
let example = "stringofletters"; // works with a String if you take a reference
let cropped = crop_letters(example, 3);
println!("{}", cropped);
}
Advantages over the mutating version are:
No copy is needed. You can call cropped.to_string() if you want a newly allocated result; but you don't have to.
It works with static string slices as well as mutable String etc.
The disadvantage is that if you really do have a mutable string you want to modify, it would be slightly less efficient as you'd need to allocate a new String.
Issues with your original code:
Functions use snake_case, types and traits use CamelCase.
"foo" is a string literal of type &str. These may not be changed. You will need something that has been heap-allocated, such as a String.
The call crop_letters(stringofletters, 3) would transfer ownership of stringofletters to the method, which means you wouldn't be able to use the variable anymore. You must pass in a mutable reference (&mut).
Rust strings are not ASCII, they are UTF-8. You need to figure out how many bytes each character requires. char_indices is a good tool here.
You need to handle the case of when the string is shorter than 3 characters.
Once you have the byte position of the new beginning of the string, you can use drain to move a chunk of bytes out of the string. We just drop these bytes and let the String move over the remaining bytes.
fn crop_letters(s: &mut String, pos: usize) {
match s.char_indices().nth(pos) {
Some((pos, _)) => {
s.drain(..pos);
}
None => {
s.clear();
}
}
}
fn main() {
let mut example = String::from("stringofletters");
crop_letters(&mut example, 3);
assert_eq!("ingofletters", example);
}
See Chris Emerson's answer if you don't actually need to modify the original String.
I found this answer which I don't consider really idiomatic:
fn crop_with_allocation(string: &str, len: usize) -> String {
string.chars().skip(len).collect()
}
fn crop_without_allocation(string: &str, len: usize) -> &str {
// optional length check
if string.len() < len {
return &"";
}
&string[len..]
}
fn main() {
let example = "stringofletters"; // works with a String if you take a reference
let cropped = crop_with_allocation(example, 3);
println!("{}", cropped);
let cropped = crop_without_allocation(example, 3);
println!("{}", cropped);
}
my version
fn crop_str(s: &str, n: usize) -> &str {
let mut it = s.chars();
for _ in 0..n {
it.next();
}
it.as_str()
}
#[test]
fn test_crop_str() {
assert_eq!(crop_str("123", 1), "23");
assert_eq!(crop_str("ЖФ1", 1), "Ф1");
assert_eq!(crop_str("ЖФ1", 2), "1");
}

error: `line` does not live long enough but it's ok in playground

I can't figure it out why my local var line does not live long enough. You can see bellow my code. It work on the Rust's playground.
I may have an idea of the issue: I use a structure (load is a function of this structure). As I want to store the result of the line in a member of my struct, it could be the issue. But I don't see what should I do to resolve this problem.
pub struct Config<'a> {
file: &'a str,
params: HashMap<&'a str, &'a str>
}
impl<'a> Config<'a> {
pub fn new(file: &str) -> Config {
Config { file: file, params: HashMap::new() }
}
pub fn load(&mut self) -> () {
let f = match fs::File::open(self.file) {
Ok(e) => e,
Err(e) => {
println!("Failed to load {}, {}", self.file, e);
return;
}
};
let mut reader = io::BufReader::new(f);
let mut buffer = String::new();
loop {
let result = reader.read_line(&mut buffer);
if result.is_ok() && result.ok().unwrap() > 0 {
let line: Vec<String> = buffer.split("=").map(String::from).collect();
let key = line[0].trim();
let value = line[1].trim();
self.params.insert(key, value);
}
buffer.clear();
}
}
...
}
And I get this error:
src/conf.rs:33:27: 33:31 error: `line` does not live long enough
src/conf.rs:33 let key = line[0].trim();
^~~~
src/conf.rs:16:34: 41:6 note: reference must be valid for the lifetime 'a as defined on the block at 16:33...
src/conf.rs:16 pub fn load(&mut self) -> () {
src/conf.rs:17 let f = match fs::File::open(self.file) {
src/conf.rs:18 Ok(e) => e,
src/conf.rs:19 Err(e) => {
src/conf.rs:20 println!("Failed to load {}, {}", self.file, e);
src/conf.rs:21 return;
...
src/conf.rs:31:87: 37:14 note: ...but borrowed value is only valid for the block suffix following statement 0 at 31:86
src/conf.rs:31 let line: Vec<String> = buffer.split("=").map(String::from).collect();
src/conf.rs:32
src/conf.rs:33 let key = line[0].trim();
src/conf.rs:34 let value = line[1].trim();
src/conf.rs:35
src/conf.rs:36 self.params.insert(key, value);
...
There are three steps in realizing why this does not work.
let line: Vec<String> = buffer.split("=").map(String::from).collect();
let key = line[0].trim();
let value = line[1].trim();
self.params.insert(key, value);
line is a Vec of Strings, meaning the vector owns the strings its containing. An effect of this is that when the vector is freed from memory, the elements, the strings, are also freed.
If we look at string::trim here, we see that it takes and returns a &str. In other words, the function does not allocate anything, or transfer ownership - the string it returns is simply a slice of the original string. So if we were to free the original string, the trimmed string would not have valid data.
The signature of HashMap::insert is fn insert(&mut self, k: K, v: V) -> Option<V>. The function moves both the key and the value, because these needs to be valid for as long as they may be in the hashmap. We would like to give the hashmap the two strings. However, both key and value are just references to strings which is owned by the vector - we are just borrowing them - so we can't give them away.
The solution is simple: copy the strings after they have been split.
let line: Vec<String> = buffer.split("=").map(String::from).collect();
let key = line[0].trim().to_string();
let value = line[1].trim().to_string();
self.params.insert(key, value);
This will allocate two new strings, and copy the trimmed slices into the new strings.
We could have moved the string out of the vector(ie. with Vec::remove), if we didn't trim the strings afterwards; I was unable to find a easy way of trimming a string without allocating a new one.
In addition, as malbarbo mentions, we can avoid the extra allocation that is done with map(String::from), and the creation of the vector with collect(), by simply omitting them.
In this case you have to use String instead of &str. See this to understand the difference.
You can also eliminate the creation of the intermediate vector and use the iterator return by split direct
pub struct Config<'a> {
file: &'a str,
params: HashMap<String, String>
}
...
let mut line = buffer.split("=");
let key = line.next().unwrap().trim().to_string();
let value = line.next().unwrap().trim().to_string();

Resources