I have a hard time fixing the needless collect clippy warning.
pub fn import_selection(packages: &mut Vec<PackageRow>) -> io::Result<()> {
let file = fs::File::open("uad_exported_selection.txt")?;
let reader = BufReader::new(file);
let imported_selection: Vec<String> = reader
.lines()
.map(|l| l.expect("Could not exported selection"))
.collect();
for (i,p) in packages.iter_mut().enumerate() {
if imported_selection.contains(&p.name) {
p.selected = true;
...
} else {
p.selected = false;
}
}
Ok(())
}
I tried to use any() directly from the iterator, it compiles but doesn't seems to work (it doesn't find anything when it should)
Is it really possible to remove the collect in this case?
This is a known issue with current Clippy. It tries to point out that collecting an iterator first and then calling contains(), len(), etc. on the collection is usually unnecessary. Yet current Clippy does not take into account that in your case the result of collect() is used multiple times during the loop, saving you from re-executing the lines().map()-iterator every iteration of that loop.
It is a false positive.
You can mark the method or function with #[allow(clippy::needless_collect)] to suppress the lint.
Related
I'm trying to perform a parallel operation on several chunks of strings at a time, and I'm finding having an issue with the borrow checker:
(for context, identifiers is a Vec<String> from a CSV file, client is reqwest and target is an Arc<String> that is write once read many)
use futures::{stream, StreamExt};
use std::sync::Arc;
async fn nop(
person_ids: &[String],
target: &str,
url: &str,
) -> String {
let noop = format!("{} {}", target, url);
let noop2 = person_ids.iter().for_each(|f| {f.as_str();});
"Some text".into()
}
#[tokio::main]
async fn main() {
let target = Arc::new(String::from("sometext"));
let url = "http://example.com";
let identifiers = vec!["foo".into(), "bar".into(), "baz".into(), "qux".into(), "quux".into(), "quuz".into(), "corge".into(), "grault".into(), "garply".into(), "waldo".into(), "fred".into(), "plugh".into(), "xyzzy".into()];
let id_sets: Vec<&[String]> = identifiers.chunks(2).collect();
let responses = stream::iter(id_sets)
.map(|person_ids| {
let target = target.clone();
tokio::spawn( async move {
let resptext = nop(person_ids, target.as_str(), url).await;
})
})
.buffer_unordered(2);
responses
.for_each(|b| async { })
.await;
}
Playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=e41c635e99e422fec8fc8a581c28c35e
Given chunks yields a Vec<&[String]>, the compiler complains that identifiers doesn't live long enough because it potentially goes out of scope while the slices are being referenced. Realistically this won't happen because there's an await. Is there a way to tell the compiler that this is safe, or is there another way of getting chunks as a set of owned Strings for each thread?
There was a similarly asked question that used into_owned() as a solution, but when I try that, rustc complains about the slice size not being known at compile time in the request_user function.
EDIT: Some other questions as well:
Is there a more direct way of using target in each thread without needing Arc? From the moment it is created, it never needs to be modified, just read from. If not, is there a way of pulling it out of the Arc that doesn't require the .as_str() method?
How do you handle multiple error types within the tokio::spawn() block? In the real world use, I'm going to receive quick_xml::Error and reqwest::Error within it. It works fine without tokio spawn for concurrency.
Is there a way to tell the compiler that this is safe, or is there another way of getting chunks as a set of owned Strings for each thread?
You can chunk a Vec<T> into a Vec<Vec<T>> without cloning by using the itertools crate:
use itertools::Itertools;
fn main() {
let items = vec![
String::from("foo"),
String::from("bar"),
String::from("baz"),
];
let chunked_items: Vec<Vec<String>> = items
.into_iter()
.chunks(2)
.into_iter()
.map(|chunk| chunk.collect())
.collect();
for chunk in chunked_items {
println!("{:?}", chunk);
}
}
["foo", "bar"]
["baz"]
This is based on the answers here.
Your issue here is that the identifiers are a Vector of references to a slice. They will not necessarily be around once you've left the scope of your function (which is what async move inside there will do).
Your solution to the immediate problem is to convert the Vec<&[String]> to a Vec<Vec<String>> type.
A way of accomplishing that would be:
let id_sets: Vec<Vec<String>> = identifiers
.chunks(2)
.map(|x: &[String]| x.to_vec())
.collect();
The following code reads space-delimited records from stdin, and writes comma-delimited records to stdout. Even with optimized builds it's rather slow (about twice as slow as using, say, awk).
use std::io::BufRead;
fn main() {
let stdin = std::io::stdin();
for line in stdin.lock().lines().map(|x| x.unwrap()) {
let fields: Vec<_> = line.split(' ').collect();
println!("{}", fields.join(","));
}
}
One obvious improvement would be to use itertools to join without allocating a vector (the collect call causes an allocation). However, I tried a different approach:
fn main() {
let stdin = std::io::stdin();
let mut cache = Vec::<&str>::new();
for line in stdin.lock().lines().map(|x| x.unwrap()) {
cache.extend(line.split(' '));
println!("{}", cache.join(","));
cache.clear();
}
}
This version tries to reuse the same vector over and over. Unfortunately, the compiler complains:
error: `line` does not live long enough
--> src/main.rs:7:22
|
7 | cache.extend(line.split(' '));
| ^^^^
|
note: reference must be valid for the block suffix following statement 1 at 5:39...
--> src/main.rs:5:40
|
5 | let mut cache = Vec::<&str>::new();
| ^
note: ...but borrowed value is only valid for the for at 6:4
--> src/main.rs:6:5
|
6 | for line in stdin.lock().lines().map(|x| x.unwrap()) {
| ^
error: aborting due to previous error
Which of course makes sense: the line variable is only alive in the body of the for loop, whereas cache keeps a pointer into it across iterations. But that error still looks spurious to me: since the cache is cleared after each iteration, no reference to line can be kept, right?
How can I tell the borrow checker about this?
The only way to do this is to use transmute to change the Vec<&'a str> into a Vec<&'b str>. transmute is unsafe and Rust will not raise an error if you forget the call to clear here. You might want to extend the unsafe block up to after the call to clear to make it clear (no pun intended) where the code returns to "safe land".
use std::io::BufRead;
use std::mem;
fn main() {
let stdin = std::io::stdin();
let mut cache = Vec::<&str>::new();
for line in stdin.lock().lines().map(|x| x.unwrap()) {
let cache: &mut Vec<&str> = unsafe { mem::transmute(&mut cache) };
cache.extend(line.split(' '));
println!("{}", cache.join(","));
cache.clear();
}
}
In this case Rust doesn't know what you're trying to do. Unfortunately, .clear() does not affect how .extend() is checked.
The cache is a "vector of strings that live as long as the main function", but in extend() calls you're appending "strings that live only as long as one loop iteration", so that's a type mismatch. The call to .clear() doesn't change the types.
Usually such limited-time uses are expressed by making a long-lived opaque object that enables access to its memory by borrowing a temporary object with the right lifetime, like RefCell.borrow() gives a temporary Ref object. Implementation of that would be a bit involved and would require unsafe methods for recycling Vec's internal memory.
In this case an alternative solution could be to avoid any allocations at all (.join() allocates too) and stream the printing thanks to Peekable iterator wrapper:
for line in stdin.lock().lines().map(|x| x.unwrap()) {
let mut fields = line.split(' ').peekable();
while let Some(field) = fields.next() {
print!("{}", field);
if fields.peek().is_some() {
print!(",");
}
}
print!("\n");
}
BTW: Francis' answer with transmute is good too. You can use unsafe to say you know what you're doing and override the lifetime check.
Itertools has .format() for the purpose of lazy formatting, which skips allocating a string too.
use std::io::BufRead;
use itertools::Itertools;
fn main() {
let stdin = std::io::stdin();
for line in stdin.lock().lines().map(|x| x.unwrap()) {
println!("{}", line.split(' ').format(","));
}
}
A digression, something like this is a “safe abstraction” in the littlest sense of the solution in another answer here:
fn repurpose<'a, T: ?Sized>(mut v: Vec<&T>) -> Vec<&'a T> {
v.clear();
unsafe {
transmute(v)
}
}
Another approach is to refrain from storing references altogether, and to store indices instead. This trick can also be useful in other data structure contexts, so this might be a nice opportunity to try it out.
use std::io::BufRead;
fn main() {
let stdin = std::io::stdin();
let mut cache = Vec::new();
for line in stdin.lock().lines().map(|x| x.unwrap()) {
cache.push(0);
cache.extend(line.match_indices(' ').map(|x| x.0 + 1));
// cache now contains the indices where new words start
// do something with this information
for i in 0..(cache.len() - 1) {
print!("{},", &line[cache[i]..(cache[i + 1] - 1)]);
}
println!("{}", &line[*cache.last().unwrap()..]);
cache.clear();
}
}
Though you made the remark yourself in the question, I feel the need to point out that there are more elegant methods to do this using iterators, that might avoid the allocation of a vector altogether.
The approach above was inspired by a similar question here, and becomes more useful if you need to do something more complicated than printing.
Elaborating on Francis's answer about using transmute(), this could be safely abstracted, I think, with this simple function:
pub fn zombie_vec<'a, 'b, T: ?Sized>(mut data: Vec<&'a T>) -> Vec<&'b T> {
data.clear();
unsafe {
std::mem::transmute(data)
}
}
Using this, the original code would be:
fn main() {
let stdin = std::io::stdin();
let mut cache0 = Vec::<&str>::new();
for line in stdin.lock().lines().map(|x| x.unwrap()) {
let mut cache = cache0; // into the loop
cache.extend(line.split(' '));
println!("{}", cache.join(","));
cache0 = zombie_vec(cache); // out of the loop
}
}
You need to move the outer vector into every loop iteration, and restore it back to before you finish, while safely erasing the local lifetime.
The safe solution is to use .drain(..) instead of .clear() where .. is a "full range". It returns an iterator, so drained elements can be processed in a loop. It is also available for other collections (String, HashMap, etc.)
fn main() {
let mut cache = Vec::<&str>::new();
for line in ["first line allocates for", "second"].iter() {
println!("Size and capacity: {}/{}", cache.len(), cache.capacity());
cache.extend(line.split(' '));
println!(" {}", cache.join(","));
cache.drain(..);
}
}
I'm trying to get into Rust from a Python background and I'm having an issue with a PoC I'm messing around with. I've read through a bunch of blogs and documentation on how to handle errors in Rust, but I can't figure out how to implement it when I use unwrap and get a panic. Here is part of the code:
fn main() {
let listener = TcpListener::bind("127.0.0.1:5432").unwrap();
// The .0 at the end is indexing a tuple, FYI
loop {
let stream = listener.accept().unwrap().0;
stream.set_read_timeout(Some(Duration::from_millis(100)));
handle_request(stream);
}
}
// Things change a bit in here
fn handle_request(stream: TcpStream) {
let address = stream.peer_addr().unwrap();
let mut reader = BufReader::new(stream);
let mut payload = "".to_string();
for line in reader.by_ref().lines() {
let brap = line.unwrap();
payload.push_str(&*brap);
if brap == "" {
break;
}
}
println!("{0} -> {1}", address, payload);
send_response(reader.into_inner());
}
It is handling the socket not receiving anything with set_read_timeout on the stream as expected, but when that triggers my unwrap on line in the loop it is causing a panic. Can someone help me understand how I'm properly supposed to apply a match or Option to this code?
There seems to be a large disconnect here. unwrap or expect handle errors by panicking the thread. You aren't really supposed to "handle" a panic in 99.9% of Rust programs; you just let things die.
If you don't want a panic, don't use unwrap or expect. Instead, pass back the error via a Result or an Option, as described in the Error Handling section of The Rust Programming Language.
You can match (or any other pattern matching technique) on the Result or Option and handle an error appropriately for your case. One example of handling the error in your outer loop:
use std::net::{TcpStream, TcpListener};
use std::time::Duration;
use std::io::prelude::*;
use std::io::BufReader;
fn main() {
let listener = TcpListener::bind("127.0.0.1:5432")
.expect("Unable to bind to the port");
loop {
if let Ok((stream, _)) = listener.accept() {
stream
.set_read_timeout(Some(Duration::from_millis(100)))
.expect("Unable to set timeout");
handle_request(stream);
}
}
}
Note that I highly recommend using expect instead of unwrap in just about every case.
As the title already says, I'm trying to move a &[&str] into a thread. Well, actually, the code below works, but I have two problems with it:
let args2: Vec<_> = args.iter().map(|arg| { arg.to_string() }).collect(); seems a bit verbose to convert a &[&str] into a Vec<String>. Can this be done "nicer"?
If I understand it correctly, the strings get copied twice: first by the let cmd2 and let args2 statements; then by moving them inside the move closure. Is this correct? And if so, can it be done with one copy?
I'm aware of thread::scoped, but is deprecated at the moment. I'm also coding this to learn a bit more about Rust, so comments about "unrusty" code are appreciated too.
use std::process::{Command,Output};
use std::thread;
use std::thread::JoinHandle;
pub struct Process {
joiner: JoinHandle<Output>,
}
impl Process {
pub fn new(cmd: &str, args: &[&str]) -> Process {
// Copy the strings for the thread
let cmd2 = cmd.to_string();
let args2: Vec<_> = args.iter().map(|arg| { arg.to_string() }).collect();
let child = thread::spawn(move || {
Command::new(cmd2).args(&args2[..]).output().unwrap_or_else(|e| {
panic!("Failed to execute process: {}", e)
})
});
Process { joiner: child }
}
}
let args2: Vec<_> = args.iter().map(|arg| { arg.to_string() }).collect(); seems a bit verbose to convert a &[&str] into a Vec. Can this be done "nicer"?
I don't think so. There are a few minor variations of this that also work (e.g. args.iter().cloned().map(String::from).collect();), but I can't think of one that is substantially nicer. One minor point is that using to_string to convert a &str to a String isn't quite as efficient as using String::from or to_owned.
If I understand it correctly, the strings get copied twice: first by the let cmd2 and let args2 statements; then by moving them inside the move closure. Is this correct? And if so, can it be done with one copy?
No, the strings are only copied where you call to_string. Strings don't implement Copy, so they're never copied implicitly. If you try to access the strings after they have been moved to the closure, you will get a compiler error.
Consider the following code example, I have a vector of JoinHandlers in which I need it iterate over to join back to the main thread, however, upon doing so I am getting the error error: cannot move out of borrowed content.
let threads = Arc::new(Mutex::new(Vec::new()));
for _x in 0..100 {
let handle = thread::spawn(move || {
//do some work
}
threads.lock().unwrap().push((handle));
}
for t in threads.lock().unwrap().iter() {
t.join();
}
Unfortunately, you can't do this directly. When Mutex consumes the data structure you fed to it, you can't get it back by value again. You can only get &mut reference to it, which won't allow moving out of it. So even into_iter() won't work - it needs self argument which it can't get from MutexGuard.
There is a workaround, however. You can use Arc<Mutex<Option<Vec<_>>>> instead of Arc<Mutex<Vec<_>>> and then just take() the value out of the mutex:
for t in threads.lock().unwrap().take().unwrap().into_iter() {
}
Then into_iter() will work just fine as the value is moved into the calling thread.
Of course, you will need to construct the vector and push to it appropriately:
let threads = Arc::new(Mutex::new(Some(Vec::new())));
...
threads.lock().unwrap().as_mut().unwrap().push(handle);
However, the best way is to just drop the Arc<Mutex<..>> layer altogether (of course, if this value is not used from other threads).
As referenced in How to take ownership of T from Arc<Mutex<T>>? this is now possible to do without any trickery in Rust using Arc::try_unwrap and Mutex.into_inner()
let threads = Arc::new(Mutex::new(Vec::new()));
for _x in 0..100 {
let handle = thread::spawn(move || {
println!("{}", _x);
});
threads.lock().unwrap().push(handle);
}
let threads_unwrapped: Vec<JoinHandle<_>> = Arc::try_unwrap(threads).unwrap().into_inner().unwrap();
for t in threads_unwrapped.into_iter() {
t.join().unwrap();
}
Play around with it in this playground to verify.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=9d5635e7f778bc744d1fb855b92db178
while the drain is a good solution, you can also do the following thing
// with a copy
let built_words: Arc<Mutex<Vec<String>>> = Arc::new(Mutex::new(vec![]));
let result: Vec<String> = built_words.lock().unwrap().clone();
// using drain
let mut locked_result = built_words.lock().unwrap();
let mut result: Vec<String> = vec![];
result.extend(locked_result.drain(..));
I would prefer to clone the data to get the original value. Not sure if it has any performance overhead.