Rust chunks method with owned values? - rust

I'm trying to perform a parallel operation on several chunks of strings at a time, and I'm finding having an issue with the borrow checker:
(for context, identifiers is a Vec<String> from a CSV file, client is reqwest and target is an Arc<String> that is write once read many)
use futures::{stream, StreamExt};
use std::sync::Arc;
async fn nop(
person_ids: &[String],
target: &str,
url: &str,
) -> String {
let noop = format!("{} {}", target, url);
let noop2 = person_ids.iter().for_each(|f| {f.as_str();});
"Some text".into()
}
#[tokio::main]
async fn main() {
let target = Arc::new(String::from("sometext"));
let url = "http://example.com";
let identifiers = vec!["foo".into(), "bar".into(), "baz".into(), "qux".into(), "quux".into(), "quuz".into(), "corge".into(), "grault".into(), "garply".into(), "waldo".into(), "fred".into(), "plugh".into(), "xyzzy".into()];
let id_sets: Vec<&[String]> = identifiers.chunks(2).collect();
let responses = stream::iter(id_sets)
.map(|person_ids| {
let target = target.clone();
tokio::spawn( async move {
let resptext = nop(person_ids, target.as_str(), url).await;
})
})
.buffer_unordered(2);
responses
.for_each(|b| async { })
.await;
}
Playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=e41c635e99e422fec8fc8a581c28c35e
Given chunks yields a Vec<&[String]>, the compiler complains that identifiers doesn't live long enough because it potentially goes out of scope while the slices are being referenced. Realistically this won't happen because there's an await. Is there a way to tell the compiler that this is safe, or is there another way of getting chunks as a set of owned Strings for each thread?
There was a similarly asked question that used into_owned() as a solution, but when I try that, rustc complains about the slice size not being known at compile time in the request_user function.
EDIT: Some other questions as well:
Is there a more direct way of using target in each thread without needing Arc? From the moment it is created, it never needs to be modified, just read from. If not, is there a way of pulling it out of the Arc that doesn't require the .as_str() method?
How do you handle multiple error types within the tokio::spawn() block? In the real world use, I'm going to receive quick_xml::Error and reqwest::Error within it. It works fine without tokio spawn for concurrency.

Is there a way to tell the compiler that this is safe, or is there another way of getting chunks as a set of owned Strings for each thread?
You can chunk a Vec<T> into a Vec<Vec<T>> without cloning by using the itertools crate:
use itertools::Itertools;
fn main() {
let items = vec![
String::from("foo"),
String::from("bar"),
String::from("baz"),
];
let chunked_items: Vec<Vec<String>> = items
.into_iter()
.chunks(2)
.into_iter()
.map(|chunk| chunk.collect())
.collect();
for chunk in chunked_items {
println!("{:?}", chunk);
}
}
["foo", "bar"]
["baz"]
This is based on the answers here.

Your issue here is that the identifiers are a Vector of references to a slice. They will not necessarily be around once you've left the scope of your function (which is what async move inside there will do).
Your solution to the immediate problem is to convert the Vec<&[String]> to a Vec<Vec<String>> type.
A way of accomplishing that would be:
let id_sets: Vec<Vec<String>> = identifiers
.chunks(2)
.map(|x: &[String]| x.to_vec())
.collect();

Related

Waiting on multiple futures borrowing mutable self

Each of the following methods need (&mut self) to operate. The following code gives the error.
cannot borrow *self as mutable more than once at a time
How can I achieve this correctly?
loop {
let future1 = self.handle_new_connections(sender_to_connector.clone());
let future2 = self.handle_incoming_message(&mut receiver_from_peers);
let future3 = self.handle_outgoing_message();
tokio::pin!(future1, future2, future3);
tokio::select! {
_=future1=>{},
_=future2=>{},
_=future3=>{}
}
}
You are not allowed to have multiple mutable references to an object and there's a good reason for that.
Imagine you pass an object mutably to 2 different functions and they edited the object out of sync since you don't have any mechanism for that in place. then you'd end up with something called a race condition.
To prevent this bug rust allows only one mutable reference to an object at a time but you can have multiple immutable references and often you see people use internal mutability patterns.
In your case, you want data not to be able to be modified by 2 different threads at the same time so you'd wrap it in a Lock or RwLock then since you want multiple threads to be able to own this value you'd wrap that in an Arc.
here you can read about interior mutability in more detail.
Alternatively, while declaring the type of your function you could add proper lifetimes to indicate the resulting Future will be waited on in the same context by giving it a lifetime since your code waits for the future before the next iteration that would do the trick as well.
I encountered the same problem when dealing with async code. Here is what I figured out:
Let's say you have an Engine, that contains both incoming and outgoing:
struct Engine {
log: Arc<Mutex<Vec<String>>>,
outgoing: UnboundedSender<String>,
incoming: UnboundedReceiver<String>,
}
Our goal is to create two functions process_incoming and process_logic and then poll them simultaneously without messing up with the borrow checker in Rust.
What is important here is that:
You cannot pass &mut self to these async functions simultaneously.
Either incoming or outgoing will be only held by one function at most.
The data access by both process_incoming and process_logic need to be wrapped by a lock.
Any trying to lock Engine directly will lead to a deadlock at runtime.
So that leaves us giving up using the method in favor of the associated function:
impl Engine {
// ...
async fn process_logic(outgoing: &mut UnboundedSender<String>, log: Arc<Mutex<Vec<String>>>) {
loop {
Delay::new(Duration::from_millis(1000)).await.unwrap();
let msg: String = "ping".into();
println!("outgoing: {}", msg);
log.lock().push(msg.clone());
outgoing.send(msg).await.unwrap();
}
}
async fn process_incoming(
incoming: &mut UnboundedReceiver<String>,
log: Arc<Mutex<Vec<String>>>,
) {
while let Some(msg) = incoming.next().await {
println!("incoming: {}", msg);
log.lock().push(msg);
}
}
}
And we can then write main as:
fn main() {
futures::executor::block_on(async {
let mut engine = Engine::new();
let a = Engine::process_incoming(&mut engine.incoming, engine.log.clone()).fuse();
let b = Engine::process_logic(&mut engine.outgoing, engine.log).fuse();
futures::pin_mut!(a, b);
select! {
_ = a => {},
_ = b => {},
}
});
}
I put the whole example here.
It's a workable solution, only be aware that you should add futures and futures-timer in your dependencies.

How to store one of two constants in a value, where the constants share traits?

Depending on configuration I need to select either stdout or sink once, and pass the results as an output destination for subsequent output call.s
My Java and C++ experience tell me that abstracting away from the concrete type is wise and makes room for future design changes. This code however won't compile:
let out = if std::env::var("LOG").is_ok() {
std::io::stdout()
} else {
std::io::sink()
};
Stating...
`if` and `else` have incompatible types
What is the Rust-o-matic way of solving this?
Dynamic dispatch using trait objects is probably what you need:
use std::io::{self, Write};
use std::env;
fn get_output() -> Box<dyn Write> {
if env::var("LOG").is_ok() {
Box::new(io::stdout())
} else {
Box::new(io::sink())
}
}
let out = get_output();
The approach from Peter's answer is probably what you need, but it does require an extra allocation. (Which probably doesn't matter in the least in this case, but could matter in other scenarios.) If you are only passing out downward, i.e. as argument to functions, you can avoid the allocation by using two variables to store the different outputs:
let (mut stdout, mut sink);
let out: &mut dyn Write = if std::env::var("LOG").is_ok() {
stdout = std::io::stdout();
&mut stdout
} else {
sink = std::io::sink();
&mut sink
};
// ...proceed to use out...

How to convert hyper's Body stream into a Result<Vec<String>>?

I'm updating code to the newest versions of hyper and futures, but everything I've tried misses implemented traits in some kind or another.
A not working example playground for this ...
extern crate futures; // 0.3.5
extern crate hyper; // 0.13.6
use futures::{future, FutureExt, StreamExt, TryFutureExt, TryStreamExt};
use hyper::body;
fn get_body_as_vec<'a>(b: body::Body) -> future::BoxFuture<'a, Result<Vec<String>, hyper::Error>> {
let f = b.and_then(|bytes| {
let s = std::str::from_utf8(&bytes).expect("sends no utf-8");
let mut lines: Vec<String> = Vec::new();
for l in s.lines() {
lines.push(l.to_string());
}
future::ok(lines)
});
Box::pin(f)
}
This produces the error:
error[E0277]: the trait bound `futures::stream::AndThen<hyper::Body, futures::future::Ready<std::result::Result<std::vec::Vec<std::string::String>, hyper::Error>>, [closure#src/lib.rs:8:24: 15:6]>: futures::Future` is not satisfied
--> src/lib.rs:17:5
|
17 | Box::pin(f)
| ^^^^^^^^^^^ the trait `futures::Future` is not implemented for `futures::stream::AndThen<hyper::Body, futures::future::Ready<std::result::Result<std::vec::Vec<std::string::String>, hyper::Error>>, [closure#src/lib.rs:8:24: 15:6]>`
|
= note: required for the cast to the object type `dyn futures::Future<Output = std::result::Result<std::vec::Vec<std::string::String>, hyper::Error>> + std::marker::Send`
I'm unable to create a compatible future. Body is a stream and I can't find any "converter" function with the required traits implemented.
With hyper 0.12, I used concat2().
From the reference of and_then:
Note that this function consumes the receiving stream and returns a
wrapped version of it.
To process the entire stream and return a single future representing
success or error, use try_for_each instead.
Yes your f is still a Stream, try_for_each will work as reference suggested but try_fold would be a better choice to represent bytes as lines in vector but as #Shepmaster points in the comment; there is a possibility that if we directly convert chunks to the UTF-8 we can lose integrity of multi-byte characters from response.
Due to consistency of data, the easiest solution might be collecting all the bytes before conversion to UTF-8.
use futures::{future, FutureExt, TryStreamExt};
use hyper::body;
fn get_body_as_vec<'a>(b: body::Body) -> future::BoxFuture<'a, Result<Vec<String>>> {
let f = b
.try_fold(vec![], |mut vec, bytes| {
vec.extend_from_slice(&bytes);
future::ok(vec)
})
.map(|x| {
Ok(std::str::from_utf8(&x?)?
.lines()
.map(ToString::to_string)
.collect())
});
Box::pin(f)
}
Playground
You can test the multiple chunk behavior by using channel from hyper Body. Here is I've created the line partition across the chunks scenario, this will work fine with the code above but if you directly process the chunks you will lose the consistency.
let (mut sender, body) = body::Body::channel();
tokio::spawn(async move {
sender
.send_data("Line1\nLine2\nLine3\nLine4\nLine5".into())
.await;
sender
.send_data("next bytes of Line5\nLine6\nLine7\nLine8\n----".into())
.await;
});
println!("{:?}", get_body_as_vec(body).await);
Playground ( Success scenario )
Playground ( Fail scenario: "next bytes of Line5" will be
represented as new line in Vec)
Note : I've used std::error:Error as a return type since both hyper::Error and FromUtf8Error implement it, you may still use your expect strategy with hyper::Error.
I found two solutions, each of them is pretty simple:
/*
WARNING for beginners!!! This use statement
is important so we can later use .data() method!!!
*/
use hyper::body::{to_bytes, HttpBody};
// Takes only single chunk of data!
let my_vector: Vec<u8> = request.into_body().data().await.unwrap().unwrap().to_vec();
// Takes all data chunks, not just the first one:
let my_bytest = body::to_bytes(res.into_body()).await?;
let my_string = String::from_utf8(my_vector).unwrap();
This example doesn't handle errors properly, ensure your code does.

Moving a &[&str] into a thread

As the title already says, I'm trying to move a &[&str] into a thread. Well, actually, the code below works, but I have two problems with it:
let args2: Vec<_> = args.iter().map(|arg| { arg.to_string() }).collect(); seems a bit verbose to convert a &[&str] into a Vec<String>. Can this be done "nicer"?
If I understand it correctly, the strings get copied twice: first by the let cmd2 and let args2 statements; then by moving them inside the move closure. Is this correct? And if so, can it be done with one copy?
I'm aware of thread::scoped, but is deprecated at the moment. I'm also coding this to learn a bit more about Rust, so comments about "unrusty" code are appreciated too.
use std::process::{Command,Output};
use std::thread;
use std::thread::JoinHandle;
pub struct Process {
joiner: JoinHandle<Output>,
}
impl Process {
pub fn new(cmd: &str, args: &[&str]) -> Process {
// Copy the strings for the thread
let cmd2 = cmd.to_string();
let args2: Vec<_> = args.iter().map(|arg| { arg.to_string() }).collect();
let child = thread::spawn(move || {
Command::new(cmd2).args(&args2[..]).output().unwrap_or_else(|e| {
panic!("Failed to execute process: {}", e)
})
});
Process { joiner: child }
}
}
let args2: Vec<_> = args.iter().map(|arg| { arg.to_string() }).collect(); seems a bit verbose to convert a &[&str] into a Vec. Can this be done "nicer"?
I don't think so. There are a few minor variations of this that also work (e.g. args.iter().cloned().map(String::from).collect();), but I can't think of one that is substantially nicer. One minor point is that using to_string to convert a &str to a String isn't quite as efficient as using String::from or to_owned.
If I understand it correctly, the strings get copied twice: first by the let cmd2 and let args2 statements; then by moving them inside the move closure. Is this correct? And if so, can it be done with one copy?
No, the strings are only copied where you call to_string. Strings don't implement Copy, so they're never copied implicitly. If you try to access the strings after they have been moved to the closure, you will get a compiler error.

Cannot move data out of a Mutex

Consider the following code example, I have a vector of JoinHandlers in which I need it iterate over to join back to the main thread, however, upon doing so I am getting the error error: cannot move out of borrowed content.
let threads = Arc::new(Mutex::new(Vec::new()));
for _x in 0..100 {
let handle = thread::spawn(move || {
//do some work
}
threads.lock().unwrap().push((handle));
}
for t in threads.lock().unwrap().iter() {
t.join();
}
Unfortunately, you can't do this directly. When Mutex consumes the data structure you fed to it, you can't get it back by value again. You can only get &mut reference to it, which won't allow moving out of it. So even into_iter() won't work - it needs self argument which it can't get from MutexGuard.
There is a workaround, however. You can use Arc<Mutex<Option<Vec<_>>>> instead of Arc<Mutex<Vec<_>>> and then just take() the value out of the mutex:
for t in threads.lock().unwrap().take().unwrap().into_iter() {
}
Then into_iter() will work just fine as the value is moved into the calling thread.
Of course, you will need to construct the vector and push to it appropriately:
let threads = Arc::new(Mutex::new(Some(Vec::new())));
...
threads.lock().unwrap().as_mut().unwrap().push(handle);
However, the best way is to just drop the Arc<Mutex<..>> layer altogether (of course, if this value is not used from other threads).
As referenced in How to take ownership of T from Arc<Mutex<T>>? this is now possible to do without any trickery in Rust using Arc::try_unwrap and Mutex.into_inner()
let threads = Arc::new(Mutex::new(Vec::new()));
for _x in 0..100 {
let handle = thread::spawn(move || {
println!("{}", _x);
});
threads.lock().unwrap().push(handle);
}
let threads_unwrapped: Vec<JoinHandle<_>> = Arc::try_unwrap(threads).unwrap().into_inner().unwrap();
for t in threads_unwrapped.into_iter() {
t.join().unwrap();
}
Play around with it in this playground to verify.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=9d5635e7f778bc744d1fb855b92db178
while the drain is a good solution, you can also do the following thing
// with a copy
let built_words: Arc<Mutex<Vec<String>>> = Arc::new(Mutex::new(vec![]));
let result: Vec<String> = built_words.lock().unwrap().clone();
// using drain
let mut locked_result = built_words.lock().unwrap();
let mut result: Vec<String> = vec![];
result.extend(locked_result.drain(..));
I would prefer to clone the data to get the original value. Not sure if it has any performance overhead.

Resources