How to create a Stream from reading and transforming a file? - rust

I'm trying to read a file, decrypt it, and return the data. Because the file is potentially very big, I want to do this in a stream.
I cannot find a good pattern to implement the stream. I'm trying to do something like this:
let stream = stream::unfold(decrypted_init_length, |decrypted_length| async move {
if decrypted_length < start + length {
let mut encrypted_chunk = vec![0u8; encrypted_block_size];
match f.read(&mut encrypted_chunk[..]) {
Ok(size) => {
if size > 0 {
let decrypted = my_decrypt_fn(&encrypted_chunk[..]);
let updated_decrypted_length = decrypted_length + decrypted.len();
Some((decrypted, updated_decrypted_length))
} else {
None
}
}
Err(e) => {
println!("Error {}", e);
None
}
}
} else {
None
}
});
The problem is that f.read is not allowed in the above async closure with the following error:
89 | | match f.read(&mut encrypted_chunk[..]) {
| | -
| | |
| | move occurs because `f` has type `std::fs::File`, which does not implement the `Copy` trait
| | move occurs due to use in generator
I don't want to open f inside the closure itself. Is there any better way to fix this? I am OK with using a different crate or trait, or method (i.e. not stream::unfold).

I found a solution: using async-stream crate at here.
One of the reasons stream::unfold did not work for me is that the async move closure does not allow access mut variables outside, for example the f file handle.
Now with async-stream, I changed my code to the following, and it works: (note the yield added by this crate).
use async_stream::try_stream;
<snip>
try_stream! {
while decrypted_length < start + length {
match f.read(&mut encrypted_chunk[..]) {
Ok(size) =>
if size > 0 {
println!("read {} bytes", size);
let decrypted = my_decrypt_fn(&encrypted_chunk[..size], ..);
decrypted_length = decrypted_length + decrypted.len();
yield decrypted;
} else {
break
}
Err(e) => {
println!("Error {}", e);
break
}
}
}
}
UPDATE:
I found that async-stream has some limitations that I cannot ignore. I ended up implementing Stream directly and no longer using async-stream. Now my code looks like this:
pub struct DecryptFileStream {
f: File,
<other_fields>,
}
impl Stream for DecryptFileStream {
type Item = io::Result<Vec<u8>>;
fn poll_next(self: Pin<&mut Self>,
_cx: &mut Context<'_>) -> Poll<Option<io::Result<Vec<u8>>>> {
// read the file `f` of self and business_logic
//
if decrypted.len() > 0 {
Poll::Ready(Some(Ok(decrypted)))
} else {
Poll::Ready(None)
}
}
}
//. then use the above stream:
let stream = DecryptFileStream::new(...);
Response::new(Body::wrap_stream(stream))

stream::unfold is only for types that implement Stream, which in Rust is used exclusively for asynchronous programming. If you want to do synchronous reading, what you're calling a "stream" is tagged as implementing Read in Rust. Thus you can call Read::read() to read some data off the current position of your File (limited by the length of the buffer you pass in), and decrypt that data.

Related

Sharing state between threads with notify-rs

i'm new to rust.
I'm trying to write a file_sensor that will start a counter after a file is created. The plan is that after an amount of time, if a second file is not received the sensor will exit with a zero exit code.
I could write the code to continue that work but i feel the code below illustrates the problem (i have also missed for example the post function referred to)
I have been struggling with this problem for several hours, i've tried Arc and mutex's and even global variables.
The Timer implementation is Ticktock-rs
I need to be able to either get heartbeat in the match body for EventKind::Create(CreateKind::Folder) or file_count in the loop
The code i've attached here runs but file_count is always zero in the loop.
use std::env;
use std::path::Path;
use std::{thread, time};
use std::process::ExitCode;
use ticktock::Timer;
use notify::{
Watcher,
RecommendedWatcher,
RecursiveMode,
Result,
event::{EventKind, CreateKind, ModifyKind, Event}
};
fn main() -> Result<()> {
let now = time::Instant::now();
let mut heartbeat = Timer::apply(
|_, count| {
*count += 1;
*count
},
0,
)
.every(time::Duration::from_millis(500))
.start(now);
let mut file_count = 0;
let args = Args::parse();
let REQUEST_SENSOR_PATH = env::var("REQUEST_SENSOR_PATH").expect("$REQUEST_SENSOR_PATH} is not set");
let mut watcher = notify::recommended_watcher(move|res: Result<Event>| {
match res {
Ok(event) => {
match event.kind {
EventKind::Create(CreateKind::File) => {
file_count += 1;
// do something with file
}
_ => { /* something else changed */ }
}
println!("{:?}", event);
},
Err(e) => {
println!("watch error: {:?}", e);
ExitCode::from(101);
},
}
})?;
watcher.watch(Path::new(&REQUEST_SENSOR_PATH), RecursiveMode::Recursive)?;
loop {
let now = time::Instant::now();
if let Some(n) = heartbeat.update(now){
println!("Heartbeat: {}, fileCount: {}", n, file_count);
if n > 10 {
heartbeat.set_value(0);
// This function will reset timer when a file arrives
}
}
}
Ok(())
}
Your compiler warnings show you the problem:
warning: unused variable: `file_count`
--> src/main.rs:31:25
|
31 | file_count += 1;
| ^^^^^^^^^^
|
= note: `#[warn(unused_variables)]` on by default
= help: did you mean to capture by reference instead?
The problem here is that you use file_count inside of a move || closure. file_count is an i32, which is Copy. Using it in a move || closure actually creates a copy of it, which does no longer update the original variable if you assign to it.
Either way, it's impossible to modify a variable in main() from an event handler. Event handlers require 'static lifetime if they reference things, because Rust cannot guarantee that the event handler lives shorter than main.
One solution for this problem is to use reference counters and interior mutability. In this case, I will use Arc for reference counters and AtomicI32 for interior mutability. Note that notify::recommended_watcher requires thread safety, otherwise instead of an Arc<AtomicI32> we could have used an Rc<Cell<i32>>, which is the same thing but only for single-threaded environments, with a little less overhead.
use notify::{
event::{CreateKind, Event, EventKind},
RecursiveMode, Result, Watcher,
};
use std::time;
use std::{env, sync::atomic::Ordering};
use std::{path::Path, sync::Arc};
use std::{process::ExitCode, sync::atomic::AtomicI32};
use ticktock::Timer;
fn main() -> Result<()> {
let now = time::Instant::now();
let mut heartbeat = Timer::apply(
|_, count| {
*count += 1;
*count
},
0,
)
.every(time::Duration::from_millis(500))
.start(now);
let file_count = Arc::new(AtomicI32::new(0));
let REQUEST_SENSOR_PATH =
env::var("REQUEST_SENSOR_PATH").expect("$REQUEST_SENSOR_PATH} is not set");
let mut watcher = notify::recommended_watcher({
let file_count = Arc::clone(&file_count);
move |res: Result<Event>| {
match res {
Ok(event) => {
match event.kind {
EventKind::Create(CreateKind::File) => {
file_count.fetch_add(1, Ordering::AcqRel);
// do something with file
}
_ => { /* something else changed */ }
}
println!("{:?}", event);
}
Err(e) => {
println!("watch error: {:?}", e);
ExitCode::from(101);
}
}
}
})?;
watcher.watch(Path::new(&REQUEST_SENSOR_PATH), RecursiveMode::Recursive)?;
loop {
let now = time::Instant::now();
if let Some(n) = heartbeat.update(now) {
println!(
"Heartbeat: {}, fileCount: {}",
n,
file_count.load(Ordering::Acquire)
);
if n > 10 {
heartbeat.set_value(0);
// This function will reset timer when a file arrives
}
}
}
}
Also, note that the ExitCode::from(101); gives you a warning. It does not actually exit the program, it only creates an exit code variable and then discards it again. You probably intended to write std::process::exit(101);. Although I would discourage it, because it does not properly clean up (does not call any Drop implementations). I'd use panic here, instead. This is the exact usecase panic is meant for.

Clean way to get Option::unwrap_or_else behaviour with an Option<&T>

I would like to know if there's any elegant solution for getting code/behaviour similar to unwrap_or_else on an Option<&T>. My use case is to pass an optional reference to a function and if it's not used then create a default value of the same type to use. Here's a boiled-down version of my code:
#[derive(Debug)]
struct ExpensiveUnclonableThing {}
fn make_the_thing() -> ExpensiveUnclonableThing {
// making the thing is slow
// ...
ExpensiveUnclonableThing {}
}
fn use_the_thing(thing_ref: &ExpensiveUnclonableThing) {
dbg!(thing_ref);
}
fn use_or_default(thing_ref_opt: Option<&ExpensiveUnclonableThing>) {
enum MaybeDefaultedRef<'a> {
Passed(&'a ExpensiveUnclonableThing),
Defaulted(ExpensiveUnclonableThing),
}
let thing_md = match thing_ref_opt {
Some(thing_ref) => MaybeDefaultedRef::Passed(thing_ref),
None => MaybeDefaultedRef::Defaulted(make_the_thing()),
};
let thing_ref = match &thing_md {
MaybeDefaultedRef::Passed(thing) => thing,
MaybeDefaultedRef::Defaulted(thing) => thing,
};
use_the_thing(thing_ref);
}
fn use_or_default_nicer(thing_ref_opt: Option<&ExpensiveUnclonableThing>) {
let thing_ref = thing_ref_opt.unwrap_or_else(|| &make_the_thing());
use_the_thing(thing_ref);
}
fn main() {
let thing = make_the_thing();
use_or_default(Some(&thing));
use_or_default(None);
use_or_default_nicer(Some(&thing));
use_or_default_nicer(None);
}
The thing is dropped right away when the unwrap_or_else closure ends, so I of course get an error stating that I can't do that:
error[E0515]: cannot return reference to temporary value
--> src/main.rs:31:53
|
31 | let thing_ref = thing_ref_opt.unwrap_or_else(|| &make_the_thing());
| ^----------------
| ||
| |temporary value created here
| returns a reference to data owned by the current function
What is the 'idiomatic Rust' way of writing use_or_default? Is there a way I can get it to look similar to how use_or_default_nicer is implemented other than by creating a generic MaybeDefaultedRef<T> type + with some convenience methods? I am open to refactoring the whole thing if there's a better way.
You can write something like this:
fn use_or_default_nicer(thing_ref_opt: Option<&ExpensiveUnclonableThing>) {
let mut maybe = None;
let thing_ref = thing_ref_opt.unwrap_or_else(
|| maybe.insert(make_the_thing())
);
use_the_thing(thing_ref);
}
That is, you can keep the value itself outside of the function and then assign to it if necessary. Unfortunately, an unitialized value cannot be capture by a lambda so you have to make the variable Option<ExpensiveUnclonableThing> and initialize with None.
But in a real code of mine, I had the same issue and I wrote a manual match:
fn use_or_default_nicer(thing_ref_opt: Option<&ExpensiveUnclonableThing>) {
let maybe;
let thing_ref = match thing_ref_opt {
Some(x) => x,
None => {
maybe = make_the_thing();
&maybe
}
};
use_the_thing(thing_ref);
}
In my opinion this is nicer even if a bit longer, because you don't need the Option<_> or the maybe variable being mutable` or the fake initialization.
Some people feel a bit of a defeat when they match on an Option, and think it is un-idiomatic, but I don't particularly care.
A plain old if/else would do also, no need to convolute things:
fn use_or_default_nicer(thing_ref_opt: Option<&ExpensiveUnclonableThing>) {
if let Some(e) = thing_ref_opt {
use_the_thing(e);
} else {
let e = make_the_thing();
use_the_thing(&e);
}
}
Playground

Why does match will not release the mutable borrow until end of it's expression?

I want to return client, In this function in any circumstances the code will not continue after match so rust should allow returning client.
pub async fn call_query2(mut client: Client<Compat<TcpStream>>, query:&str) -> Result<(Vec<tiberius::Row>,Client<Compat<TcpStream>>),(tiberius::error::Error,Client<Compat<TcpStream>>)> {
match client.query(query, &[]).await {
Ok(stream) =>{
match stream.into_first_result().await {
Ok(rows) => Ok((rows,client)),
Err(e) => Err((e,client))
}
},
Err(e) => Err((e,client))
}
}
but the compiler return this error message:
match client.query(query, &[]).await {
| ------------------------------
| |
| borrow of `client` occurs here
| a temporary with access to the borrow is created here ...
...
102 | Err(e) => Err((e,client))
| ^^^^^^ move out of `client` occurs here
103 | }
104 | }
| - ... and the borrow might be used here, when that temporary is dropped and runs the destructor for type `Result<QueryStream<'_>, tiberius::error::Error>`
It seems that match will not release the mutable borrow until end of it's expression, because this code will work but I'm not able to return client:
pub async fn call_query(mut client: Client<Compat<TcpStream>>, query:&str) -> Result<(Vec<tiberius::Row>,Client<Compat<TcpStream>>),tiberius::error::Error> {
let stream = client.query(query, &[]).await?;
return Ok((stream.into_first_result().await?, client));
}
Any idea?
The main thing is that the temporary value that is created with the match expression is not dropped (through the Drop trait) until after the match-expression. In a sense you can think of the code being something like this:
pub async fn call_query2(mut client: Client<Compat<TcpStream>>, query:&str) -> Result<(Vec<tiberius::Row>,Client<Compat<TcpStream>>), (tiberius::error::Error,Client<Compat<TcpStream>>)> {
let result;
let tmp = client.query(query, &[]).await;
match tmp {
Ok(stream) => {
match stream.into_first_result().await {
Ok(rows) => result = Ok((rows,client)),
Err(e) => result = Err((e,client))
}
},
Err(e) => result = Err((e,client))
}
// tmp.drop(); happens here implicitly
return result;
}
Note that the implicit call tmp.drop() here theoretically might need access to client from the borrow checker's perspective.
Your other example works because you're basically dropping result before the return statement. Conceptually something like this:
pub async fn call_query(mut client: Client<Compat<TcpStream>>, query:&str) -> Result<(Vec<tiberius::Row>,Client<Compat<TcpStream>>),tiberius::error::Error> {
let result = client.query(query, &[]).await;
if let Err(e) = result {
return Err( e );
}
let stream = result.unwrap();
return Ok((stream.into_first_result().await?, client));
}
Note that you couldn't return an Err( (e,client) ) inside the if here either or you'd get again the same error from the borrow checker, since result hasn't been dropped yet.
That being said -- why would you want to return client in the first place? Probably because you want to use client in the calling code again. But then your function shouldn't require the caller to give up ownership to client in the first place. Just change mut client: ... in the signature of your function to client: &mut ... and remove Client from the return value type like this:
pub async fn call_query2(client: &mut Client<Compat<TcpStream>>, query:&str) -> Result<Vec<tiberius::Row>, tiberius::error::Error> {
// same code as before, but changing the result so that
// it doesn't return client anymore, i.e,
// Ok( (rows,client) ) => Ok(rows) and same for Err
}
Now your calling code can still refer to client without needing it "passed back" from you function. I.e., you go from
if let Ok( (rows, old_new_client) ) = call_query(client, query) {
old_new_client.whatever();
}
to a much nicer
if let Ok(rows) = call_query2(&mut client, query) {
client.whatever();
}

Dispatching Rust TcpStream from TcpListener to separate method in loop

I'm new to Rust and have the following code:
while let Some(tcp_stream) = incoming.next() {
match tcp_stream {
Ok(s) => {
handle_request(tcp_stream.unwrap(), &routes);
}
Err(ref e) if e.kind() == io::ErrorKind::WouldBlock => {
continue;
}
Err(e) => panic!("Error"),
};
}
And I get the following error:
use of moved value: `tcp_stream`
value used here after partial move
note: move occurs because value has type `std::net::TcpStream`, which does not implement the `Copy` trait rustc(E0382)
lib.rs(72, 20): value moved here
lib.rs(73, 36): value used here after partial move
What I'm looking to do is handle the new connection/TcpStream in a separate method handle_request but I'm unsure how to go about doing this. Passing a reference gives the same issue. I tried implementing the Copy trait for TcpStream but since I'm outside of its crate I can't seem to do that.
Any pointers would be greatly appreciated!
One fix is to perform error checking in one arm:
async fn listen(self, listener: TcpListener) {
let routes = self.routes;
let mut incoming = listener.incoming();
while let Some(tcp_stream) = incoming.next() {
match tcp_stream {
Ok(s) => {
handle_request(tcp_stream.unwrap(), &routes);
}
Err(e) => {
if e.kind() == io::ErrorKind::WouldBlock {
continue;
}
panic!("Error");
},
};
}
}

How to return an error from FuturesUnordered?

I have a set of futures to be run in parallel and if one fails I would like to get the error to return to the caller.
Here is what I have been testing so far:
use futures::prelude::*;
use futures::stream::futures_unordered::FuturesUnordered;
use futures::{future, Future};
fn main() {
let tasks: FuturesUnordered<_> = (1..10).map(|_| async_func(false)).collect();
let mut runtime = tokio::runtime::Runtime::new().expect("Unable to start runtime");
let res = runtime.block_on(tasks.into_future());
if let Err(_) = res {
println!("err");
}
}
fn async_func(success: bool) -> impl Future<Item = (), Error = String> {
if success {
future::ok(())
} else {
future::err("Error".to_string())
}
}
How can I get the error from any failed futures? Even better would be to stop running any pending futures if a single future fails.
Your code is already returning and handling the error. If you attempted to use the error, the compiler will quickly direct you to the solution:
if let Err(e) = res {
println!("err: {}", e);
}
error[E0277]: `(std::string::String, futures::stream::futures_unordered::FuturesUnordered<impl futures::future::Future>)` doesn't implement `std::fmt::Display`
--> src/main.rs:12:29
|
12 | println!("err: {}", e);
| ^ `(std::string::String, futures::stream::futures_unordered::FuturesUnordered<impl futures::future::Future>)` cannot be formatted with the default formatter
|
= help: the trait `std::fmt::Display` is not implemented for `(std::string::String, futures::stream::futures_unordered::FuturesUnordered<impl futures::future::Future>)`
= note: in format strings you may be able to use `{:?}` (or {:#?} for pretty-print) instead
= note: required by `std::fmt::Display::fmt`
The Err value is a tuple of your error and the original stream to continue pulling after you have dealt with the error. This is what Stream::into_future / StreamFuture does.
Access the first value in the tuple to get to the error:
if let Err((e, _)) = res {
println!("err: {}", e);
}
If you want to see all of the values, you could keep polling the stream over and over (but don't do this because it's probably inefficient):
let mut f = tasks.into_future();
loop {
match runtime.block_on(f) {
Ok((None, _)) => {
println!("Stream complete");
break;
}
Ok((Some(v), next)) => {
println!("Success: {:?}", v);
f = next.into_future();
}
Err((e, next)) => {
println!("Error: {:?}", e);
f = next.into_future();
}
}
}

Resources