Rust: Reading a stream into a buffer till it is 'complete'

Rust: Reading a stream into a buffer till it is 'complete' - rust

I have a piece of code which uses the following pattern:
let mut request_buf: Vec<u8> = vec![];
let mut buf = [0 as u8; 50]; // 50 byte read buffer
while !req.parse(&request_buf).unwrap().is_complete() {
match stream.read(&mut buf) {
Ok(size) => {
request_buf.extend(&buf[0..size]);
},
Err(_) => {
// Handle err...
}
}
}
Basically the code is supposed to read data in from a socket into the temporary buffer buf and accumulates it till the fn is_complete() returns true.
I'm running into problems with the borrow checker, but I can't think of an alternative. (cannot borrow request_buf as immutable because it is also borrowed as mutable).
Even if I try to clone request_buf to get around this, which is horribly inefficient, i get a temporary value dropped while borrowedno matter how widely I scope the variable for the cloned buffer.
Is this kind of pattern simply impossible to implement in Rust? I feel I've tried everything I can think of and Rust is playing cat and mouse with me at every turn.

The problem is that the Request comes with a lifetime 'b that the buffer must outlive. Thus calling parse() associates the Request object with the buffer, which means it's now req that holds a (shared) borrow of the request_buf. This removes the possibility of getting a mutable borrow of request_buf as long as req lives.
This is not just a whim of the borrow checker: that signature of parse() means that req intends (typically for efficiency) to hold on to slices pointing inside request_buf. A mutable borrow of request_buf would allow you to shrink the container or to grow it past its current capacity, causing it to reallocate. Either scenario would invalidate the slices held by req and cause a crash.
Comments on reddit clarify that the parser stores no parsing state. You're expected to use a large enough buffer that partial parses are rare, and in case they happen, you just start from scratch, presumably by creating a new Request and giving it the whole buffer (the latter of which you were already doing). For example:
fn parse(mut stream: impl Read) {
let mut request_buf: Vec<u8> = vec![];
let mut buf = [0 as u8; 16384]; // 16k read buffer
let (req, headers) = loop {
match stream.read(&mut buf) {
Ok(size) => request_buf.extend(&buf[0..size]),
Err(_) => {} // handle err,
}
let mut headers = [httparse::EMPTY_HEADER; 16];
let mut req = Request::new(&mut headers);
if req.parse(&request_buf).unwrap().is_complete() {
break (req, headers);
}
};
}

Related

Lifetime struggles with "borrowed value does not live long enough" for lazy_static value

Rust newbie here that has been struggling for a full day on how to get the compiler to recognize that the lifetime of a lazy_static struct instance is 'static. A minimal example of what I am trying to do is the following:
use redis::{Client, Connection, PubSub};
use std::sync::Mutex;
#[macro_use]
extern crate lazy_static;
lazy_static! {
static ref REDIS_CLIENT: Mutex<Client> =
Mutex::new(Client::open("redis://127.0.0.1/").unwrap());
static ref RECEIVER_CONNECTIONS: Mutex<Vec<Connection>> = Mutex::new(vec![]);
static ref RECEIVERS: Mutex<Vec<PubSub<'static>>> = Mutex::new(vec![]);
}
pub fn create_receiver() -> u64 {
let client_instance = match REDIS_CLIENT.lock() {
Ok(i) => i,
Err(_) => return 0,
};
let connection: Connection = match client_instance.get_connection() {
Ok(conn) => conn,
Err(_) => return 0,
};
let mut receiver_connections_instance = match RECEIVER_CONNECTIONS.lock() {
Ok(i) => i,
Err(_) => return 0,
};
let receiver_connection_index = receiver_connections_instance.len();
receiver_connections_instance.push(connection);
let receiver_connection = &mut receiver_connections_instance[receiver_connection_index];
let receiver = receiver_connection.as_pubsub();
let mut receivers_instance = match RECEIVERS.lock() {
Ok(i) => i,
Err(_) => return 0,
};
receivers_instance.push(receiver);
let receiver_handle = receivers_instance.len();
receiver_handle.try_into().unwrap()
}
But I am getting the following error:
error[E0597]: `receiver_connections_instance` does not live long enough
--> src/lib.rs:33:36
|
33 | let receiver_connection = &mut receiver_connections_instance[receiver_connection_index];
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ borrowed value does not live long enough
34 | let receiver = receiver_connection.as_pubsub();
| ------------------------------- argument requires that `receiver_connections_instance` is borrowed for `'static`
...
45 | }
| - `receiver_connections_instance` dropped here while still borrowed
I don't understand this because RECEIVER_CONNECTIONS is a lazy_static variable and I don't think my code uses the receiver_connections_instance past the end of the function.
Many thanks and infinite karma to whoever can help me understand what I'm doing wrong here. :)

The problem is that your Connection isn't 'static at the time you invoke as_pubsub(), you access it through a mutex guard with a limited lifetime. As soon as you drop the guard, the connection is no longer exclusively yours, and neither is the PubSub - which is why PubSub<'static> is not allowed. The redis Rust API doesn't seem to allow exactly what you're after (at least without unsafe), because Connection::as_pubsub() requires &mut self, prohibiting you from invoking as_pubsub() directly on a globally stored Connection.
But since your connections are global and never removed anyway, you could simply not store the connection, but "leak" it instead and only store the PubSub. Here leak is meant in a technical sense of creating a value that is allocated and then never dropped, much like a global variable, not to an uncontrolled memory leak that would indicate a bug. Leaking the connection gives you &'static mut Connection which you can use to create a PubSub<'static>, which you can store in a global variable. For example, this compiles:
lazy_static! {
static ref REDIS_CLIENT: Client = Client::open("redis://127.0.0.1/").unwrap();
static ref RECEIVERS: Mutex<Vec<PubSub<'static>>> = Default::default();
}
pub fn create_receiver() -> RedisResult<usize> {
let connection = REDIS_CLIENT.get_connection()?;
let connection = Box::leak(Box::new(connection)); // make it immortal
let mut receivers = RECEIVERS.lock().unwrap();
receivers.push(connection.as_pubsub());
Ok(receivers.len() - 1)
}
Several tangential notes:
redis Client doesn't need to be wrapped in Mutex because get_connection() takes &self.
you don't need to pattern-match every mutex lock - locking can fail only if a thread that held the lock panicked. In that case you most likely want to just propagate the panic, so an unwrap() is appropriate.
using 0 as a special value is not idiomatic Rust, you can use Option<u64> or Result<u64> to signal that a value could not be returned. That allows the function to use the ? operator.
The code above has these improvements applied, resulting in a significantly reduced line count.

The TLDR is that the relevant reference is not 'static because it's tied to the lifetime of the mutex guard. I'll explain the issue by walking through the relevant parts of the code.
You start by locking the RECEIVER_CONNECTIONS mutex, storing the guard in receiver_connections_instance:
let mut receiver_connections_instance = match RECEIVER_CONNECTIONS.lock() {
Ok(i) => i,
Err(_) => return 0,
};
Then you get a mutable reference to data inside the guard, and store it in receiver_connection:
let receiver_connection = &mut receiver_connections_instance[receiver_connection_index];
You then call the as_pubsub() method on receiver_connection and store the result in receiver:
let receiver = receiver_connection.as_pubsub();
The signature of that as_pubsub() method is the following:
fn as_pubsub(&mut self) -> PubSub<'_>
which if we un-elide the lifetimes can be written as
fn as_pubsub<'a>(&'a mut self) -> PubSub<'a>
We can see from the lifetimes that the return type PubSub captures the input lifetime. (This is because PubSub stores the mutable reference inside itself). So all of this means the lifetime of receiver is bound to the lifetime of the mutex guard. The code that follows then tries to store receiver in the static RECEIVERS variable, but that cannot work because receiver cannot outlive the mutex guard receiver_connections_instance, which is dropped at the end of the function.

Lifetimes in lambda-based iterators

My questions seems to be closely related to Rust error "cannot infer an appropriate lifetime for borrow expression" when attempting to mutate state inside a closure returning an Iterator, but I think it's not the same. So, this
use std::iter;
fn example(text: String) -> impl Iterator<Item = Option<String>> {
let mut i = 0;
let mut chunk = None;
iter::from_fn(move || {
if i <= text.len() {
let p_chunk = chunk;
chunk = Some(&text[..i]);
i += 1;
Some(p_chunk.map(|s| String::from(s)))
} else {
None
}
})
}
fn main() {}
does not compile. The compiler says it cannot determine the appropriate lifetime for &text[..i]. This is the smallest example I could come up with. The idea being, there is an internal state, which is a slice of text, and the iterator returns new Strings allocated from that internal state. I'm new to Rust, so maybe it's all obvious, but how would I annotate lifetimes so that this compiles?
Note that this example is different from the linked example, because there point was passed as a reference, while here text is moved. Also, the answer there is one and half years old by now, so maybe there is an easier way.
EDIT: Added p_chunk to emphasize that chunk needs to be persistent across calls to next and so cannot be local to the closure but should be captured by it.

Your code is an example of attempting to create a self-referential struct, where the struct is implicitly created by the closure. Since both text and chunk are moved into the closure, you can think of both as members of a struct. As chunk refers to the contents in text, the result is a self-referential struct, which is not supported by the current borrow checker.
While self-referential structs are unsafe in general due to moves, in this case it would be safe because text is heap-allocated and is not subsequently mutated, nor does it escape the closure. Therefore it is impossible for the contents of text to move, and a sufficiently smart borrow checker could prove that what you're trying to do is safe and allow the closure to compile.
The answer to the [linked question] says that referencing through an Option is possible but the structure cannot be moved afterwards. In my case, the self-reference is created after text and chunk were moved in place, and they are never moved again, so in principle it should work.
Agreed - it should work in principle, but it is well known that the current borrow checker doesn't support it. The support would require multiple new features: the borrow checker should special-case heap-allocated types like Box or String whose moves don't affect references into their content, and in this case also prove that you don't resize or mem::replace() the closed-over String.
In this case the best workaround is the "obvious" one: instead of persisting the chunk slice, persist a pair of usize indices (or a Range) and create the slice when you need it.

If you move the chunk Option into the closure, your code compiles. I can't quite answer why declaring chunk outside the closure results in a lifetime error for the borrow of text inside the closure, but the chunk Option looks superfluous anyways and the following code should be equivalent:
fn example(text: String) -> impl Iterator<Item = Option<String>> {
let mut i = 0;
iter::from_fn(move || {
if i <= text.len() {
let chunk = text[..i].to_string();
i += 1;
Some(Some(chunk))
} else {
None
}
})
}
Additionally, it seems unlikely that you really want an Iterator<Item = Option<String>> here instead of an Iterator<Item<String>>, since the iterator never yields Some(None) anyways.
fn example(text: String) -> impl Iterator<Item = String> {
let mut i = 0;
iter::from_fn(move || {
if i <= text.len() {
let chunk = text[..i].to_string();
i += 1;
Some(chunk)
} else {
None
}
})
}
Note, you can also go about this iterator without allocating a String for each chunk, if you take a &str as an argument and tie the lifetime of the output to the input argument:
fn example<'a>(text: &'a str) -> impl Iterator<Item = &'a str> + 'a {
let mut i = 0;
iter::from_fn(move || {
if i <= text.len() {
let chunk = &text[..i];
i += 1;
Some(chunk)
} else {
None
}
})
}

How to return the captured variable from `FnMut` closure, which is a captor at the same time

I have a function, collect_n, that returns a Future that repeatedly polls a futures::sync::mpsc::Receiver and collects the results into a vector, and resolves with that vector. The problem is that it consumes the Receiver, so that Receiver can't be used again. I'm trying to write a version that takes ownership of the Receiver but then returns ownership back to the caller when the returned Future resolves.
Here's what I wrote:
/// Like collect_n but returns the mpsc::Receiver it consumes so it can be reused.
pub fn collect_n_reusable<T>(
mut rx: mpsc::Receiver<T>,
n: usize,
) -> impl Future<Item = (mpsc::Receiver<T>, Vec<T>), Error = ()> {
let mut events = Vec::new();
future::poll_fn(move || {
while events.len() < n {
let e = try_ready!(rx.poll()).unwrap();
events.push(e);
}
let ret = mem::replace(&mut events, Vec::new());
Ok(Async::Ready((rx, ret)))
})
}
This results in a compile error:
error[E0507]: cannot move out of `rx`, a captured variable in an `FnMut` closure
--> src/test_util.rs:200:26
|
189 | mut rx: mpsc::Receiver<T>,
| ------ captured outer variable
...
200 | Ok(Async::Ready((rx, ret)))
| ^^ move occurs because `rx` has type `futures::sync::mpsc::Receiver<T>`, which does not implement the `Copy` trait
How can I accomplish what I'm trying to do?

It is not possible unless your variable is shareable like Rc or Arc, since FnMut can be called multiple times it is possible that your closure needs to return the captured variable more than one. But after returning you lose the ownership of the variable so you cannot return it back, due to safety Rust doesn't let you do this.
According to your logic we know that once your Future is ready it will not need to be polled again, so we can create a solution without using smart pointer. Lets consider a container object like below, I've used Option:
use futures::sync::mpsc;
use futures::{Future, Async, try_ready};
use futures::stream::Stream;
use core::mem;
pub fn collect_n_reusable<T>(
mut rx: mpsc::Receiver<T>,
n: usize,
) -> impl Future<Item = (mpsc::Receiver<T>, Vec<T>), Error = ()> {
let mut events = Vec::new();
let mut rx = Some(rx); //wrapped with an Option
futures::future::poll_fn(move || {
while events.len() < n {
let e = try_ready!(rx.as_mut().unwrap().poll()).unwrap();//used over an Option
events.push(e);
}
let ret = mem::replace(&mut events, Vec::new());
//We took it from option and returned.
//Careful this will panic if the return line got execute more than once.
Ok(Async::Ready((rx.take().unwrap(), ret)))
})
.fuse()
}
Basically we captured the container not the receiver. So this made compiler happy. I used fuse() to make sure the closure will not be called again after returning Async::Ready, otherwise it would panic for further polls.
The other solution would be using std::mem::swap :
futures::future::poll_fn(move || {
while events.len() < n {
let e = try_ready!(rx.poll()).unwrap();
events.push(e);
}
let mut place_holder_receiver = futures::sync::mpsc::channel(0).1; //creating object with same type to swap with actual one.
let ret = mem::replace(&mut events, Vec::new());
mem::swap(&mut place_holder_receiver, &mut rx); //Swapping the actual receiver with the placeholder
Ok(Async::Ready((place_holder_receiver, ret))) //so we can return placeholder in here
})
.fuse()
Simply I've swapped actual receiver with the place_holder_receiver. Since placeholder is created in FnMut(Not captured) we can return it as many time as we want, so compiler is happy again. Thanks to fuse() this closure will not be called after successful return, otherwise it would return the Receivers with a dropped Sender.
See also:
https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html#method.fuse

Maintain a HashMap of TcpStreams in a loop

I'm writing an application that uses a Distributed Hashtable (DHT) to distribute data to various nodes. When inserting data, I have to loop through it all and write various parts to different nodes. Instead of opening a new TcpStream object for each write, I would like to maintain a map of streams that I can use to write the data as needed. I'm very new to the Rust language and I'm having issues with lifetimes, specifically the errors
cannot borrow 'streams' as mutable because it is already borrowed as mutable
'stream' does not live long enough.
I'm sure there is a fancy Rust way of doing this. The code I'm working with is below.
let mut streams = HashMap::new();
...
//get socket address to send data too
loop {
match streams.get(&socket_addr) {
Some(stream) => {
capnp::serialize::write_message(*stream, &msg_builder).unwrap();
},
None => {
let mut stream = TcpStream::connect(socket_addr).unwrap();
streams.insert(socket_addr, &mut stream);
capnp::serialize::write_message(&mut stream, &msg_builder).unwrap();
}
}
}

You cannot insert a reference to the stream in the HashMap, since the stream is a local variable that goes out of scope at the end of the match expression. The HashMap must own the stream.
The easiest way to implement this is using the entry() method on HashMap to open the stream at first use.
fn main() {
let socket_addr = /* ... */;
let mut streams = HashMap::new();
let msg_builder = /* ... */;
loop {
let stream = streams.entry(&socket_addr).or_insert_with(|| {
TcpStream::connect(socket_addr).unwrap()
});
capnp::serialize::write_message(stream, &msg_builder).unwrap();
}
}

Cannot move data out of a Mutex

Consider the following code example, I have a vector of JoinHandlers in which I need it iterate over to join back to the main thread, however, upon doing so I am getting the error error: cannot move out of borrowed content.
let threads = Arc::new(Mutex::new(Vec::new()));
for _x in 0..100 {
let handle = thread::spawn(move || {
//do some work
}
threads.lock().unwrap().push((handle));
}
for t in threads.lock().unwrap().iter() {
t.join();
}

Unfortunately, you can't do this directly. When Mutex consumes the data structure you fed to it, you can't get it back by value again. You can only get &mut reference to it, which won't allow moving out of it. So even into_iter() won't work - it needs self argument which it can't get from MutexGuard.
There is a workaround, however. You can use Arc<Mutex<Option<Vec<_>>>> instead of Arc<Mutex<Vec<_>>> and then just take() the value out of the mutex:
for t in threads.lock().unwrap().take().unwrap().into_iter() {
}
Then into_iter() will work just fine as the value is moved into the calling thread.
Of course, you will need to construct the vector and push to it appropriately:
let threads = Arc::new(Mutex::new(Some(Vec::new())));
...
threads.lock().unwrap().as_mut().unwrap().push(handle);
However, the best way is to just drop the Arc<Mutex<..>> layer altogether (of course, if this value is not used from other threads).

As referenced in How to take ownership of T from Arc<Mutex<T>>? this is now possible to do without any trickery in Rust using Arc::try_unwrap and Mutex.into_inner()
let threads = Arc::new(Mutex::new(Vec::new()));
for _x in 0..100 {
let handle = thread::spawn(move || {
println!("{}", _x);
});
threads.lock().unwrap().push(handle);
}
let threads_unwrapped: Vec<JoinHandle<_>> = Arc::try_unwrap(threads).unwrap().into_inner().unwrap();
for t in threads_unwrapped.into_iter() {
t.join().unwrap();
}
Play around with it in this playground to verify.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=9d5635e7f778bc744d1fb855b92db178

while the drain is a good solution, you can also do the following thing
// with a copy
let built_words: Arc<Mutex<Vec<String>>> = Arc::new(Mutex::new(vec![]));
let result: Vec<String> = built_words.lock().unwrap().clone();
// using drain
let mut locked_result = built_words.lock().unwrap();
let mut result: Vec<String> = vec![];
result.extend(locked_result.drain(..));
I would prefer to clone the data to get the original value. Not sure if it has any performance overhead.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string