Maintain a HashMap of TcpStreams in a loop

Maintain a HashMap of TcpStreams in a loop - rust

I'm writing an application that uses a Distributed Hashtable (DHT) to distribute data to various nodes. When inserting data, I have to loop through it all and write various parts to different nodes. Instead of opening a new TcpStream object for each write, I would like to maintain a map of streams that I can use to write the data as needed. I'm very new to the Rust language and I'm having issues with lifetimes, specifically the errors
cannot borrow 'streams' as mutable because it is already borrowed as mutable
'stream' does not live long enough.
I'm sure there is a fancy Rust way of doing this. The code I'm working with is below.
let mut streams = HashMap::new();
...
//get socket address to send data too
loop {
match streams.get(&socket_addr) {
Some(stream) => {
capnp::serialize::write_message(*stream, &msg_builder).unwrap();
},
None => {
let mut stream = TcpStream::connect(socket_addr).unwrap();
streams.insert(socket_addr, &mut stream);
capnp::serialize::write_message(&mut stream, &msg_builder).unwrap();
}
}
}

You cannot insert a reference to the stream in the HashMap, since the stream is a local variable that goes out of scope at the end of the match expression. The HashMap must own the stream.
The easiest way to implement this is using the entry() method on HashMap to open the stream at first use.
fn main() {
let socket_addr = /* ... */;
let mut streams = HashMap::new();
let msg_builder = /* ... */;
loop {
let stream = streams.entry(&socket_addr).or_insert_with(|| {
TcpStream::connect(socket_addr).unwrap()
});
capnp::serialize::write_message(stream, &msg_builder).unwrap();
}
}

Related

How to share a User Hashmap between different socket processing in Rust?

Hi I am new into Rust and I am learning Async(tokio crate) and Ownership. In order to do that I am developing a chat server for which I want to have some basic Log In options. For that I use a HashMap saving the user as key and password as value. The code goes like this:
async fn main(){
let mut users_map: HashMap<String,String> = HashMap::new();
let listener = TcpListener::bind("localhost:8881").await.unwrap();
//stuff
loop{
let (mut socket, addr) = listener.accept().await.unwrap();
//more stuff
tokio::spawn( async move {
if users_map.contains_key(&user)==true{ //Here is the problem
//more stuff
users_map.insert(user, password);
}
}
}
}
So according to what I read in the Rust book, when I use that if, the value moves to that statement so I cant lose it out of that scope. But then how can I do operations with the Hashmap defined for all the scope? I tried cloning, but if I clone it, I would create a Hashmap for each connection acceptance, so that is really bad, because I want a shared Hashmap for all the 'tasks'.
Thanks

You can use Arc, and DashMap. DashMap is from dashmap crate. e.g.:
let users_map = Arc::new(DashMap::<String, String>::new());
...
loop {
let map_clone = users_map.clone();
tokio::spawn( async move {
if map_clone.contains_key(&user)==true{ //Here is the problem
//more stuff
map_clone.insert(user, password);
}
}
...

Waiting on multiple futures borrowing mutable self

Each of the following methods need (&mut self) to operate. The following code gives the error.
cannot borrow *self as mutable more than once at a time
How can I achieve this correctly?
loop {
let future1 = self.handle_new_connections(sender_to_connector.clone());
let future2 = self.handle_incoming_message(&mut receiver_from_peers);
let future3 = self.handle_outgoing_message();
tokio::pin!(future1, future2, future3);
tokio::select! {
_=future1=>{},
_=future2=>{},
_=future3=>{}
}
}

You are not allowed to have multiple mutable references to an object and there's a good reason for that.
Imagine you pass an object mutably to 2 different functions and they edited the object out of sync since you don't have any mechanism for that in place. then you'd end up with something called a race condition.
To prevent this bug rust allows only one mutable reference to an object at a time but you can have multiple immutable references and often you see people use internal mutability patterns.
In your case, you want data not to be able to be modified by 2 different threads at the same time so you'd wrap it in a Lock or RwLock then since you want multiple threads to be able to own this value you'd wrap that in an Arc.
here you can read about interior mutability in more detail.
Alternatively, while declaring the type of your function you could add proper lifetimes to indicate the resulting Future will be waited on in the same context by giving it a lifetime since your code waits for the future before the next iteration that would do the trick as well.

I encountered the same problem when dealing with async code. Here is what I figured out:
Let's say you have an Engine, that contains both incoming and outgoing:
struct Engine {
log: Arc<Mutex<Vec<String>>>,
outgoing: UnboundedSender<String>,
incoming: UnboundedReceiver<String>,
}
Our goal is to create two functions process_incoming and process_logic and then poll them simultaneously without messing up with the borrow checker in Rust.
What is important here is that:
You cannot pass &mut self to these async functions simultaneously.
Either incoming or outgoing will be only held by one function at most.
The data access by both process_incoming and process_logic need to be wrapped by a lock.
Any trying to lock Engine directly will lead to a deadlock at runtime.
So that leaves us giving up using the method in favor of the associated function:
impl Engine {
// ...
async fn process_logic(outgoing: &mut UnboundedSender<String>, log: Arc<Mutex<Vec<String>>>) {
loop {
Delay::new(Duration::from_millis(1000)).await.unwrap();
let msg: String = "ping".into();
println!("outgoing: {}", msg);
log.lock().push(msg.clone());
outgoing.send(msg).await.unwrap();
}
}
async fn process_incoming(
incoming: &mut UnboundedReceiver<String>,
log: Arc<Mutex<Vec<String>>>,
) {
while let Some(msg) = incoming.next().await {
println!("incoming: {}", msg);
log.lock().push(msg);
}
}
}
And we can then write main as:
fn main() {
futures::executor::block_on(async {
let mut engine = Engine::new();
let a = Engine::process_incoming(&mut engine.incoming, engine.log.clone()).fuse();
let b = Engine::process_logic(&mut engine.outgoing, engine.log).fuse();
futures::pin_mut!(a, b);
select! {
_ = a => {},
_ = b => {},
}
});
}
I put the whole example here.
It's a workable solution, only be aware that you should add futures and futures-timer in your dependencies.

Rust: Reading a stream into a buffer till it is 'complete'

I have a piece of code which uses the following pattern:
let mut request_buf: Vec<u8> = vec![];
let mut buf = [0 as u8; 50]; // 50 byte read buffer
while !req.parse(&request_buf).unwrap().is_complete() {
match stream.read(&mut buf) {
Ok(size) => {
request_buf.extend(&buf[0..size]);
},
Err(_) => {
// Handle err...
}
}
}
Basically the code is supposed to read data in from a socket into the temporary buffer buf and accumulates it till the fn is_complete() returns true.
I'm running into problems with the borrow checker, but I can't think of an alternative. (cannot borrow request_buf as immutable because it is also borrowed as mutable).
Even if I try to clone request_buf to get around this, which is horribly inefficient, i get a temporary value dropped while borrowedno matter how widely I scope the variable for the cloned buffer.
Is this kind of pattern simply impossible to implement in Rust? I feel I've tried everything I can think of and Rust is playing cat and mouse with me at every turn.

The problem is that the Request comes with a lifetime 'b that the buffer must outlive. Thus calling parse() associates the Request object with the buffer, which means it's now req that holds a (shared) borrow of the request_buf. This removes the possibility of getting a mutable borrow of request_buf as long as req lives.
This is not just a whim of the borrow checker: that signature of parse() means that req intends (typically for efficiency) to hold on to slices pointing inside request_buf. A mutable borrow of request_buf would allow you to shrink the container or to grow it past its current capacity, causing it to reallocate. Either scenario would invalidate the slices held by req and cause a crash.
Comments on reddit clarify that the parser stores no parsing state. You're expected to use a large enough buffer that partial parses are rare, and in case they happen, you just start from scratch, presumably by creating a new Request and giving it the whole buffer (the latter of which you were already doing). For example:
fn parse(mut stream: impl Read) {
let mut request_buf: Vec<u8> = vec![];
let mut buf = [0 as u8; 16384]; // 16k read buffer
let (req, headers) = loop {
match stream.read(&mut buf) {
Ok(size) => request_buf.extend(&buf[0..size]),
Err(_) => {} // handle err,
}
let mut headers = [httparse::EMPTY_HEADER; 16];
let mut req = Request::new(&mut headers);
if req.parse(&request_buf).unwrap().is_complete() {
break (req, headers);
}
};
}

Rust chunks method with owned values?

I'm trying to perform a parallel operation on several chunks of strings at a time, and I'm finding having an issue with the borrow checker:
(for context, identifiers is a Vec<String> from a CSV file, client is reqwest and target is an Arc<String> that is write once read many)
use futures::{stream, StreamExt};
use std::sync::Arc;
async fn nop(
person_ids: &[String],
target: &str,
url: &str,
) -> String {
let noop = format!("{} {}", target, url);
let noop2 = person_ids.iter().for_each(|f| {f.as_str();});
"Some text".into()
}
#[tokio::main]
async fn main() {
let target = Arc::new(String::from("sometext"));
let url = "http://example.com";
let identifiers = vec!["foo".into(), "bar".into(), "baz".into(), "qux".into(), "quux".into(), "quuz".into(), "corge".into(), "grault".into(), "garply".into(), "waldo".into(), "fred".into(), "plugh".into(), "xyzzy".into()];
let id_sets: Vec<&[String]> = identifiers.chunks(2).collect();
let responses = stream::iter(id_sets)
.map(|person_ids| {
let target = target.clone();
tokio::spawn( async move {
let resptext = nop(person_ids, target.as_str(), url).await;
})
})
.buffer_unordered(2);
responses
.for_each(|b| async { })
.await;
}
Playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=e41c635e99e422fec8fc8a581c28c35e
Given chunks yields a Vec<&[String]>, the compiler complains that identifiers doesn't live long enough because it potentially goes out of scope while the slices are being referenced. Realistically this won't happen because there's an await. Is there a way to tell the compiler that this is safe, or is there another way of getting chunks as a set of owned Strings for each thread?
There was a similarly asked question that used into_owned() as a solution, but when I try that, rustc complains about the slice size not being known at compile time in the request_user function.
EDIT: Some other questions as well:
Is there a more direct way of using target in each thread without needing Arc? From the moment it is created, it never needs to be modified, just read from. If not, is there a way of pulling it out of the Arc that doesn't require the .as_str() method?
How do you handle multiple error types within the tokio::spawn() block? In the real world use, I'm going to receive quick_xml::Error and reqwest::Error within it. It works fine without tokio spawn for concurrency.

Is there a way to tell the compiler that this is safe, or is there another way of getting chunks as a set of owned Strings for each thread?
You can chunk a Vec<T> into a Vec<Vec<T>> without cloning by using the itertools crate:
use itertools::Itertools;
fn main() {
let items = vec![
String::from("foo"),
String::from("bar"),
String::from("baz"),
];
let chunked_items: Vec<Vec<String>> = items
.into_iter()
.chunks(2)
.into_iter()
.map(|chunk| chunk.collect())
.collect();
for chunk in chunked_items {
println!("{:?}", chunk);
}
}
["foo", "bar"]
["baz"]
This is based on the answers here.

Your issue here is that the identifiers are a Vector of references to a slice. They will not necessarily be around once you've left the scope of your function (which is what async move inside there will do).
Your solution to the immediate problem is to convert the Vec<&[String]> to a Vec<Vec<String>> type.
A way of accomplishing that would be:
let id_sets: Vec<Vec<String>> = identifiers
.chunks(2)
.map(|x: &[String]| x.to_vec())
.collect();

Cannot move data out of a Mutex

Consider the following code example, I have a vector of JoinHandlers in which I need it iterate over to join back to the main thread, however, upon doing so I am getting the error error: cannot move out of borrowed content.
let threads = Arc::new(Mutex::new(Vec::new()));
for _x in 0..100 {
let handle = thread::spawn(move || {
//do some work
}
threads.lock().unwrap().push((handle));
}
for t in threads.lock().unwrap().iter() {
t.join();
}

Unfortunately, you can't do this directly. When Mutex consumes the data structure you fed to it, you can't get it back by value again. You can only get &mut reference to it, which won't allow moving out of it. So even into_iter() won't work - it needs self argument which it can't get from MutexGuard.
There is a workaround, however. You can use Arc<Mutex<Option<Vec<_>>>> instead of Arc<Mutex<Vec<_>>> and then just take() the value out of the mutex:
for t in threads.lock().unwrap().take().unwrap().into_iter() {
}
Then into_iter() will work just fine as the value is moved into the calling thread.
Of course, you will need to construct the vector and push to it appropriately:
let threads = Arc::new(Mutex::new(Some(Vec::new())));
...
threads.lock().unwrap().as_mut().unwrap().push(handle);
However, the best way is to just drop the Arc<Mutex<..>> layer altogether (of course, if this value is not used from other threads).

As referenced in How to take ownership of T from Arc<Mutex<T>>? this is now possible to do without any trickery in Rust using Arc::try_unwrap and Mutex.into_inner()
let threads = Arc::new(Mutex::new(Vec::new()));
for _x in 0..100 {
let handle = thread::spawn(move || {
println!("{}", _x);
});
threads.lock().unwrap().push(handle);
}
let threads_unwrapped: Vec<JoinHandle<_>> = Arc::try_unwrap(threads).unwrap().into_inner().unwrap();
for t in threads_unwrapped.into_iter() {
t.join().unwrap();
}
Play around with it in this playground to verify.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=9d5635e7f778bc744d1fb855b92db178

while the drain is a good solution, you can also do the following thing
// with a copy
let built_words: Arc<Mutex<Vec<String>>> = Arc::new(Mutex::new(vec![]));
let result: Vec<String> = built_words.lock().unwrap().clone();
// using drain
let mut locked_result = built_words.lock().unwrap();
let mut result: Vec<String> = vec![];
result.extend(locked_result.drain(..));
I would prefer to clone the data to get the original value. Not sure if it has any performance overhead.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Maintain a HashMap of TcpStreams in a loop - rust

Related

How to share a User Hashmap between different socket processing in Rust?

Waiting on multiple futures borrowing mutable self

Rust: Reading a stream into a buffer till it is 'complete'

Rust chunks method with owned values?

Cannot move data out of a Mutex

Categories

Resources