Implementing "move" thread semantics - multithreading

I want to write a function to be called like this:
send("message","address");
Where some other thread that is doing
let k = recv("address");
println!("{}",k);
sees message.
In particular, the message may be large, and so I'd like "move" or "zero-copy" semantics for sending the message.
In C, the solution is something like:
Allocate messages on the heap
Have a global, threadsafe hashmap that maps "address" to some memory location
Write pointers into the memory location on send, and wake up the receiver using a semaphore
Read pointers out of the memory location on receive, and wait on a semaphore to process new messages
But according to another SO question, step #2 "sounds like a bad idea". So I'd like to see a more Rust-idiomatic way to approach this problem.

You get these sort of move semantics automatically, and get achieve light-weight moves by placing large values into a Box (i.e. allocate them on the heap). Using type ConcurrentHashMap<K, V> = Mutex<HashMap<K, V>>; as the threadsafe hashmap (there's various ways this could be improved), one might have:
use std::collections::{HashMap, RingBuf};
use std::sync::Mutex;
type ConcurrentHashMap<K, V> = Mutex<HashMap<K, V>>;
lazy_static! {
pub static ref MAP: ConcurrentHashMap<String, RingBuf<String>> = {
Mutex::new(HashMap::new())
}
}
fn send(message: String, address: String) {
MAP.lock()
// find the place this message goes
.entry(address)
.get()
// create a new RingBuf if this address was empty
.unwrap_or_else(|v| v.insert(RingBuf::new()))
// add the message on the back
.push_back(message)
}
fn recv(address: &str) -> Option<String> {
MAP.lock()
.get_mut(address)
// pull the message off the front
.and_then(|buf| buf.pop_front())
}
That code is using the lazy_static! macro to achieve a global hashmap (it may be better to use a local object that wraps an Arc<ConcurrentHashMap<...>, fwiw, since global state can make reasoning about program behaviour hard). It also uses RingBuf as a queue, so that messages bank up for a given address. If you only wish to support one message at a time, the type could be ConcurrentHashMap<String, String>, send could become MAP.lock().insert(address, message) and recv just MAP.lock().remove(address).
(NB. I haven't compiled this, so the types may not match up precisely.)

Related

Moving context into several closures?

I have found a way to move context into several closures, but it looks ugly. I do it with help of Rc and cloning each variable I need to use for each closure. Particularly I don't like to clone every variable for every closure I want to use:
let mut context = Rc::new( Context { a : 13 } );
..
let context_clone_1 = Rc::clone( &context );
engine.on_event1( Box::new( move ||
{
println!( "on_event1 : {}", context_clone_1.a );
...
let context_clone_2 = Rc::clone( &context );
engine.on_event2( Box::new( move ||
{
println!( "on_event1 : {}", context_clone_1.a );
...
It is an extensive way to go and I feel there must be a better way to do it. Also, uncommenting line // context_clone_1.a += 1; breaks the compilation. What is the proper way of solving problems like this in Rust?
Here is a playground with minimal code.
There are two "problems" here:
Since you specifically asked about context_clone_1.a += 1;: When putting a value into an Rc, there could be multiple references to that value, derived from the independent Rc owners. If mutation was allowed, this would also allow simultaneous mutation and aliasing, which is not allowed in Rust; therefore Rc does not allow mutating its inner value. A common approach to regain mutability is to put the value into a RefCell, which provides mutability through try_borrow_mut() with a runtime check that ensures no aliasing occurs. A Rc<RefCell<T>> is commonly seen in Rust.
Regarding the use of Rc: The way your code is currently set up is actually fine, at least if that's how it should work. The way the code is currently structured allows for flexibility, including cases where multiple Context-objects provide callback implementations on different events. For example, this is currently possible:
let context1 = Context { a : 13 };
engine.on_event1(Box::new(move ||
{
println!("on_event1 : {}", context1.a );
});
let context2 = Context { a : 999 };
engine.on_event2(Box::new(move ||
{
println!("on_event1 : {}", context2.a );
});
In case you have exactly one Context (as in your example), and since the Engine needs to make sure that all callbacks are alive while it itself is alive, you'll need to put each callback - which is structured as a completely separate thing - into a Rc. In your case, all Rc end up pointing to the same object; but they don't have to and this is what your code currently allows for.
A more simple solution would be to define a trait for Context, something along the lines of
trait EventDriver {
fn event1(&mut self, &Engine);
fn event2(&mut self, &Engine);
}
... and then have Context implement the trait. The Engine-struct then becomes generic over E: EventDriver and Context becomes the E in that. This solution only allows for exactly one instance of Context to provide event callbacks. But since Engine is the owner of that object, it can be sure that all callbacks are alive while it itself is alive and the whole Rc-thing goes away.

How can I make this Rust code more idiomatic

Recently I started to learn Rust and one of my main struggles is converting years of Object Oriented thinking into procedural code.
I'm trying to parse a XML that have tags that are processed by an specific handler that can deal with the data it gets from the children.
Further more I have some field members that are common between them and I would prefer not to have to write the same fields to all the handlers.
I tried my hand on it and my code came out like this:
use roxmltree::Node; // roxmltree = "0.14.0"
fn get_data_from(node: &Node) -> String {
let tag_name = get_node_name(node);
let tag_handler: dyn XMLTagHandler = match tag_name {
"name" => NameHandler::new(),
"phone" => PhoneHandler::new(),
_ => DefaultHandler::new()
}
if tag_handler.is_recursive() {
for child in node.children() {
let child_value = get_data_from(&child);
// do something with child value
}
}
let value: String = tag_handler.value()
value
}
// consider that handlers are on my project and can be adapted to my needs, and that XMLTagHandler is the trait that they share in common.
My main issues with this are:
This feels like a Object oriented approach to it;
is_recursive needs to be reimplemented to each struct because they traits cannot have field members, and I will have to add more fields later, which means more boilerplate for each new field;
I could use one type for a Handler and pass to it a function pointer, but this approach seems dirty. e.g.:=> Handler::new(my_other_params, phone_handler_func)
This feels like a Object oriented approach to it
Actually, I don't think so. This code is in clear violation of the Tell-Don't-Ask principle, which falls out from the central idea of object-oriented programming: the encapsulation of data and related behavior into objects. The objects (NameHandler, PhoneHandler, etc.) don't have enough knowledge about what they are to do things on their own, so get_data_from has to query them for information and decide what to do, rather than simply sending a message and letting the object figure out how to deal with it.
So let's start by moving the knowledge about what to do with each kind of tag into the handler itself:
trait XmlTagHandler {
fn foreach_child<F: FnMut(&Node)>(&self, node: &Node, callback: F);
}
impl XmlTagHandler for NameHandler {
fn foreach_child<F: FnMut(&Node)>(&self, _node: &Node, _callback: F) {
// "name" is not a recursive tag, so do nothing
}
}
impl XmlTagHandler for DefaultHandler {
fn foreach_child<F: FnMut(&Node)>(&self, node: &Node, callback: F) {
// all other tags may be recursive
for child in node.children() {
callback(child);
}
}
}
This way you call foreach_child on every kind of Handler, and let the handler itself decide whether the right action is to recurse or not. After all, that's why they have different types -- right?
To get rid of the dyn part, which is unnecessary, let's write a little generic helper function that uses XmlTagHandler to handle one specific kind of tag, and modify get_data_from so it just dispatches to the correct parameterized version of it. (I'll suppose that XmlTagHandler also has a new function so that you can create one generically.)
fn handle_tag<H: XmlTagHandler>(node: &Node) -> String {
let handler = H::new();
handler.foreach_child(node, |child| {
// do something with child value
});
handler.value()
}
fn get_data_from(node: &Node) -> String {
let tag_name = get_node_name(node);
match tag_name {
"name" => handle_tag::<NameHandler>(node),
"phone" => handle_tag::<PhoneHandler>(node),
_ => handle_tag::<DefaultHandler>(node),
}
}
If you don't like handle_tag::<SomeHandler>(node), also consider making handle_tag a provided method of XmlTagHandler, so you can instead write SomeHandler::handle(node).
Note that I have not really changed any of the data structures. Your presumption of an XmlTagHandler trait and various Handler implementors is a pretty normal way to organize code. However, in this case, it doesn't offer any real improvement over just writing three separate functions:
fn get_data_from(node: &Node) -> String {
let tag_name = get_node_name(node);
match tag_name {
"name" => get_name_from(node),
"phone" => get_phone_from(node),
_ => get_other_from(node),
}
}
In some languages, such as Java, all code has to be part of some class – so you can find yourself writing classes that don't exist for any other reason than to group related things together. In Rust you don't need to do this, so make sure that any added complication such as XmlTagHandler is actually pulling its weight.
is_recursive needs to be reimplemented to each struct because they traits cannot have field members, and I will have to add more fields later, which means more boilerplate for each new field
Without more information about the fields, it's impossible to really understand what problem you're facing here; however, in general, if there is a family of structs that have some data in common, you may want to make a generic struct instead of a trait. See the answers to How to reuse codes for Binary Search Tree, Red-Black Tree, and AVL Tree? for more suggestions.
I could use one type for a Handler and pass to it a function pointer, but this approach seems dirty
Elegance is sometimes a useful thing, but it is subjective. I would recommend closures rather than function pointers, but this suggestion doesn't seem "dirty" to me. Making closures and putting them in data structures is a very normal way to write Rust code. If you can elaborate on what you don't like about it, perhaps someone could point out ways to improve it.

How can I transfer some values ​into a Rust generator at each step?

I use generators as long-lived asynchronous threads (see
How to implement a lightweight long-lived thread based on a generator or asynchronous function in Rust?) in a user interaction scenario. I need to pass user input into the generator at each step. I think I can do it with a RefCell, but it is not clear how to transfer the reference to the RefCell inside the generator when creating its instance?
fn user_scenario() -> impl Generator<Yield = String, Return = String> {
|| {
yield format!("what is your name?");
yield format!("{}, how are you feeling?", "anon");
return format!("{}, bye !", "anon");
}
}
The UserData structure contains user input, the second structure contains a user session consisting of UserData and the generator instance. Sessions are collected in a HashMap.
struct UserData {
sid: String,
msg_in: String,
msg_out: String,
}
struct UserSession {
udata_cell: RefCell<UserData>,
scenario: Pin<Box<dyn Generator<Yield = String, Return = String>>>,
}
type UserSessions = HashMap<String, UserSession>;
let mut sessions: UserSessions = HashMap::new();
UserData is created at the time of receiving user input - at this moment I need to send a link to UserData inside the generator, wrapping it in RefCell, but I don’t know how to do it since the generator has a 'static lifetime, and the RefCell lives less!
let mut udata: UserData = read_udata(&mut stream);
let mut session: UserSession;
if udata.sid == "" { //new session
let sid = rnd.gen::<u64>().to_string();
udata.sid = sid.clone();
sessions.insert(
sid.clone(),
UserSession {
udata_cell: RefCell::new(udata),
scenario: Box::pin(user_scenario())
}
);
session = sessions.get_mut(&sid).unwrap();
}
The full code is here, but the generator here does not see user input.
Disclaimer: resumption arguments are a planned extension for generators, so at some point in the future it will be possible to resume the argument with &UserData.
For now, I will recommend sharing ownership. The cost is fairly minor (one memory allocation, one indirection) and will save you a lot of troubles:
struct UserSession {
user_data: Rc<RefCell<UserData>>,
scenario: ..,
}
Which is built with:
let user_data = Rc::new(RefCell::new(udata));
UserSession {
user_data: user_data.clone(),
scenario: Box::pin(user_scenario(user_data))
}
Then, both the session and the generator have access to the UserData each on their turn, and everything is fine.
There is one little wrinkle: be careful of scopes. If you keep a .borrow() alive across a yield point, which is possible, then you will have a run-time error when trying to write to it outside the generator.
A more involved solution would be using a queue of messages; which would also involve memory allocation, etc... I would consider your UserData structure to be a degenerate form of a pair of queues: it's two queues with capacity for one message. You could make it more explicit with a regular queue, but that would not buy you much.

golang threading model comparison

I have a piece of data
type data struct {
// all good data here
...
}
This data is owned by a manager and used by other threads for reading only. The manager needs to periodically update the data. How do I design the threading model for this? I can think of two options:
1.
type manager struct {
// acquire read lock when other threads read the data.
// acquire write lock when manager wants to update.
lock sync.RWMutex
// a pointer holding a pointer to the data
p *data
}
2.
type manager struct {
// copy the pointer when other threads want to use the data.
// When manager updates, just change p to point to the new data.
p *data
}
Does the second approach work? It seems I don't need any lock. If other threads get a pointer pointing to the old data, it would be fine if manager updates the original pointer. As GoLang will do GC, after all other threads read the old data it will be auto released. Am I correct?
Your first option is fine and perhaps simplest to do. However, it could lead to poor performance with many readers as it could struggle to obtain a write lock.
As the comments on your question have stated, your second option (as-is) can cause a race condition and lead to unpredictable behaviour.
You could implement your second option by using atomic.Value. This would allow you to store the pointer to some data struct and atomically update this for the next readers to use. For example:
// Data shared with readers
type data struct {
// all the fields
}
// Manager
type manager struct {
v atomic.Value
}
// Method used by readers to obtain a fresh copy of data to
// work with, e.g. inside loop
func (m *manager) Data() *data {
return m.v.Load().(*data)
}
// Internal method called to set new data for readers
func (m *manager) update() {
d:=&data{
// ... set values here
}
m.v.Store(d)
}

Declaring a map in a separate file and reading its contents

I'm trying to declare a map in a separate file, and then access it from my main function.
I want Rust's equivalent (or whatever comes closest) to this C++ map:
static const std::map<std::string, std::vector<std::string>> table = {
{ "a", { "foo" } },
{ "e", { "bar", "baz" } }
};
This is my attempt in Rust.
table.rs
use std::container::Map;
pub static table: &'static Map<~str, ~[~str]> = (~[
(~"a", ~[~"foo"]),
(~"e", ~[~"bar", ~"baz"])
]).move_iter().collect();
main.rs
mod table;
fn main() {
println(fmt!("%?", table::table));
}
The above gives two compiler errors in table.rs, saying "constant contains unimplemented expression type".
I also have the feeling that the map declaration is less than optimal for the purpose.
Finally, I'm using Rust 0.8.
As Chris Morgan noted, rust doesn't allow you to run user code in order to initialize global variables before main is entered, unlike C++. So you are mostly limited to primitive types that you can initialize with literal expressions. This is, afaik, part of the design and unlikely to change, even though the particular error message is probably not final.
Depending on your use case, you might want to change your code so you're manually passing your map as an argument to all the functions that will want to use it (ugh!), use task-local storage to initialize a tls slot with your map early on and then refer to it later in the same task (ugh?), or use unsafe code and a static mut variable to do much the same with your map wrapped in an Option maybe so it can start its life as None (ugh!).

Resources