Related
I'm following the example from the official documentation. I'll copy the code here for simplicity:
#[derive(Debug, PartialEq)]
pub struct Foo {
name: String,
list: Vec<u8>,
}
let foo = {
let mut uninit: MaybeUninit<Foo> = MaybeUninit::uninit();
let ptr = uninit.as_mut_ptr();
// Initializing the `name` field
// Using `write` instead of assignment via `=` to not call `drop` on the
// old, uninitialized value.
unsafe { addr_of_mut!((*ptr).name).write("Bob".to_string()); }
// Initializing the `list` field
// If there is a panic here, then the `String` in the `name` field leaks.
unsafe { addr_of_mut!((*ptr).list).write(vec![0, 1, 2]); }
// All the fields are initialized, so we call `assume_init` to get an initialized Foo.
unsafe { uninit.assume_init() }
};
What bothers me is the second unsafe comment: If there is a panic here, then the String in the name field leaks. This is exactly what I want to avoid. I modified the example so now it reflects my concerns:
use std::mem::MaybeUninit;
use std::ptr::addr_of_mut;
#[derive(Debug, PartialEq)]
pub struct Foo {
name: String,
list: Vec<u8>,
}
#[allow(dead_code)]
fn main() {
let mut uninit: MaybeUninit<Foo> = MaybeUninit::uninit();
let ptr = uninit.as_mut_ptr();
init_foo(ptr);
// this is wrong because it tries to read the uninitialized field
// I could avoid this call if the function `init_foo` returns a `Result`
// but I'd like to know which fields are initialized so I can cleanup
let _foo = unsafe { uninit.assume_init() };
}
fn init_foo(foo_ptr: *mut Foo) {
unsafe { addr_of_mut!((*foo_ptr).name).write("Bob".to_string()); }
// something happened and `list` field left uninitialized
return;
}
The code builds and runs. But using MIRI I see the error:
Undefined Behavior: type validation failed at .value.list.buf.ptr.pointer: encountered uninitialized raw pointer
The question is how I can figure out which fields are initialized and which are not? Sure, I could return a result with the list of field names or similar, for example. But I don't want to do it - my struct can have dozens of fields, it changes over time and I'm too lazy to maintain an enum that should reflect the fields. Ideally I'd like to have something like this:
if addr_initialized!((*ptr).name) {
clean(addr_of_mut!((*ptr).name));
}
Update: Here's an example of what I want to achieve. I'm doing some Vulkan programming (with ash crate, but that's not important). I want to create a struct that holds all the necessary objects, like Device, Instance, Surface, etc.:
struct VulkanData {
pub instance: Instance,
pub device: Device,
pub surface: Surface,
// 100500 other fields
}
fn init() -> Result<VulkanData, Error> {
// let vulkan_data = VulkanData{}; // can't do that because some fields are not default constructible.
let instance = create_instance(); // can fail
let device = create_device(instance); // can fail, in this case instance have to be destroyed
let surface = create_surface(device); // can fail, in this case instance and device have to be destroyed
//other initialization routines
VulkanData{instance, device, surface, ...}
}
As you can see, for every such object, there's a corresponding create_x function, which can fail. Obviously, if I fail in the middle of the process, I don't want to proceed. But I want to clear already created objects. As you mentioned, I could create a wrapper. But it's very tedious work to create wrappers for hundreds of types, I absolutely want to avoid this (btw, ash is already a wrapper over C-types). Moreover, because of the asynchronous nature of CPU-GPU communication, sometimes it makes no sense to drop an object, it can lead to errors. Instead, some form of a signal should come from the GPU that indicates that an object is safe to destroy. That's the main reason why I can't implement Drop for the wrappers.
But as soon as the struct is successfully initialized I know that it's safe to read any of its fields. That's why don't want to use an Option - it adds some overhead and makes no sense in my particular example.
All that is trivially achievable in C++ - create an uninitialized struct (well, by default all Vulkan objects are initialized with VK_NULL_HANDLE), start to fill it field-by-field, if something went wrong just destroy the objects that are not null.
There is no general purpose way to tell if something is initialized or not. Miri can detect this because it adds a lot of instrumentation and overhead to track memory operations.
All that is trivially achievable in C++ - create an uninitialized struct (well, by default all Vulkan objects are initialized with VK_NULL_HANDLE), start to fill it field-by-field, if something went wrong just destroy the objects that are not null.
You could theoretically do the same in Rust, however this is quite unsafe and makes a lot of assumptions about the construction of the ash types.
If the functions didn't depend on each other, I might suggest something like this:
let instance = create_instance();
let device = create_device();
let surface = create_surface();
match (instance, device, surface) {
(Ok(instance), Ok(device), Ok(surface)) => {
Ok(VulkanData{
instance,
device,
surface,
})
}
instance, device, surface {
// clean up the `Ok` ones and return some error
}
}
However, your functions are dependent on others succeeding (e.g. need the Instance to create a Device) and this also has the disadvantage that it would keep creating values when one already failed.
Creating wrappers with custom drop behavior is the most robust way to accomplish this. There is the vulkano crate that is built on top of ash that does this among other things. But if that's not to your liking you can use something like scopeguard to encapsulate drop logic on the fly.
use scopeguard::{guard, ScopeGuard}; // 1.1.0
fn init() -> Result<VulkanData, Error> {
let instance = guard(create_instance()?, destroy_instance);
let device = guard(create_device(&instance)?, destroy_device);
let surface = guard(create_surface(&device)?, destroy_surface);
Ok(VulkanData {
// use `into_inner` to escape the drop behavior
instance: ScopeGuard::into_inner(instance),
device: ScopeGuard::into_inner(device),
surface: ScopeGuard::into_inner(surface),
})
}
See a full example on the playground. No unsafe required.
I believe MaybeUninit is designed for the cases when you have all the information about its contents and can make the code safe "by hand".
If you need to figure out in runtime if a field has a value, then use Option<T>.
Per the documentation:
You can think of MaybeUninit<T> as being a bit like Option<T> but without any of the run-time tracking and without any of the safety checks.
I'm trying to create async Reader and Writer in tokio, these require Send, and must be thread safe. (does not seem to be a way to write single threaded tokio code that avoids mutexts)
The reader and writer both need to interact, e.g read data might result in a response.
I would like both Reader and Writer to have a thread safe pointer to a Session that can ensure communication between the two.
/// function on the
impl Session {
pub fn split(&mut self, sock: TcpStream) -> (Reader, Writer) {
let (read_half, write_half) = sock.split();
let session = Arc::new(RwLock::new(self)); // <- expected lifetime
( Reader::new(session, read_half), Writer::new(Arc::clone(session), write_half) )
}
...
}
pub struct Reader {
session: Arc<RwLock<&mut StompSession>>,
read_half: ReadHalf<TcpStream>,
...
pub struct Writer {
session: Arc<RwLock<&mut StompSession>>,
write_half: WriteHalf<TcpStream>,
....
I dont really understand the difference between Arc<RwLock<&mut StompSession>> and Arc<RwLock<StompSession>> both can only be talking about pointers.
Naturally come at this by stuggling with the borrow checker and rust book only has examples for RRwLock with integers, not mutable "objects".
Let's start by clearing a few things:
does not seem to be a way to write single threaded tokio code that avoids mutexes
The Mutex requirement has nothing to do with single-threadedness but with mutable borrows. Whenever you spawn a future, that future is its own entity; it isn't magically part of your struct and most definitely does not know how to keep a &mut self. That is the point of the Mutex - it allows you to dynamically acquire a mutable reference to inner state - and the Arc allows you to have access to the Mutex itself in multiple places.
Their non-synchronized equivalents are Rc and Cell/RefCell, by the way, and their content (whether synchronized or unsynchronized) should be an owned type.
The Send requirement actually shows up when you use futures on top of tokio, as an Executor requires futures spawned on it to be Send (for obvious reasons - you could make a spawn_local method, but that'd cause more problems than it solves).
Now, back to your problem. I'm going to give you the shortest path to an answer to your problem. This, however, will not be completely the right way of doing things; however, since I do not know what protocol you're going to lay on top of TcpStream or what kind of requirements you have, I cannot point you in the right direction (yet). Comments are there for that reason - give me more requirements and I'll happily edit this!
Anyway. Back to the problem. Since Mutex<_> is better used with an owned type, we're going to do that right now and "fix" both your Reader and Writer:
pub struct Reader {
session: Arc<RwLock<Session>>,
read_half: ReadHalf<TcpStream>,
}
pub struct Writer {
session: Arc<RwLock<StompSession>>,
write_half: WriteHalf<TcpStream>,
}
Since we changed this, we also need to reflect it elsewhere, but to be able to do this we're going to need to consume self. It is fine, however, as we're going to have an Arc<Mutex<_>> copy in either object we return:
impl Session {
pub fn split(self, sock: TcpStream) -> (Reader, Writer) {
let (read_half, write_half) = sock.split();
let session = Arc::new(RwLock::new(self));
( Reader::new(session.clone(), read_half), Writer::new(session, write_half) )
}
}
And lo and behold, it compiles and each Writer/Reader pair now has its own borrowable (mutably and non-mutably) reference to our session!
The playground snippet highlights the changes made. As I said, it works now, but it's going to bite you in the ass the moment you try to do something as you're going to need something on top of both ReadHalf and WriteHalf to be able to use them properly.
I'm trying to implement a simple interpreter in Rust for a made up programming language called rlox, following Bob Nystrom's book Crafting Interpreters.
I want errors to be able to occur in any child module, and for them to be "reported" in the main module (this is done in the book, with Java, by simply calling a static method on the containing class which prints the offending token and line). However, if an error occurs, it's not like I can just return early with Result::Err (which is, I assume, the idiomatic way to handle errors in Rust) because the interpreter should keep running - continually looking for errors.
Is there an (idiomatic) way for me to emulate the Java behaviour of calling a parent class' static method from a child class in Rust with modules? Should I abandon something like that entirely?
I thought about a strategy where I inject a reference to some ErrorReporter struct as a dependency into the Scanner and Token structs, but that seems unwieldy to me (I don't feel like an error reporter should be part of the struct's signature, am I wrong?):
struct Token {
error_reporter: Rc<ErrorReporter>, // Should I avoid this?
token_type: token::Type,
lexeme: String,
line: u32
}
This is the layout of my project if you need to visualise what I'm talking about with regards to module relationships. Happy to provide some source code if necessary.
rlox [package]
└───src
├───main.rs (uses scanner + token mods, should contain logic for handling errors)
├───lib.rs (just exports scanner and token mods)
├───scanner.rs (uses token mod, declares scanner struct and impl)
└───token.rs (declares token struct and impl)
Literal translation
Importantly, a Java static method has no access to any instance state. That means that it can be replicated in Rust by either a function or an associated function, neither of which have any state. The only difference is in how you call them:
fn example() {}
impl Something {
fn example() {}
}
fn main() {
example();
Something::example();
}
Looking at the source you are copying, it doesn't "just" report the error, it has code like this:
public class Lox {
static boolean hadError = false;
static void error(int line, String message) {
report(line, "", message);
}
private static void report(int line, String where, String message) {
System.err.println(
"[line " + line + "] Error" + where + ": " + message);
hadError = true;
}
}
I'm no JVM expert, but I'm pretty sure that using a static variable like that means that your code is no longer thread safe. You simply can't do that in safe Rust; you can't "accidentally" make memory-unsafe code.
The most literal translation of this that is safe would use associated functions and atomic variables:
use std::sync::atomic::{AtomicBool, Ordering, ATOMIC_BOOL_INIT};
static HAD_ERROR: AtomicBool = ATOMIC_BOOL_INIT;
struct Lox;
impl Lox {
fn error(line: usize, message: &str) {
Lox::report(line, "", message);
}
fn report(line: usize, where_it_was: &str, message: &str) {
eprintln!("[line {}] Error{}: {}", line, where_it_was, message);
HAD_ERROR.store(true, Ordering::SeqCst);
}
}
You can also choose more rich data structures to store in your global state by using lazy_static and a Mutex or RwLock, if you need them.
Idiomatic translation
Although it might be convenient, I don't think such a design is good. Global state is simply terrible. I'd prefer to use dependency injection.
Define an error reporter structure that has the state and methods you need and pass references to the error reporter down to where it needs to be:
struct LoggingErrorSink {
had_error: bool,
}
impl LoggingErrorSink {
fn error(&mut self, line: usize, message: &str) {
self.report(line, "", message);
}
fn report(&mut self, line: usize, where_it_was: &str, message: &str) {
eprintln!("[line {} ] Error {}: {}", line, where_it_was, message);
self.had_error = true;
}
}
fn some_parsing_thing(errors: &mut LoggingErrorSink) {
errors.error(0, "It's broken");
}
In reality, I'd rather define a trait for things that allow reporting errors and implement it for a concrete type. Rust makes this nice because there's zero performance difference when using these generics.
trait ErrorSink {
fn error(&mut self, line: usize, message: &str) {
self.report(line, "", message);
}
fn report(&mut self, line: usize, where_it_was: &str, message: &str);
}
struct LoggingErrorSink {
had_error: bool,
}
impl LoggingErrorSink {
fn report(&mut self, line: usize, where_it_was: &str, message: &str) {
eprintln!("[line {} ] Error {}: {}", line, where_it_was, message);
self.had_error = true;
}
}
fn some_parsing_thing<L>(errors: &mut L)
where
L: ErrorSink,
{
errors.error(0, "It's broken");
}
There's lots of variants of implementing this, all depending on your tradeoffs.
You could choose to have the logger take &self instead of &mut, which would force this case to use something like a Cell to gain internal mutability of had_error.
You could use something like an Rc to avoid adding any extra lifetimes to the calling chain.
You could choose to store the logger as a struct member instead of a function parameter.
For your extra keyboard work, you get the benefit of being able to test your errors. Simply whip up a dummy implementation of the trait that saves information to internal variables and pass it in at test time.
Opinions, ahoy!
a strategy where I inject a reference to some ErrorReporter struct as a dependency into the Scanner
Yes, dependency injection is an amazing solution to a large number of coding issues.
and Token structs
I don't know why a token would need to report errors, but it would make sense for the tokenizer to do so.
but that seems unwieldy to me. I don't feel like an error reporter should be part of the struct's signature, am I wrong?
I'd say yes, you are wrong; you've stated this as an absolute truth, of which very few exist in programming.
Concretely, very few people care about what is inside your type, probably only to be the implementer. The person who constructs a value of your type might care a little because they need to pass in dependencies, but this is a good thing. They now know that this value can generate errors that they need to handle "out-of-band", as opposed to reading some documentation after their program doesn't work.
A few more people care about the actual signature of your type. This is a double-edged blade. In order to have maximal performance, Rust will force you to expose your generic types and lifetimes in your type signatures. Sometimes, this sucks, but either the performance gain is worth it, or you can hide it somehow and take the tiny hit. That's the benefit of a language that gives you choices.
See also
How to synchronize a static variable among threads running different instances of a class in Java?
Where are static methods and static variables stored in Java?
Static fields in a struct in Rust
How can you make a safe static singleton in Rust?
How do I create a global, mutable singleton?
How can I avoid a ripple effect from changing a concrete struct to generic?
There is a new Pin type in unstable Rust and the RFC is already merged. It is said to be kind of a game changer when it comes to passing references, but I am not sure how and when one should use it.
Can anyone explain it in layman's terms?
What is pinning ?
In programming, pinning X means instructing X not to move.
For example:
Pinning a thread to a CPU core, to ensure it always executes on the same CPU,
Pinning an object in memory, to prevent a Garbage Collector to move it (in C# for example).
What is the Pin type about?
The Pin type's purpose is to pin an object in memory.
It enables taking the address of an object and having a guarantee that this address will remain valid for as long as the instance of Pin is alive.
What are the usecases?
The primary usecase, for which it was developed, is supporting Generators.
The idea of generators is to write a simple function, with yield, and have the compiler automatically translate this function into a state machine. The state that the generator carries around is the "stack" variables that need to be preserved from one invocation to another.
The key difficulty of Generators that Pin is designed to fix is that Generators may end up storing a reference to one of their own data members (after all, you can create references to stack values) or a reference to an object ultimately owned by their own data members (for example, a &T obtained from a Box<T>).
This is a subcase of self-referential structs, which until now required custom libraries (and lots of unsafe). The problem of self-referential structs is that if the struct move, the reference it contains still points to the old memory.
Pin apparently solves this years-old issue of Rust. As a library type. It creates the extra guarantee that as long as Pin exist the pinned value cannot be moved.
The usage, therefore, is to first create the struct you need, return it/move it at will, and then when you are satisfied with its place in memory initialize the pinned references.
One of the possible uses of the Pin type is self-referencing objects; an article by ralfj provides an example of a SelfReferential struct which would be very complicated without it:
use std::ptr;
use std::pin::Pin;
use std::marker::PhantomPinned;
struct SelfReferential {
data: i32,
self_ref: *const i32,
_pin: PhantomPinned,
}
impl SelfReferential {
fn new() -> SelfReferential {
SelfReferential { data: 42, self_ref: ptr::null(), _pin: PhantomPinned }
}
fn init(self: Pin<&mut Self>) {
let this : &mut Self = unsafe { self.get_unchecked_mut() };
// Set up self_ref to point to this.data.
this.self_ref = &mut this.data as *const i32;
}
fn read_ref(self: Pin<&Self>) -> Option<i32> {
let this : &Self= self.get_ref();
// Dereference self_ref if it is non-NULL.
if this.self_ref == ptr::null() {
None
} else {
Some(unsafe { *this.self_ref })
}
}
}
fn main() {
let mut data: Pin<Box<SelfReferential>> = Box::new(SelfReferential::new()).into();
data.as_mut().init();
println!("{:?}", data.as_ref().read_ref()); // prints Some(42)
}
I am writing a game and have a player list defined as follows:
pub struct PlayerList {
by_name: HashMap<String, Arc<Mutex<Player>>>,
by_uuid: HashMap<Uuid, Arc<Mutex<Player>>>,
}
This struct has methods for adding, removing, getting players, and getting the player count.
The NetworkServer and Server shares this list as follows:
NetworkServer {
...
player_list: Arc<Mutex<PlayerList>>,
...
}
Server {
...
player_list: Arc<Mutex<PlayerList>>,
...
}
This is inside an Arc<Mutex> because the NetworkServer accesses the list in a different thread (network loop).
When a player joins, a thread is spawned for them and they are added to the player_list.
Although the only operation I'm doing is adding to player_list, I'm forced to use Arc<Mutex<Player>> instead of the more natural Rc<RefCell<Player>> in the HashMaps because Mutex<PlayerList> requires it. I am not accessing players from the network thread (or any other thread) so it makes no sense to put them under a Mutex. Only the HashMaps need to be locked, which I am doing using Mutex<PlayerList>. But Rust is pedantic and wants to protect against all misuses.
As I'm only accessing Players in the main thread, locking every time to do that is both annoying and less performant. Is there a workaround instead of using unsafe or something?
Here's an example:
use std::cell::Cell;
use std::collections::HashMap;
use std::ffi::CString;
use std::rc::Rc;
use std::sync::{Arc, Mutex};
use std::thread;
#[derive(Clone, Copy, PartialEq, Eq, Hash)]
struct Uuid([u8; 16]);
struct Player {
pub name: String,
pub uuid: Uuid,
}
struct PlayerList {
by_name: HashMap<String, Arc<Mutex<Player>>>,
by_uuid: HashMap<Uuid, Arc<Mutex<Player>>>,
}
impl PlayerList {
fn add_player(&mut self, p: Player) {
let name = p.name.clone();
let uuid = p.uuid;
let p = Arc::new(Mutex::new(p));
self.by_name.insert(name, Arc::clone(&p));
self.by_uuid.insert(uuid, p);
}
}
struct NetworkServer {
player_list: Arc<Mutex<PlayerList>>,
}
impl NetworkServer {
fn start(&mut self) {
let player_list = Arc::clone(&self.player_list);
thread::spawn(move || {
loop {
// fake network loop
// listen for incoming connections, accept player and add them to player_list.
player_list.lock().unwrap().add_player(Player {
name: "blahblah".into(),
uuid: Uuid([0; 16]),
});
}
});
}
}
struct Server {
player_list: Arc<Mutex<PlayerList>>,
network_server: NetworkServer,
}
impl Server {
fn start(&mut self) {
self.network_server.start();
// main game loop
loop {
// I am only accessing players in this loop in this thread. (main thread)
// so Mutex for individual player is not needed although rust requires it.
}
}
}
fn main() {
let player_list = Arc::new(Mutex::new(PlayerList {
by_name: HashMap::new(),
by_uuid: HashMap::new(),
}));
let network_server = NetworkServer {
player_list: Arc::clone(&player_list),
};
let mut server = Server {
player_list,
network_server,
};
server.start();
}
As I'm only accessing Players in the main thread, locking everytime to do that is both annoying and less performant.
You mean, as right now you are only accessing Players in the main thread, but at any time later you may accidentally introduce an access to them in another thread?
From the point of view of the language, if you can get a reference to a value, you may use the value. Therefore, if multiple threads have a reference to a value, this value should be safe to use from multiple threads. There is no way to enforce, at compile-time, that a particular value, although accessible, is actually never used.
This raises the question, however:
If the value is never used by a given thread, why does this thread have access to it in the first place?
It seems to me that you have a design issue. If you can manage to redesign your program so that only the main thread has access to the PlayerList, then you will immediately be able to use Rc<RefCell<...>>.
For example, you could instead have the network thread send a message to the main thread announcing that a new player connected.
At the moment, you are "Communicating by Sharing", and you could shift toward "Sharing by Communicating" instead. The former usually has synchronization primitives (such as mutexes, atomics, ...) all over the place, and may face contention/dead-lock issues, while the latter usually has communication queues (channels) and requires an "asynchronous" style of programming.
Send is a marker trait that governs which objects can have ownership transferred across thread boundaries. It is automatically implemented for any type that is entirely composed of Send types. It is also an unsafe trait because manually implementing this trait can cause the compiler to not enforce the concurrency safety that we love about Rust.
The problem is that Rc<RefCell<Player>> isn't Send and thus your PlayerList isn't Send and thus can't be sent to another thread, even when wrapped in an Arc<Mutex<>>. The unsafe workaround would be to unsafe impl Send for your PlayerList struct.
Putting this code into your playground example allows it to compile the same way as the original with Arc<Mutex<Player>>
struct PlayerList {
by_name: HashMap<String, Rc<RefCell<Player>>>,
by_uuid: HashMap<Uuid, Rc<RefCell<Player>>>,
}
unsafe impl Send for PlayerList {}
impl PlayerList {
fn add_player(&mut self, p: Player) {
let name = p.name.clone();
let uuid = p.uuid;
let p = Rc::new(RefCell::new(p));
self.by_name.insert(name, Rc::clone(&p));
self.by_uuid.insert(uuid, p);
}
}
Playground
The Nomicon is sadly a little sparse at explaining what rules have have to be enforced by the programmer when unsafely implementing Send for a type containing Rcs, but accessing in only one thread seems safe enough...
For completeness, here's TRPL's bit on Send and Sync
I suggest solving this threading problem using a multi-sender-single-receiver channel. The network threads get a Sender<Player> and no direct access to the player list.
The Receiver<Player> gets stored inside the PlayerList. The only thread accessing the PlayerList is the main thread, so you can remove the Mutex around it. Instead in the place where the main-thread used to lock the mutexit dequeue all pending players from the Receiver<Player>, wraps them in an Rc<RefCell<>> and adds them to the appropriate collections.
Though looking at the bigger designing, I wouldn't use a per-player thread in the first place. Instead I'd use some kind single threaded event-loop based design. (I didn't look into which Rust libraries are good in that area, but tokio seems popular)