How to safely wrap C pointers in rust structs - rust

I am building safe bindings for a C library in Rust and I started facing a weird issue.
I created a struct to own the unsafe pointer to the objects returned by the library and free them safely.
This is what I have:
pub struct VipsImage {
pub(crate) ctx: *mut bindings::VipsImage,
}
impl Drop for VipsImage {
fn drop(&mut self) {
unsafe {
if !self.ctx.is_null() {
bindings::g_object_unref(self.ctx as *mut c_void);
}
}
}
}
This works fine as long as I don't share this between async calls. If I return one of this objects in an async function and use it afterwards it will be corrupted. If I used and free them in a single operation, they work as expected.
How would I implement Send and Sync for a struct like that so I can safely share it between threads?
If someone wants to check the full library code, here's the link

Related

rust two objects refering each borrow_mut pattern

A similar question I posted earlier is here
Rust can't modify RefCell in Rc, but completely different.
I want to simulate some natural process, so I have a Simulator, and a reactor like a NuclearReactor. The simulator will modify the reactor, and the reactor can reversely influance the simulator by modifying it. One important thing is that the NuclearReactor is wrapped from somewhere else, the solid_function must has a inmutable &self.
So after reading rust book of RefCell, I wrote something like these, It complies, but crashed.
use std::borrow::BorrowMut;
use std::cell::RefCell;
use std::rc::{Rc, Weak};
pub struct Simulator {
nr: NuclearReactor,
data: Vec<f64>,
}
impl Simulator {
pub fn on_nuclear_data(&mut self, x: i64) {
// modify self
}
pub fn run_simulation(&mut self) {
self.nr.write_simulator();
}
}
pub struct NuclearReactor {
simulator: Option<Weak<RefCell<Simulator>>>,
}
impl NuclearReactor {
pub fn solid_function(&self, x: i64) {
/*
this function `&self` is solid, so I have to use a RefCell to wrap Simulator
*/
}
pub fn write_simulator(&self) {
/*
thread 'main' panicked at 'already borrowed: BorrowMutError'
*/
(*self.simulator.as_ref().unwrap().upgrade().unwrap()).borrow_mut().on_nuclear_data(0);
}
}
pub fn main() {
let nr_ = NuclearReactor {
simulator: None
};
let mut sm_ = Rc::new(RefCell::new(Simulator {
nr: nr_,
data: vec![],
}));
(*sm_).borrow_mut().nr.simulator = Some(Rc::downgrade(&sm_));
(*sm_).borrow_mut().run_simulation();
}
Obviously, the runtime check of borrow_mut fails.
Actually the NuclearReactor is my online code, the Simulator is an offline test, so I wanna modify the NuclearReactor at a minimal cost to let it run on the offline environment. That's why I have to keep the function solid_function with an immutable &self. Changing it to a &mut self is and then move to-modifying objects to a seperate function is feasible, but then I have to modify the online code frequently at a high cost. It there anything cool that can solve it ?
Ok, after reading this: http://smallcultfollowing.com/babysteps/blog/2018/11/01/after-nll-interprocedural-conflicts/
I finnaly realized that what I am doing is something like below and rust was helping me avoiding bugs.
let v: Vec<i64> = vec![1,2,3];
for ele in v.iter_mut(){
v.push(1);
}
Thankfully, pushing NuclearReactor's modify to a temp buffer then apply them to Simulator is just enough to solve my problem.
Also, I didn't explain the question clearly (actually I didn't get the point to describe the question until I solved it), so the community can't help me.

How do I send read-only data to other threads without copying?

I'm trying to send a "view" of a read-only data to another thread for processing. Basically the main thread does work, and continuously updates a set of data. Whenever an update occurs, the main thread should send the updated data down to other threads where they will process it in a read-only manner. I do not want to copy the data as it may be very large. (The main thread also keeps a "cache" of the data in-memory anyway.)
I can achieve this with Arc<RwLock<T>>, where T being my data structure.
However, there is nothing stopping the side threads updating the data. The side threads can simply call lock() and the write to the data.
My question is there something similar to RwLock where the owner/creator of it has the only write access, but all other instances have read-only access? This way I will have compile time checking of any logic bugs that may occur via side threads accidentally updating data.
Regarding these questions:
Sharing read-only object between threads in Rust?
How can I pass a reference to a stack variable to a thread?
The above questions suggest solving it with Arc<Mutex<T>> or Arc<RwLock<T>> which is all fine. But it still doesn't give compile time enforcement of only one writer.
Additionally: crossbeam or rayon's scoped threads don't help here as I want my side threads to outlive my main thread.
You can create a wrapper type over an Arc<RwLock<T>> that only exposes cloning via a read only wrapper:
mod shared {
use std::sync::{Arc, LockResult, RwLock, RwLockReadGuard, RwLockWriteGuard};
pub struct Lock<T> {
inner: Arc<RwLock<T>>,
}
impl<T> Lock<T> {
pub fn new(val: T) -> Self {
Self {
inner: Arc::new(RwLock::new(val)),
}
}
pub fn write(&self) -> LockResult<RwLockWriteGuard<'_, T>> {
self.inner.write()
}
pub fn read(&self) -> LockResult<RwLockReadGuard<'_, T>> {
self.inner.read()
}
pub fn read_only(&self) -> ReadOnly<T> {
ReadOnly {
inner: self.inner.clone(),
}
}
}
pub struct ReadOnly<T> {
inner: Arc<RwLock<T>>,
}
impl<T> ReadOnly<T> {
pub fn read(&self) -> LockResult<RwLockReadGuard<'_, T>> {
self.inner.read()
}
}
}
Now you can pass read only versions of the value to spawned threads, and continue writing in the main thread:
fn main() {
let val = shared::Lock::new(String::new());
for _ in 0..10 {
let view = val.read_only();
std::thread::spawn(move || {
// view.write().unwrap().push_str("...");
// ERROR: no method named `write` found for struct `ReadOnly` in the current scope
println!("{}", view.read().unwrap());
});
}
val.write().unwrap().push_str("...");
println!("{}", val.read().unwrap());
}

Circular reference between two structs with high performance

I have two structs Node and Communicator. Node contains "business-logic" while Communicator contains methods for sending and receiving UDP messages. Node needs to call methods on Communicator when it wants to send messages and Communicator needs to call methods on Node when it receives a UDP message. If they are the same struct, there is no problem at all. But I want to separate them because they clearly have different responsibilities. Having it all in one struct would become unmanageable. My code looks as follows:
fn main() {
use std::sync::{Arc, Mutex, Condvar, Weak};
use std::thread;
pub struct Node {
communicator: Option<Arc<Communicator>>
}
impl Node {
pub fn new() -> Node {
Node {
communicator: None
}
}
pub fn set_communicator(&mut self, communicator: Arc<Communicator>) {
self.communicator = Some(communicator);
}
}
pub struct Communicator {
node: Option<Weak<Node>>
}
impl Communicator {
pub fn new() -> Communicator {
Communicator {
node: None
}
}
pub fn set_node(&mut self, node: Weak<Node>) {
self.node = Some(node);
}
}
let mut my_node = Arc::new(Node::new());
let mut my_communicator = Arc::new(Communicator::new());
Arc::get_mut(&mut my_node).unwrap().set_communicator(Arc::clone(&my_communicator));
//Arc::get_mut(&mut my_communicator).unwrap().set_node(Arc::downgrade(&my_node));
}
My code crashes, as I predicted, if I uncomment both of the two last lines. But it is something like that I want to achieve.
I see a couple of options:
Use Mutex or RwLock to gain interior mutability. But that gives a performance penalty.
Use Cell or RefCell. But they are not thread-safe.
Use AtomicCell from crossbeam. Looks like the best option so far. But what about performance?
Use unsafe. But where? And how to ensure the code remains memory safe in spite of unsafe?
In theory I could split Communicator into a Sender and a Receiver, so that there would be no cyclical references. But for my program, I know I will have a similar situation in the future where this will not be possible.
It seems that since I just split a struct into two, for code-structuring reasons, and gaining no new functionality, there should be a way to do this without having to pay any performance penalty like Mutex would give.

Drop a Rust void pointer stored in an FFI

I'm wrapping a C API which allows the caller to set/get an arbitrary pointer via function calls. In this way, the C API allows a caller to associate arbitrary data with one of the C API objects. This data is not used in any callbacks, it's just a pointer that a user can stash away and get at later.
My wrapper struct implements the Drop trait for the C object that contains this pointer. What I'd like to be able to do, but am not sure it's possible, is have the data dropped correctly if the pointer is not null when the wrapper struct drops. I'm not sure how I would recover the correct type though from a raw c_void pointer.
Two alternatives I'm thinking of are
Implement the behavior of these two calls in the wrapper. Don't make any calls to the C API.
Don't attempt to offer any kind of safer interface to these functions. Document that the pointer must be managed by the caller of the wrapper.
Is what I want to do possible? If not, is there a generally accepted practice for these kinds of situations?
A naive + fully automatic approach is NOT possible for the following reasons:
freeing memory does not call drop/deconstructors/...: the C API can be used from languages which can have objects which should be deconstructed properly, e.g. C++ or Rust itself. So when you only store a memory pointer you do not know you to call the proper function (you neither know which function not how the calling conventions look like).
which memory allocator?: memory allocation and deallocation isn't a trivial thing. your program needs to request memory from the OS and then manage this resources in an intelligent way to be efficient and correct. This is usually done by a library. In case of Rust, jemalloc is used (but can be changed). So even when you ask the API caller to only pass Plain Old Data (which should be easier to destruct) you still don't know which library function to call to deallocate memory. Just using libc::free won't work (it can but it could horrible fail).
Solutions:
dealloc callback: you can ask the API user to set an additional pointer to, let's say a void destruct(void* ptr) function. If this one is not NULL, you call that function during your drop. You could also use int as an return type to signal when the destruction went wrong. In that case you could for example panic!.
global callback: let's assume you requested your user to only pass POD (plain old data). To know which free function of the memory allocator to call, you could request the user to register a global void (*free)(void* ptr) pointer which is called during drop. You could also make that one optional.
Although I was able to follow the advice in this thread, I wasn't entirely satisfied with my results, so I asked the question on the Rust forums and found the answer I was really looking for. (play)
use std::any::Any;
static mut foreign_ptr: *mut () = 0 as *mut ();
unsafe fn api_set_fp(ptr: *mut ()) {
foreign_ptr = ptr;
}
unsafe fn api_get_fp() -> *mut() {
foreign_ptr
}
struct ApiWrapper {}
impl ApiWrapper {
fn set_foreign<T: Any>(&mut self, value: Box<T>) {
self.free_foreign();
unsafe {
let raw = Box::into_raw(Box::new(value as Box<Any>));
api_set_fp(raw as *mut ());
}
}
fn get_foreign_ref<T: Any>(&self) -> Option<&T> {
unsafe {
let raw = api_get_fp() as *const Box<Any>;
if !raw.is_null() {
let b: &Box<Any> = &*raw;
b.downcast_ref()
} else {
None
}
}
}
fn get_foreign_mut<T: Any>(&mut self) -> Option<&mut T> {
unsafe {
let raw = api_get_fp() as *mut Box<Any>;
if !raw.is_null() {
let b: &mut Box<Any> = &mut *raw;
b.downcast_mut()
} else {
None
}
}
}
fn free_foreign(&mut self) {
unsafe {
let raw = api_get_fp() as *mut Box<Any>;
if !raw.is_null() {
Box::from_raw(raw);
}
}
}
}
impl Drop for ApiWrapper {
fn drop(&mut self) {
self.free_foreign();
}
}
struct MyData {
i: i32,
}
impl Drop for MyData {
fn drop(&mut self) {
println!("Dropping MyData with value {}", self.i);
}
}
fn main() {
let p1 = Box::new(MyData {i: 1});
let mut api = ApiWrapper{};
api.set_foreign(p1);
{
let p2 = api.get_foreign_ref::<MyData>().unwrap();
println!("i is {}", p2.i);
}
api.set_foreign(Box::new("Hello!"));
{
let p3 = api.get_foreign_ref::<&'static str>().unwrap();
println!("payload is {}", p3);
}
}

Recommended way to wrap C lib initialization/destruction routine

I am writing a wrapper/FFI for a C library that requires a global initialization call in the main thread as well as one for destruction.
Here is how I am currently handling it:
struct App;
impl App {
fn init() -> Self {
unsafe { ffi::InitializeMyCLib(); }
App
}
}
impl Drop for App {
fn drop(&mut self) {
unsafe { ffi::DestroyMyCLib(); }
}
}
which can be used like:
fn main() {
let _init_ = App::init();
// ...
}
This works fine, but it feels like a hack, tying these calls to the lifetime of an unnecessary struct. Having the destructor in a finally (Java) or at_exit (Ruby) block seems theoretically more appropriate.
Is there some more graceful way to do this in Rust?
EDIT
Would it be possible/safe to use this setup like so (using the lazy_static crate), instead of my second block above:
lazy_static! {
static ref APP: App = App::new();
}
Would this reference be guaranteed to be initialized before any other code and destroyed on exit? Is it bad practice to use lazy_static in a library?
This would also make it easier to facilitate access to the FFI through this one struct, since I wouldn't have to bother passing around the reference to the instantiated struct (called _init_ in my original example).
This would also make it safer in some ways, since I could make the App struct default constructor private.
I know of no way of enforcing that a method be called in the main thread beyond strongly-worded documentation. So, ignoring that requirement... :-)
Generally, I'd use std::sync::Once, which seems basically designed for this case:
A synchronization primitive which can be used to run a one-time global
initialization. Useful for one-time initialization for FFI or related
functionality. This type can only be constructed with the ONCE_INIT
value.
Note that there's no provision for any cleanup; many times you just have to leak whatever the library has done. Usually if a library has a dedicated cleanup path, it has also been structured to store all that initialized data in a type that is then passed into subsequent functions as some kind of context or environment. This would map nicely to Rust types.
Warning
Your current code is not as protective as you hope it is. Since your App is an empty struct, an end-user can construct it without calling your method:
let _init_ = App;
We will use a zero-sized argument to prevent this. See also What's the Rust idiom to define a field pointing to a C opaque pointer? for the proper way to construct opaque types for FFI.
Altogether, I'd use something like this:
use std::sync::Once;
mod ffi {
extern "C" {
pub fn InitializeMyCLib();
pub fn CoolMethod(arg: u8);
}
}
static C_LIB_INITIALIZED: Once = Once::new();
#[derive(Copy, Clone)]
struct TheLibrary(());
impl TheLibrary {
fn new() -> Self {
C_LIB_INITIALIZED.call_once(|| unsafe {
ffi::InitializeMyCLib();
});
TheLibrary(())
}
fn cool_method(&self, arg: u8) {
unsafe { ffi::CoolMethod(arg) }
}
}
fn main() {
let lib = TheLibrary::new();
lib.cool_method(42);
}
I did some digging around to see how other FFI libs handle this situation. Here is what I am currently using (similar to #Shepmaster's answer and based loosely on the initialization routine of curl-rust):
fn initialize() {
static INIT: Once = ONCE_INIT;
INIT.call_once(|| unsafe {
ffi::InitializeMyCLib();
assert_eq!(libc::atexit(cleanup), 0);
});
extern fn cleanup() {
unsafe { ffi::DestroyMyCLib(); }
}
}
I then call this function inside the public constructors for my public structs.

Resources