Passing a safe rust function pointer to C - rust

I've created rust bindings to a C library and currently writing safe wrappers around it.
The question is about C functions which take in C function pointers which are not able to take in any custom user data.
It is easier to explain it with an example,
C Library:
// The function pointer I need to pass,
typedef void (*t_function_pointer_library_wants)(const char *argument);
// The function which I need to pass the function pointer to,
void register_hook(const t_function_pointer_library_wants hook);
Bindings:
// For the function pointer type
pub type t_function_pointer_library_wants = ::std::option::Option<unsafe extern "C" fn(argument: *const ::std::os::raw::c_char)>;
// For the function which accepts the pointer
extern "C" {
pub fn register_hook(hook: t_function_pointer_library_wants);
}
It would have been very nice if I could expose an api to the user like the following,
// Let's assume my safe wrapper is named on_something
// ..
on_something(|argument|{
// Do something with the argument..
});
// ..
although according to the sources below, the lack of ability to hand over the management of the part of memory which would store my closure's state to C, prevents me to create this sort of API. Because the function pointer in C is stateless and does not take in any user data of some sort. (Please correct me if I'm wrong.)
I've come to this conclusion by reading these sources and similar ones:
Trampoline Technique
Similar Trampoline Technique
Hacky Thread Local Technique
Sources in Shepmaster's answer
As a fallback, I maybe can imagine an API like this where I pass a function pointer instead.
fn handler(argument: &str) {
// Do something with the argument..
}
//..
on_something(handler);
//..
But I am a little confused about converting an fn(&str),
to an unsafe extern "C" fn(argument: *const std::os::raw::c_char)..
I'd be very glad if you could point me to the right direction.
* The actual library in focus is libpd and there is an issue I've created related to this.
Thanks a lot.

First off, this is a pretty hard problem to solve. Obviously, you need some way to pass data into a function outside of its arguments. However, pretty any method of doing that via a static could easily result in race conditions or worse, depending on what the c library does and how the library is used. The other option is to JIT some glue code that calls your closure. At first glance, that seems even worse, but libffi abstracts most of that away. A wrapper using the libffi crate would like this:
use std::ffi::CStr;
use libffi::high::Closure1;
fn on_something<F: Fn(&str) + Send + Sync + 'static>(f: F) {
let closure: &'static _ = Box::leak(Box::new(move |string: *const c_char| {
let string = unsafe { CStr::from_ptr(string).to_str().unwrap() };
f(string);
}));
let callback = Closure1::new(closure);
let &code = callback.code_ptr();
let ptr:unsafe extern "C" fn (*const c_char) = unsafe { std::mem::transmute(code) };
std::mem::forget(callback);
unsafe { register_handler(Some(ptr)) };
}
I don't have a playground link, but it worked fine when I tested it locally. There are two important things to note with this code:
It's maximally pessimistic about what the c code does, assuming the function is repeatedly called from multiple threads for the entire duration of the program. You may be able to get away with fewer restrictions, depending on what libpd does.
It leaks memory to ensure the callback is valid for the life of the program. This is probably fine since callbacks are typically only set once. There is no way to safely recover this memory without keeping around a pointer to the callback that was registered.
It's also worth noting that the libffi::high::ClosureMutN structs are unsound, as they permit aliasing mutable references to the passed wrapped closure. There is a PR to fix that waiting to be merged though.

Related

Deref Mutex/RwLock to inner object with implicit locking

I have a struct Obj with a large API that I would also like to use through an Arc<RwLock<Obj>>. I have defined a struct ObjRef(Arc<RwLock<Obj>>) and want to call functions of Obj on ObjRef without needing to call .read().unwrap() or .write().unwrap() every time. (ObjRef implements Deref<Target=Arc<RwLock<Obj>>, so I only have to call objRef.read().unwrap())
I could implement all of the functions of Obj again for ObjRef in a manner like this:
impl ObjRef {
fn foo(&self, arg: Arg) -> Ret {
self.read().unwrap().foo(arg)
}
}
But doing this for every function results in a lot of boilerplate.
Implementing Deref<Target=Obj> for ObjRef does not work, because the deref implementation would be returning a reference into the RwReadLockGuard object returned by .read().unwrap() which would be dropped after deref:
impl Deref for ObjRef {
type Target = Obj;
fn deref(&self) -> &Self::Target {
let guard: RwLockReadGuard<'_, Obj> = self.0.read().unwrap();
&*guard // <- return reference to Obj referenced by guard
// guard is dropped now, reference lifetime ends
}
}
Is there a way to call the Obj Api through a low boilerplate implementation doing the locking internally?
I am thinking about introducing a trait for the Api with a default implementation of the Api and a single function to be implemented by users of the trait for getting a generic Deref<Target=Obj>, but this puts the Api under some limitations, for example using generic parameters will become much more complicated.
Is there maybe a better way, something that lets me use any Obj through a lock without explicitly calling the locking mechanism every time?
It is not possible to do this through Deref, because Deref::deref always returns a plain reference — the validity of it must only depend on the argument continuing to exist (which the borrow checker understands), not on any additional state such as "locked".
Also, adding locking generically to an interface designed without it is dangerous, because it may accidentally cause deadlocks, or allow unwanted outcomes due to doing two operations in sequence that should have been done using a single locking.
I recommend considering making your type internally an Arc<RwLock, that is,
pub struct Obj {
lock: Arc<RwLock<ObjData>>,
}
// internal helper, not public
struct ObjData {
// whatever fields Obj would have had go here
}
Then, you can define your methods once, directly on Obj, and the caller never needs to deal with the lock explicitly, but your methods can handle it exactly as needed. This is a fairly common pattern in problem domains that benefit from it, like UI objects.
If there is some reason why Obj really needs to be usable with or without the lock, then define a trait.

How do I use a serial number in rust?

I'm looking for something like this (below is C)
void MyFunction() {
//Do work
static int serial;
++serial;
printf("Function run #%d\n", serial);
//Do more work
}
As Ice Giant mentions in their answer, you can use static mut to solve this. However, it is unsafe for a good reason: if you use it, you are responsible for making sure that the code is thread safe.
It's very easy to misuse and introduce unsound code and undefined behavior into your Rust program, so what I would recommend that you do instead is to use a type that provides safe interior mutability, such as Mutex, RwLock, or in the case of integers, atomics like AtomicU32:
use std::sync::atomic::{AtomicU32, Ordering};
fn my_function() {
// NOTE: AtomicU32 has safe interior mutability, so we don't need `mut` here
static SERIAL: AtomicU32 = AtomicU32::new(0);
// fetch and increment SERIAL in a single atomic operation
let serial = SERIAL.fetch_add(1, Ordering::Relaxed);
// do something with `serial`...
println!("function run #{}", serial);
}
Playground example
Because this program does not contain any unsafe code, we can be certain that the program does not contain the same data race bugs that can be caused by naïve use of static mut (or the equivalent use of static in the C code).

Initializing a struct field-by-field. Is it possible to know if all the fields were initialized?

I'm following the example from the official documentation. I'll copy the code here for simplicity:
#[derive(Debug, PartialEq)]
pub struct Foo {
name: String,
list: Vec<u8>,
}
let foo = {
let mut uninit: MaybeUninit<Foo> = MaybeUninit::uninit();
let ptr = uninit.as_mut_ptr();
// Initializing the `name` field
// Using `write` instead of assignment via `=` to not call `drop` on the
// old, uninitialized value.
unsafe { addr_of_mut!((*ptr).name).write("Bob".to_string()); }
// Initializing the `list` field
// If there is a panic here, then the `String` in the `name` field leaks.
unsafe { addr_of_mut!((*ptr).list).write(vec![0, 1, 2]); }
// All the fields are initialized, so we call `assume_init` to get an initialized Foo.
unsafe { uninit.assume_init() }
};
What bothers me is the second unsafe comment: If there is a panic here, then the String in the name field leaks. This is exactly what I want to avoid. I modified the example so now it reflects my concerns:
use std::mem::MaybeUninit;
use std::ptr::addr_of_mut;
#[derive(Debug, PartialEq)]
pub struct Foo {
name: String,
list: Vec<u8>,
}
#[allow(dead_code)]
fn main() {
let mut uninit: MaybeUninit<Foo> = MaybeUninit::uninit();
let ptr = uninit.as_mut_ptr();
init_foo(ptr);
// this is wrong because it tries to read the uninitialized field
// I could avoid this call if the function `init_foo` returns a `Result`
// but I'd like to know which fields are initialized so I can cleanup
let _foo = unsafe { uninit.assume_init() };
}
fn init_foo(foo_ptr: *mut Foo) {
unsafe { addr_of_mut!((*foo_ptr).name).write("Bob".to_string()); }
// something happened and `list` field left uninitialized
return;
}
The code builds and runs. But using MIRI I see the error:
Undefined Behavior: type validation failed at .value.list.buf.ptr.pointer: encountered uninitialized raw pointer
The question is how I can figure out which fields are initialized and which are not? Sure, I could return a result with the list of field names or similar, for example. But I don't want to do it - my struct can have dozens of fields, it changes over time and I'm too lazy to maintain an enum that should reflect the fields. Ideally I'd like to have something like this:
if addr_initialized!((*ptr).name) {
clean(addr_of_mut!((*ptr).name));
}
Update: Here's an example of what I want to achieve. I'm doing some Vulkan programming (with ash crate, but that's not important). I want to create a struct that holds all the necessary objects, like Device, Instance, Surface, etc.:
struct VulkanData {
pub instance: Instance,
pub device: Device,
pub surface: Surface,
// 100500 other fields
}
fn init() -> Result<VulkanData, Error> {
// let vulkan_data = VulkanData{}; // can't do that because some fields are not default constructible.
let instance = create_instance(); // can fail
let device = create_device(instance); // can fail, in this case instance have to be destroyed
let surface = create_surface(device); // can fail, in this case instance and device have to be destroyed
//other initialization routines
VulkanData{instance, device, surface, ...}
}
As you can see, for every such object, there's a corresponding create_x function, which can fail. Obviously, if I fail in the middle of the process, I don't want to proceed. But I want to clear already created objects. As you mentioned, I could create a wrapper. But it's very tedious work to create wrappers for hundreds of types, I absolutely want to avoid this (btw, ash is already a wrapper over C-types). Moreover, because of the asynchronous nature of CPU-GPU communication, sometimes it makes no sense to drop an object, it can lead to errors. Instead, some form of a signal should come from the GPU that indicates that an object is safe to destroy. That's the main reason why I can't implement Drop for the wrappers.
But as soon as the struct is successfully initialized I know that it's safe to read any of its fields. That's why don't want to use an Option - it adds some overhead and makes no sense in my particular example.
All that is trivially achievable in C++ - create an uninitialized struct (well, by default all Vulkan objects are initialized with VK_NULL_HANDLE), start to fill it field-by-field, if something went wrong just destroy the objects that are not null.
There is no general purpose way to tell if something is initialized or not. Miri can detect this because it adds a lot of instrumentation and overhead to track memory operations.
All that is trivially achievable in C++ - create an uninitialized struct (well, by default all Vulkan objects are initialized with VK_NULL_HANDLE), start to fill it field-by-field, if something went wrong just destroy the objects that are not null.
You could theoretically do the same in Rust, however this is quite unsafe and makes a lot of assumptions about the construction of the ash types.
If the functions didn't depend on each other, I might suggest something like this:
let instance = create_instance();
let device = create_device();
let surface = create_surface();
match (instance, device, surface) {
(Ok(instance), Ok(device), Ok(surface)) => {
Ok(VulkanData{
instance,
device,
surface,
})
}
instance, device, surface {
// clean up the `Ok` ones and return some error
}
}
However, your functions are dependent on others succeeding (e.g. need the Instance to create a Device) and this also has the disadvantage that it would keep creating values when one already failed.
Creating wrappers with custom drop behavior is the most robust way to accomplish this. There is the vulkano crate that is built on top of ash that does this among other things. But if that's not to your liking you can use something like scopeguard to encapsulate drop logic on the fly.
use scopeguard::{guard, ScopeGuard}; // 1.1.0
fn init() -> Result<VulkanData, Error> {
let instance = guard(create_instance()?, destroy_instance);
let device = guard(create_device(&instance)?, destroy_device);
let surface = guard(create_surface(&device)?, destroy_surface);
Ok(VulkanData {
// use `into_inner` to escape the drop behavior
instance: ScopeGuard::into_inner(instance),
device: ScopeGuard::into_inner(device),
surface: ScopeGuard::into_inner(surface),
})
}
See a full example on the playground. No unsafe required.
I believe MaybeUninit is designed for the cases when you have all the information about its contents and can make the code safe "by hand".
If you need to figure out in runtime if a field has a value, then use Option<T>.
Per the documentation:
You can think of MaybeUninit<T> as being a bit like Option<T> but without any of the run-time tracking and without any of the safety checks.

Why doesn't Rayon require Arc<_>?

On page 465 of Programming Rust you can find the code and explanation (emphasis added by me)
use std::sync::Arc;
fn process_files_in_parallel(filenames: Vec<String>,
glossary: Arc<GigabyteMap>)
-> io::Result<()>
{
...
for worklist in worklists {
// This call to .clone() only clones the Arc and bumps the
// reference count. It does not clone the GigabyteMap.
let glossary_for_child = glossary.clone();
thread_handles.push(
spawn(move || process_files(worklist, &glossary_for_child))
);
}
...
}
We have changed the type of glossary: to run the analysis in parallel, the caller must pass in an Arc<GigabyteMap>, a smart pointer to a GigabyteMap that’s been moved into the heap, by doing Arc::new(giga_map). When we call glossary.clone(), we are making a copy of the Arc smart pointer, not the whole GigabyteMap. This amounts to incrementing a reference count. With this change, the program compiles and runs, because it no longer depends on reference lifetimes. As long as any thread owns an Arc<GigabyteMap>, it will keep the map alive, even if the parent thread bails out early. There won’t be any data races, because data in an Arc is immutable.
In the next section they show this rewritten with Rayon,
extern crate rayon;
use rayon::prelude::*;
fn process_files_in_parallel(filenames: Vec<String>, glossary: &GigabyteMap)
-> io::Result<()>
{
filenames.par_iter()
.map(|filename| process_file(filename, glossary))
.reduce_with(|r1, r2| {
if r1.is_err() { r1 } else { r2 }
})
.unwrap_or(Ok(()))
}
You can see in the section rewritten to use Rayon that it accepts &GigabyteMap rather than Arc<GigabyteMap>. They don't explain how this works though. Why doesn't Rayon require Arc<GigabyteMap>? How does Rayon get away with accepting a direct reference?
Rayon can guarantee that the iterator does not outlive the current stack frame, unlike what I assume is thread::spawn in the first code example. Specifically, par_iter under the hood uses something like Rayon's scope function, which allows one to spawn a unit of work that's "attached" to the stack and will join before the stack ends.
Because Rayon can guarantee (via lifetime bounds, from the user's perspective) that the tasks/threads are joined before the function calling par_iter exits, it can provide this API which is more ergonomic to use than the standard library's thread::spawn.
Rayon expands on this in the scope function's documentation.

How to return a Rust closure to JavaScript via WebAssembly?

The comments on closure.rs are pretty great, however I can't make it work for returning a closure from a WebAssembly library.
I have a function like this:
#[wasm_bindgen]
pub fn start_game(
start_time: f64,
screen_width: f32,
screen_height: f32,
on_render: &js_sys::Function,
on_collision: &js_sys::Function,
) -> ClosureTypeHere {
// ...
}
Inside that function I make a closure, assuming Closure::wrap is one piece of the puzzle, and copying from closure.rs):
let cb = Closure::wrap(Box::new(move |time| time * 4.2) as Box<FnMut(f64) -> f64>);
How do I return this callback from start_game and what should ClosureTypeHere be?
The idea is that start_game will create local mutable objects - like a camera, and the JavaScript side should be able to call the function Rust returns in order to update that camera.
This is a good question, and one that has some nuance too! It's worth calling out the closures example in the wasm-bindgen guide (and the section about passing closures to JavaScript) as well, and it'd be good to contribute back to that as well if necessary!
To get you started, though, you can do something like this:
use wasm_bindgen::{Closure, JsValue};
#[wasm_bindgen]
pub fn start_game(
start_time: f64,
screen_width: f32,
screen_height: f32,
on_render: &js_sys::Function,
on_collision: &js_sys::Function,
) -> JsValue {
let cb = Closure::wrap(Box::new(move |time| {
time * 4.2
}) as Box<FnMut(f64) -> f64>);
// Extract the `JsValue` from this `Closure`, the handle
// on a JS function representing the closure
let ret = cb.as_ref().clone();
// Once `cb` is dropped it'll "neuter" the closure and
// cause invocations to throw a JS exception. Memory
// management here will come later, so just leak it
// for now.
cb.forget();
return ret;
}
Above the return value is just a plain-old JS object (here as a JsValue) and we create that with the Closure type you've seen already. This will allow you to quickly return a closure to JS and you'll be able to call it from JS as well.
You've also asked about storing mutable objects and such, and that can all be done through normal Rust closures, capturing, etc. For example the declaration of FnMut(f64) -> f64 above is the signature of the JS function, and that can be any set of types such as FnMut(String, MyCustomWasmBindgenType, f64) ->
Vec<u8> if you really want. For capturing local objects you can do:
let mut camera = Camera::new();
let mut state = State::new();
let cb = Closure::wrap(Box::new(move |arg1, arg2| { // note the `move`
if arg1 {
camera.update(&arg2);
} else {
state.update(&arg2);
}
}) as Box<_>);
(or something like that)
Here the camera and state variables will be owned by the closure and dropped at the same time. More info about just closures can be found in the Rust book.
It's also worth briefly covering the memory management aspect here. In the
example above we're calling forget() which leaks memory and can be a problem if the Rust function is called many times (as it would leak a lot of memory). The fundamental problem here is that there's memory allocated on the WASM heap which the created JS function object references. This allocated memory in theory needs to be deallocated whenever the JS function object is GC'd, but we have no way of knowing when that happens (until WeakRef exists!).
In the meantime we've chosen an alternate strategy. The associated memory is
deallocated whenever the Closure type itself is dropped, providing
deterministic destruction. This, however, makes it difficult to work with as we need to figure out manually when to drop the Closure. If forget doesn't work for your use case, some ideas for dropping the Closure are:
First, if it's a JS closure only invoked once, then you can use Rc/RefCell
to drop the Closure inside the the closure itself (using some interior
mutability shenanigans). We should also eventually
provide native support
for FnOnce in wasm-bindgen as well!
Next, you can return an auxiliary JS object to Rust which has a manual free
method. For example a #[wasm_bindgen]-annotated wrapper. This wrapper would
then need to be manually freed in JS when appropriate.
If you can get by, forget is by far the easiest thing to do for
now, but this is definitely a pain point! We can't wait for WeakRef to exist :)
As far as I understand from documentation, it isn't supposed to export Rust closures, they only might be passed over as parameters to imported JS functions, but all this happens in Rust code.
https://rustwasm.github.io/wasm-bindgen/reference/passing-rust-closures-to-js.html#passing-rust-closures-to-imported-javascript-functions
I made couple of experiments, and when a Rust function returns the mentioned 'Closure' type, compiler throws exception: the trait wasm_bindgen::convert::IntoWasmAbi is not implemented for wasm_bindgen::prelude::Closure<(dyn std::ops::FnMut() -> u32 + 'static)>
In all examples, closures are wrapped into an arbitrary sctuct, but after that you already can't call this on JS side.

Resources