How can I create "C Blocks" when using FFI? - rust

I'm working with the CoreFoundation framework on OS X, but I don't know how to map this function in Rust:
void CFRunLoopPerformBlock(CFRunLoopRef fl, CFTypeRef mode, void (^block)(void));
The last parameter is void(^block)(void) — how can I create arguments of this type?

Short, probably helpful answer: there's the block crate, which looks like it might do the job.
Short, unhelpful answer: Insofar as I am aware, Rust doesn't have any support for Apple's block extension. There is no equivalent Rust type, assuming you want to call an API that expects a block.
Longer, marginally less unhelpful answer: From what I can gather from some Clang documentation on the Apple Block ABI, void(^)(void) would be the same size as a regular pointer.
As such, my advice is as follows: treat blocks as opaque, pointer-sized values. To invoke one, write a function in C which calls it for you.
The following is untested (I don't have a Mac), but should at least get you going in the right direction. Also, I'm marking this community wiki so anyone who can test it can fix it if need-be.
In Rust:
// These are the "raw" representations involved. I'm not using std::raw
// because that's not yet stabilised.
#[deriving(Copy, Clone)]
struct AppleBlock(*const ());
#[deriving(Copy, Clone)]
struct RustClosure(*const(), *const());
// Functions that we need to be written in C:
extern "C" {
fn rust_closure_to_block(closure_blob: RustClosure) -> AppleBlock;
fn block_release(block_blob: AppleBlock);
}
// The function that the C code will need. Note that this is *specific* to
// FnMut() closures. If you wanted to generalise this, you could write a
// generic version and pass a pointer to that to `rust_closure_to_block`.
extern "C" fn call_rust_closure(closure_blob: RustClosure) {
let closure_ref: &FnMut() = unsafe { mem::transmute(closure_blob) };
closure_ref();
}
// This is what you call in order to *temporarily* turn a closure into a
// block. So, you'd use it as:
//
// with_closure_as_block(
// || do_stuff(),
// |block| CFRunLoopPerformBlock(fl, mode, block)
// );
fn with_closure_as_block<C, B, R>(closure: C, body: B) -> R
where C: FnMut(), B: FnOnce(block_blob) -> R {
let closure_ref: &FnMut() = &closure;
let closure_blob: RustClosure = unsafe { mem::transmute(closure_ref) };
let block_blob = unsafe { rust_closure_to_block(closure_blob) };
let r = body(block_blob);
unsafe { block_release(block_blob) };
r
}
In C:
typedef struct AppleBlock {
void *ptr;
} AppleBlock;
typedef struct RustClosure {
void *ptr;
void *vt;
} RustClosure;
void call_rust_closure(RustClosure closure_blob);
AppleBlock rust_closure_to_block(RustClosure closure_blob) {
return (AppleBlock)Block_copy(^() {
call_rust_closure(closure_blob);
});
}
// I'm not using Block_release directly because I don't know if or how
// blocks change name mangling or calling. You might be able to just
// use Block_release directly from Rust.
void block_release(AppleBlock block) {
Block_release((void (^)(void))block);
}

Related

Invoke libc::c_void-Pointer as function in Rust with parameters [duplicate]

This question already has an answer here:
How can I call a raw address from Rust?
(1 answer)
Closed 3 years ago.
Hello people of the internet,
I'm struggeling to invoke a function that is stored in a libc::c_void-Pointer. I can't tell Rust that the pointer is invokable and I can't figure out how to.
I want to translate this C++ Code
void * malloc(size_t size) {
static void *(*real_malloc)(size_t) = nullptr;
if (real_malloc == nullptr) {
real_malloc = reinterpret_cast<void *(*)(size_t)> (dlsym(RTLD_NEXT, "malloc"));
}
// do some logging stuff
void * ptr = real_malloc(size);
return ptr;
}
to Rust.
#[no_mangle]
pub extern fn malloc(bytes: usize) {
let c_string = "malloc\0".as_mut_ptr() as *mut i8; // char array for libc
let real_malloc: *mut libc::c_void = libc::dlsym(libc::RTLD_NEXT, c_string);
return real_malloc(bytes);
}
That's my progress so far after 1h of searching on the internet and trying. I'm new to Rust and not yet familiar with Rust/FFI / Rust with libc. I tried a lot with unsafe{}, casts with as but I always stuck at the following problem:
return real_malloc(bytes);
^^^^^^^^^^^^^^^^^^ expected (), found *-ptr
Q1: How can I call the function behind the void-Pointer stored in real_malloc?
Q2: Is my Rust-String to C-String conversion feasible this way?
I figured it out! Perhaps there is a better way but it works.
The trick is to "cast" the void-Pointer to c-function-Type with std::mem::transmute since it won't work with as
type LibCMallocT = fn(usize) -> *mut libc::c_void;
// C-Style string for symbol-name
let c_string = "malloc\0".as_ptr() as *mut i8; // char array for libc
// Void-Pointer to address of symbol
let real_malloc_addr: *mut libc::c_void = unsafe {libc::dlsym(libc::RTLD_NEXT, c_string)};
// transmute: "Reinterprets the bits of a value of one type as another type"
// Transform void-pointer-type to callable C-Function
let real_malloc: LibCMallocT = unsafe { std::mem::transmute(real_malloc_addr) }
When the shared object is build, one can verify that it works like this:
LD_PRELOAD=./target/debug/libmalloc_log_lib.so some-binary
My full code:
extern crate libc;
use std::io::Write;
const MSG: &str = "HELLO WORLD\n";
type LibCMallocT = fn(usize) -> *mut libc::c_void;
#[no_mangle] // then "malloc" is the symbol name so that ELF-Files can find it (if this lib is preloaded)
pub extern fn malloc(bytes: usize) -> *mut libc::c_void {
/// Disable logging aka return immediately the pointer from the real malloc (libc malloc)
static mut RETURN_IMMEDIATELY: bool = false;
// C-Style string for symbol-name
let c_string = "malloc\0".as_ptr() as *mut i8; // char array for libc
// Void-Pointer to address of symbol
let real_malloc_addr: *mut libc::c_void = unsafe {libc::dlsym(libc::RTLD_NEXT, c_string)};
// transmute: "Reinterprets the bits of a value of one type as another type"
// Transform void-pointer-type to callable C-Function
let real_malloc: LibCMallocT = unsafe { std::mem::transmute(real_malloc_addr) };
unsafe {
if !RETURN_IMMEDIATELY {
// let's do logging and other stuff that potentially
// needs malloc() itself
// This Variable prevent infinite loops because 'std::io::stdout().write_all'
// also uses malloc itself
// TODO: Do proper synchronisazion
// (lock whole method? thread_local variable?)
RETURN_IMMEDIATELY = true;
match std::io::stdout().write_all(MSG.as_bytes()) {
_ => ()
};
RETURN_IMMEDIATELY = false
}
}
(real_malloc)(bytes)
}
PS: Thanks to https://stackoverflow.com/a/46134764/2891595 (After I googled a lot more I found the trick with transmute!)

How do I keep internal state in a WebAssembly module written in Rust?

I want to do computations on a large set of data each frame of my web app. Only a subset of this will be used by JavaScript, so instead of sending the entire set of data back and forth between WebAssembly and JavaScript each frame, it would be nice if the data was maintained internally in my WebAssembly module.
In C, something like this works:
#include <emscripten/emscripten.h>
int state = 0;
void EMSCRIPTEN_KEEPALIVE inc() {
state++;
}
int EMSCRIPTEN_KEEPALIVE get() {
return state;
}
Is the same thing possible in Rust? I tried doing it with a static like this:
static mut state: i32 = 0;
pub fn main() {}
#[no_mangle]
pub fn add() {
state += 1;
}
#[no_mangle]
pub fn get() -> i32 {
state
}
But it seems static variables cannot be mutable.
Francis Gagné is absolutely correct that global variables generally make your code worse and you should avoid them.
However, for the specific case of WebAssembly as it is today, we don't have to worry about this concern:
if you have multiple threads
We can thus choose to use mutable static variables, if we have a very good reason to do so:
// Only valid because we are using this in a WebAssembly
// context without threads.
static mut STATE: i32 = 0;
#[no_mangle]
pub extern fn add() {
unsafe { STATE += 1 };
}
#[no_mangle]
pub extern fn get() -> i32 {
unsafe { STATE }
}
We can see the behavior with this NodeJS driver program:
const fs = require('fs-extra');
fs.readFile(__dirname + '/target/wasm32-unknown-unknown/release/state.wasm')
.then(bytes => WebAssembly.instantiate(bytes))
.then(({ module, instance }) => {
const { get, add } = instance.exports;
console.log(get());
add();
add();
console.log(get());
});
0
2
error[E0133]: use of mutable static requires unsafe function or block
In general, accessing mutable global variables is unsafe, which means that you can only do it in an unsafe block. With mutable global variables, it's easy to accidentally create dangling references (think of a reference to an item of a global mutable Vec), data races (if you have multiple threads – Rust doesn't care that you don't actually use threads) or otherwise invoke undefined behavior.
Global variables are usually not the best solution to a problem because it makes your software less flexible and less reusable. Instead, consider passing the state explicitly (by reference, so you don't need to copy it) to the functions that need to operate on it. This lets the calling code work with multiple independent states.
Here's an example of allocating unique state and modifying that:
type State = i32;
#[no_mangle]
pub extern fn new() -> *mut State {
Box::into_raw(Box::new(0))
}
#[no_mangle]
pub extern fn free(state: *mut State) {
unsafe { Box::from_raw(state) };
}
#[no_mangle]
pub extern fn add(state: *mut State) {
unsafe { *state += 1 };
}
#[no_mangle]
pub extern fn get(state: *mut State) -> i32 {
unsafe { *state }
}
const fs = require('fs-extra');
fs.readFile(__dirname + '/target/wasm32-unknown-unknown/release/state.wasm')
.then(bytes => WebAssembly.instantiate(bytes))
.then(({ module, instance }) => {
const { new: newFn, free, get, add } = instance.exports;
const state1 = newFn();
const state2 = newFn();
add(state1);
add(state2);
add(state1);
console.log(get(state1));
console.log(get(state2));
free(state1);
free(state2);
});
2
1
Note — This currently needs to be compiled in release mode to work. Debugging mode has some issues at the moment.
Admittedly, this is not less unsafe because you're passing raw pointers around, but it makes it clearer in the calling code that there is some mutable state being manipulated. Also note that it is now the responsibility of the caller to ensure that the state pointer is being handled correctly.

Convenient 'Option<Box<Any>>' access when success is assured?

When writing callbacks for generic interfaces, it can be useful for them to define their own local data which they are responsible for creating and accessing.
In C I would just use a void pointer, C-like example:
struct SomeTool {
int type;
void *custom_data;
};
void invoke(SomeTool *tool) {
StructOnlyForThisTool *data = malloc(sizeof(*data));
/* ... fill in the data ... */
tool.custom_data = custom_data;
}
void execute(SomeTool *tool) {
StructOnlyForThisTool *data = tool.custom_data;
if (data.foo_bar) { /* do something */ }
}
When writing something similar in Rust, replacing void * with Option<Box<Any>>, however I'm finding that accessing the data is unreasonably verbose, eg:
struct SomeTool {
type: i32,
custom_data: Option<Box<Any>>,
};
fn invoke(tool: &mut SomeTool) {
let data = StructOnlyForThisTool { /* my custom data */ }
/* ... fill in the data ... */
tool.custom_data = Some(Box::new(custom_data));
}
fn execute(tool: &mut SomeTool) {
let data = tool.custom_data.as_ref().unwrap().downcast_ref::<StructOnlyForThisTool>().unwrap();
if data.foo_bar { /* do something */ }
}
There is one line here which I'd like to be able to write in a more compact way:
tool.custom_data.as_ref().unwrap().downcast_ref::<StructOnlyForThisTool>().unwrap()
tool.custom_data.as_ref().unwrap().downcast_mut::<StructOnlyForThisTool>().unwrap()
While each method makes sense on its own, in practice it's not something I'd want to write throughout a code-base, and not something I'm going to want to type out often or remember easily.
By convention, the uses of unwrap here aren't dangerous because:
While only some tools define custom data, the ones that do always define it.
When the data is set, by convention the tool only ever sets its own data. So there is no chance of having the wrong data.
Any time these conventions aren't followed, its a bug and should panic.
Given these conventions, and assuming accessing custom-data from a tool is something that's done often - what would be a good way to simplify this expression?
Some possible options:
Remove the Option, just use Box<Any> with Box::new(()) representing None so access can be simplified a little.
Use a macro or function to hide verbosity - passing in the Option<Box<Any>>: will work of course, but prefer not - would use as a last resort.
Add a trait to Option<Box<Any>> which exposes a method such as tool.custom_data.unwrap_box::<StructOnlyForThisTool>() with matching unwrap_box_mut.
Update 1): since asking this question a point I didn't include seems relevant.
There may be multiple callback functions like execute which must all be able to access the custom_data. At the time I didn't think this was important to point out.
Update 2): Wrapping this in a function which takes tool isn't practical, since the borrow checker then prevents further access to members of tool until the cast variable goes out of scope, I found the only reliable way to do this was to write a macro.
If the implementation really only has a single method with a name like execute, that is a strong indication to consider using a closure to capture the implementation data. SomeTool can incorporate an arbitrary callable in a type-erased manner using a boxed FnMut, as shown in this answer. execute() then boils down to invoking the closure stored in the struct field implementation closure using (self.impl_)(). For a more general approach, that will also work when you have more methods on the implementation, read on.
An idiomatic and type-safe equivalent of the type+dataptr C pattern is to store the implementation type and pointer to data together as a trait object. The SomeTool struct can contain a single field, a boxed SomeToolImpl trait object, where the trait specifies tool-specific methods such as execute. This has the following characteristics:
You no longer need an explicit type field because the run-time type information is incorporated in the trait object.
Each tool's implementation of the trait methods can access its own data in a type-safe manner without casts or unwraps. This is because the trait object's vtable automatically invokes the correct function for the correct trait implementation, and it is a compile-time error to try to invoke a different one.
The "fat pointer" representation of the trait object has the same performance characteristics as the type+dataptr pair - for example, the size of SomeTool will be two pointers, and accessing the implementation data will still involve a single pointer dereference.
Here is an example implementation:
struct SomeTool {
impl_: Box<SomeToolImpl>,
}
impl SomeTool {
fn execute(&mut self) {
self.impl_.execute();
}
}
trait SomeToolImpl {
fn execute(&mut self);
}
struct SpecificTool1 {
foo_bar: bool
}
impl SpecificTool1 {
pub fn new(foo_bar: bool) -> SomeTool {
let my_data = SpecificTool1 { foo_bar: foo_bar };
SomeTool { impl_: Box::new(my_data) }
}
}
impl SomeToolImpl for SpecificTool1 {
fn execute(&mut self) {
println!("I am {}", self.foo_bar);
}
}
struct SpecificTool2 {
num: u64
}
impl SpecificTool2 {
pub fn new(num: u64) -> SomeTool {
let my_data = SpecificTool2 { num: num };
SomeTool { impl_: Box::new(my_data) }
}
}
impl SomeToolImpl for SpecificTool2 {
fn execute(&mut self) {
println!("I am {}", self.num);
}
}
pub fn main() {
let mut tool1: SomeTool = SpecificTool1::new(true);
let mut tool2: SomeTool = SpecificTool2::new(42);
tool1.execute();
tool2.execute();
}
Note that, in this design, it doesn't make sense to make implementation an Option because we always associate the tool type with the implementation. While it is perfectly valid to have an implementation without data, it must always have a type associated with it.

Detecting new struct initialization

I'm coming from mostly OOP languages, so getting this concept to work in Rust kinda seems hard. I want to implement a basic counter that keeps count of how many "instances" I've made of that type, and keep them in a vector for later use.
I've tried many different things, first was making a static vector variable, but that cant be done due to it not allowing static stuff that have destructors.
This was my first try:
struct Entity {
name: String,
}
struct EntityCounter {
count: i64,
}
impl Entity {
pub fn init() {
let counter = EntityCounter { count: 0 };
}
pub fn new(name: String) {
println!("Entity named {} was made.", name);
counter += 1; // counter variable unaccessable (is there a way to make it global to the struct (?..idek))
}
}
fn main() {
Entity::init();
Entity::new("Hello".to_string());
}
Second:
struct Entity {
name: String,
counter: i32,
}
impl Entity {
pub fn new(self) {
println!("Entity named {} was made.", self.name);
self.counter = self.counter + 1;
}
}
fn main() {
Entity::new(Entity { name: "Test".to_string() });
}
None of those work, I was just trying out some concepts on how I could be able to implement such a feature.
Your problems appear to be somewhat more fundamental than what you describe. You're kind of throwing code at the wall to see what sticks, and that's simply not going to get you anywhere. I'd recommend reading the Rust Book completely before continuing. If you don't understand something in it, ask about it. As it stands, you're demonstrating you don't understand variable scoping, return types, how instance construction works, how statics work, and how parameters are passed. That's a really shaky base to try and build any understanding on.
In this particular case, you're asking for something that's deliberately not straightforward. You say you want a counter and a vector of instances. The counter is simple enough, but a vector of instances? Rust doesn't allow easy sharing like other languages, so how you go about doing that depends heavily on what it is you're actually intending to use this for.
What follows is a very rough guess at something that's maybe vaguely similar to what you want.
/*!
Because we need the `lazy_static` crate, you need to add the following to your
`Cargo.toml` file:
```cargo
[dependencies]
lazy_static = "0.2.1"
```
*/
#[macro_use] extern crate lazy_static;
mod entity {
use std::sync::{Arc, Weak, Mutex};
use std::sync::atomic;
pub struct Entity {
pub name: String,
}
impl Entity {
pub fn new(name: String) -> Arc<Self> {
println!("Entity named {} was made.", name);
let ent = Arc::new(Entity {
name: name,
});
bump_counter();
remember_instance(ent.clone());
ent
}
}
/*
The counter is simple enough, though I'm not clear on *why* you even want
it in the first place. You don't appear to be using it for anything...
*/
static COUNTER: atomic::AtomicUsize = atomic::ATOMIC_USIZE_INIT;
fn bump_counter() {
// Add one using the most conservative ordering.
COUNTER.fetch_add(1, atomic::Ordering::SeqCst);
}
pub fn get_counter() -> usize {
COUNTER.load(atomic::Ordering::SeqCst)
}
/*
There are *multiple* ways of doing this part, and you simply haven't given
enough information on what it is you're trying to do. This is, at best,
a *very* rough guess.
`Mutex` lets us safely mutate the vector from any thread, and `Weak`
prevents `INSTANCES` from keeping every instance alive *forever*. I mean,
maybe you *want* that, but you didn't specify.
Note that I haven't written a "cleanup" function here to remove dead weak
references.
*/
lazy_static! {
static ref INSTANCES: Mutex<Vec<Weak<Entity>>> = Mutex::new(vec![]);
}
fn remember_instance(entity: Arc<Entity>) {
// Downgrade to a weak reference. Type constraint is just for clarity.
let entity: Weak<Entity> = Arc::downgrade(&entity);
INSTANCES
// Lock mutex
.lock().expect("INSTANCES mutex was poisoned")
// Push entity
.push(entity);
}
pub fn get_instances() -> Vec<Arc<Entity>> {
/*
This is about as inefficient as I could write this, but again, without
knowing your access patterns, I can't really do any better.
*/
INSTANCES
// Lock mutex
.lock().expect("INSTANCES mutex was poisoned")
// Get a borrowing iterator from the Vec.
.iter()
/*
Convert each `&Weak<Entity>` into a fresh `Arc<Entity>`. If we
couldn't (because the weak ref is dead), just drop that element.
*/
.filter_map(|weak_entity| weak_entity.upgrade())
// Collect into a new `Vec`.
.collect()
}
}
fn main() {
use entity::Entity;
let e0 = Entity::new("Entity 0".to_string());
println!("e0: {}", e0.name);
{
let e1 = Entity::new("Entity 1".to_string());
println!("e1: {}", e1.name);
/*
`e1` is dropped here, which should cause the underlying `Entity` to
stop existing, since there are no other "strong" references to it.
*/
}
let e2 = Entity::new("Entity 2".to_string());
println!("e2: {}", e2.name);
println!("Counter: {}", entity::get_counter());
println!("Instances:");
for ent in entity::get_instances() {
println!("- {}", ent.name);
}
}

Recommended way to wrap C lib initialization/destruction routine

I am writing a wrapper/FFI for a C library that requires a global initialization call in the main thread as well as one for destruction.
Here is how I am currently handling it:
struct App;
impl App {
fn init() -> Self {
unsafe { ffi::InitializeMyCLib(); }
App
}
}
impl Drop for App {
fn drop(&mut self) {
unsafe { ffi::DestroyMyCLib(); }
}
}
which can be used like:
fn main() {
let _init_ = App::init();
// ...
}
This works fine, but it feels like a hack, tying these calls to the lifetime of an unnecessary struct. Having the destructor in a finally (Java) or at_exit (Ruby) block seems theoretically more appropriate.
Is there some more graceful way to do this in Rust?
EDIT
Would it be possible/safe to use this setup like so (using the lazy_static crate), instead of my second block above:
lazy_static! {
static ref APP: App = App::new();
}
Would this reference be guaranteed to be initialized before any other code and destroyed on exit? Is it bad practice to use lazy_static in a library?
This would also make it easier to facilitate access to the FFI through this one struct, since I wouldn't have to bother passing around the reference to the instantiated struct (called _init_ in my original example).
This would also make it safer in some ways, since I could make the App struct default constructor private.
I know of no way of enforcing that a method be called in the main thread beyond strongly-worded documentation. So, ignoring that requirement... :-)
Generally, I'd use std::sync::Once, which seems basically designed for this case:
A synchronization primitive which can be used to run a one-time global
initialization. Useful for one-time initialization for FFI or related
functionality. This type can only be constructed with the ONCE_INIT
value.
Note that there's no provision for any cleanup; many times you just have to leak whatever the library has done. Usually if a library has a dedicated cleanup path, it has also been structured to store all that initialized data in a type that is then passed into subsequent functions as some kind of context or environment. This would map nicely to Rust types.
Warning
Your current code is not as protective as you hope it is. Since your App is an empty struct, an end-user can construct it without calling your method:
let _init_ = App;
We will use a zero-sized argument to prevent this. See also What's the Rust idiom to define a field pointing to a C opaque pointer? for the proper way to construct opaque types for FFI.
Altogether, I'd use something like this:
use std::sync::Once;
mod ffi {
extern "C" {
pub fn InitializeMyCLib();
pub fn CoolMethod(arg: u8);
}
}
static C_LIB_INITIALIZED: Once = Once::new();
#[derive(Copy, Clone)]
struct TheLibrary(());
impl TheLibrary {
fn new() -> Self {
C_LIB_INITIALIZED.call_once(|| unsafe {
ffi::InitializeMyCLib();
});
TheLibrary(())
}
fn cool_method(&self, arg: u8) {
unsafe { ffi::CoolMethod(arg) }
}
}
fn main() {
let lib = TheLibrary::new();
lib.cool_method(42);
}
I did some digging around to see how other FFI libs handle this situation. Here is what I am currently using (similar to #Shepmaster's answer and based loosely on the initialization routine of curl-rust):
fn initialize() {
static INIT: Once = ONCE_INIT;
INIT.call_once(|| unsafe {
ffi::InitializeMyCLib();
assert_eq!(libc::atexit(cleanup), 0);
});
extern fn cleanup() {
unsafe { ffi::DestroyMyCLib(); }
}
}
I then call this function inside the public constructors for my public structs.

Resources