Recommended way to wrap C lib initialization/destruction routine - rust

I am writing a wrapper/FFI for a C library that requires a global initialization call in the main thread as well as one for destruction.
Here is how I am currently handling it:
struct App;
impl App {
fn init() -> Self {
unsafe { ffi::InitializeMyCLib(); }
App
}
}
impl Drop for App {
fn drop(&mut self) {
unsafe { ffi::DestroyMyCLib(); }
}
}
which can be used like:
fn main() {
let _init_ = App::init();
// ...
}
This works fine, but it feels like a hack, tying these calls to the lifetime of an unnecessary struct. Having the destructor in a finally (Java) or at_exit (Ruby) block seems theoretically more appropriate.
Is there some more graceful way to do this in Rust?
EDIT
Would it be possible/safe to use this setup like so (using the lazy_static crate), instead of my second block above:
lazy_static! {
static ref APP: App = App::new();
}
Would this reference be guaranteed to be initialized before any other code and destroyed on exit? Is it bad practice to use lazy_static in a library?
This would also make it easier to facilitate access to the FFI through this one struct, since I wouldn't have to bother passing around the reference to the instantiated struct (called _init_ in my original example).
This would also make it safer in some ways, since I could make the App struct default constructor private.

I know of no way of enforcing that a method be called in the main thread beyond strongly-worded documentation. So, ignoring that requirement... :-)
Generally, I'd use std::sync::Once, which seems basically designed for this case:
A synchronization primitive which can be used to run a one-time global
initialization. Useful for one-time initialization for FFI or related
functionality. This type can only be constructed with the ONCE_INIT
value.
Note that there's no provision for any cleanup; many times you just have to leak whatever the library has done. Usually if a library has a dedicated cleanup path, it has also been structured to store all that initialized data in a type that is then passed into subsequent functions as some kind of context or environment. This would map nicely to Rust types.
Warning
Your current code is not as protective as you hope it is. Since your App is an empty struct, an end-user can construct it without calling your method:
let _init_ = App;
We will use a zero-sized argument to prevent this. See also What's the Rust idiom to define a field pointing to a C opaque pointer? for the proper way to construct opaque types for FFI.
Altogether, I'd use something like this:
use std::sync::Once;
mod ffi {
extern "C" {
pub fn InitializeMyCLib();
pub fn CoolMethod(arg: u8);
}
}
static C_LIB_INITIALIZED: Once = Once::new();
#[derive(Copy, Clone)]
struct TheLibrary(());
impl TheLibrary {
fn new() -> Self {
C_LIB_INITIALIZED.call_once(|| unsafe {
ffi::InitializeMyCLib();
});
TheLibrary(())
}
fn cool_method(&self, arg: u8) {
unsafe { ffi::CoolMethod(arg) }
}
}
fn main() {
let lib = TheLibrary::new();
lib.cool_method(42);
}

I did some digging around to see how other FFI libs handle this situation. Here is what I am currently using (similar to #Shepmaster's answer and based loosely on the initialization routine of curl-rust):
fn initialize() {
static INIT: Once = ONCE_INIT;
INIT.call_once(|| unsafe {
ffi::InitializeMyCLib();
assert_eq!(libc::atexit(cleanup), 0);
});
extern fn cleanup() {
unsafe { ffi::DestroyMyCLib(); }
}
}
I then call this function inside the public constructors for my public structs.

Related

How can I store variables in traits so that it can be used in other methods on trait?

I have several objects in my code that have common functionality. These objects act essentially like services where they have a life-time (controlled by start/stop) and perform work until their life-time ends. I am in the process of trying to refactor my code to reduce duplication but getting stuck.
At a high level, my objects all have the code below implemented in them.
impl SomeObject {
fn start(&self) {
// starts a new thread.
// stores the 'JoinHandle' so thread can be joined later.
}
fn do_work(&self) {
// perform work in the context of the new thread.
}
fn stop(&self) {
// interrupts the work that this object is doing.
// stops the thread.
}
}
Essentially, these objects act like "services" so in order to refactor, my first thought was that I should create a trait called "service" as shown below.
trait Service {
fn start(&self) {}
fn do_work(&self);
fn stop(&self) {}
}
Then, I can just update my objects to each implement the "Service" trait. The issue that I am having though, is that since traits are not allowed to have fields/properties, I am not sure how I can go about saving the 'JoinHandle' in the trait so that I can use it in the other methods.
Is there an idiomatic way to handle this problem in Rust?
tldr; how can I save variables in a trait so that they can be re-used in different trait methods?
Edit:
Here is the solution I settled on. Any feedback is appreciated.
extern crate log;
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Arc;
use std::thread::JoinHandle;
pub struct Context {
name: String,
thread_join_handle: JoinHandle<()>,
is_thread_running: Arc<AtomicBool>,
}
pub trait Service {
fn start(&self, name: String) -> Context {
log::trace!("starting service, name=[{}]", name);
let is_thread_running = Arc::new(AtomicBool::new(true));
let cloned_is_thread_running = is_thread_running.clone();
let thread_join_handle = std::thread::spawn(move || loop {
while cloned_is_thread_running.load(Ordering::SeqCst) {
Self::do_work();
}
});
log::trace!("started service, name=[{}]", name);
return Context {
name: name,
thread_join_handle: thread_join_handle,
is_thread_running: is_thread_running,
};
}
fn stop(context: Context) {
log::trace!("stopping service, name=[{}]", context.name);
context.is_thread_running.store(false, Ordering::SeqCst);
context
.thread_join_handle
.join()
.expect("joining service thread");
log::trace!("stopped service, name=[{}]", context.name);
}
fn do_work();
}
I think you just need to save your JoinHandle in the struct's state itself, then the other methods can access it as well because they all get all the struct's data passed to them already.
struct SomeObject {
join_handle: JoinHandle;
}
impl Service for SomeObject {
fn start(&mut self) {
// starts a new thread.
// stores the 'JoinHandle' so thread can be joined later.
self.join_handle = what_ever_you_wanted_to_set //Store it here like this
}
fn do_work(&self) {
// perform work in the context of the new thread.
}
fn stop(&self) {
// interrupts the work that this object is doing.
// stops the thread.
}
}
Hopefully that works for you.
Depending on how you're using the trait, you could restructure your code and make it so start returns a JoinHandle and the other functions take the join handle as input
trait Service {
fn start(&self) -> JoinHandle;
fn do_work(&self, handle: &mut JoinHandle);
fn stop(&self, handle: JoinHandle);
}
(maybe with different function arguments depending on what you need). this way you could probably cut down on duplicate code by putting all the code that handles the handles (haha) outside of the structs themselves and makes it more generic. If you want the structs the use the JoinHandle outside of this trait, I'd say it's best to just do what Equinox suggested and just make it a field.

How do I wrap a native library with init and exit functions? [duplicate]

I am making an crossplatform terminal library. Because my library changes the state of the terminal, I need to revert all the changes that are made to the terminal when the process ends. I am now implementing this feature and thinking of ways how to restore to the original terminal state at the end.
I thought that a static variable is initialized when the program starts and that when the program ends this static variable will be destroyed. Since my static variable is a struct which has implemented the Drop trait, it would be dropped at the end of the program, but this is not the case because the string "drop called" is never printed:
static mut SOME_STATIC_VARIABLE: SomeStruct = SomeStruct { some_value: None };
struct SomeStruct {
pub some_value: Option<i32>,
}
impl Drop for SomeStruct {
fn drop(&mut self) {
println!("drop called");
}
}
Why is drop() not called when the program ends? Are my thoughts wrong and should I do this another way?
One way to enforce initialization and clean-up code in a library is to introduce a Context type that can only be constructed with a public new() function, and implementing the Drop trait. Every function in the library requiring initialization can take a Context as argument, so the user needs to create one before calling these functions. Any clean-up code can be included in Context::drop().
pub struct Context {
// private field to enforce use of Context::new()
some_value: Option<i32>,
}
impl Context {
pub fn new() -> Context {
// Add initialization code here.
Context { some_value: Some(42) }
}
}
impl Drop for Context {
fn drop(&mut self) {
// Add cleanup code here
println!("Context dropped");
}
}
// The type system will statically enforce that the initialization
// code in Context::new() is called before this function, and the
// cleanup code in drop() when the context goes out of scope.
pub fn some_function(_ctx: &Context, some_arg: i32) {
println!("some_function called with argument {}", some_arg);
}
One of the principles of Rust is no life before main, which implies no life after main.
There are considerable challenges in correctly ordering constructors and destructors before or after main. In C++ the situation is referred to as static initialization order fiasco, and while there are work-arounds for it, its pendant (static destruction order fiasco) has none.
In Rust, the challenge is exacerbated by the 'static lifetime: running a destructor in statics could lead to observing partially destructed other statics. Which is unsafe.
In order to allow safe destruction of statics, the language would need to introduce subsets of 'static lifetimes to order the construction/destruction of statics while having those lifetimes still be 'static from inside main...
How to run code at the start/end of the program?
Simply run code at the start/end of main. Note that any structure built at the beginning of main will be dropped at its end in reverse order of construction.
And if I am not writing main myself?
Ask the writer of main, nicely.

How to safely wrap C pointers in rust structs

I am building safe bindings for a C library in Rust and I started facing a weird issue.
I created a struct to own the unsafe pointer to the objects returned by the library and free them safely.
This is what I have:
pub struct VipsImage {
pub(crate) ctx: *mut bindings::VipsImage,
}
impl Drop for VipsImage {
fn drop(&mut self) {
unsafe {
if !self.ctx.is_null() {
bindings::g_object_unref(self.ctx as *mut c_void);
}
}
}
}
This works fine as long as I don't share this between async calls. If I return one of this objects in an async function and use it afterwards it will be corrupted. If I used and free them in a single operation, they work as expected.
How would I implement Send and Sync for a struct like that so I can safely share it between threads?
If someone wants to check the full library code, here's the link

Is there a way to emulate the Java behaviour of calling a parent class' static method for simple global-ish error handling?

I'm trying to implement a simple interpreter in Rust for a made up programming language called rlox, following Bob Nystrom's book Crafting Interpreters.
I want errors to be able to occur in any child module, and for them to be "reported" in the main module (this is done in the book, with Java, by simply calling a static method on the containing class which prints the offending token and line). However, if an error occurs, it's not like I can just return early with Result::Err (which is, I assume, the idiomatic way to handle errors in Rust) because the interpreter should keep running - continually looking for errors.
Is there an (idiomatic) way for me to emulate the Java behaviour of calling a parent class' static method from a child class in Rust with modules? Should I abandon something like that entirely?
I thought about a strategy where I inject a reference to some ErrorReporter struct as a dependency into the Scanner and Token structs, but that seems unwieldy to me (I don't feel like an error reporter should be part of the struct's signature, am I wrong?):
struct Token {
error_reporter: Rc<ErrorReporter>, // Should I avoid this?
token_type: token::Type,
lexeme: String,
line: u32
}
This is the layout of my project if you need to visualise what I'm talking about with regards to module relationships. Happy to provide some source code if necessary.
rlox [package]
└───src
├───main.rs (uses scanner + token mods, should contain logic for handling errors)
├───lib.rs (just exports scanner and token mods)
├───scanner.rs (uses token mod, declares scanner struct and impl)
└───token.rs (declares token struct and impl)
Literal translation
Importantly, a Java static method has no access to any instance state. That means that it can be replicated in Rust by either a function or an associated function, neither of which have any state. The only difference is in how you call them:
fn example() {}
impl Something {
fn example() {}
}
fn main() {
example();
Something::example();
}
Looking at the source you are copying, it doesn't "just" report the error, it has code like this:
public class Lox {
static boolean hadError = false;
static void error(int line, String message) {
report(line, "", message);
}
private static void report(int line, String where, String message) {
System.err.println(
"[line " + line + "] Error" + where + ": " + message);
hadError = true;
}
}
I'm no JVM expert, but I'm pretty sure that using a static variable like that means that your code is no longer thread safe. You simply can't do that in safe Rust; you can't "accidentally" make memory-unsafe code.
The most literal translation of this that is safe would use associated functions and atomic variables:
use std::sync::atomic::{AtomicBool, Ordering, ATOMIC_BOOL_INIT};
static HAD_ERROR: AtomicBool = ATOMIC_BOOL_INIT;
struct Lox;
impl Lox {
fn error(line: usize, message: &str) {
Lox::report(line, "", message);
}
fn report(line: usize, where_it_was: &str, message: &str) {
eprintln!("[line {}] Error{}: {}", line, where_it_was, message);
HAD_ERROR.store(true, Ordering::SeqCst);
}
}
You can also choose more rich data structures to store in your global state by using lazy_static and a Mutex or RwLock, if you need them.
Idiomatic translation
Although it might be convenient, I don't think such a design is good. Global state is simply terrible. I'd prefer to use dependency injection.
Define an error reporter structure that has the state and methods you need and pass references to the error reporter down to where it needs to be:
struct LoggingErrorSink {
had_error: bool,
}
impl LoggingErrorSink {
fn error(&mut self, line: usize, message: &str) {
self.report(line, "", message);
}
fn report(&mut self, line: usize, where_it_was: &str, message: &str) {
eprintln!("[line {} ] Error {}: {}", line, where_it_was, message);
self.had_error = true;
}
}
fn some_parsing_thing(errors: &mut LoggingErrorSink) {
errors.error(0, "It's broken");
}
In reality, I'd rather define a trait for things that allow reporting errors and implement it for a concrete type. Rust makes this nice because there's zero performance difference when using these generics.
trait ErrorSink {
fn error(&mut self, line: usize, message: &str) {
self.report(line, "", message);
}
fn report(&mut self, line: usize, where_it_was: &str, message: &str);
}
struct LoggingErrorSink {
had_error: bool,
}
impl LoggingErrorSink {
fn report(&mut self, line: usize, where_it_was: &str, message: &str) {
eprintln!("[line {} ] Error {}: {}", line, where_it_was, message);
self.had_error = true;
}
}
fn some_parsing_thing<L>(errors: &mut L)
where
L: ErrorSink,
{
errors.error(0, "It's broken");
}
There's lots of variants of implementing this, all depending on your tradeoffs.
You could choose to have the logger take &self instead of &mut, which would force this case to use something like a Cell to gain internal mutability of had_error.
You could use something like an Rc to avoid adding any extra lifetimes to the calling chain.
You could choose to store the logger as a struct member instead of a function parameter.
For your extra keyboard work, you get the benefit of being able to test your errors. Simply whip up a dummy implementation of the trait that saves information to internal variables and pass it in at test time.
Opinions, ahoy!
a strategy where I inject a reference to some ErrorReporter struct as a dependency into the Scanner
Yes, dependency injection is an amazing solution to a large number of coding issues.
and Token structs
I don't know why a token would need to report errors, but it would make sense for the tokenizer to do so.
but that seems unwieldy to me. I don't feel like an error reporter should be part of the struct's signature, am I wrong?
I'd say yes, you are wrong; you've stated this as an absolute truth, of which very few exist in programming.
Concretely, very few people care about what is inside your type, probably only to be the implementer. The person who constructs a value of your type might care a little because they need to pass in dependencies, but this is a good thing. They now know that this value can generate errors that they need to handle "out-of-band", as opposed to reading some documentation after their program doesn't work.
A few more people care about the actual signature of your type. This is a double-edged blade. In order to have maximal performance, Rust will force you to expose your generic types and lifetimes in your type signatures. Sometimes, this sucks, but either the performance gain is worth it, or you can hide it somehow and take the tiny hit. That's the benefit of a language that gives you choices.
See also
How to synchronize a static variable among threads running different instances of a class in Java?
Where are static methods and static variables stored in Java?
Static fields in a struct in Rust
How can you make a safe static singleton in Rust?
How do I create a global, mutable singleton?
How can I avoid a ripple effect from changing a concrete struct to generic?

Convenient 'Option<Box<Any>>' access when success is assured?

When writing callbacks for generic interfaces, it can be useful for them to define their own local data which they are responsible for creating and accessing.
In C I would just use a void pointer, C-like example:
struct SomeTool {
int type;
void *custom_data;
};
void invoke(SomeTool *tool) {
StructOnlyForThisTool *data = malloc(sizeof(*data));
/* ... fill in the data ... */
tool.custom_data = custom_data;
}
void execute(SomeTool *tool) {
StructOnlyForThisTool *data = tool.custom_data;
if (data.foo_bar) { /* do something */ }
}
When writing something similar in Rust, replacing void * with Option<Box<Any>>, however I'm finding that accessing the data is unreasonably verbose, eg:
struct SomeTool {
type: i32,
custom_data: Option<Box<Any>>,
};
fn invoke(tool: &mut SomeTool) {
let data = StructOnlyForThisTool { /* my custom data */ }
/* ... fill in the data ... */
tool.custom_data = Some(Box::new(custom_data));
}
fn execute(tool: &mut SomeTool) {
let data = tool.custom_data.as_ref().unwrap().downcast_ref::<StructOnlyForThisTool>().unwrap();
if data.foo_bar { /* do something */ }
}
There is one line here which I'd like to be able to write in a more compact way:
tool.custom_data.as_ref().unwrap().downcast_ref::<StructOnlyForThisTool>().unwrap()
tool.custom_data.as_ref().unwrap().downcast_mut::<StructOnlyForThisTool>().unwrap()
While each method makes sense on its own, in practice it's not something I'd want to write throughout a code-base, and not something I'm going to want to type out often or remember easily.
By convention, the uses of unwrap here aren't dangerous because:
While only some tools define custom data, the ones that do always define it.
When the data is set, by convention the tool only ever sets its own data. So there is no chance of having the wrong data.
Any time these conventions aren't followed, its a bug and should panic.
Given these conventions, and assuming accessing custom-data from a tool is something that's done often - what would be a good way to simplify this expression?
Some possible options:
Remove the Option, just use Box<Any> with Box::new(()) representing None so access can be simplified a little.
Use a macro or function to hide verbosity - passing in the Option<Box<Any>>: will work of course, but prefer not - would use as a last resort.
Add a trait to Option<Box<Any>> which exposes a method such as tool.custom_data.unwrap_box::<StructOnlyForThisTool>() with matching unwrap_box_mut.
Update 1): since asking this question a point I didn't include seems relevant.
There may be multiple callback functions like execute which must all be able to access the custom_data. At the time I didn't think this was important to point out.
Update 2): Wrapping this in a function which takes tool isn't practical, since the borrow checker then prevents further access to members of tool until the cast variable goes out of scope, I found the only reliable way to do this was to write a macro.
If the implementation really only has a single method with a name like execute, that is a strong indication to consider using a closure to capture the implementation data. SomeTool can incorporate an arbitrary callable in a type-erased manner using a boxed FnMut, as shown in this answer. execute() then boils down to invoking the closure stored in the struct field implementation closure using (self.impl_)(). For a more general approach, that will also work when you have more methods on the implementation, read on.
An idiomatic and type-safe equivalent of the type+dataptr C pattern is to store the implementation type and pointer to data together as a trait object. The SomeTool struct can contain a single field, a boxed SomeToolImpl trait object, where the trait specifies tool-specific methods such as execute. This has the following characteristics:
You no longer need an explicit type field because the run-time type information is incorporated in the trait object.
Each tool's implementation of the trait methods can access its own data in a type-safe manner without casts or unwraps. This is because the trait object's vtable automatically invokes the correct function for the correct trait implementation, and it is a compile-time error to try to invoke a different one.
The "fat pointer" representation of the trait object has the same performance characteristics as the type+dataptr pair - for example, the size of SomeTool will be two pointers, and accessing the implementation data will still involve a single pointer dereference.
Here is an example implementation:
struct SomeTool {
impl_: Box<SomeToolImpl>,
}
impl SomeTool {
fn execute(&mut self) {
self.impl_.execute();
}
}
trait SomeToolImpl {
fn execute(&mut self);
}
struct SpecificTool1 {
foo_bar: bool
}
impl SpecificTool1 {
pub fn new(foo_bar: bool) -> SomeTool {
let my_data = SpecificTool1 { foo_bar: foo_bar };
SomeTool { impl_: Box::new(my_data) }
}
}
impl SomeToolImpl for SpecificTool1 {
fn execute(&mut self) {
println!("I am {}", self.foo_bar);
}
}
struct SpecificTool2 {
num: u64
}
impl SpecificTool2 {
pub fn new(num: u64) -> SomeTool {
let my_data = SpecificTool2 { num: num };
SomeTool { impl_: Box::new(my_data) }
}
}
impl SomeToolImpl for SpecificTool2 {
fn execute(&mut self) {
println!("I am {}", self.num);
}
}
pub fn main() {
let mut tool1: SomeTool = SpecificTool1::new(true);
let mut tool2: SomeTool = SpecificTool2::new(42);
tool1.execute();
tool2.execute();
}
Note that, in this design, it doesn't make sense to make implementation an Option because we always associate the tool type with the implementation. While it is perfectly valid to have an implementation without data, it must always have a type associated with it.

Resources