I'm currently playing around with DynamicLibrary.
The code of my dynamic library (compiled with rustc --crate-type dylib dylib.rs):
// dylib.rs
#[no_mangle]
pub fn minicall() -> u8 {
3u8
}
And the code to call it:
// caller.rs
use std::dynamic_lib::DynamicLibrary;
fn main() {
let mut v = Vec::new();
DynamicLibrary::prepend_search_path(&::std::os::getcwd());
match DynamicLibrary::open(Some("./libdylib.so")) {
Err(e) => panic!("ERROR: {}", e),
Ok(lib) => {
println!("Unsafe bloc !");
let func = unsafe {
match lib.symbol::< fn() -> u8 >("minicall") {
Err(e) => { panic!("ERROR: {}", e) },
Ok(f) => { *f },
}
};
println!("call func !");
let new_value = func();
println!("extend vec !");
v.push(new_value);
}
}
println!("v is: {}", v);
}
I have this output :
~> ./caller
Unsafe bloc !
call func !
Illegal instruction
And here I'm quite lost. What am I doing wrong ?
The problem here is how the symbol function works. It has signature:
unsafe fn symbol<T>(&self, symbol: &str) -> Result<*mut T, String>
A loaded library is basically a big array in memory with certain addresses labelled with a name (the symbol names). Querying for a symbol looks up the address and returns a pointer straight to it. A function in a library is a long sequence of instructions, so querying for a function's name returns a (function) pointer directly to the start. This can then be called as a normal function pointer. The Rust DynamicLibrary API is returning this pointer, that is, *mut T points directly to the chunk of memory in the dynamic library (which is supposedly/hopefully of type T).
The type fn(...) -> ... is a function pointer itself, that is, it is 8 bytes (or 4 bytes) storing the address of the start of the function it represents. Hence, calling lib.symbol::< fn() -> u8 >("minicall") is saying "find me the address of the thing called minicall (which is a pointer to a function)", it is not saying "find me the address of the thing called minicall (which is a function)". The return value of *mut (fn() -> u8) is then doubly-indirect, and dereferencing it to call it is interpreting the first 8 (or 4) bytes of the function code as a pointer (i.e. random machine instructions/function prelude), it is not executing them.
(Side-note: it would probably work if you had #[no_mangle] pub static minicall: fn() -> u8 = the_real_minicall; in your library, but you probably don't want this.)
The call to lib.symbol::<T>("minicall") is returning the exact function pointer we want (that is, it is returning a pointer to the start of the code of minicall), so it just becomes a question of expressing this to the compiler. Unfortunately, there is currently no type T that makes *mut T a function pointer, so one must first set T = u8 (i.e. lib.symbol::<u8>("minicall")) and then cast the return value to the appropriate function pointer type via transmute::<_, fn() -> u8>(pointer).
(I'm answering this even after the other answer was accepted because I don't think it explained the cause very well, just gave the solution.)
Last thing, this isn't a problem in this case, but it trips people a lot: the Rust ABI (the calling convention used for functions of type fn(...) -> ...) is not the same as the C ABI, so functions loaded from C dynamic libraries should be given type extern "C" fn(...) -> ..., not fn(...) -> ....
I think the problem stems from the fact that you are casting between incompatible types. Specifically, the dereference *f is going to point to the wrong place. I looked in the Rust code to see how the library is supposed to be used and found an example in src/librustc/plugin/load.rs. I adapted that code to your example:
let func = unsafe {
// Let this return a `*mut u8`, a very generic pointer
match lib.symbol("minicall") {
Err(e) => { fail!("ERROR: {}", e) },
// And then cast that pointer a function
Ok(f) => { std::mem::transmute::<*mut u8, fn() -> u8>(f) },
}
};
println!("call func !");
let new_value = func();
The output:
$ ./caller
Unsafe bloc !
call func !
extend vec !
v is: [3]
Since this question/answer, the std::dynamic_lib API seems to have gone away. As of this writing it looks like libloading is the most popular way of dynamically loading libraries on crates.io.
Related
Consider the following code:
enum Deferred<F, T> {
Fn(F),
Ready(T),
}
impl<F, T> Deferred<F, T>
where
F: FnOnce() -> T,
T: Clone,
{
fn get_clone(&mut self) -> T {
let clone;
*self = match *self {
Self::Fn(f) => { // `f` moved here
let data = f(); // this needs to consume `f`
clone = data.clone();
Self::Ready(data)
},
Self::Ready(data) => { // `data` moved here
clone = data.clone();
Self::Ready(data)
},
};
clone
}
}
The compiler complains about the *self = match *self {...}; statement because its right hand side takes ownership of contents of self. Is there a way to accomplish this behaviour with just a mutable reference to self?
I found a workaround using F: FnMut() -> T so that f doesn't have to be moved but this approach clearly has its limitations.
I also tried what the answer to Is there a safe way to temporarily retrieve an owned value from a mutable reference in Rust? suggested but it led to an issue with initialization of clone (the compiler could no longer reason that the match statement would initialize clone because the code was moved into a closure) so I had to use MaybeUninit with unsafe.
At that point it was better to read/write self through a raw pointer:
unsafe {
std::ptr::write(
self as *mut Self,
match std::ptr::read(self as *const Self) {
Self::Fn(f) => {
let data = f();
clone = data.clone();
Self::Ready(data)
},
Self::Ready(data) {
clone = data.clone();
Self::Ready(data)
},
}
);
}
It is not possible to do this in a straightforward safe fashion, because while f is being executed there is no possible valid value of Deferred as defined: you don't yet have a T to go in Deferred::Ready, and you're consuming the F so you can't have Deferred::Fn.
If you use the take_mut crate or similar, that accomplishes this by replacing panicking with aborting, so the invalid state can never be observed. But, in your case, I would recommend introducing a third state to the enum instead — this changes the semantics but in a way that, shall we say, respects the fact that f can fail.
enum Deferred<F, T> {
Fn(F),
Ready(T),
Failed,
}
impl<F, T> Deferred<F, T>
where
F: FnOnce() -> T,
T: Clone,
{
fn get_clone(&mut self) -> T {
match std::mem::replace(self, Self::Failed) {
Self::Ready(data) => {
let clone = data.clone();
*self = Self::Ready(data);
clone
},
Self::Fn(f) => {
let data = f();
*self = Self::Ready(data.clone());
data
}
Self::Failed => {
panic!("A previous call failed");
}
}
}
}
With this solution, the very first thing we do is swap out *self for Self::Failed, so we own the Deferred value and can freely move out the non-clonable F value. Notice that the expression being matched is not a borrow of self, so we aren't blocked from further modifying *self.
This particular solution does have a disadvantage: it's unnecessarily writing to *self on every call (which is unobservable, but could reduce performance). We can fix that, by separating the decision of what to do from doing it, but this requires writing a second pattern match to extract the value:
impl<F, T> Deferred<F, T>
where
F: FnOnce() -> T,
T: Clone,
{
fn get_clone(&mut self) -> T {
match self {
Self::Ready(data) => {
return data.clone();
},
Self::Failed => {
panic!("A previous call failed");
}
Self::Fn(_) => {
// Fall through below, relinquishing the borrow of self.
}
}
match std::mem::replace(self, Self::Failed) {
Self::Fn(f) => {
let data = f();
*self = Self::Ready(data.clone());
data
}
_ => unreachable!()
}
}
}
The fundamental problem here is trying to take ownership of the value in the enum while simultaneously having a live reference (via &mut self) to it. Safe Rust code will not let you do that, which is the reason for the compiler error. This is also a problem with your unsafe-solution: What if any code inside one of the match-arms panics? Then the owned value created via ptr::read gets dropped, the stack unwinds, and the caller might catch that panic and is then perfectly capable of observing the Deferred it owns (of which it gave a &mut to get_clone()) after it has been dropped, that is, in an invalid state and therefor causing Undefined Behaviour (see the docs for ptr::read for an example).
What must be achieved therefore is to hold &mut self in a valid/defined state while taking ownership. You can do that by having a cheap or possibly even free variant on Deferred, which is put into &mut self while the new value is being constructed.
For instance:
#[derive(Debug)]
enum FooBar {
Foo(String),
Bar(&'static str),
}
impl FooBar {
fn switch(&mut self) {
// notice here
*self = match std::mem::replace(self, FooBar::Bar("temporary")) {
FooBar::Foo(_) => FooBar::Bar("default"),
FooBar::Bar(s) => FooBar::Foo(s.to_owned()),
}
}
}
fn main() {
let mut s = FooBar::Bar("hello");
s.switch();
println!("{:?}", s);
}
Here, we use std::mem::replace to switch the value behind &mut self with a cheaply constructed temporary value; replace() returns ownership of the original value, which we match on to construct a new value; the value returned by the match-expression is then put into place via the *self-assignment. If everything goes well, the FooBar::Bar("temporary") is not ever observed; but it could be observed if the match panicked; but even then, the code is at least safe. In case your match can't unwind at all, the compiler might even be able to eliminate that useless store entirely.
Coming back to your original code, I don't see how to construct a Deferred without T being Default, as you can neither construct the Fn nor the Ready case in a safe way. Either that can be added via an additional bound, or you can add a third variant Empty to your enum.
As I see it, the difference between your problem and the one that you referenced is that you also want to return the inner T while replacing self. Which is something that the there proposed crate take_mut does not provide. However, I noticed that it seems a bit unmaintained. So, I had a short look around, and found the replace_with crate instead. It basically does the same as take_mut, but it also has a replace_with_or_abort_and_return function -- exactly what you need:
use replace_with::replace_with_or_abort_and_return;
impl<F, T> Deferred<F, T>
where
F: FnOnce() -> T,
T: Clone,
{
fn get_clone(&mut self) -> T {
replace_with_or_abort_and_return(self, |s| match s {
Self::Fn(f) => {
// `f` moved here
let data = f(); // this needs to consume `f`
let clone = data.clone();
(clone, Self::Ready(data))
}
Self::Ready(data) => {
// `data` moved here
let clone = data.clone();
(clone, Self::Ready(data))
}
})
}
}
Tho, notice, that taking a value out of a mutable borrow, has the potential of UB that is specifically if your f() panics, because self would be left in an uninitialized state. Therefore, we get this slightly cumbersome function name, which indicates that if f() panic, the program will be aborted (which is the safe thing to do). However, if you expect f() to panic and don't want to abort the program, you can instead provide a default for self via the replace_with_and_return or replace_with_or_default_and_return function.
I wrote some Rust code that provides a FFI for some C code, which I recently discovered a bug in. Turns out unsafe is hard and error prone — who knew! I think I've fixed the bug but I am curious to understand the issue more.
One function took a Vec, called into_boxed_slice on it and returned the pointer (via as_mut_ptr) and length to the caller. It called mem:forget on the Box before returning.
The corresponding "free" function only accepted the pointer and called Box::from_raw with it. Now this is wrong, but the amazing thing about undefined behaviour is that it can work most of the time. And this did. Except if the source Vec was empty when it would segfault. Also of note, MIRI correctly identifies the issue: "Undefined Behavior: inbounds test failed: 0x4 is not a valid pointer".
Anyway the fix was to take the length in the free function as well, reconstitute the slice, then Box::from_raw that. E.g. Box::from_raw(slice::from_raw_parts_mut(p, len))
I've tried to capture all of this in this playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=7fe80cb9f0c5c1eee4ac821e58787f17
Here's the playground code for reference:
use std::slice;
fn main() {
// This one does not crash
demo(vec![1]);
// These do not crash
hopefully_correct(vec![2]);
hopefully_correct(vec![]);
// This one seg faults
demo(vec![]);
}
// MIRI complains about UB in this one (in Box::from_raw)
fn demo(v: Vec<i32>) {
let mut s: Box<[i32]> = dbg!(v.into_boxed_slice());
let p: *mut i32 = dbg!(s.as_mut_ptr());
assert!(!p.is_null());
std::mem::forget(s);
// Pretend the pointer is returned to an FFI interface here
// Imagine this is the free function counterpart to the imaginary FFI.
unsafe { Box::from_raw(p) };
}
// MIRI does not complain about this one
fn hopefully_correct(v: Vec<i32>) {
let mut s: Box<[i32]> = dbg!(v.into_boxed_slice());
let p: *mut i32 = dbg!(s.as_mut_ptr());
let len = s.len();
assert!(!p.is_null());
std::mem::forget(s);
// Pretend the pointer is returned to an FFI interface here
// Imagine this is the free function counterpart to the imaginary FFI.
unsafe { Box::from_raw(slice::from_raw_parts_mut(p, len)) };
}
I've looked through the Box source and done a bunch of searching but it's unclear to me how rebuilding the slice helps. It would seem that the pointers are the same but there is some empty optimisation handled properly in the fixed example somewhere, possibly as part of Unique?
Can anyone explain what's going on here?
I found these three links useful but not enough to answer my query:
How to expose a Rust Vec<T> to FFI?
How to pass a boxed slice (Box<[T]>) to a C function?
Box<[T]>::into_raw is useless
That's because when you deconstruct your empty vector, you get a null pointer and a zero length.
When you call Box::from_raw (null), you break one of the box invariants: "Box<T> values will always be fully aligned, non-null pointers". Then when Rust drops the box, it attempts to deallocate the null pointer.
OTOH when you call slice::from_raw_parts, Rust allocates a new fat pointer that contains the null pointer and the zero length, then Box::from_raw stores a reference to this fat pointer in the Box. When dropping the box, Rust first drops the slice (which knows that a length of zero means a null data that doesn't need to be freed), then frees the memory for the fat pointer.
Note also that in the non-working case you reconstruct a Box<i32>, whereas in the working case you reconstruct a Box<[i32]>, as shown if you try to compile the following code:
use std::slice;
fn demo(v: Vec<i32>) {
let mut s: Box<[i32]> = dbg!(v.into_boxed_slice());
let p: *mut i32 = dbg!(s.as_mut_ptr());
assert!(!p.is_null());
std::mem::forget(s);
// Pretend the pointer is returned to an FFI interface here
// Imagine this is the free function counterpart to the imaginary FFI.
let _b: () = unsafe { Box::from_raw(p) };
}
// MIRI does not complain about this one
fn hopefully_correct(v: Vec<i32>) {
let mut s: Box<[i32]> = dbg!(v.into_boxed_slice());
let p: *mut i32 = dbg!(s.as_mut_ptr());
let len = s.len();
assert!(!p.is_null());
std::mem::forget(s);
// Pretend the pointer is returned to an FFI interface here
// Imagine this is the free function counterpart to the imaginary FFI.
let _b: () = unsafe { Box::from_raw(slice::from_raw_parts_mut(p, len)) };
}
Playground
I'm wrapping a C API which allows the caller to set/get an arbitrary pointer via function calls. In this way, the C API allows a caller to associate arbitrary data with one of the C API objects. This data is not used in any callbacks, it's just a pointer that a user can stash away and get at later.
My wrapper struct implements the Drop trait for the C object that contains this pointer. What I'd like to be able to do, but am not sure it's possible, is have the data dropped correctly if the pointer is not null when the wrapper struct drops. I'm not sure how I would recover the correct type though from a raw c_void pointer.
Two alternatives I'm thinking of are
Implement the behavior of these two calls in the wrapper. Don't make any calls to the C API.
Don't attempt to offer any kind of safer interface to these functions. Document that the pointer must be managed by the caller of the wrapper.
Is what I want to do possible? If not, is there a generally accepted practice for these kinds of situations?
A naive + fully automatic approach is NOT possible for the following reasons:
freeing memory does not call drop/deconstructors/...: the C API can be used from languages which can have objects which should be deconstructed properly, e.g. C++ or Rust itself. So when you only store a memory pointer you do not know you to call the proper function (you neither know which function not how the calling conventions look like).
which memory allocator?: memory allocation and deallocation isn't a trivial thing. your program needs to request memory from the OS and then manage this resources in an intelligent way to be efficient and correct. This is usually done by a library. In case of Rust, jemalloc is used (but can be changed). So even when you ask the API caller to only pass Plain Old Data (which should be easier to destruct) you still don't know which library function to call to deallocate memory. Just using libc::free won't work (it can but it could horrible fail).
Solutions:
dealloc callback: you can ask the API user to set an additional pointer to, let's say a void destruct(void* ptr) function. If this one is not NULL, you call that function during your drop. You could also use int as an return type to signal when the destruction went wrong. In that case you could for example panic!.
global callback: let's assume you requested your user to only pass POD (plain old data). To know which free function of the memory allocator to call, you could request the user to register a global void (*free)(void* ptr) pointer which is called during drop. You could also make that one optional.
Although I was able to follow the advice in this thread, I wasn't entirely satisfied with my results, so I asked the question on the Rust forums and found the answer I was really looking for. (play)
use std::any::Any;
static mut foreign_ptr: *mut () = 0 as *mut ();
unsafe fn api_set_fp(ptr: *mut ()) {
foreign_ptr = ptr;
}
unsafe fn api_get_fp() -> *mut() {
foreign_ptr
}
struct ApiWrapper {}
impl ApiWrapper {
fn set_foreign<T: Any>(&mut self, value: Box<T>) {
self.free_foreign();
unsafe {
let raw = Box::into_raw(Box::new(value as Box<Any>));
api_set_fp(raw as *mut ());
}
}
fn get_foreign_ref<T: Any>(&self) -> Option<&T> {
unsafe {
let raw = api_get_fp() as *const Box<Any>;
if !raw.is_null() {
let b: &Box<Any> = &*raw;
b.downcast_ref()
} else {
None
}
}
}
fn get_foreign_mut<T: Any>(&mut self) -> Option<&mut T> {
unsafe {
let raw = api_get_fp() as *mut Box<Any>;
if !raw.is_null() {
let b: &mut Box<Any> = &mut *raw;
b.downcast_mut()
} else {
None
}
}
}
fn free_foreign(&mut self) {
unsafe {
let raw = api_get_fp() as *mut Box<Any>;
if !raw.is_null() {
Box::from_raw(raw);
}
}
}
}
impl Drop for ApiWrapper {
fn drop(&mut self) {
self.free_foreign();
}
}
struct MyData {
i: i32,
}
impl Drop for MyData {
fn drop(&mut self) {
println!("Dropping MyData with value {}", self.i);
}
}
fn main() {
let p1 = Box::new(MyData {i: 1});
let mut api = ApiWrapper{};
api.set_foreign(p1);
{
let p2 = api.get_foreign_ref::<MyData>().unwrap();
println!("i is {}", p2.i);
}
api.set_foreign(Box::new("Hello!"));
{
let p3 = api.get_foreign_ref::<&'static str>().unwrap();
println!("payload is {}", p3);
}
}
I'm trying to select a function to call depending on a condition. I want to store that function in a variable so that I can call it again later without carrying the condition around. Here's a working minimal example:
fn foo() {
println! ("Foo");
}
fn bar() {
println! ("Bar");
}
fn main() {
let selector = 0;
let foo: &Fn() = &foo;
let bar: &Fn() = &bar;
let test = match selector {
0 => foo,
_ => bar
};
test();
}
My question is: is it possible to get rid of the intermediate variables? I've tried simply removing them:
fn foo() {
println! ("Foo");
}
fn bar() {
println! ("Bar");
}
fn main() {
let selector = 0;
let test = match selector {
0 => &foo as &Fn(),
_ => &bar as &Fn()
};
test();
}
but then the borrow checker complains that the borrowed values are only valid until the end of the match (btw, why? the functions are 'static anyway so should be valid to the end of times). I've also tried making the 'static lifetime explicit by using &foo as &'static Fn() but that doesn't work either.
The following works, if you only need to work with static functions and not closures:
fn foo() {
println!("Foo");
}
fn bar() {
println!("Bar");
}
fn main() {
let selector = 0;
let test: fn() = match selector {
0 => foo,
_ => bar
};
test();
}
(try on playground)
Here I've used function type instead of function trait.
The reason that the borrowed trait object doesn't work is probably the following. Any trait object is a fat pointer which consists of a pointer to some value and a pointer to a virtual table. When the trait object is created out of a closure, everything is clear - the value would be represented by the closure itself (internally being an instance of a structure containing all captured variables) and the virtual table would contain a pointer to the implementation of the corresponding Fn*() trait generated by the compiler whose body would be the closure body.
With functions, however, things are not so clear. There are no value to create a trait object from because the function itself should correspond to the implementation of Fn() trait. Therefore, rustc probably generates an empty structure and implements Fn() for it, and this implementation calls the static function directly (not actual Rust, but something close):
struct SomeGeneratedStructFoo;
impl Fn<()> for SomeGeneratedStructFoo {
type Output = ();
fn call(&self, args: ()) -> () {
foo();
}
}
Therefore, when a trait object is created out of fn foo(), a reference is taken in fact to a temporary value of type SomeGeneratedStructFoo. However, this value is created inside the match, and only a reference to it is returned from the match, thus this value does not live long enough, and that's what the error is about.
fn() is a function pointer type. It's already a pointer type. You can check this with std::mem::size_of::<fn()>(). It is not a zero-sized type.
When you do &foo, you take a pointer to a stack allocated pointer. This inner pointer does not survive very long, causing the error.
You can cast these to the generic fn() type as suggested. I would be interested in knowing why you can't cast fn() to &Fn(), though.
Rust slices do not currently support some iterator methods, i.e. take_while. What is the best way to implement take_while for slices?
const STRHELLO:&'static[u8] = b"HHHello";
fn main() {
let subslice:&[u8] = STRHELLO.iter().take_while(|c|(**c=='H' as u8)).collect();
println!("Expecting: {}, Got {}",STRHELLO.slice_to(3),subslice);
assert!(subslice==STRHELLO.slice_to(3));
}
results in the error:
<anon>:6:74: 6:83 error: the trait `core::iter::FromIterator<&u8>` is not implemented for the type `&[u8]`
This code in the playpen:
http://is.gd/1xkcUa
First of all, the issue you have is that collect is about creating a new collection, while a slice is about referencing a contiguous range of items in an existing array (be it dynamically allocated or not).
I am afraid that due to the nature of traits, the fact that the original container (STRHELLO) was a contiguous range has been lost, and cannot be reconstructed after the fact. I am also afraid that any use of "generic" iterators simply cannot lead to the desired output; the type system would have to somehow carry the fact that:
the original container was a contiguous range
the chain of operations performed so far conserve this property
This may be doable or not, but I do not see it done now, and I am unsure in what way it could be elegantly implemented.
On the other hand, you can go about it in the do-it-yourself way:
fn take_while<'a>(initial: &'a [u8], predicate: |&u8| -> bool) -> &'a [u8] { // '
let mut i = 0u;
for c in initial.iter() {
if predicate(c) { i += 1; } else { break; }
}
initial.slice_to(i)
}
And then:
fn main() {
let subslice: &[u8] = take_while(STRHELLO, |c|(*c==b'H'));
println!("Expecting: {}, Got {}",STRHELLO.slice_to(3), subslice);
assert!(subslice == STRHELLO.slice_to(3));
}
Note: 'H' as u8 can be rewritten as b'H' as show here, which is symmetric with the strings.
It is possible via some heavy gymnastics to implement this functionality using the stock iterators:
use std::raw::Slice;
use std::mem::transmute;
/// Splice together to slices of the same type that are contiguous in memory.
/// Panics if the slices aren't contiguous with "a" coming first.
/// i.e. slice b must follow slice a immediately in memory.
fn splice<'a>(a:&'a[u8], b:&'a[u8]) -> &'a[u8] {
unsafe {
let aa:Slice<u8> = transmute(a);
let bb:Slice<u8> = transmute(b);
let pa = aa.data as *const u8;
let pb = bb.data as *const u8;
let off = aa.len as int; // Risks overflow into negative!!!
assert!(pa.offset(off) == pb, "Slices were not contiguous!");
let cc = Slice{data:aa.data,len:aa.len+bb.len};
transmute(cc)
}
}
/// Wrapper around splice that lets you use None as a base case for fold
/// Will panic if the slices cannot be spliced! See splice.
fn splice_for_fold<'a>(oa:Option<&'a[u8]>, b:&'a[u8]) -> Option<&'a[u8]> {
match oa {
Some(a) => Some(splice(a,b)),
None => Some(b),
}
}
/// Implementaton using pure iterators
fn take_while<'a>(initial: &'a [u8],
predicate: |&u8| -> bool) -> Option<&'a [u8]> {
initial
.chunks(1)
.take_while(|x|(predicate(&x[0])))
.fold(None, splice_for_fold)
}
usage:
const STRHELLO:&'static[u8] = b"HHHello";
let subslice: &[u8] = super::take_while(STRHELLO, |c|(*c==b'H')).unwrap();
println!("Expecting: {}, Got {}",STRHELLO.slice_to(3), subslice);
assert!(subslice == STRHELLO.slice_to(3));
Matthieu's implementation is way cleaner if you just need take_while. I am posting this anyway since it may be a path towards solving the more general problem of using iterator functions on slices cleanly.