Initializing "external" functions before they are created - rust

Basically I have a function called gen_func() which takes in a &mut extern fn(i64) -> i64 then generates some code and places it in the reference.
The problem arises when I try to create an input to this function.
I don't know what the code to get such a function allocated would look like.
let func = [what goes here?];
gen_func(&mut func);
Edit:
As people in the comments pointed out this question is a lot more nuanced than I thought so here is some further information:
I have complete control over the gen_func() function. It uses a code generation library (gnu lightning) to generate a function.
My goal is to have a function which can generate another function which can then be executed.

It is undefined behavior to create a reference to a value that doesn't exist, so the only way you can accomplish this without undefined behavior given your current constraints is to initialize func to some other extern fn(i64) -> i64 that already exists.
If you are able to change the signature of gen_func you could have it return the function pointer instead:
fn gen_func() -> extern fn(i64) -> i64 { todo!(); }
I suspect that get_func is itself an extern function implemented in C, which accepts an "out" pointer-to-function-pointer. In that case, you can get around the issue by changing the Rust-side signature to either a raw pointer (*mut extern fn(i64) -> i64) or a safer type like MaybeUninit.
For example:
use std::mem::MaybeUninit;
extern "C" {
fn gen_func(fp: &mut MaybeUninit<extern fn(i64) -> i64>);
}
Now we can implement a safe interface without UB. We create an uninitialized MaybeUninit and give a reference to gen_func, which populates the pointer. Then we can assume it's initialized.
fn gen_func_wrapper() -> extern fn(i64) -> i64 {
let mut func = MaybeUninit::uninit();
unsafe {
gen_func(&mut func);
func.assume_init()
}
}
Note that MaybeUninit<T> and T are guaranteed to have the same layout, so the non-Rust side would not see any difference here.

Related

Why are string constant pointers different across crates in Rust?

While working with a HashMap that uses &'static str as the key type, I created a newtype to hash by the pointer rather than by the string contents to reduce overhead.
pub struct StaticStr(&'static str);
impl Hash for StaticStr {
fn hash<H: Hasher>(&self, state: &mut H) {
self.0.as_ptr().hash(state)
}
}
impl PartialEq for StaticStr {
fn eq(&self, other: &Self) -> bool {
self.0.as_ptr() == other.0.as_ptr()
}
}
impl Eq for StaticStr {}
It turns out that this does not work consistently, as in the following example.
pub type MyMap = HashMap<StaticStr, u8>;
pub const A: &str = "A";
pub fn make_map() -> MyMap {
let mut map = MyMap::new();
map.insert(StaticStr(A), 1);
map
}
pub fn get_value(control: &MyMap) -> Option<u8> {
control.get(&StaticStr(A)).cloned()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
pub fn map_made_in_lib() {
let map = make_map();
assert_eq!(get_value(&map), Some(1));
}
#[test]
pub fn map_made_in_test() {
// Same as make_map()
let mut map = MyMap::new();
map.insert(StaticStr(A), 1);
// This check fails
assert_eq!(get_value(&map), Some(1));
}
}
Notice that in the first test, the string constant A is only used directly in the lib crate. In the second test, A is used directly in both the lib crate and the test crate. I discovered that although both tests use the same string constant, the pointers are different depending on which crate refers to the string constant by name. This is demonstrated in the minimal reproduction I created. I would have expected that the string literal be included only once for the crate that defines it, or at least that the linker would be smart enough to deduplicate the string literals. Is there a reason for this behavior?
Instead of a const try a static?
A constant item is an optionally named constant value which is not
associated with a specific memory location in the program. Constants
are essentially inlined wherever they are used, meaning that they are
copied directly into the relevant context when used. This includes
usage of constants from external crates, and non-Copy types.
References to the same constant are not necessarily guaranteed to
refer to the same memory address. -- The Rust Reference
A static item is similar to a constant, except that it represents a
precise memory location in the program. All references to the static
refer to the same memory location. Static items have the static
lifetime, which outlives all other lifetimes in a Rust program. Static
items do not call drop at the end of the program. -- The Rust Reference

Ensuring value lives for its entire scope

I have a type (specifically CFData from core-foundation), whose memory is managed by C APIs and that I need to pass and receive from C functions as a *c_void. For simplicity, consider the following struct:
struct Data {
ptr: *mut ffi::c_void,
}
impl Data {
pub fn new() -> Self {
// allocate memory via an unsafe C-API
Self {
ptr: std::ptr::null(), // Just so this compiles.
}
}
pub fn to_raw(&self) -> *const ffi::c_void {
self.ptr
}
}
impl Drop for Data {
fn drop(&mut self) {
unsafe {
// Free memory via a C-API
}
}
}
Its interface is safe, including to_raw(), since it only returns a raw pointer. It doesn't dereference it. And the caller doesn't dereference it. It's just used in a callback.
pub extern "C" fn called_from_C_ok(on_complete: extern "C" fn(*const ffi::c_void)) {
let data = Data::new();
// Do things to compute Data.
// This is actually async code, which is why there's a completion handler.
on_complete(data.to_raw()); // data survives through the function call
}
This is fine. Data is safe to manipulate, and (I strongly believe) Rust promises that data will live until the end of the on_complete function call.
On the other hand, this is not ok:
pub extern "C" fn called_from_C_broken(on_complete: extern "C" fn(*const ffi::c_void)) {
let data = Data::new();
// ...
let ptr = data.to_raw(); // data can be dropped at this point, so ptr is dangling.
on_complete(ptr); // This may crash when the receiver dereferences it.
}
In my code, I made this mistake and it started crashing. It's easy to see why and it's easy to fix. But it's subtle, and it's easy for a future developer (me) to modify the ok version into the broken version without realizing the problem (and it may not always crash).
What I'd like to do is to ensure data lives as long as ptr. In Swift, I'd do this with:
withExtendedLifetime(&data) { data in
// ...data cannot be dropped until this closure ends...
}
Is there a similar construct in Rust that explicitly marks the minimum lifetime for a variable to a scope (that the optimizer may not reorder), even if it's not directly accessed? (I'm sure it's trivial to build a custom with_extended_lifetime in Rust, but I'm looking for a more standard solution so that it will be obvious to other developers what's going on).
Playground
I do believe the following "works" but I'm not sure how flexible it is, or if it's just replacing a more standard solution:
fn with_extended_lifetime<T, U, F>(value: &T, f: F) -> U
where
F: Fn(&T) -> U,
{
f(value)
}
with_extended_lifetime(&data, |data| {
let ptr = data.to_raw();
on_complete(ptr)
});
The optimizer is not allowed to change when a value is dropped. If you assign a value to a variable (and that value is not then moved elsewhere or overwritten by assignment), it will always be dropped at the end of the block, not earlier.
You say that this code is incorrect:
pub extern "C" fn called_from_C_broken(on_complete: extern "C" fn(*const ffi::c_void)) {
let data = Data::new();
// ...
let ptr = data.to_raw(); // data can be dropped at this point, so ptr is dangling.
on_complete(ptr); // This may crash when the receiver dereferences it.
}
but in fact data may not be dropped at that point, and this code is sound. What you may be confusing this with is the mistake of not assigning the value to a variable:
let ptr = Data::new().to_raw();
on_complete(ptr);
In this case, the pointer is dangling, because the result of Data::new() is stored in a temporary variable within the statement, which is dropped at the end of the statement, not a local variable, which is dropped at the end of the block.
If you want to adopt a programming style which makes explicit when values are dropped, the usual pattern is to use the standard drop() function to mark the exact time of drop:
let data = Data::new();
...
on_complete(data.to_raw());
drop(data); // We have stopped using the raw pointer now
(Note that drop() is not magic: it is simply a function which takes one argument and does nothing with it other than dropping. Its presence in the standard library is to give a name to this pattern, not to provide special functionality.)
However, if you want to, there isn't anything wrong with using your with_extended_lifetime (other than nonstandard style and arguably a misleading name) if you want to make the code structure even more strongly indicate the scope of the value. One nitpick: the function parameter should be FnOnce, not Fn, for maximum generality (this allows it to be passed functions that can't be called more than once).
Other than explicitly dropping as the other answer mentions, there is another way to help prevent these types of accidental drops: use a wrapper around a raw pointer that has lifetime information.
use std::marker::PhantomData;
#[repr(transparent)]
struct PtrWithLifetime<'a>{
ptr: *mut ffi::c_void,
_data: PhantomData<&'a ffi::c_void>,
}
impl Data {
fn to_raw(&self) -> PtrWithLife<'_>{
PtrWithLifetime{
ptr: self.ptr,
_data: PhantomData,
}
}
}
The #[repr(transparent)] guarantees that PtrWithLife is stored in memory the same as *const ffi::c_void is, so you can adjust the declaration of on_complete to
fn called_from_c(on_complete: extern "C" fn(PtrWithLifetime<'_>)){
//stuff
}
without causing any major inconvenience to any downstream users, especially since the ffi bindings can be adjusted in a similar fashion.

How do I obtain the address of a function?

How do I obtain a function address in Rust? What does '&somefunction' exactly mean?
What addresses do I get doing
std::mem::transmute::<_, u32>(function)
or
std::mem::transmute::<_, u32>(&function)
(on 32-bit system, of course)?
What does
&function as *const _ as *const c_void
give?
If I just wanted to know the address of a function, I'd probably just print it out:
fn moo() {}
fn main() {
println!("{:p}", moo as *const ());
}
However, I can't think of a useful reason to want to do this. Usually, there's something you want to do with the function. In those cases, you might as well just pass the function directly, no need to deal with the address:
fn moo() {}
fn do_moo(f: fn()) {
f()
}
fn main() {
do_moo(moo);
}
I'm less sure about this, but I think that std::mem::transmute::<_, u32>(&function) would just create a local variable that points to function and then gets the reference to that variable. This would match how this code works:
fn main() {
let a = &42;
}
I need not to work with them in Rust, I need an address because I have some FFI that takes an address of the symbol in the current process
You can still just pass the function as-is to the extern functions that will use the callback:
extern {
fn a_thing_that_does_a_callback(callback: extern fn(u8) -> bool);
}
extern fn zero(a: u8) -> bool { a == 0 }
fn main() {
unsafe { a_thing_that_does_a_callback(zero); }
}
The answer by #Shepmaster gives answer to this question (though giving also other not relevant but may be useful for somebody information). I summarize it here.
Obtaining address is easy, just
funct as *const ()
Reference to a function seems to create a local variable just like with
let a = &42;

C library freeing a pointer coming from Rust

I want to do Rust bindings to a C library which requires a callback, and this callback must return a C-style char* pointer to the C library which will then free it.
The callback must be in some sense exposed to the user of my library (probably using closures), and I want to provide a Rust interface as convenient as possible (meaning accepting a String output if possible).
However, the C library complains when trying to free() a pointer coming from memory allocated by Rust, probably because Rust uses jemalloc and the C library uses malloc.
So currently I can see two workarounds using libc::malloc(), but both of them have disadvantages:
Give the user of the library a slice that he must fill (inconvenient, and imposes length restrictions)
Take his String output, copy it to an array allocated by malloc, and then free the String (useless copy and allocation)
Can anybody see a better solution?
Here is an equivalent of the interface of the C library, and the implementation of the ideal case (if the C library could free a String allocated in Rust)
extern crate libc;
use std::ffi::CString;
use libc::*;
use std::mem;
extern "C" {
// The second parameter of this function gets passed as an argument of my callback
fn need_callback(callback: extern fn(arbitrary_data: *mut c_void) -> *mut c_char,
arbitrary_data: *mut c_void);
}
// This function must return a C-style char[] that will be freed by the C library
extern fn my_callback(arbitrary_data: *mut c_void) -> *mut c_char {
unsafe {
let mut user_callback: *mut &'static mut FnMut() -> String = mem::transmute(arbitrary_data); //'
let user_string = (*user_callback)();
let c_string = CString::new(user_string).unwrap();
let ret: *mut c_char = mem::transmute(c_string.as_ptr());
mem::forget(c_string); // To prevent deallocation by Rust
ret
}
}
pub fn call_callback(mut user_callback: &mut FnMut() -> String) {
unsafe {
need_callback(my_callback, mem::transmute(&mut user_callback));
}
}
The C part would be equivalent to this:
#include <stdlib.h>
typedef char* (*callback)(void *arbitrary_data);
void need_callback(callback cb, void *arbitrary_data) {
char *user_return = cb(arbitrary_data);
free(user_return); // Complains as the pointer has been allocated with jemalloc
}
It might require some annoying work on your part, but what about exposing a type that implements Write, but is backed by memory allocated via malloc? Then, your client can use the write! macro (and friends) instead of allocating a String.
Here's how it currently works with Vec:
let mut v = Vec::new();
write!(&mut v, "hello, world");
You would "just" need to implement the two methods and then you would have a stream-like interface.

What is the correct type for a method on a lifetime-parameterized struct?

I have a struct that contains a reference and so it has a lifetime parameter. I'd like to pass around the function pointer of a method of this struct. Later, I will call that function with an instance of the struct. I ran into snags while trying to store the function pointer, eventually finding this solution:
struct Alpha<'a> { a: &'a u8 }
impl<'a> Alpha<'a> {
fn alpha(&self) -> u8 { *self.a }
}
struct Try1(fn(&Alpha) -> u8);
struct Try2(for<'z> fn(&Alpha<'z>) -> u8);
struct Try3<'z>(fn(&Alpha<'z>) -> u8);
fn main() {
Try1(Alpha::alpha); // Nope
Try2(Alpha::alpha); // Nope
Try3(Alpha::alpha);
}
Unfortunately, this solution doesn't work for my real case because I want to implement a trait that has its own notion of lifetimes:
trait Zippy {
fn greet<'a>(&self, &Alpha<'a>);
}
impl<'z> Zippy for Try3<'z> {
fn greet<'a>(&self, a: &Alpha<'a>) { println!("Hello, {}", self.0(a)) }
}
Produces the error:
error: mismatched types:
expected `&Alpha<'z>`,
found `&Alpha<'a>`
I feel that I shouldn't need to tie the lifetime of my struct Try3 to the lifetime of the parameter of the function pointer, but the compiler must be seeing something I'm not.
Unfortunately, the function alpha implemented on the struct Alpha effectively takes the struct's lifetime as a parameter, despite not actually using it. This is a limitation of the syntax for defining methods on structs with lifetimes. So even though it is possible to take a pointer to it as a for<'z> fn(&Alpha<'z>) -> u8, it is not possible to treat it as a fn(&Alpha) -> u8, even though the definition suggests this should be possible.
This can be worked around by defining a function that invokes the method and take a pointer to it instead:
fn workaround(a: &Alpha) -> u8 { Alpha::alpha(a) }
Try1(workaround);
In fact, it may be better to do it the other way around, with the definition in the function and the method invoking the function. Then when the function is invoked through a fn(&Alpha) -> u8 pointer a second jump won't be necessary into the method, and calls to the method can be inlined as calls to the function.

Resources