FFI: Convert nullable pointer to option - rust

I'm using rust-bindgen to access a C library from Rust. Some functions return nullable pointers to structs, which bindgen represents as
extern "C" {
pub fn get_some_data() -> *const SomeStruct;
}
Now, for a higher level wrapper, I would like to convert this to a Option<&'a SomeStruct> with an appropriate lifetime. Due to the nullable pointer optimization, this is actually represented identically to *const SomeStruct. However, I couln't find any concise syntax to cast between the two. Transmuting
let data: Option<&'a SomeStruct> = unsafe { mem::transmute( get_some_data() ) };
and reborrowing
let data_ptr = get_some_data();
let data = if data_ptr.is_null() { None } else { unsafe { &*data_ptr } };
could be used. The docs for mem::transmute state that
transmute is incredibly unsafe. There are a vast number of ways to cause undefined behavior with this function. transmute should be the absolute last resort.
and recommends re-borrowing instead for
Turning a *mut T into an &mut T
However, for the nullable pointer, this is quite clumsy as shown in the second example.
Q: Is there a more concise Syntax for this cast? Alternatively, is there a way to tell bindgen to generate
extern "C" {
pub fn get_some_data() -> Option<&SomeStruct>;
}
directly?

Use <*const T>::as_ref¹:
let data = unsafe { get_some_data().as_ref() };
Since a raw pointer may not point to a valid object of sufficient lifetime for any 'a, as_ref is unsafe to call.
There is a corresponding as_mut for *mut T → Option<&mut T>.
¹ This is a different as_ref from, for example, AsRef::as_ref and Option::as_ref, both of which are common in safe code.

Related

How to write a Rust function with argument that accept both of *const and *mut pointer

I'm trying to find out how to write a function that receives both of *const and *mut pointer. Rust doesn't seems to have a trait for pointers. Is there a solution in latest version of Rust?
Here is an example, the function needs a pointer, but not care about the mutability of the pointer.
use std::ffi::c_void;
fn main() {
let const_ptr = 0usize as *const c_void; // null const pointer
function(const_ptr);
let mut_ptr = 0usize as *mut c_void; // null mut pointer
function(mut_ptr);
}
fn function<T>(arg: *ANY T) -> *ANY T {
arg
}
Rust does not support any kind of direct overload in their fn or method signatures. But as always, exists and idiomatic way to go. This is, go with traits.
trait Pointereable {}
fn accept_diff_pointers<T: Pointereable>(ptr: T) { // do stuff }
Then, you have to implement that new trait for the desired types:
impl Pointereable for X {}
impl Pointereable for Y {}
...
and so on and so forth, where in this example, X and Y are the exact pointer types that you want to work with.
One last note:
Cast const things to mut ones is Undefined Behaviour.
Transmuting an & to &mut is Undefined Behavior. While certain usages may appear safe, note that the Rust optimizer is free to assume that a shared reference won't change through its lifetime and thus such transmutation will run afoul of those assumptions.
Read more here: https://doc.rust-lang.org/nomicon/transmutes.html
I am explaining thease because with the trait bound approach to accept different pointers, you'll will have to play a bit to match the correct one.
Also note that in the nomicon, it's talking about references. Trasmutting them like exposed above is undefined behaviour.
Casting pointers in this context it's always fine, but dereferencing them is a lot more complex. It may or not may lead to undefined behaviour, so you must take it carefully.

What is the preferred unsafe way to extend lifetimes which are correct but not provable?

I have a minimal arena allocator which demonstrates the intention, although it isn't optimized for minimizing allocations/deallocations like a true arena would be:
#[derive(Clone)]
pub struct Arena(Arc<Mutex<Vec<Box<[u8]>>>>);
impl Arena {
/// Allocate memory of the given size.
pub fn allocate(&self, size: usize) -> &mut [u8] {
let inner = &mut *self.0.lock().unwrap();
let mut new_mem = vec![0u8; size].into_boxed_slice();
let slice = &mut new_mem[..]; // THIS OBVIOUSLY DOESNT WORK
inner.push(new_mem);
slice
}
}
allocate is the only operation, therefore I know that I can safely take a reference to the memory contained in new_mem with the same lifetime as &self, because I don't provide any operations that would allow the boxed memory to become aliased, and I know because the memory block is boxed, it won't move even if the vector it is stored in has to reallocate to add additional blocks.
There's also no way I know of to safely tell the compiler that the reference to the memory block is safe. Using &mut new_mem[..] fails because the compiler thinks I'm borrowing new_mem while trying to move it into the vector in push. I could invert the order and do push followed by &mut inner.last().unwrap()[..], but that also fails, because the compiler sees that reference as being owned by the mutex guard.
That means that to tell the compiler that this borrow is OK, I need to do something unsafe to create a reference with a longer lifetime than normal borrowing would produce in this case.
I know of two ways to extend this lifetime:
std::mem::transmute:
let slice = {
// Strongly-typed line to make sure we aren't accidentally starting from a pointer to
// the box itself by accident.
let slice: &mut [u8] = &mut new_mem[..];
unsafe { mem::transmute(slice) }
};
Dereferencing a raw pointer:
let slice = {
// Strongly-typed line to make sure we aren't accidentally starting from a pointer to
// the box itself by accident.
let slice: &mut [u8] = &mut new_mem[..];
let ptr: *mut _ = slice;
unsafe { &mut *ptr }
};
Is there any particular advantage to either of these options? For example, are there classes of mistakes that are possible with one option that aren't possible, or are harder to make, with the other? Are there other ways to do this that have different advantages? Or are all options about the same?
For classes of mistakes, I'm particularly wondering whether there are type inference mistakes that can occur in one that aren't possible in the other. Obviously transmute can convert to anything with the same size, and raw pointers allow casting to any type regardless of size, but I wonder if type inference on pointers is more restricted in the absence of an as cast.
How about this?
impl Arena {
/// Allocate memory of the given size.
pub fn allocate(&self, size: usize) -> &mut [u8] {
let inner = &mut *self.0.lock().unwrap();
let mut new_mem = vec![0u8; size].into_boxed_slice();
let mem_ptr: *mut u8 = new_mem.as_mut_ptr();
let slice = unsafe { std::slice::from_raw_parts_mut(mem_ptr, size) };
inner.push(new_mem);
slice
}
}
I don't know the pros and cons, but that's how I would do it.

Why do most ffi functions use raw pointers instead of references?

Both in ffi tutorials and in automatically generated interfaces, *const T pointers are used most of the time. As far as I know the difference between &T and *const T is only that *const T doesn't have to fulfill certain conditions like not being null and is unsafe to dereference.
fn main() {
unsafe {
do_something(&TestStruct {data: 3})
}
}
#[repr(C)]
pub struct TestStruct {
data: u32
}
extern "C" {
fn do_something(arg: &TestStruct);
}
This code can be compiled and runs. Because external functions are similar in usage to internal ones, i don't understand why raw pointers are used there as the default.
An element of answer can probably be found in the fact that references must be aligned. As using un-aligned references is undefined behaviour, and the alignment of the pointers cannot be guaranteed in FFIs, defaulting to pointers seems to be a sane choice

Ensuring value lives for its entire scope

I have a type (specifically CFData from core-foundation), whose memory is managed by C APIs and that I need to pass and receive from C functions as a *c_void. For simplicity, consider the following struct:
struct Data {
ptr: *mut ffi::c_void,
}
impl Data {
pub fn new() -> Self {
// allocate memory via an unsafe C-API
Self {
ptr: std::ptr::null(), // Just so this compiles.
}
}
pub fn to_raw(&self) -> *const ffi::c_void {
self.ptr
}
}
impl Drop for Data {
fn drop(&mut self) {
unsafe {
// Free memory via a C-API
}
}
}
Its interface is safe, including to_raw(), since it only returns a raw pointer. It doesn't dereference it. And the caller doesn't dereference it. It's just used in a callback.
pub extern "C" fn called_from_C_ok(on_complete: extern "C" fn(*const ffi::c_void)) {
let data = Data::new();
// Do things to compute Data.
// This is actually async code, which is why there's a completion handler.
on_complete(data.to_raw()); // data survives through the function call
}
This is fine. Data is safe to manipulate, and (I strongly believe) Rust promises that data will live until the end of the on_complete function call.
On the other hand, this is not ok:
pub extern "C" fn called_from_C_broken(on_complete: extern "C" fn(*const ffi::c_void)) {
let data = Data::new();
// ...
let ptr = data.to_raw(); // data can be dropped at this point, so ptr is dangling.
on_complete(ptr); // This may crash when the receiver dereferences it.
}
In my code, I made this mistake and it started crashing. It's easy to see why and it's easy to fix. But it's subtle, and it's easy for a future developer (me) to modify the ok version into the broken version without realizing the problem (and it may not always crash).
What I'd like to do is to ensure data lives as long as ptr. In Swift, I'd do this with:
withExtendedLifetime(&data) { data in
// ...data cannot be dropped until this closure ends...
}
Is there a similar construct in Rust that explicitly marks the minimum lifetime for a variable to a scope (that the optimizer may not reorder), even if it's not directly accessed? (I'm sure it's trivial to build a custom with_extended_lifetime in Rust, but I'm looking for a more standard solution so that it will be obvious to other developers what's going on).
Playground
I do believe the following "works" but I'm not sure how flexible it is, or if it's just replacing a more standard solution:
fn with_extended_lifetime<T, U, F>(value: &T, f: F) -> U
where
F: Fn(&T) -> U,
{
f(value)
}
with_extended_lifetime(&data, |data| {
let ptr = data.to_raw();
on_complete(ptr)
});
The optimizer is not allowed to change when a value is dropped. If you assign a value to a variable (and that value is not then moved elsewhere or overwritten by assignment), it will always be dropped at the end of the block, not earlier.
You say that this code is incorrect:
pub extern "C" fn called_from_C_broken(on_complete: extern "C" fn(*const ffi::c_void)) {
let data = Data::new();
// ...
let ptr = data.to_raw(); // data can be dropped at this point, so ptr is dangling.
on_complete(ptr); // This may crash when the receiver dereferences it.
}
but in fact data may not be dropped at that point, and this code is sound. What you may be confusing this with is the mistake of not assigning the value to a variable:
let ptr = Data::new().to_raw();
on_complete(ptr);
In this case, the pointer is dangling, because the result of Data::new() is stored in a temporary variable within the statement, which is dropped at the end of the statement, not a local variable, which is dropped at the end of the block.
If you want to adopt a programming style which makes explicit when values are dropped, the usual pattern is to use the standard drop() function to mark the exact time of drop:
let data = Data::new();
...
on_complete(data.to_raw());
drop(data); // We have stopped using the raw pointer now
(Note that drop() is not magic: it is simply a function which takes one argument and does nothing with it other than dropping. Its presence in the standard library is to give a name to this pattern, not to provide special functionality.)
However, if you want to, there isn't anything wrong with using your with_extended_lifetime (other than nonstandard style and arguably a misleading name) if you want to make the code structure even more strongly indicate the scope of the value. One nitpick: the function parameter should be FnOnce, not Fn, for maximum generality (this allows it to be passed functions that can't be called more than once).
Other than explicitly dropping as the other answer mentions, there is another way to help prevent these types of accidental drops: use a wrapper around a raw pointer that has lifetime information.
use std::marker::PhantomData;
#[repr(transparent)]
struct PtrWithLifetime<'a>{
ptr: *mut ffi::c_void,
_data: PhantomData<&'a ffi::c_void>,
}
impl Data {
fn to_raw(&self) -> PtrWithLife<'_>{
PtrWithLifetime{
ptr: self.ptr,
_data: PhantomData,
}
}
}
The #[repr(transparent)] guarantees that PtrWithLife is stored in memory the same as *const ffi::c_void is, so you can adjust the declaration of on_complete to
fn called_from_c(on_complete: extern "C" fn(PtrWithLifetime<'_>)){
//stuff
}
without causing any major inconvenience to any downstream users, especially since the ffi bindings can be adjusted in a similar fashion.

How to share parts of a string with Rc?

I want to create some references to a str with Rc, without cloning str:
fn main() {
let s = Rc::<str>::from("foo");
let t = Rc::clone(&s); // Creating a new pointer to the same address is easy
let u = Rc::clone(&s[1..2]); // But how can I create a new pointer to a part of `s`?
let w = Rc::<str>::from(&s[0..2]); // This seems to clone str
assert_ne!(&w as *const _, &s as *const _);
}
playground
How can I do this?
While it's possible in principle, the standard library's Rc does not support the case you're trying to create: a counted reference to a part of reference-counted memory.
However, we can get the effect for strings using a fairly straightforward wrapper around Rc which remembers the substring range:
use std::ops::{Deref, Range};
use std::rc::Rc;
#[derive(Clone, Debug, Eq, Hash, PartialEq)]
pub struct RcSubstr {
string: Rc<str>,
span: Range<usize>,
}
impl RcSubstr {
fn new(string: Rc<str>) -> Self {
let span = 0..string.len();
Self { string, span }
}
fn substr(&self, span: Range<usize>) -> Self {
// A full implementation would also have bounds checks to ensure
// the requested range is not larger than the current substring
Self {
string: Rc::clone(&self.string),
span: (self.span.start + span.start)..(self.span.start + span.end)
}
}
}
impl Deref for RcSubstr {
type Target = str;
fn deref(&self) -> &str {
&self.string[self.span.clone()]
}
}
fn main() {
let s = RcSubstr::new(Rc::<str>::from("foo"));
let u = s.substr(1..2);
// We need to deref to print the string rather than the wrapper struct.
// A full implementation would `impl Debug` and `impl Display` to produce
// the expected substring.
println!("{}", &*u);
}
There are a lot of conveniences missing here, such as suitable implementations of Display, Debug, AsRef, Borrow, From, and Into — I've provided only enough code to illustrate how it can work. Once supplemented with the appropriate trait implementations, this should be just as usable as Rc<str> (with the one edge case that it can't be passed to a library type that wants to store Rc<str> in particular).
The crate arcstr claims to offer a finished version of this basic idea, but I haven't used or studied it and so can't guarantee its quality.
The crate owning_ref provides a way to hold references to parts of an Rc or other smart pointer, but there are concerns about its soundness and I don't fully understand which circumstances that applies to (issue search which currently has 3 open issues).

Resources