Why do most ffi functions use raw pointers instead of references?

Why do most ffi functions use raw pointers instead of references? - rust

Both in ffi tutorials and in automatically generated interfaces, *const T pointers are used most of the time. As far as I know the difference between &T and *const T is only that *const T doesn't have to fulfill certain conditions like not being null and is unsafe to dereference.
fn main() {
unsafe {
do_something(&TestStruct {data: 3})
}
}
#[repr(C)]
pub struct TestStruct {
data: u32
}
extern "C" {
fn do_something(arg: &TestStruct);
}
This code can be compiled and runs. Because external functions are similar in usage to internal ones, i don't understand why raw pointers are used there as the default.

An element of answer can probably be found in the fact that references must be aligned. As using un-aligned references is undefined behaviour, and the alignment of the pointers cannot be guaranteed in FFIs, defaulting to pointers seems to be a sane choice

Related

Are mutable static primitives actually `unsafe` if single-threaded?

I'm developing for a single-core embedded chip. In C & C++ it's common to statically-define mutable values that can be used globally. The Rust equivalent is roughly this:
static mut MY_VALUE: usize = 0;
pub fn set_value(val: usize) {
unsafe { MY_VALUE = val }
}
pub fn get_value() -> usize {
unsafe { MY_VALUE }
}
Now anywhere can call the free functions get_value and set_value.
I think that this should be entirely safe in single-threaded embedded Rust, but I've not been able to find a definitive answer. I'm only interested in types that don't require allocation or destruction (like the primitive in the example here).
The only gotcha I can see is with the compiler or processor reordering accesses in unexpected ways (which could be solves using the volatile access methods), but is that unsafe per se?
Edit:
The book suggests that this is safe so long as we can guarantee no multi-threaded data races (obviously the case here)
With mutable data that is globally accessible, it’s difficult to ensure there are no data races, which is why Rust considers mutable static variables to be unsafe.
The docs are phrased less definitively, suggesting that data races are only one way this can be unsafe but not expanding on other examples
accessing mutable statics can cause undefined behavior in a number of ways, for example due to data races in a multithreaded context
The nomicon suggests that this should be safe so long as you don't somehow dereference a bad pointer.

Be aware as there is no such thing as single-threaded code as long as interrupts are enabled. So even for microcontrollers, mutable statics are unsafe.
If you really can guarantee single-threaded access, your assumption is correct that accessing primitive types should be safe. That's why the Cell type exists, which allows mutability of primitive types with the exception that it is not Sync (meaning it explicitely prevents threaded access).
That said, to create a safe static variable, it needs to implement Sync for exactly the reason mentioned above; which Cell doesn't do, for obvious reasons.
To actually have a mutable global variable with a primitive type without using an unsafe block, I personally would use an Atomic. Atomics do not allocate and are available in the core library, meaning they work on microcontrollers.
use core::sync::atomic::{AtomicUsize, Ordering};
static MY_VALUE: AtomicUsize = AtomicUsize::new(0);
pub fn set_value(val: usize) {
MY_VALUE.store(val, Ordering::Relaxed)
}
pub fn get_value() -> usize {
MY_VALUE.load(Ordering::Relaxed)
}
fn main() {
println!("{}", get_value());
set_value(42);
println!("{}", get_value());
}
Atomics with Relaxed are zero-overhead on almost all architectures.

In this case it's not unsound, but you still should avoid it because it is too easy to misuse it in a way that is UB.
Instead, use a wrapper around UnsafeCell that is Sync:
pub struct SyncCell<T>(UnsafeCell<T>);
unsafe impl<T> Sync for SyncCell<T> {}
impl<T> SyncCell<T> {
pub const fn new(v: T) -> Self { Self(UnsafeCell::new(v)); }
pub unsafe fn set(&self, v: T) { *self.0.get() = v; }
}
impl<T: Copy> SyncCell<T> {
pub unsafe fn get(&self) -> T { *self.0.get() }
}
If you use nightly, you can use SyncUnsafeCell.

Mutable statics are unsafe in general because they circumvent the normal borrow checker rules that enforce either exactly 1 mutable borrow exists or any number of immutable borrows exist (including 0), which allows you to write code which causes undefined behavior. For instance, the following compiles and prints 2 2:
static mut COUNTER: i32 = 0;
fn main() {
unsafe {
let mut_ref1 = &mut COUNTER;
let mut_ref2 = &mut COUNTER;
*mut_ref1 += 1;
*mut_ref2 += 1;
println!("{mut_ref1} {mut_ref2}");
}
}
However we have two mutable references to the same location in memory existing concurrently, which is UB.
I believe the code that you posted there is safe, but I generally would not recommend using static mut. Use an atomic, SyncUnsafeCell/UnsafeCell, a wrapper around a Cell that implements Sync which is safe since your environment is single-threaded, or honestly just about anything else. static mut is wildly unsafe and its use is highly discouraged.

In order to sidestep the issue of exactly how mutable statics can be used safely in single-threaded code, another option is to use thread-local storage:
use std::cell::Cell;
thread_local! (static MY_VALUE: Cell<usize> = {
Cell::new(0)
});
pub fn set_value(val: usize) {
MY_VALUE.with(|cell| cell.set(val))
}
pub fn get_value() -> usize {
MY_VALUE.with(|cell| cell.get())
}

How to safely create an opaque struct and then free it over the FFI boundary?

I am using cbindgen to generate C bindings for a small Rust crate that implements the ULID specification. To avoid leaking information, I am generating an opaque struct ulid_ctx and returning a pointer to that context object when it is first created. I'm struggling a little bit with reconciling Rust's ownership semantics and C's laissez-faire approach to memory.
#[allow(non_camel_case_types)]
pub struct ulid_ctx {
seed: u32,
}
#[no_mangle]
pub extern "C" fn ulid_create(seed: u32) -> *mut ulid_ctx {
let ctx = ulid_ctx { seed: s };
Box::leak(Box::new(ctx))
}
#[no_mangle]
pub unsafe extern "C" fn ulid_ctx_destroy(ctx: *mut ulid_ctx) {
Box::from_raw(ctx);
}
Two questions:
Does Box::leak(Box::new(ctx)) correctly allocate a ctx value on the heap and then inform Rust that the function no longer owns it?
Will Box::from_raw(ctx); re-create a Box and then immediately drop it, thereby freeing the memory?
Although it's not a lot of data (32 bits), I would like to avoid creating a memory leak if possible.

Does Box::leak(Box::new(ctx)) correctly allocate a ctx value on the
heap and then inform Rust that the function no longer owns it?
Indeed, as the name says it leaks the data, so it will not be dropped when going out of scope.
As per #user4815162342 comment, consider using Box::into_raw instead.
Will Box::from_raw(ctx); re-create a Box and then immediately drop it,
thereby freeing the memory?
Also true, will build a Box then will be dropped. As a note, it may be nice to make the drop explicit.
#[no_mangle]
pub unsafe extern "C" fn ulid_ctx_destroy(ctx: *mut ulid_ctx) {
drop(unsafe { Box::from_raw(ctx) })
}

FFI: Convert nullable pointer to option

I'm using rust-bindgen to access a C library from Rust. Some functions return nullable pointers to structs, which bindgen represents as
extern "C" {
pub fn get_some_data() -> *const SomeStruct;
}
Now, for a higher level wrapper, I would like to convert this to a Option<&'a SomeStruct> with an appropriate lifetime. Due to the nullable pointer optimization, this is actually represented identically to *const SomeStruct. However, I couln't find any concise syntax to cast between the two. Transmuting
let data: Option<&'a SomeStruct> = unsafe { mem::transmute( get_some_data() ) };
and reborrowing
let data_ptr = get_some_data();
let data = if data_ptr.is_null() { None } else { unsafe { &*data_ptr } };
could be used. The docs for mem::transmute state that
transmute is incredibly unsafe. There are a vast number of ways to cause undefined behavior with this function. transmute should be the absolute last resort.
and recommends re-borrowing instead for
Turning a *mut T into an &mut T
However, for the nullable pointer, this is quite clumsy as shown in the second example.
Q: Is there a more concise Syntax for this cast? Alternatively, is there a way to tell bindgen to generate
extern "C" {
pub fn get_some_data() -> Option<&SomeStruct>;
}
directly?

Use <*const T>::as_ref¹:
let data = unsafe { get_some_data().as_ref() };
Since a raw pointer may not point to a valid object of sufficient lifetime for any 'a, as_ref is unsafe to call.
There is a corresponding as_mut for *mut T → Option<&mut T>.
¹ This is a different as_ref from, for example, AsRef::as_ref and Option::as_ref, both of which are common in safe code.

Is storing data and a mutable pointer to that data in a struct safe?

Let's consider a Rust wrapper library around a C library. This C library defines a struct and uses it as a mutable pointer throughout it's API.
The Rust wrapper defines the following struct with ptr pointing at data.
struct Wrapper {
data: struct_from_c_t,
ptr: *mut struct_from_c_t,
}
If all uses of this pointer are made within the Wrapper struct's lifetime, what other potential issues can I run into when using this pointer in unsafe code ?
Is dereference and use of this pointer always safe in this construct?
For detailed context, the goal is to be able to call FFI functions using this pointer from functions borrowing Wrapper non-mutably.

This is generally a bad idea and it can go wrong very easily.
First, go read Why can't I store a value and a reference to that value in the same struct? for an in-depth explanation about why safe Rust prevents this construct at compile time.
TL;DR, if you ever move the Wrapper struct, the pointer will be invalid. Dereferencing it will cause undefined behavior (a bad thing).
If you can ensure that either of:
The Wrapper is never moved.
The ptr is updated every time you move the struct.
Then the pointer will be valid and safe to dereference (assuming all the other caveats about unsafe code are upheld).
What's worse is that there's no reason to keep the pointer in the first place; you can take a reference to a value and convert it into a pointer whenever you need:
extern "C" {
fn ffi_fn(data: *mut struct_from_c_t);
}
struct Wrapper {
data: struct_from_c_t,
}
impl Wrapper {
fn do_thing(&mut self) {
unsafe { ffi_fn(&mut self.data) }
}
}
from functions borrowing Wrapper non-mutably
Without context, this seems like a dubious decision, but Rust has tools for interior mutability:
use std::cell::RefCell;
struct Wrapper {
data: RefCell<struct_from_c_t>,
}
impl Wrapper {
fn do_thing(&self) {
unsafe { ffi_fn(&mut *self.data.borrow_mut()) }
}
}

Rust not creating function in lib extern FFI

I have many Rust functions working perfectly across Ruby FFI. But following directions from two different sites for creating a free_array method is not making the method available in the linked library.
This example is the working example of freeing a String returned from Ruby.
use libc::c_char;
use std::ffi::CString;
#[no_mangle]
pub extern "C" fn free_string(s: *mut c_char) {
unsafe {
if s.is_null() { return }
CString::from_raw(s)
};
}
And here are two attempts at implementing a way to free the memory of an Array.
use std::mem::transmute;
use ruby_array::RubyArray;
#[no_mangle]
pub extern "C" fn free_array(ra: *mut RubyArray) {
let _ra: Box<RubyArray> = unsafe{ transmute(ra) };
}
// OR
#[no_mangle]
pub extern "C" fn free_array(ptr: *mut RubyArray) {
if ptr.is_null() { return }
unsafe { Box::from_raw(ptr); }
}
This results in an error:
Function 'free_array' not found in [/libfaster_path.so] (FFI::NotFoundError)
Here's the Struct I'm using and which gets created perfectly in to Ruby from Rust.
use libc;
use std::mem;
#[repr(C)]
pub struct RubyArray {
len: libc::size_t,
data: *const libc::c_void,
}
impl RubyArray {
#[allow(dead_code)]
pub fn from_vec<T>(vec: Vec<T>) -> RubyArray {
let array = RubyArray {
data: vec.as_ptr() as *const libc::c_void,
len: vec.len() as libc::size_t
};
mem::forget(vec);
array
}
}
But that's not relevant as it's not the issue. The issue is the method is not being made available in the library output for FFI to read from. What's wrong with this? Rust is happy and I've written many other methods in similar manner that work. What makes this not found in the .so file?
The file is included in the main src/lib.rs with pub mod so there's nothing wrong there. It's the same as the other working methods.
I'll be posting a blog with much fuller implementation details later and I'll add a link to the comment section below for those who are interested.
Minor Update
I'm pretty sure this is an issue with Rust. I've used both Ruby's Fiddle and FFI to verify that this method couldn't be accessed where as other methods could be by both.
I grepped the binary and found the text showing the free_array method in the binary but apparently that's not compiled correctly to be read by FFI.

The information I provided was not enough to debug the issue. The real problem was I had changed from building source from release to debug and had not updated the FFI linked library folder to reflect that. So the code kept pointing at and old lib.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Why do most ffi functions use raw pointers instead of references? - rust

An element of answer can probably be found in the fact that references must be aligned. As using un-aligned references is undefined behaviour, and the alignment of the pointers cannot be guaranteed in FFIs, defaulting to pointers seems to be a sane choice

Related

Are mutable static primitives actually `unsafe` if single-threaded?

How to safely create an opaque struct and then free it over the FFI boundary?

FFI: Convert nullable pointer to option

Is storing data and a mutable pointer to that data in a struct safe?

Rust not creating function in lib extern FFI

Categories

Resources