I'm trying to call pthread_join with a pointer to my struct in order that the C thread can fill in the struct to the memory I point it to. (Yes, I'm aware that this is highly unsafe..)
The function signature of pthread_join:
pub unsafe extern fn pthread_join(native: pthread_t,
value: *mut *mut c_void)
-> c_int
I'm doing this as an exercise of porting C code from a book to Rust. The C code:
pthread_t tid1;
struct foo *fp;
err = pthread_create(&tid1, NULL, thr_fn1, NULL);
err = pthread_join(tid1, (void *)&fp);
I came up with this code:
extern crate libc;
use libc::{pthread_t, pthread_join};
struct Foo {}
fn main() {
let tid1:pthread_t = std::mem::uninitialized();
let mut fp:Box<Foo> = std::mem::uninitialized();
let value = &mut fp;
pthread_join(tid1, &mut value);
}
But the error I see is:
error[E0308]: mismatched types
--> src/bin/11-threads/f04-bogus-pthread-exit.rs:51:24
|
51 | pthread_join(tid1, &mut value);
| ^^^^^^^^^^ expected *-ptr, found mutable reference
|
= note: expected type `*mut *mut libc::c_void`
found type `&mut &mut std::boxed::Box<Foo>`
Is it even possible to achieve this just using casts, or do I need to transmute?
There are several issues here:
Box is a pointer to a heap-allocated resource, you can extract the pointer itself using Box::into_raw(some_box),
References are not silently coerced into pointers (even though they have the same representation), you need an explicit cast,
You need to cast from your concrete type to c_void, type inference may be able to do that
You have a reference to a reference to a pointer, you need a pointer to a pointer; you have one too many levels of indirection.
Let's make it work:
// pthread interface, reduced
struct Void;
fn sample(_: *mut *mut Void) {}
// actual code
struct Foo {}
fn main() {
let mut p = Box::into_raw(Box::new(Foo{})) as *mut Void;
sample(&mut p as *mut _);
}
Note that this is leaking memory (as a result of into_raw), normally the memory should be shoved back into a Box with from_raw for the destructor of Foo to be called and the memory to be freed.
The code can't work as written; that is because the C thread doesn't really "fill in the struct" in the memory you point to. It is responsible for allocating its own memory (or receiving it from another thread beforehand) and filling it out. The only thing the C thread "returns" is a single address, and this address is picked up by pthread_join.
This is why pthread_join receives a void **, i.e. the pointer to a void *. That kind of output parameter enables pthread_join to store (return) the void * pointer provided by the freshly finished thread. The thread can provide the pointer either by passing it to pthread_exit or by returning it from the start_routine passed to pthread_create. In Rust, the raw pointer can be received with code like this:
let mut c_result: *mut libc::c_void = ptr::null_mut();
libc::pthread_join(tid1, &mut c_result as *mut _);
// C_RESULT now contains the raw pointer returned by the worker's
// start routine, or passed to pthread_exit()
The contents and size of the memory that the returned pointer points to are a matter of contract between the thread being joined and the thread that is joining it. If the worker thread is implemented in C and designed to be invoked by other C code, then an obvious choice is for it to allocate memory for a result structure, fill it out, and provide a pointer to allocated memory. For example:
struct ThreadResult { ... };
...
ThreadResult *result = malloc(sizeof(struct ThreadResult));
result->field1 = value1;
...
pthread_exit(result);
In that case your Rust code that joins the thread can interpret the result by replicating the C structure and picking up its ownership:
// obtain a raw-pointer c_result through pthread_join as
// shown above:
let mut c_result = ...;
libc::pthread_join(tid1, &mut c_result as *mut _);
#[repr(C)]
struct ThreadResult { ... } // fields copy-pasted from C
unsafe {
// convert the raw pointer to a Rust reference, so that we may
// inspect its contents
let result = &mut *(c_result as *mut ThreadResult);
// ... inspect result.field1, etc ...
// free the memory allocated in the thread
libc::free(c_result);
// RESULT is no longer usable
}
Related
I'm writing a foreign function interface (ffi) to expose the API of a pre-existing C++ library to some new Rust code I am writing. I am using the Rust cxx module for this.
I am running into some problems related to Pin (a topic which I have to admit I don't have a complete grasp of).
My C++ module has an API that exposes pointers to some contained objects from the main object that owns these contained objects. Here is a contrived example:
// test.hpp
#include <string>
#include <memory>
class Row {
std::string row_data;
public:
void set_value(const std::string& new_value) {
this->row_data = new_value;
}
};
class Table {
Row row;
public:
Table() : row() {};
Row* get_row() {
return &this->row;
}
};
inline std::unique_ptr<Table> make_table() {
return std::make_unique<Table>();
}
The idea is that you create a Table object, which then allows you to obtain a pointer to it's Row so you can manipulate it.
My attempt to create a Rust FFI looks like this:
// main.rs
use std::pin::Pin;
use cxx::let_cxx_string;
#[cxx::bridge]
mod ffi {
unsafe extern "C++" {
include!("src/test.hpp");
type Table;
pub fn make_table() -> UniquePtr<Table>;
fn get_row(self: Pin<&mut Table>) -> *mut Row;
type Row;
pub fn set_value(self: Pin<&mut Row>, value: &CxxString);
}
}
impl ffi::Table {
pub fn get_row_ref<'a>(self: Pin<&'a mut ffi::Table>) -> Pin<&'a mut ffi::Row> {
unsafe { Pin::new_unchecked(&mut *self.get_row()) }
}
}
fn main() {
let mut table = ffi::make_table();
let row = table.pin_mut().get_row_ref();
let_cxx_string!(hello="hello world");
row.set_value(&hello);
let_cxx_string!(hello2="bye world");
row.set_value(&hello2);
}
Note that:
The cxx module requires that non-const C++ methods take Pin<&mut T> as their receiver
The C++ get_row method returns a pointer, which I want to convert into a reference to the Row which has the same lifetime as the owning Table object - that's what the get_row_ref wrapper is for.
I have two problems:
Is it sound for me to call Pin::new_unchecked here? The documentation implies it is not:
calling Pin::new_unchecked on an &'a mut T is unsafe because while you are
able to pin it for the given lifetime 'a, you have no control over whether
it is kept pinned once 'a ends
If that is not safe, how do I proceed?
This program fails to compile with the following error:
error[E0382]: use of moved value: `row`
--> src/main.rs:41:2
|
34 | let row = table.pin_mut().get_row_ref();
| --- move occurs because `row` has type `Pin<&mut Row>`, which does not implement the `Copy` trait
...
38 | row.set_value(&hello);
| --- value moved here
...
41 | row.set_value(&hello2);
| ^^^ value used here after move
The first call to set_value consumes the pinned reference, and after that it can't
be used again. &mut T is not Copy, so Pin<&mut Row> is not Copy either.
How do I set up the API so that the reference to Row can be used for multiple
successive method calls (within the constraints established by cxx)?
For those wanting to try it out:
# Cargo.toml
[dependencies]
cxx = "1.0.52"
[build-dependencies]
cxx-build = "1.0"
// build.rs
fn main() {
cxx_build::bridge("src/main.rs")
.flag("-std=c++17")
.include(".")
.compile("test");
}
Is it sound for me to call Pin::new_unchecked here?
Yes, it's sound. In this context, we know the Row is pinned because:
The Table is pinned;
The Row is stored inline in the Table;
C++'s move semantics essentially means every C++ object is "pinned" anyway.
This program fails to compile with the following error:
When you call a method on a normal mutable reference (&mut T), the compiler implicitly performs a reborrow in order to avoid moving the mutable reference, because &mut T is not Copy. Unfortunately, this compiler "magic" doesn't extend to Pin<&mut T> (which is not Copy either), so instead we must reborrow explicitly.
The easiest way to reborrow is to use Pin::as_mut(). This use case is even called out in the documentation:
This method is useful when doing multiple calls to functions that consume the pinned type.
fn main() {
let mut table = ffi::make_table();
let mut row = table.pin_mut().get_row_ref();
let_cxx_string!(hello="hello world");
row.as_mut().set_value(&hello);
let_cxx_string!(hello2="bye world");
row.as_mut().set_value(&hello2);
}
The use of as_mut() on the last use of row is not strictly necessary, but applying it consistently is probably clearer. When compiled with optimizations, this function is probably a noop anyway (for Pin<&mut T>).
How do I set up the API so that the reference to Row can be used for multiple successive method calls (within the constraints established by cxx)?
If you want to hide the as_mut()'s, you could add a method that accepts a &mut Pin<&mut ffi::Row> and does the as_mut() call. (Note that as_mut() is defined on &mut Pin<P>, so the compiler will insert a reborrow of the outer &mut.) Yes, this means there are now two levels of indirection.
impl ffi::Row {
pub fn set_value2(self: &mut Pin<&mut ffi::Row>, value: &cxx::CxxString) {
self.as_mut().set_value(value)
}
}
fn main() {
let mut table = ffi::make_table();
let mut row = table.pin_mut().get_row_ref();
let_cxx_string!(hello="hello world");
row.set_value2(&hello);
let_cxx_string!(hello2="bye world");
row.set_value2(&hello2);
}
I have the following simple OpenCL kernel, that simply copies all entries pointed at a to b
__kernel void mmcopy(__global float* a, __global float* b) {
unsigned pos = get_global_id(0);
b[pos] = a[pos];
}
The following code snippet shows the opencl function calls for creating a buffer memory object out of four floats, and setting the first argument on the kernel with the buffer object.
let mut v = [1f32, 1f32, 1f32, 1f32];
let size = mem::size_of_val(&v) as size_t;
let mut error_buffer = 0 as i32;
let buffer = unsafe {
clCreateBuffer(
context.id.unwrap(),
(CL_MEM_COPY_HOST_PTR | CL_MEM_READ_WRITE) as u64,
size,
v.as_mut_ptr() as *mut c_void,
&mut error_buffer,
)
};
let real_size = mem::size_of::<cl_mem>() as size_t;
let error = unsafe {
clSetKernelArg(
self.id.unwrap(), // here `self` is a wrapper. `id` is of type `cl_kernel`
0 as cl_uint,
real_size,
buffer as *const c_void,
)
};
However, executing the code results in an error CL_INVALID_MEM_OBJECT.
it looks like creating the buffer didn't succeed, but returned without an error.
The spec is also not very precise when it comes to describe the error in more detail:
for an argument declared to be a memory object when the specified arg_value is not a valid memory object.
note: the OpenCL functions, and types have been generated by rust-bindgen.
update 1
To clarify how the opaque types are represented in rust, here is the representation of cl_mem,
pub struct _cl_mem {
_unused: [u8; 0],
}
pub type cl_mem = *mut _cl_mem;
the ffi to clSetKernelArg
extern "C" {
pub fn clSetKernelArg(
kernel: cl_kernel,
arg_index: cl_uint,
arg_size: size_t,
arg_value: *const ::std::os::raw::c_void,
) -> cl_int;
}
and clCreateBuffer
extern "C" {
pub fn clCreateBuffer(
context: cl_context,
flags: cl_mem_flags,
size: size_t,
host_ptr: *mut ::std::os::raw::c_void,
errcode_ret: *mut cl_int,
) -> cl_mem;
}
In my understanding rust(-bindgen) uses zero sized types (ZST) to represent external opaque types. So basically cl_mem is already a pointer.
update 2
According to pmdj's answer the correct way is to pass a pointer to the cl_mem buffer
let error = unsafe {
clSetKernelArg(
self.id.unwrap(), // here `self` is a wrapper. `id` is of type `cl_kernel`
0 as cl_uint,
real_size,
&buffer as *const _ as *const c_void,
)
};
That actually fixes the problem, and set the return value to CL_SUCCESS. The spec for clSetKernelArg also mentions a pointer to data
A pointer to data that should be used as the argument value for argument specified by arg_index. The argument data pointed to by arg_value is copied and the arg_value pointer can therefore be reused by the application after clSetKernelArg returns. The argument value specified is the value used by all API calls that enqueue kernel (clEnqueueNDRangeKernel) until the argument value is changed by a call to clSetKernelArg for kernel [...]
Before I dig in, I'll point out that I'm a relative beginner with Rust and I'm not particularly familiar with what bindgen produces, but I know OpenCL quite well. So please bear with me if my Rust syntax is off.
The most obvious thing that sticks out for me is that passing the buffer to clSetKernelArg using buffer as *const c_void looks suspicious. My understanding is that your code is roughly equivalent to this C:
cl_int error_buffer = 0;
cl_mem buffer = clCreateBuffer(
context.id,
(CL_MEM_COPY_HOST_PTR | CL_MEM_READ_WRITE),
size,
v,
&error_buffer
);
size_t real_size = siezof(buffer);
cl_int error = clSetKernelArg(self.id, 0, real_size, buffer);
However, the last line is incorrect, it should be:
cl_int error = clSetKernelArg(self.id, 0, real_size, &buffer);
// yes, we want a POINTER to the buffer handle-------^
Although cl_mem is defined as a pointer to an opaque struct type, you need to pass the pointer to that pointer as the argument, just as with any other type of kernel argument: conceptually, I find it useful to think of it as clSetKernelArg performing a memcpy(internal_buffer, arg_value, arg_size); internally - so arg_size must always be the size of the object pointed to by arg_value. I find this helps me work out the correct level of indirection.
So in Rust this is probably along the lines of:
let error = unsafe {
clSetKernelArg(
self.id.unwrap(),
0 as cl_uint,
real_size,
&buffer as *const c_void,
)
};
but I haven't run it past rustc so it's probably wrong. You get the drift though.
I would like to return some strings to C via a Rust FFI call. I also would like to ensure they're cleaned up properly.
I'm creating the strings on the Rust side and turning them into an address of an array of strings.
use core::mem;
use std::ffi::CString;
#[no_mangle]
pub extern "C" fn drop_rust_memory(mem: *mut ::libc::c_void) {
unsafe {
let boxed = Box::from_raw(mem);
mem::drop(boxed);
}
}
#[no_mangle]
pub extern "C" fn string_xfer(strings_out: *mut *mut *mut ::libc::c_char) -> usize {
unsafe {
let s1 = CString::new("String 1").unwrap();
let s2 = CString::new("String 2").unwrap();
let s1 = s1.into_raw();
let s2 = s2.into_raw();
let strs = vec![s1, s2];
let len = strs.len();
let mut boxed_slice = strs.into_boxed_slice();
*strings_out = boxed_slice.as_mut_ptr() as *mut *mut ::libc::c_char;
mem::forget(boxed_slice);
len
}
}
On the C side, I call the Rust FFI function, print the strings and then attempt to delete them via another Rust FFI call.
extern size_t string_xfer(char ***out);
extern void drop_rust_memory(void* mem);
int main() {
char **receiver;
int len = string_xfer(&receiver);
for (int i = 0; i < len; i++) {
printf("<%s>\n", receiver[i]);
}
drop_rust_memory(receiver);
printf("# rust memory dropped\n");
for (int i = 0; i < len; i++) {
printf("<%s>\n", receiver[i]);
}
return 0;
}
This appears to work. For the second printing after the drop, I would expect to get a crash or some undefined behavior, but I get this
<String 1>
<String 2>
# rust memory dropped
<(null)>
<String 2>
which makes me less sure about the entire thing.
First you may want take a look at Catching panic! when Rust called from C FFI, without spawning threads. Because panic will invoke undefined behaviour in this case. So you better catch the panic or avoid have code that can panic.
Secondly, into_boxed_slice() is primary use when you don't need vector feature any more so "A contiguous growable array type". You could also use as_mut_ptr() and forget the vector. That a choice either you want to carry the capacity information into C so you can make the vector grow or you don't want. (I think vector is missing a into_raw() method but I'm sure you can code one (just an example) to avoid critical code repetition). You could also use Box::into_raw() followed with a cast to transform the slice to pointer:
use std::panic;
use std::ffi::CString;
pub unsafe extern "C" fn string_xfer(len: &mut libc::size_t) -> Option<*mut *mut libc::c_char> {
if let Ok(slice) = panic::catch_unwind(move || {
let s1 = CString::new("String 1").unwrap();
let s2 = CString::new("String 2").unwrap();
let strs = vec![s1.into_raw(), s2.into_raw()];
Box::into_raw(strs.into_boxed_slice())
}) {
*len = (*slice).len();
Some(slice as _)
} else {
None
}
}
Third, your drop_rust_memory() only drop a pointer, I think you are doing a total UB here. Rust memory allocation need the real size of the allocation (capacity). And you didn't give the size of your slice, you tell to Rust "free this pointer that contain a pointer to nothing (void so 0)" but that not the good capacity. You need to use from_raw_parts_mut(), your C code must give the size of the slice to the Rust code. Also, you need to properly free your CString you need to call from_raw() to do it (More information here):
use std::ffi::CString;
pub unsafe extern "C" fn drop_rust_memory(mem: *mut *mut libc::c_char, len: libc::size_t) {
let slice = Box::from_raw(std::slice::from_raw_parts_mut(mem, len));
for &x in slice.iter() {
CString::from_raw(x);
} // CString will free resource don't use mem/vec element after
}
To conclude, you should read more about undefined behaviour, it's not about "expect a crash" or "something" should happen. When your program trigger a UB, everything can happen, you go into a random zone, read more about UB on this amazing LLVM blog post
Note about C style prefer return the pointer and not the size because strings_out: *mut *mut *mut ::libc::c_char is a ugly thing so do pub extern fn string_xfer(size: &mut libc::size_t) -> *mut *mut libc::c_char. Also, How to check if function pointer passed from C is non-NULL
Some C code calls into the Rust open call below which returns a pointer. Later the C code passes the exact same pointer back to the close function which tries to drop (free) it. It segfaults in free(3). Why?
use std::os::raw::{c_int, c_void};
struct Handle;
extern "C" fn open(_readonly: c_int) -> *mut c_void {
let h = Handle;
let h = Box::new(h);
return Box::into_raw(h) as *mut c_void;
}
extern "C" fn close(h: *mut c_void) {
let h = unsafe { Box::from_raw(h) };
// XXX This segfaults - why?
drop(h);
}
In close, you end up creating a Box<c_void> instead of a Box<Handle> because you didn't cast the *mut c_void back to *mut Handle before invoking Box::from_raw.
fn close(h: *mut c_void) {
let h = unsafe { Box::from_raw(h as *mut Handle) };
drop(h);
}
By the way, Box doesn't actually allocate any memory for a zero-sized type (such as Handle here) and uses a fixed, non-zero pointer value instead (which, in the current implementation, is the alignment of the type; a zero-sized type has an alignment of 1 by default). The destructor for a boxed zero-sized type knows not to try to deallocate memory at this fictitious memory address, but c_void is not a zero-sized type (it has size 1), so the destructor for Box<c_void> tries to free memory at address 0x1, which causes the segfault.
If Handle wasn't zero-sized, then the code may not crash, but it would still run the wrong destructor (it'd run c_void's destructor, which does nothing), and this may cause memory leaks. A destructor runs Drop::drop for the type if present, then drops the type's fields.
The problem is you didn't cast the pointer back to a Handle pointer while transforming it back to a Box, and got a Box of the wrong type.
This works:
fn close(h: *mut c_void) {
let h = unsafe { Box::from_raw(h as *mut Handle) };
// ^^^^^^^^^^^^^^
drop(h);
}
In your code, h is a std::boxed::Box<std::ffi::c_void>.
I want to do Rust bindings to a C library which requires a callback, and this callback must return a C-style char* pointer to the C library which will then free it.
The callback must be in some sense exposed to the user of my library (probably using closures), and I want to provide a Rust interface as convenient as possible (meaning accepting a String output if possible).
However, the C library complains when trying to free() a pointer coming from memory allocated by Rust, probably because Rust uses jemalloc and the C library uses malloc.
So currently I can see two workarounds using libc::malloc(), but both of them have disadvantages:
Give the user of the library a slice that he must fill (inconvenient, and imposes length restrictions)
Take his String output, copy it to an array allocated by malloc, and then free the String (useless copy and allocation)
Can anybody see a better solution?
Here is an equivalent of the interface of the C library, and the implementation of the ideal case (if the C library could free a String allocated in Rust)
extern crate libc;
use std::ffi::CString;
use libc::*;
use std::mem;
extern "C" {
// The second parameter of this function gets passed as an argument of my callback
fn need_callback(callback: extern fn(arbitrary_data: *mut c_void) -> *mut c_char,
arbitrary_data: *mut c_void);
}
// This function must return a C-style char[] that will be freed by the C library
extern fn my_callback(arbitrary_data: *mut c_void) -> *mut c_char {
unsafe {
let mut user_callback: *mut &'static mut FnMut() -> String = mem::transmute(arbitrary_data); //'
let user_string = (*user_callback)();
let c_string = CString::new(user_string).unwrap();
let ret: *mut c_char = mem::transmute(c_string.as_ptr());
mem::forget(c_string); // To prevent deallocation by Rust
ret
}
}
pub fn call_callback(mut user_callback: &mut FnMut() -> String) {
unsafe {
need_callback(my_callback, mem::transmute(&mut user_callback));
}
}
The C part would be equivalent to this:
#include <stdlib.h>
typedef char* (*callback)(void *arbitrary_data);
void need_callback(callback cb, void *arbitrary_data) {
char *user_return = cb(arbitrary_data);
free(user_return); // Complains as the pointer has been allocated with jemalloc
}
It might require some annoying work on your part, but what about exposing a type that implements Write, but is backed by memory allocated via malloc? Then, your client can use the write! macro (and friends) instead of allocating a String.
Here's how it currently works with Vec:
let mut v = Vec::new();
write!(&mut v, "hello, world");
You would "just" need to implement the two methods and then you would have a stream-like interface.