How to expose a Rust `Vec<T>` to FFI?

How to expose a Rust `Vec<T>` to FFI? - rust

I'm trying to construct a pair of elements:
array: *mut T
array_len: usize
array is intended to own the data
However, Box::into_raw will return *mut [T]. I cannot find any info on converting raw pointers to slices. What is its layout in memory? How do I use it from C? Should I convert to *mut T? If so, how?

If you just want some C function to mutably borrow the Vec, you can do it like this:
extern "C" {
fn some_c_function(ptr: *mut i32, len: ffi::size_t);
}
fn safe_wrapper(a: &mut [i32]) {
unsafe {
some_c_function(a.as_mut_ptr(), a.len() as ffi::size_t);
}
}
Of course, the C function shouldn't store this pointer somewhere else because that would break aliasing assumptions.
If you want to "pass ownership" of the data to C code, you'd do something like this:
use std::mem;
extern "C" {
fn c_sink(ptr: *mut i32, len: ffi::size_t);
}
fn sink_wrapper(mut vec: Vec<i32>) {
vec.shrink_to_fit();
assert!(vec.len() == vec.capacity());
let ptr = vec.as_mut_ptr();
let len = vec.len();
mem::forget(vec); // prevent deallocation in Rust
// The array is still there but no Rust object
// feels responsible. We only have ptr/len now
// to reach it.
unsafe {
c_sink(ptr, len as ffi::size_t);
}
}
Here, the C function "takes ownership" in the sense that we expect it to eventually return the pointer and length to Rust, for example, by calling a Rust function to deallocate it:
#[no_mangle]
/// This is intended for the C code to call for deallocating the
/// Rust-allocated i32 array.
unsafe extern "C" fn deallocate_rust_buffer(ptr: *mut i32, len: ffi::size_t) {
let len = len as usize;
drop(Vec::from_raw_parts(ptr, len, len));
}
Because Vec::from_raw_parts expects three parameters, a pointer, a size and a capacity, we either have to keep track of the capacity as well somehow, or we use Vec's shrink_to_fit before passing the pointer and length to the C function. This might involve a reallocation, though.

You could use [T]::as_mut_ptr to obtain the *mut T pointer directly from Vec<T>, Box<[T]> or any other DerefMut-to-slice types.
use std::mem;
let mut boxed_slice: Box<[T]> = vector.into_boxed_slice();
let array: *mut T = boxed_slice.as_mut_ptr();
let array_len: usize = boxed_slice.len();
// Prevent the slice from being destroyed (Leak the memory).
mem::forget(boxed_slice);

Related

How to make a field lifetime same as the struct?

I am getting memory allocated by an external C function. I then convert the memory to a [u8] slice using std::slice::from_raw_parts(). To avoid repeated calls to from_raw_parts() I want to store the slice as a field. So far I have this.
struct ImageBuffer<'a> {
//Slice view of buffer allocated by C
pub bytes: &'a [u8],
}
But this is wrong right there. This says bytes has a lifetime larger than the ImageBuffer instance. This does not reflect the reality. The buffer is freed up from drop() and hence bytes should have a lifetime same as the struct instance. How do I model that?
With the current code it is easy to use after free.
impl <'a> ImageBuffer<'a> {
pub fn new() -> ImageBuffer<'a> {
let size: libc::size_t = 100;
let ptr = unsafe {libc::malloc(size)};
let bytes = unsafe {std::slice::from_raw_parts(ptr as *const u8, size)};
ImageBuffer {
bytes,
}
}
}
impl <'a> Drop for ImageBuffer<'a> {
fn drop(&mut self) {
unsafe {
libc::free(self.bytes.as_ptr() as *mut libc::c_void)
};
}
}
fn main() {
let bytes;
{
let img = ImageBuffer::new();
bytes = img.bytes;
}
//Use after free!
println!("Size: {}. First: {}", bytes.len(), bytes[0]);
}
I can solve this problem by writing a getter function for bytes.
First make bytes private.
struct ImageBuffer<'a> {
bytes: &'a [u8],
}
Then write this getter method. This establishes the fact that the returned bytes has a lifetime same as the struct instance.
impl <'a> ImageBuffer<'a> {
pub fn get_bytes(&'a self) -> &'a [u8] {
self.bytes
}
}
Now, a use after free will not be allowed. The following will not compile.
fn main() {
let bytes;
{
let img = ImageBuffer::new();
bytes = img.get_bytes();
}
println!("Size: {}. First: {}", bytes.len(), bytes[0]);
}
I find this solution deeply disturbing. The struct declaration is still conveying a wrong meaning (it still says bytes has a larger lifetime than the struct instance). The get_bytes() method counters that and conveys the correct meaning. I'm looking for an explanation of this situation and what the best way to handle it.

Lifetimes cannot be used to express what you are doing, and therefore you should not be using a reference, since references always use lifetimes.
Instead, store a raw pointer in your struct. It can be a slice pointer; just not a slice reference.
struct ImageBuffer {
bytes: *const [u8],
}
To create the pointer, convert it from the C pointer without involving references. To create a safe reference, do it in your getter (or a Deref implementation):
impl ImageBuffer {
pub fn new() -> ImageBuffer {
let size: libc::size_t = 100;
let ptr = unsafe {libc::malloc(size)};
assert!(!ptr.is_null());
let bytes = unsafe {
std::ptr::slice_from_raw_parts(ptr as *const u8, size)
};
ImageBuffer {
bytes,
}
}
pub fn get_bytes(&self) -> &[u8] {
// Safety: the pointer is valid until `*self` is dropped, and it
// cannot be dropped while it is borrowed by this reference
unsafe { &*self.bytes }
}
}
This is essentially what Box, the basic owning pointer type, does, except that your pointer was allocated by the C allocator and this needs to be freed using it too (using your Drop implementation).
All of this is the normal and routine thing to do to make a safe wrapper for an owning C pointer.

Safely handling a buffer from C

I have a Rust function like this:
pub fn get_buffer() -> &[u8] {
// returns *mut c_char
let ptr = unsafe { get_buffer_from_c_code() };
// returns &[u8]
let buf = unsafe { core::slice::from_raw_parts(ptr as *const u8, 10) };
buf
}
It generates this error:
pub fn get_buffer() -> &[u8] {
| ^ expected named lifetime parameter
|
= help: this function's return type contains a borrowed value, but there is no value for it to be borrowed from
help: consider using the `'static` lifetime
|
19 | pub fn get_buffer() -> &'static [u8] {
| ~~~~~~~~
I understand the error. It makes sense.
Question: should I take the compiler's suggestion and add a static lifetime specifier?
I'm connecting Rust to a C library that allocates memory internally and returns a pointer to it. Later, the C library takes care of de-allocating that memory on its own. I'd like the Rust code to be able to read what's in the memory, but otherwise leave it alone.
Rust is allocating the slice, though, in its own memory, and the slice itself (the pointer and the length) need to be dropped eventually.
Does a static lifetime do what I need it to do? Will Rust drop the slice, but not try to free the underlying buffer?

Question: should I take the compiler's suggestion and add a static lifetime specifier?
No. If your function return a static reference then you're promising to your caller that it can keep the reference around (and read through it) as long as it likes, which is only true if the buffer is never deallocated and never modified.
I'm connecting Rust to a C library that allocates memory internally and returns a pointer to it. Later, the C library takes care of de-allocating that memory on its own.
The solution to this problem depends entirely on when the deallocation happens. You need to ensure that there is some lifetime of a borrow such that there is no possibility to cause the deallocation until the borrow ends. You wrote in a comment
When my Rust code returns then C deallocates stuff.
That's key to picking the solution. That means that the reference should be obtained when the Rust code is called. That is:
extern "C" wrapper_called_from_c_code() {
let ptr = unsafe { get_buffer_from_c_code() };
let buf = unsafe { core::slice::from_raw_parts(ptr as *const u8, 10) };
// Constrain the lifetime of the slice to be the duration of
// this call by passing it through a lifetime-generic function.
// (<'a> is just for explicitness and could be elided.)
fn shim<'a>(buf: &'a [u8]) {
safe_rust_code(buf);
}
shim(buf);
// Now, after the function call returns, it's safe to proceed with
// allowing the C code to deallocate the buffer.
}
fn safe_rust_code(buf: &[u8]) {
// write whatever you like here
}
safe_rust_code can do whatever it likes in its function body, but the borrow checker will ensure it cannot hang onto the &'a [u8] slice reference longer than is safe.
The shim function exists to ensure that what wrapper_called_from_c_code needs (that the slice reference is being passed to a lifetime-generic function and not one that accepts &'static [u8]) inside wrapper_called_from_c_code rather than to explain it as a constraint on another function. I consider this good practice to keep invariants in the narrowest scope possible, to reduce the chances that they're broken by merely editing safe code without reading the comments.

Wrap the buffer in a struct that frees the buffer when the struct is dropped. The struct can then own the buffer, much like a Vec owns a block of data on the heap. When it hands out references their lifetimes will naturally be tied to the lifetime of the struct.
pub struct Buffer {
ptr: *const u8,
len: usize,
}
impl Buffer {
pub fn new() -> Self {
Self {
ptr: unsafe { get_buffer_from_c_code() },
len: 10,
}
}
}
impl Drop for Buffer {
fn drop(&mut self) {
unsafe {
free_buffer_from_c_code(self.ptr);
}
}
}
impl Deref for Buffer {
type Target = [u8];
fn deref(&self) -> &[u8] {
// SAFETY: The C library must not modify the contents of the buffer
// for the lifetime of the slice.
unsafe { slice::from_raw_parts(self.ptr, self.len) }
}
}

The solution I ended up going with was something like this:
pub fn get_buffer<'a>(foo: u32) -> &'a [u8] {
let ptr = unsafe { get_buffer_from_c_code() };
let buf = unsafe { core::slice::from_raw_parts(ptr as *const u8, 10) };
buf
}
Adding a parameter and then defining a lifetime on the function itself works. Essentially, it says that the returned slice lives as long as the foo parameter.
You can also do this:
pub fn get_buffer(foo: &u32) -> &[u8] {
If foo is a reference then you can elide the lifetime. The system makes the slice live as long as the foo reference. I don't know why it can do this with a reference and not a value parameter, but there it is.

casting *mut u8 to &[u8] without std

I'm writing Rust code for WebAssembly to handle strings from JavaScript land.
Since WebAssembly has no real string type, I'm trying to pass a pointer to WebAssembly memory object which points to UTF-8 encoded string.
#[no_mangle]
pub extern "C" fn check(ptr: *mut u8, length: u32) -> u32 {
unsafe {
let buf: &[u8] = std::slice::from_raw_parts(ptr, length as usize);
// do some operations on buf
0
}
}
It works fine, expect that I have to depend on the std crate, which bloats the final binary to about 600KB.
Is there any way to get rid of std::slice::from_raw_parts but still be able to cast a raw pointer to a slice?

You cannot cast a raw pointer to a slice because in Rust, a slice is not a mere pointer, it is a pointer and a size (otherwise it could not be safe).
If you do not want to use std, you can use the core crate:
extern crate core;
#[no_mangle]
pub extern "C" fn check(ptr: *mut u8, length: u32) -> u32 {
unsafe {
let buf: &mut [u8] = core::slice::from_raw_parts_mut(ptr, length as usize);
}
// do some operations on buf
0
}
The core crate is the part of the std crate suitable for embedded, i.e. without all the stuff that needs some allocation.

It is possible to manually construct something similar to a slice, which is a fat pointer that consists of a thin pointer and a length. Then cast a pointer-to-this-construct to a pointer-to-slice.
This approach is not only unsafe, it also relies on Rust internals (memory layout of a slice) that are not guaranteed to remain stable between compiler version, or even systems I suppose. #Boiethios' answer is the way to go if you want to be sure that your code works correctly in the future. However, for educational purposes, the code below may still be interesting:
unsafe fn make_slice<'a>(ptr: *const u8, len: usize) -> &'a [u8] {
// place pointer address and length in contiguous memory
let x: [usize; 2] = [ptr as usize, len];
// cast pointer to array as pointer to slice
let slice_ptr = &x as * const _ as *const &[u8];
// dereference pointer to slice, so we get a slice
*slice_ptr
}
fn main() {
let src: Vec<u8> = vec![1, 2, 3, 4, 5, 6];
let raw_ptr = &src[1] as *const u8;
unsafe {
println!("{:?}", make_slice(raw_ptr, 3)); // [2, 3, 4]
}
}
(tested on playground with Rust Stable 1.26.2)

Creating a Vec in Rust from a C array pointer and safely freeing it?

I'm calling a C function from Rust which takes a null pointer as as an argument, then allocates some memory to point it to.
What is the correct way to efficiently (i.e. avoiding unnecessary copies) and safely (i.e. avoid memory leaks or segfaults) turn data from the C pointer into a Vec?
I've got something like:
extern "C" {
// C function that allocates an array of floats
fn allocate_data(data_ptr: *mut *const f32, data_len: *mut i32);
}
fn get_vec() -> Vec<f32> {
// C will set this to length of array it allocates
let mut data_len: i32 = 0;
// C will point this at the array it allocates
let mut data_ptr: *const f32 = std::ptr::null_mut();
unsafe { allocate_data(&mut data_ptr, &mut data_len) };
let data_slice = unsafe { slice::from_raw_parts(data_ptr as *const f32, data_len as usize) };
data_slice.to_vec()
}
If I understand correctly, .to_vec() will copy data from the slice into a new Vec, so the underlying memory will still need to be freed (as the underlying memory for the slice won't be freed when it's dropped).
What is the correct approach for dealing with the above?
can I create a Vec which takes ownership of the underlying memory, which is freed when the Vec is freed?
if not, where/how in Rust should I free the memory that the C function allocated?
anything else in the above that could/should be improved on?

can I create a Vec which takes ownership of the underlying memory, which is freed when the Vec is freed?
Not safely, no. You must not use Vec::from_raw_parts unless the pointer came from a Vec originally (well, from the same memory allocator). Otherwise, you will try to free memory that your allocator doesn't know about; a very bad idea.
Note that the same thing is true for String::from_raw_parts, as a String is a wrapper for a Vec<u8>.
where/how in Rust should I free the memory that the C function allocated?
As soon as you are done with it and no sooner.
anything else in the above that could/should be improved on?
There's no need to cast the pointer when calling slice::from_raw_parts
There's no need for explicit types on the variables
Use ptr::null, not ptr::null_mut
Perform a NULL pointer check
Check the length is non-negative
use std::{ptr, slice};
extern "C" {
fn allocate_data(data_ptr: *mut *const f32, data_len: *mut i32);
fn deallocate_data(data_ptr: *const f32);
}
fn get_vec() -> Vec<f32> {
let mut data_ptr = ptr::null();
let mut data_len = 0;
unsafe {
allocate_data(&mut data_ptr, &mut data_len);
assert!(!data_ptr.is_null());
assert!(data_len >= 0);
let v = slice::from_raw_parts(data_ptr, data_len as usize).to_vec();
deallocate_data(data_ptr);
v
}
}
fn main() {}
You didn't state why you need it to be a Vec, but if you never need to change the size, you can create your own type that can be dereferenced as a slice and drops the data when appropriate:
use std::{ptr, slice};
extern "C" {
fn allocate_data(data_ptr: *mut *const f32, data_len: *mut i32);
fn deallocate_data(data_ptr: *const f32);
}
struct CVec {
ptr: *const f32,
len: usize,
}
impl std::ops::Deref for CVec {
type Target = [f32];
fn deref(&self) -> &[f32] {
unsafe { slice::from_raw_parts(self.ptr, self.len) }
}
}
impl Drop for CVec {
fn drop(&mut self) {
unsafe { deallocate_data(self.ptr) };
}
}
fn get_vec() -> CVec {
let mut ptr = ptr::null();
let mut len = 0;
unsafe {
allocate_data(&mut ptr, &mut len);
assert!(!ptr.is_null());
assert!(len >= 0);
CVec {
ptr,
len: len as usize,
}
}
}
fn main() {}
See also:
How to convert a *const pointer into a Vec to correctly drop it?
Is it possible to call a Rust function taking a Vec from C?

How to convert a *const pointer into a Vec to correctly drop it?

After asking how I should go about freeing memory across the FFI boundary, someone on the Rust reddit suggested that rather than wrapping my structs in a Box, I could use Vec::from_raw_parts to construct a vector from the following struct, and that this could be safely dropped:
#[repr(C)]
pub struct Array {
data: *const c_void,
len: libc::size_t,
}
However, from_raw_parts seems to require *mut _ data, so I'm not sure how to proceed…

The very short answer is self.data as *mut u8. But, let's talk more details...
First, words of warning:
Do not use Vec::from_raw_parts unless the pointer came from a Vec originally. There is no guarantee that an arbitrary pointer will be compatible with a Vec and you are likely to create giant holes in your program if you proceed.
Do not free a pointer that you don't own. Doing so leads to double frees, which will blow other large holes in your program.
You need to know the capacity of the vector before you can reconstruct it. Your example struct only contains a len. This is only acceptable if the len and capacity are equal.
Now, let's see if I can follow my own rules...
extern crate libc;
use std::mem;
#[repr(C)]
pub struct Array {
data: *const libc::c_void,
len: libc::size_t,
}
// Note that both of these methods should probably be implementations
// of the `From` trait to allow them to participate in more places.
impl Array {
fn from_vec(mut v: Vec<u8>) -> Array {
v.shrink_to_fit(); // ensure capacity == size
let a = Array {
data: v.as_ptr() as *const libc::c_void,
len: v.len(),
};
mem::forget(v);
a
}
fn into_vec(self) -> Vec<u8> {
unsafe { Vec::from_raw_parts(self.data as *mut u8, self.len, self.len) }
}
}
fn main() {
let v = vec![1, 2, 3];
let a = Array::from_vec(v);
let v = a.into_vec();
println!("{:?}", v);
}
Note that we don't have to do any explicit dropping of the Vec because the normal Drop implementation of Vec comes into play. We just have to make sure that we construct a Vec properly.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to expose a Rust `Vec<T>` to FFI? - rust

Related

How to make a field lifetime same as the struct?

Safely handling a buffer from C

casting *mut u8 to &[u8] without std

Creating a Vec in Rust from a C array pointer and safely freeing it?

How to convert a *const pointer into a Vec to correctly drop it?

Categories

Resources