Safely handling a buffer from C - rust

I have a Rust function like this:
pub fn get_buffer() -> &[u8] {
// returns *mut c_char
let ptr = unsafe { get_buffer_from_c_code() };
// returns &[u8]
let buf = unsafe { core::slice::from_raw_parts(ptr as *const u8, 10) };
buf
}
It generates this error:
pub fn get_buffer() -> &[u8] {
| ^ expected named lifetime parameter
|
= help: this function's return type contains a borrowed value, but there is no value for it to be borrowed from
help: consider using the `'static` lifetime
|
19 | pub fn get_buffer() -> &'static [u8] {
| ~~~~~~~~
I understand the error. It makes sense.
Question: should I take the compiler's suggestion and add a static lifetime specifier?
I'm connecting Rust to a C library that allocates memory internally and returns a pointer to it. Later, the C library takes care of de-allocating that memory on its own. I'd like the Rust code to be able to read what's in the memory, but otherwise leave it alone.
Rust is allocating the slice, though, in its own memory, and the slice itself (the pointer and the length) need to be dropped eventually.
Does a static lifetime do what I need it to do? Will Rust drop the slice, but not try to free the underlying buffer?

Question: should I take the compiler's suggestion and add a static lifetime specifier?
No. If your function return a static reference then you're promising to your caller that it can keep the reference around (and read through it) as long as it likes, which is only true if the buffer is never deallocated and never modified.
I'm connecting Rust to a C library that allocates memory internally and returns a pointer to it. Later, the C library takes care of de-allocating that memory on its own.
The solution to this problem depends entirely on when the deallocation happens. You need to ensure that there is some lifetime of a borrow such that there is no possibility to cause the deallocation until the borrow ends. You wrote in a comment
When my Rust code returns then C deallocates stuff.
That's key to picking the solution. That means that the reference should be obtained when the Rust code is called. That is:
extern "C" wrapper_called_from_c_code() {
let ptr = unsafe { get_buffer_from_c_code() };
let buf = unsafe { core::slice::from_raw_parts(ptr as *const u8, 10) };
// Constrain the lifetime of the slice to be the duration of
// this call by passing it through a lifetime-generic function.
// (<'a> is just for explicitness and could be elided.)
fn shim<'a>(buf: &'a [u8]) {
safe_rust_code(buf);
}
shim(buf);
// Now, after the function call returns, it's safe to proceed with
// allowing the C code to deallocate the buffer.
}
fn safe_rust_code(buf: &[u8]) {
// write whatever you like here
}
safe_rust_code can do whatever it likes in its function body, but the borrow checker will ensure it cannot hang onto the &'a [u8] slice reference longer than is safe.
The shim function exists to ensure that what wrapper_called_from_c_code needs (that the slice reference is being passed to a lifetime-generic function and not one that accepts &'static [u8]) inside wrapper_called_from_c_code rather than to explain it as a constraint on another function. I consider this good practice to keep invariants in the narrowest scope possible, to reduce the chances that they're broken by merely editing safe code without reading the comments.

Wrap the buffer in a struct that frees the buffer when the struct is dropped. The struct can then own the buffer, much like a Vec owns a block of data on the heap. When it hands out references their lifetimes will naturally be tied to the lifetime of the struct.
pub struct Buffer {
ptr: *const u8,
len: usize,
}
impl Buffer {
pub fn new() -> Self {
Self {
ptr: unsafe { get_buffer_from_c_code() },
len: 10,
}
}
}
impl Drop for Buffer {
fn drop(&mut self) {
unsafe {
free_buffer_from_c_code(self.ptr);
}
}
}
impl Deref for Buffer {
type Target = [u8];
fn deref(&self) -> &[u8] {
// SAFETY: The C library must not modify the contents of the buffer
// for the lifetime of the slice.
unsafe { slice::from_raw_parts(self.ptr, self.len) }
}
}

The solution I ended up going with was something like this:
pub fn get_buffer<'a>(foo: u32) -> &'a [u8] {
let ptr = unsafe { get_buffer_from_c_code() };
let buf = unsafe { core::slice::from_raw_parts(ptr as *const u8, 10) };
buf
}
Adding a parameter and then defining a lifetime on the function itself works. Essentially, it says that the returned slice lives as long as the foo parameter.
You can also do this:
pub fn get_buffer(foo: &u32) -> &[u8] {
If foo is a reference then you can elide the lifetime. The system makes the slice live as long as the foo reference. I don't know why it can do this with a reference and not a value parameter, but there it is.

Related

How to make a field lifetime same as the struct?

I am getting memory allocated by an external C function. I then convert the memory to a [u8] slice using std::slice::from_raw_parts(). To avoid repeated calls to from_raw_parts() I want to store the slice as a field. So far I have this.
struct ImageBuffer<'a> {
//Slice view of buffer allocated by C
pub bytes: &'a [u8],
}
But this is wrong right there. This says bytes has a lifetime larger than the ImageBuffer instance. This does not reflect the reality. The buffer is freed up from drop() and hence bytes should have a lifetime same as the struct instance. How do I model that?
With the current code it is easy to use after free.
impl <'a> ImageBuffer<'a> {
pub fn new() -> ImageBuffer<'a> {
let size: libc::size_t = 100;
let ptr = unsafe {libc::malloc(size)};
let bytes = unsafe {std::slice::from_raw_parts(ptr as *const u8, size)};
ImageBuffer {
bytes,
}
}
}
impl <'a> Drop for ImageBuffer<'a> {
fn drop(&mut self) {
unsafe {
libc::free(self.bytes.as_ptr() as *mut libc::c_void)
};
}
}
fn main() {
let bytes;
{
let img = ImageBuffer::new();
bytes = img.bytes;
}
//Use after free!
println!("Size: {}. First: {}", bytes.len(), bytes[0]);
}
I can solve this problem by writing a getter function for bytes.
First make bytes private.
struct ImageBuffer<'a> {
bytes: &'a [u8],
}
Then write this getter method. This establishes the fact that the returned bytes has a lifetime same as the struct instance.
impl <'a> ImageBuffer<'a> {
pub fn get_bytes(&'a self) -> &'a [u8] {
self.bytes
}
}
Now, a use after free will not be allowed. The following will not compile.
fn main() {
let bytes;
{
let img = ImageBuffer::new();
bytes = img.get_bytes();
}
println!("Size: {}. First: {}", bytes.len(), bytes[0]);
}
I find this solution deeply disturbing. The struct declaration is still conveying a wrong meaning (it still says bytes has a larger lifetime than the struct instance). The get_bytes() method counters that and conveys the correct meaning. I'm looking for an explanation of this situation and what the best way to handle it.
Lifetimes cannot be used to express what you are doing, and therefore you should not be using a reference, since references always use lifetimes.
Instead, store a raw pointer in your struct. It can be a slice pointer; just not a slice reference.
struct ImageBuffer {
bytes: *const [u8],
}
To create the pointer, convert it from the C pointer without involving references. To create a safe reference, do it in your getter (or a Deref implementation):
impl ImageBuffer {
pub fn new() -> ImageBuffer {
let size: libc::size_t = 100;
let ptr = unsafe {libc::malloc(size)};
assert!(!ptr.is_null());
let bytes = unsafe {
std::ptr::slice_from_raw_parts(ptr as *const u8, size)
};
ImageBuffer {
bytes,
}
}
pub fn get_bytes(&self) -> &[u8] {
// Safety: the pointer is valid until `*self` is dropped, and it
// cannot be dropped while it is borrowed by this reference
unsafe { &*self.bytes }
}
}
This is essentially what Box, the basic owning pointer type, does, except that your pointer was allocated by the C allocator and this needs to be freed using it too (using your Drop implementation).
All of this is the normal and routine thing to do to make a safe wrapper for an owning C pointer.

Returning a reference of data that a raw pointer point to

Here's my code:
use std::ptr::NonNull;
struct S {
i: i32
}
impl Clone for S {
fn clone(&self) -> Self {
S {
i: self.i
}
}
}
struct F {
v: Vec<NonNull<S>>
}
impl F {
pub fn func<'a>(&'a self) -> &'a mut S {
let s = &mut unsafe {
*self.v[0].as_ptr()
};
s
}
}
fn main() {
let f = F {
v: vec![NonNull::new_unchecked(Box::into_raw(Box::new(S{i: 32}.clone()))); 5]
};
f.func();
}
When I compile it, the compiler reminds me that "returns a value referencing data owned by the current function".
Here's my question: When I dereferenced a raw pointer, the data that it points to shouldn't be owned by the struct self? How come it become owned by the current function. And also the compiler reminds me that "move occurs" when I dereferenced the raw pointer. But I do not implement the Copy trait for the struct S, so it failed.
Could someone explain this to me please? Thanks in advance.
Without reading too much into the context of what you are trying to do, it is really easy to fix this issue. The error comes from the fact that the unsafe {} splits the operation into two parts. Whenever you want to turn a raw pointer into a reference don't split up the &mut *x.
// Take a new mutable reference to whatever is returned by the unsafe {} block
let s = &mut unsafe {
// Copy whatever is stored at the raw pointer to a new owned value
*self.v[0].as_ptr()
};
let s = unsafe {
// Create a new mutable reference to a raw pointer
&mut *self.v[0].as_ptr()
};
After that you just need to mark the NonNull::new_unchecked as unsafe and you are good to go.
rust playground link

How to expose a Rust `Vec<T>` to FFI?

I'm trying to construct a pair of elements:
array: *mut T
array_len: usize
array is intended to own the data
However, Box::into_raw will return *mut [T]. I cannot find any info on converting raw pointers to slices. What is its layout in memory? How do I use it from C? Should I convert to *mut T? If so, how?
If you just want some C function to mutably borrow the Vec, you can do it like this:
extern "C" {
fn some_c_function(ptr: *mut i32, len: ffi::size_t);
}
fn safe_wrapper(a: &mut [i32]) {
unsafe {
some_c_function(a.as_mut_ptr(), a.len() as ffi::size_t);
}
}
Of course, the C function shouldn't store this pointer somewhere else because that would break aliasing assumptions.
If you want to "pass ownership" of the data to C code, you'd do something like this:
use std::mem;
extern "C" {
fn c_sink(ptr: *mut i32, len: ffi::size_t);
}
fn sink_wrapper(mut vec: Vec<i32>) {
vec.shrink_to_fit();
assert!(vec.len() == vec.capacity());
let ptr = vec.as_mut_ptr();
let len = vec.len();
mem::forget(vec); // prevent deallocation in Rust
// The array is still there but no Rust object
// feels responsible. We only have ptr/len now
// to reach it.
unsafe {
c_sink(ptr, len as ffi::size_t);
}
}
Here, the C function "takes ownership" in the sense that we expect it to eventually return the pointer and length to Rust, for example, by calling a Rust function to deallocate it:
#[no_mangle]
/// This is intended for the C code to call for deallocating the
/// Rust-allocated i32 array.
unsafe extern "C" fn deallocate_rust_buffer(ptr: *mut i32, len: ffi::size_t) {
let len = len as usize;
drop(Vec::from_raw_parts(ptr, len, len));
}
Because Vec::from_raw_parts expects three parameters, a pointer, a size and a capacity, we either have to keep track of the capacity as well somehow, or we use Vec's shrink_to_fit before passing the pointer and length to the C function. This might involve a reallocation, though.
You could use [T]::as_mut_ptr to obtain the *mut T pointer directly from Vec<T>, Box<[T]> or any other DerefMut-to-slice types.
use std::mem;
let mut boxed_slice: Box<[T]> = vector.into_boxed_slice();
let array: *mut T = boxed_slice.as_mut_ptr();
let array_len: usize = boxed_slice.len();
// Prevent the slice from being destroyed (Leak the memory).
mem::forget(boxed_slice);

How to get the v-ptr for a given Trait/Struct combination?

In Rust, a &T where T is a trait is a fat reference, which actually corresponds to raw::TraitObject:
pub struct TraitObject {
pub data: *mut (),
pub vtable: *mut (),
}
Using TraitObject, one can de-construct and re-construct a &T at leisure.
However, while obtaining the vtable from de-constructing a &T is easy, what if I never have the &T in the first place, but just a T and S; essentially, something along the lines of:
fn make_vptr<T: ?Sized, S>() -> *mut ();
How could I divine the v-ptr from there? Is there any intrinsic I could use?
Note: the naive implementation of creating a S (or conjuring it from thin-air) and then making a &T reference does not work; the compiler complains that T is not necessarily a trait and therefore that &T is either one pointer or two pointers in size.
A possibility is to use a macro to do the magic job:
#![feature(raw)]
macro_rules! make_vptr(
($S:ty, $T:ty) => ({
let s: &$S = unsafe { ::std::mem::uninitialized() };
let t: &$T = s;
let r: ::std::raw::TraitObject = unsafe { ::std::mem::transmute(t) };
r.vtable
})
);
This code will not compile if T is not a trait (thanks to transmute(..) checking that &T is a fat pointer) or if T is not implemented by S (thanks to the assignment).
Then, it can be used directly:
use std::fmt::Display;
fn main() {
let u32_display_vtable = make_vptr!(u32, Display);
let x = 42u32;
let disp: &Display = unsafe {
::std::mem::transmute(::std::raw::TraitObject {
data: &x as *const _ as *mut _,
vtable: u32_display_vtable,
})
};
println!("{}", disp);
}
I don't believe this is currently possible.
In order for this to work, you'd need to be able to constrain the T generic parameter to only accept traits. You can't do this. As a result, it won't ever let you do anything with &T that depends on it being a trait, such as getting the vtable.

How do I globally store a trait object to make it accessible to a C API callback?

Suppose a C API which calls a callback before returning. Unfortunately, there is no way to send data to the callback except by global variables. There is only 1 thread, by the way.
To make this example compile, I've added a dummy implementation for it in Rust, the real thing is extern "C"
unsafe fn c_api(c_api_callback:extern fn()){
c_api_callback();
}
I want to encapsulate some state for this API
pub trait State {
fn called(&mut self); //c_api_callback should call this on self
}
In a generic way. Multiple independent implementations of State can exist
struct MyState {
value:i32
}
impl State for MyState{
fn called(&mut self){
println!("I hope this prints 123:{}", self.value);
}
}
pub fn main(){
let mut mystate = MyState { value: 123 };
do_call(&mut mystate);
}
The basic question: How do I implement what follows?
//rustc says: error: explicit lifetime bound required [E0228]
static static_state:* mut State=0 as *mut State;
//This doesn't work
//static static_state:*'static mut State=0 as *mut State;
//error: bare raw pointers are no longer allowed, you should likely use `*mut T`, but otherwise `*T` is now known as `*const T`
extern fn my_callback_impl(){
static_state.called();
}
pub fn do_call(state:&mut State){
static_state=state;
unsafe{
c_api(my_callback_impl);
}
static_state=0 as *mut State;
}
I tried all kinds of horrible workarounds, up to wrapping the trait in a struct and using transmute on it to cast it to *u8, and I have a nice collection of weird error messages and compiler crashes as a result.
As this is the second time I get confused by static in rust, I would also appreciate it if someone has some pointers to blogs or good example code clarifying what's going on here.
The 'static lifetime actually isn't too complicated - it simply denotes that something is guaranteed to live for the entire life of the program. In this case, a global value, by definition, needs to be available for that long.
A problem often occurs because people want to initialize that global value during runtime of the program, which means that it isn't available for the entire program.
Now, the meat of the problem. Solution presented with very little guarantee on how safe it is.
First, I think you are running into a bug that prevents you from directly storing the trait object. To work around that, we wrap the trait object in a little dummy struct (Holder) that gives the trait object somewhere to live.
Then, we stick the reference to the holder into the global, mutable, scary, location. Call the callback, and wham, presto, there it is!
use std::mem;
struct Holder<'a>(&'a mut (State + 'a)); //'
// You'd truly better never use this in multiple threads!
static mut static_state: *mut Holder<'static> = 0 as *mut _; //'
pub trait State {
fn called(&mut self);
}
struct MyState {
value: i32
}
impl State for MyState{
fn called(&mut self) {
println!("I hope this prints 123:{}", self.value);
}
}
unsafe fn c_api(c_api_callback: extern fn()) {
c_api_callback();
}
extern fn my_callback_impl() {
// really should check that it's not 0 here...
let h = unsafe { &mut *static_state };
h.0.called();
}
pub fn do_call(state: &mut State){
let h = Holder(state);
unsafe {
// Straight-up lie to the compiler: "yeah, this is static"
static_state = mem::transmute(&h);
c_api(my_callback_impl);
static_state = 0 as *mut _;
}
}
pub fn main(){
let mut mystate = MyState { value: 123 };
do_call(&mut mystate);
}

Resources