Rust ffi pointer does not panic when out of bounds - rust

For some reason, when I run the code below, it does not panic or throw any errors...?
Isn't this a seg fault?
Why is this happening? How do I check the size of the passed pointer to avoid panics? (without the user having to pass a "size" variable as well)
#[repr(C)]
pub struct MyStruct {
pub item: u32
// a bunch of other fields as well
}
#[no_mangle]
pub unsafe extern fn do_something(mut data: *mut MyStruct) {
println!("{:p}", data);
data= data.offset(100);
println!("{:p}", data);
println!("{}", (*data).item);
if data.is_null() {
println!("datais null");
}
}
After I build, (and generate header using cbindgen) I link and use in a sample program like so:
#include "my_bindings.h"
int main() {
MyStruct *data = new MyStruct[2];
do_something(data);
return 0;
}
This is the output I get:
0x55f0ba739eb0
0x55f0ba73a108
0

An out of bounds access is not necessarily a segmentation fault, it's just an unidentified behaviour, the data that's out of bounds may still be a part of your application so the OS won't kill your application.
Unfortunately this is unsafe code, so rust can't do anything about it, and you should wrap it in a safer rust container along with the container length (you must pass the length), that panic on out of bounds access, as in the following answer Creating a Vec in Rust from a C array pointer and safely freeing it?

Related

Rust type that requires manual drop

Is there a way to have the Rust compiler error if a struct is to be automatically dropped?
Example: I'm implementing a memory pool and want the allocations to be manually returned to the memory pool to prevent leakage. Is there something like RequireManualDrop from the below example?
impl MemoryPool {
pub fn allocate(&mut self) -> Option<Allocation> { /* ... */ }
pub fn free(&mut self, alloc: Allocation) { let (ptr, size) = alloc.inner.into_inner(); /* ... */ }
}
pub struct Allocation {
inner: RequireManualDrop<(*mut u8, usize)>,
}
fn valid_usage(mem: &mut MemoryPool) {
let chunk = mem.allocate();
/* ... */
mem.free(chunk);
}
/* compile error: Allocation.inner needs to be manully dropped */
fn will_have_compile_error(mem: &mut MemoryPool) {
let chunk = mem.allocate();
/* ... */
}
Rust doesn't have a direct solution for this. However, you can use a nit trick for that (this trick is taken from https://github.com/Kixunil/dont_panic).
If you call a declared but not defined function, the linker will report an error. We can use that to error on drop. By calling an undefined function in the destructor, the code will not compile unless the destructor is not called.
#[derive(Debug)]
pub struct Undroppable(pub i32);
impl Drop for Undroppable {
fn drop(&mut self) {
extern "C" {
// This will show (somewhat) useful error message instead of complete gibberish
#[link_name = "\n\nERROR: `Undroppable` implicitly dropped\n\n"]
fn trigger() -> !;
}
unsafe { trigger() }
}
}
pub fn manually_drop(v: Undroppable) {
let v = std::mem::ManuallyDrop::new(v);
println!("Manually dropping {v:?}...");
}
Playground.
However, beware that while in this case it worked even in debug builds, it may require optimizations in other cases to eliminate the call. And unwinding may complicate the process even more, because it can lead to unexpected implicit drops. For example, if I change manually_drop() to the following seemingly identical version:
pub fn manually_drop(v: Undroppable) {
println!("Manually dropping {v:?}...");
std::mem::forget(v);
}
It doesn't work, because println!() may unwind, and then std::mem::forget(v) won't be reached. If a function that takes Undroppable by value must unwind, you probably have no options but to wrap it with ManuallyDrop and manually ensure it is correctly dropped (though you can just drop it at the end of the function, and let the unwind case leak it, as leaking in panicking is mostly fine).

Rust FFI - Dangling pointer

I work on a Rust library used, through C headers, in a Swift UI.
I can read from Swift in Rust, but I can't write right away to Swift (so from Rust) what I've just read.
--
Basically, I get to convert successfully in String an *const i8 saying hello world.
But the same String fails to be handled with consistency by as_ptr() (and so being parsed as UTF-8 in Swift) =>
Swift send hello world as *const i8
Rust handle it through let input: &str successfully (#1 print in get_message()) => rightly prints hello world
Now I can't convert this input &strto a pointer again:
the pointer can't be decoded by Swift
the "pointer encoding" changes at every call of the function (should be always the same output, as for "hello world".as_ptr())
Basically, why
"hello world".as_ptr() always have the same output and can be decoded by Swift
when input.as_ptr() has a different output every time called and can't never be decoded by Swift (where printing input rightly returns hello world)?
Do you guys have ideas?
#[derive(Debug)]
#[repr(C)]
pub struct MessageC {
pub message_bytes: *const u8,
pub message_len: libc::size_t,
}
/// # Safety
/// call of c_string_safe from Swift
/// => https://doc.rust-lang.org/std/ffi/struct.CStr.html#method.from_ptr
unsafe fn c_string_safe(cstring: *const i8) -> String {
CStr::from_ptr(cstring).to_string_lossy().into_owned()
}
/// # Safety
/// call of c_string_safe from Swift
/// => https://doc.rust-lang.org/std/ffi/struct.CStr.html#method.from_ptr
/// on `async extern "C"` => <https://stackoverflow.com/a/52521592/7281870>
#[no_mangle]
#[tokio::main] // allow async function, needed to call here other async functions (not this example but needed)
pub async unsafe extern "C" fn get_message(
user_input: *const i8,
) -> MessageC {
let input: &str = &c_string_safe(user_input);
println!("from Swift: {}", input); // [consistent] from Swift: hello world
println!("converted to ptr: {:?}", input.as_ptr()); // [inconsistent] converted to ptr: 0x60000079d770 / converted to ptr: 0x6000007b40b0
println!("directly to ptr: {:?}", "hello world".as_ptr()); // [consistent] directly to ptr: 0x1028aaf6f
MessageC {
message_bytes: input.as_ptr(),
message_len: input.len() as libc::size_t,
}
}
The way you construct MessageC is unsound and returns a dangling pointer. The code in get_message() is equivalent to this:
pub async unsafe extern "C" fn get_message(user_input: *const i8) -> MessageC {
let _invisible = c_string_safe(user_input);
let input: &str = &_invisible;
// let's skip the prints
let msg = MessageC {
message_bytes: input.as_ptr(),
message_len: input.len() as libc::size_t,
};
drop(_invisible);
return msg;
}
Hopefully this formulation highlights the issue: c_string_safe() returns an owned heap-allocated String which gets dropped (and its data deallocated) by the end of the function. input is a slice that refers to the data allocated by that String. In safe Rust you wouldn't be allowed to return a slice referring to a local variable such as input - you'd have to either return the String itself or limit yourself to passing the slice downwards to functions.
However, you're not using safe Rust and you're creating a pointer to the heap-allocated data. Now you have a problem because as soon as get_message() returns, the _invisible String gets deallocated, and the pointer you're returning is dangling. The dangling pointer may even appear to work because deallocation is not obligated to clear the data from memory, it just marks it as available for future allocations. But those future allocations can and will happen, perhaps from a different thread. Thus a program that references freed memory is bound to misbehave, often in an unpredictable fashion - which is precisely what you have observed.
In all-Rust code you'd resolve the issue by safely returning String instead. But you're doing FFI, so you must reduce the string to a pointer/length pair. Rust allows you to do just that, the easiest way being to just call std::mem::forget() to prevent the string from getting deallocated:
pub async unsafe extern "C" fn get_message(user_input: *const i8) -> MessageC {
let mut input = c_string_safe(user_input);
input.shrink_to_fit(); // ensure string capacity == len
let msg = MessageC {
message_bytes: input.as_ptr(),
message_len: input.len() as libc::size_t,
};
std::mem::forget(input); // prevent input's data from being deallocated on return
msg
}
But now you have a different problem: get_message() allocates a string, but how do you deallocate it? Just dropping MessageC won't do it because it just contains pointers. (And doing so by implementing Drop would probably be unwise because you're sending it to Swift or whatever.) The solution is to provide a separate function that re-creates the String from the MessageC and drops it immediately:
pub unsafe fn free_message_c(m: MessageC) {
// The call to `shrink_to_fit()` above makes it sound to re-assemble
// the string using a capacity equal to its length
drop(String::from_raw_parts(
m.message_bytes as *mut _,
m.message_len,
m.message_len,
));
}
You should call this function once you're done with MessageC, i.e. when the Swift code has done its job. (You could even make it extern "C" and call it from Swift.)
Finally, using "hello world".as_ptr() directly works because "hello world" is a static &str which is baked into the executable and never gets deallocated. In other words, it doesn't point to a String, it points to some static data that comes with the program.

How to safely create an opaque struct and then free it over the FFI boundary?

I am using cbindgen to generate C bindings for a small Rust crate that implements the ULID specification. To avoid leaking information, I am generating an opaque struct ulid_ctx and returning a pointer to that context object when it is first created. I'm struggling a little bit with reconciling Rust's ownership semantics and C's laissez-faire approach to memory.
#[allow(non_camel_case_types)]
pub struct ulid_ctx {
seed: u32,
}
#[no_mangle]
pub extern "C" fn ulid_create(seed: u32) -> *mut ulid_ctx {
let ctx = ulid_ctx { seed: s };
Box::leak(Box::new(ctx))
}
#[no_mangle]
pub unsafe extern "C" fn ulid_ctx_destroy(ctx: *mut ulid_ctx) {
Box::from_raw(ctx);
}
Two questions:
Does Box::leak(Box::new(ctx)) correctly allocate a ctx value on the heap and then inform Rust that the function no longer owns it?
Will Box::from_raw(ctx); re-create a Box and then immediately drop it, thereby freeing the memory?
Although it's not a lot of data (32 bits), I would like to avoid creating a memory leak if possible.
Does Box::leak(Box::new(ctx)) correctly allocate a ctx value on the
heap and then inform Rust that the function no longer owns it?
Indeed, as the name says it leaks the data, so it will not be dropped when going out of scope.
As per #user4815162342 comment, consider using Box::into_raw instead.
Will Box::from_raw(ctx); re-create a Box and then immediately drop it,
thereby freeing the memory?
Also true, will build a Box then will be dropped. As a note, it may be nice to make the drop explicit.
#[no_mangle]
pub unsafe extern "C" fn ulid_ctx_destroy(ctx: *mut ulid_ctx) {
drop(unsafe { Box::from_raw(ctx) })
}

Wouldnt ManuallyDrop without drop call cause memory leak?

I am going through wasm-bindgen guide and i came across the glue code it generates for interacting between js and rust. A reference to a value is passed from js to rust. Rust has to wrap it in ManuallyDrop so that it wont call the Drop implemented on JsValue.
pub fn foo(a: &JsValue) {
// ...
}
#[export_name = "foo"]
pub extern "C" fn __wasm_bindgen_generated_foo(arg0: u32) {
let arg0 = unsafe {
ManuallyDrop::new(JsValue::__from_idx(arg0))
};
let arg0 = &*arg0;
foo(arg0);
}
But I do not see a ManuallyDrop::drop being called on arg0 . So would the JsValuewrapped in ManuallyDrop be dropped unless the ManuallyDrop::drop(arg0) function is called? Wouldnt it cause a memory leak?
ManuallyDrop does not stop the inner value from being destroyed. It only stops drop from being called. Consider a Vec:
pub struct Vec<T> {
ptr: *mut T,
cap: usize,
len: usize,
}
The fields ptr, cap, and len will still be destroyed even when wrapped by ManuallyDrop. However, any dynamic resources managed (in this case the data referenced by ptr) will not be released since drop is not called.
Since JsValue simply holds a u32, no leak will occur on the Rust-side. And since the glue code ensures proper cleanup for &JsValue arguments, no memory is leaked on the Javascript-side.

How do I use cbindgen to return and free a Box<Vec<_>>?

I have a struct returned to C code from Rust. I have no idea if it's a good way to do things, but it does work for rebuilding the struct and freeing memory without leaks.
#[repr(C)]
pub struct s {
// ...
}
#[repr(C)]
#[allow(clippy::box_vec)]
pub struct s_arr {
arr: *const s,
n: i8,
vec: Box<Vec<s>>,
}
/// Frees memory that was returned to C code
pub unsafe extern "C" fn free_s_arr(a: *mut s_arr) {
Box::from_raw(s_arr);
}
/// Generates an array for the C code
pub unsafe extern "C" fn gen_s_arr() -> *mut s_arr {
let many_s: Vec<s> = Vec::new();
// ... logic here
Box::into_raw(Box::new(s_arr {
arr: many_s.as_mut_ptr(),
n: many_s.len() as i8,
vec: many_s,
}))
}
The C header is currently written by hand, but I wanted to try out cbindgen. The manual C definition for s_arr is:
struct s_arr {
struct s *arr;
int8_t n;
void *_;
};
cbindgen generates the following for s_arr:
typedef struct Box_Vec_s Box_Vec_s;
typedef struct s_arr {
const s *arr;
int8_t n;
Box_Vec_s vec;
} s_arr;
This doesn't work since struct Box_Vec_s is not defined. Ideally I would just want to override the cbindgen type generated for vec to make it void * since it requires no code changes and thus no additional testing, but I am open to other suggestions.
I have looked through the cbindgen documentation, though not the examples, and couldn't find anything.
Your question is a bit unclear, but I think that if I understood you right, you're confusing two things and being led down a dark alley as a result.
In C, a dynamically-sized array, as you probably know, is identified by two things:
Its starting position, as a pointer
Its length
Rust follows the same convention - a Vec<_>, below the hood, shares the same structure (well, almost. It has a capacity as well, but that's beside the point).
Passing the boxed vector on top of a pointer is not only overkill, but extremely unwise. FFI bindings may be smart, but they're not smart enough to deal with a boxed complex type most of the time.
To solve this, we're going to simplify your bindings. I've added a single element in struct S to show you how it works. I've also cleaned up your FFI boundary:
#[repr(C)]
#[no_mangle]
pub struct S {
foo: u8
}
#[repr(C)]
pub struct s_arr {
arr: *mut S,
n: usize,
cap: usize
}
// Retrieve the vector back
pub unsafe extern "C" fn recombine_s_arr(ptr: *mut S, n: usize, cap: usize) -> Vec<S> {
Vec::from_raw_parts(ptr, n, cap)
}
#[no_mangle]
pub unsafe extern "C" fn gen_s_arr() -> s_arr {
let mut many_s: Vec<S> = Vec::new();
let output = s_arr {
arr: many_s.as_mut_ptr(),
n: many_s.len(),
cap: many_s.capacity()
};
std::mem::forget(many_s);
output
}
With this, cbindgen returns the expected header definitions:
typedef struct {
uint8_t foo;
} so58311426S;
typedef struct {
so58311426S *arr;
uintptr_t n;
uintptr_t cap;
} so58311426s_arr;
so58311426s_arr gen_s_arr(void);
This allows us to call gen_s_arr() from either C or Rust and retrieve a struct that is usable across both parts of the FFI boundary (so58311426s_arr). This struct contains all we need to be able to modify our array of S (well, so58311426S according to cbindgen).
When passing through FFI, you need to make sure of a few simple things:
You cannot pass raw boxes or non-primitive types; you will almost universally need to convert down to a set of pointers or change your definitions to accomodate (as I have done here)
You most definitely do not pass raw vectors. At most, you pass a slice, as that is a primitive type (see the point above).
You make sure to std::mem::forget() whatever you do not want to deallocate, and make sure to remember to deallocate it or reform it somewhere else.
I will edit this question in an hour; I have a plane to get on to. Let me know if any of this needs clarifications and I'll get to it once I'm in the right country :-)

Resources