I work on a Rust library used, through C headers, in a Swift UI.
I can read from Swift in Rust, but I can't write right away to Swift (so from Rust) what I've just read.
--
Basically, I get to convert successfully in String an *const i8 saying hello world.
But the same String fails to be handled with consistency by as_ptr() (and so being parsed as UTF-8 in Swift) =>
Swift send hello world as *const i8
Rust handle it through let input: &str successfully (#1 print in get_message()) => rightly prints hello world
Now I can't convert this input &strto a pointer again:
the pointer can't be decoded by Swift
the "pointer encoding" changes at every call of the function (should be always the same output, as for "hello world".as_ptr())
Basically, why
"hello world".as_ptr() always have the same output and can be decoded by Swift
when input.as_ptr() has a different output every time called and can't never be decoded by Swift (where printing input rightly returns hello world)?
Do you guys have ideas?
#[derive(Debug)]
#[repr(C)]
pub struct MessageC {
pub message_bytes: *const u8,
pub message_len: libc::size_t,
}
/// # Safety
/// call of c_string_safe from Swift
/// => https://doc.rust-lang.org/std/ffi/struct.CStr.html#method.from_ptr
unsafe fn c_string_safe(cstring: *const i8) -> String {
CStr::from_ptr(cstring).to_string_lossy().into_owned()
}
/// # Safety
/// call of c_string_safe from Swift
/// => https://doc.rust-lang.org/std/ffi/struct.CStr.html#method.from_ptr
/// on `async extern "C"` => <https://stackoverflow.com/a/52521592/7281870>
#[no_mangle]
#[tokio::main] // allow async function, needed to call here other async functions (not this example but needed)
pub async unsafe extern "C" fn get_message(
user_input: *const i8,
) -> MessageC {
let input: &str = &c_string_safe(user_input);
println!("from Swift: {}", input); // [consistent] from Swift: hello world
println!("converted to ptr: {:?}", input.as_ptr()); // [inconsistent] converted to ptr: 0x60000079d770 / converted to ptr: 0x6000007b40b0
println!("directly to ptr: {:?}", "hello world".as_ptr()); // [consistent] directly to ptr: 0x1028aaf6f
MessageC {
message_bytes: input.as_ptr(),
message_len: input.len() as libc::size_t,
}
}
The way you construct MessageC is unsound and returns a dangling pointer. The code in get_message() is equivalent to this:
pub async unsafe extern "C" fn get_message(user_input: *const i8) -> MessageC {
let _invisible = c_string_safe(user_input);
let input: &str = &_invisible;
// let's skip the prints
let msg = MessageC {
message_bytes: input.as_ptr(),
message_len: input.len() as libc::size_t,
};
drop(_invisible);
return msg;
}
Hopefully this formulation highlights the issue: c_string_safe() returns an owned heap-allocated String which gets dropped (and its data deallocated) by the end of the function. input is a slice that refers to the data allocated by that String. In safe Rust you wouldn't be allowed to return a slice referring to a local variable such as input - you'd have to either return the String itself or limit yourself to passing the slice downwards to functions.
However, you're not using safe Rust and you're creating a pointer to the heap-allocated data. Now you have a problem because as soon as get_message() returns, the _invisible String gets deallocated, and the pointer you're returning is dangling. The dangling pointer may even appear to work because deallocation is not obligated to clear the data from memory, it just marks it as available for future allocations. But those future allocations can and will happen, perhaps from a different thread. Thus a program that references freed memory is bound to misbehave, often in an unpredictable fashion - which is precisely what you have observed.
In all-Rust code you'd resolve the issue by safely returning String instead. But you're doing FFI, so you must reduce the string to a pointer/length pair. Rust allows you to do just that, the easiest way being to just call std::mem::forget() to prevent the string from getting deallocated:
pub async unsafe extern "C" fn get_message(user_input: *const i8) -> MessageC {
let mut input = c_string_safe(user_input);
input.shrink_to_fit(); // ensure string capacity == len
let msg = MessageC {
message_bytes: input.as_ptr(),
message_len: input.len() as libc::size_t,
};
std::mem::forget(input); // prevent input's data from being deallocated on return
msg
}
But now you have a different problem: get_message() allocates a string, but how do you deallocate it? Just dropping MessageC won't do it because it just contains pointers. (And doing so by implementing Drop would probably be unwise because you're sending it to Swift or whatever.) The solution is to provide a separate function that re-creates the String from the MessageC and drops it immediately:
pub unsafe fn free_message_c(m: MessageC) {
// The call to `shrink_to_fit()` above makes it sound to re-assemble
// the string using a capacity equal to its length
drop(String::from_raw_parts(
m.message_bytes as *mut _,
m.message_len,
m.message_len,
));
}
You should call this function once you're done with MessageC, i.e. when the Swift code has done its job. (You could even make it extern "C" and call it from Swift.)
Finally, using "hello world".as_ptr() directly works because "hello world" is a static &str which is baked into the executable and never gets deallocated. In other words, it doesn't point to a String, it points to some static data that comes with the program.
Related
I have a type (specifically CFData from core-foundation), whose memory is managed by C APIs and that I need to pass and receive from C functions as a *c_void. For simplicity, consider the following struct:
struct Data {
ptr: *mut ffi::c_void,
}
impl Data {
pub fn new() -> Self {
// allocate memory via an unsafe C-API
Self {
ptr: std::ptr::null(), // Just so this compiles.
}
}
pub fn to_raw(&self) -> *const ffi::c_void {
self.ptr
}
}
impl Drop for Data {
fn drop(&mut self) {
unsafe {
// Free memory via a C-API
}
}
}
Its interface is safe, including to_raw(), since it only returns a raw pointer. It doesn't dereference it. And the caller doesn't dereference it. It's just used in a callback.
pub extern "C" fn called_from_C_ok(on_complete: extern "C" fn(*const ffi::c_void)) {
let data = Data::new();
// Do things to compute Data.
// This is actually async code, which is why there's a completion handler.
on_complete(data.to_raw()); // data survives through the function call
}
This is fine. Data is safe to manipulate, and (I strongly believe) Rust promises that data will live until the end of the on_complete function call.
On the other hand, this is not ok:
pub extern "C" fn called_from_C_broken(on_complete: extern "C" fn(*const ffi::c_void)) {
let data = Data::new();
// ...
let ptr = data.to_raw(); // data can be dropped at this point, so ptr is dangling.
on_complete(ptr); // This may crash when the receiver dereferences it.
}
In my code, I made this mistake and it started crashing. It's easy to see why and it's easy to fix. But it's subtle, and it's easy for a future developer (me) to modify the ok version into the broken version without realizing the problem (and it may not always crash).
What I'd like to do is to ensure data lives as long as ptr. In Swift, I'd do this with:
withExtendedLifetime(&data) { data in
// ...data cannot be dropped until this closure ends...
}
Is there a similar construct in Rust that explicitly marks the minimum lifetime for a variable to a scope (that the optimizer may not reorder), even if it's not directly accessed? (I'm sure it's trivial to build a custom with_extended_lifetime in Rust, but I'm looking for a more standard solution so that it will be obvious to other developers what's going on).
Playground
I do believe the following "works" but I'm not sure how flexible it is, or if it's just replacing a more standard solution:
fn with_extended_lifetime<T, U, F>(value: &T, f: F) -> U
where
F: Fn(&T) -> U,
{
f(value)
}
with_extended_lifetime(&data, |data| {
let ptr = data.to_raw();
on_complete(ptr)
});
The optimizer is not allowed to change when a value is dropped. If you assign a value to a variable (and that value is not then moved elsewhere or overwritten by assignment), it will always be dropped at the end of the block, not earlier.
You say that this code is incorrect:
pub extern "C" fn called_from_C_broken(on_complete: extern "C" fn(*const ffi::c_void)) {
let data = Data::new();
// ...
let ptr = data.to_raw(); // data can be dropped at this point, so ptr is dangling.
on_complete(ptr); // This may crash when the receiver dereferences it.
}
but in fact data may not be dropped at that point, and this code is sound. What you may be confusing this with is the mistake of not assigning the value to a variable:
let ptr = Data::new().to_raw();
on_complete(ptr);
In this case, the pointer is dangling, because the result of Data::new() is stored in a temporary variable within the statement, which is dropped at the end of the statement, not a local variable, which is dropped at the end of the block.
If you want to adopt a programming style which makes explicit when values are dropped, the usual pattern is to use the standard drop() function to mark the exact time of drop:
let data = Data::new();
...
on_complete(data.to_raw());
drop(data); // We have stopped using the raw pointer now
(Note that drop() is not magic: it is simply a function which takes one argument and does nothing with it other than dropping. Its presence in the standard library is to give a name to this pattern, not to provide special functionality.)
However, if you want to, there isn't anything wrong with using your with_extended_lifetime (other than nonstandard style and arguably a misleading name) if you want to make the code structure even more strongly indicate the scope of the value. One nitpick: the function parameter should be FnOnce, not Fn, for maximum generality (this allows it to be passed functions that can't be called more than once).
Other than explicitly dropping as the other answer mentions, there is another way to help prevent these types of accidental drops: use a wrapper around a raw pointer that has lifetime information.
use std::marker::PhantomData;
#[repr(transparent)]
struct PtrWithLifetime<'a>{
ptr: *mut ffi::c_void,
_data: PhantomData<&'a ffi::c_void>,
}
impl Data {
fn to_raw(&self) -> PtrWithLife<'_>{
PtrWithLifetime{
ptr: self.ptr,
_data: PhantomData,
}
}
}
The #[repr(transparent)] guarantees that PtrWithLife is stored in memory the same as *const ffi::c_void is, so you can adjust the declaration of on_complete to
fn called_from_c(on_complete: extern "C" fn(PtrWithLifetime<'_>)){
//stuff
}
without causing any major inconvenience to any downstream users, especially since the ffi bindings can be adjusted in a similar fashion.
I am going through wasm-bindgen guide and i came across the glue code it generates for interacting between js and rust. A reference to a value is passed from js to rust. Rust has to wrap it in ManuallyDrop so that it wont call the Drop implemented on JsValue.
pub fn foo(a: &JsValue) {
// ...
}
#[export_name = "foo"]
pub extern "C" fn __wasm_bindgen_generated_foo(arg0: u32) {
let arg0 = unsafe {
ManuallyDrop::new(JsValue::__from_idx(arg0))
};
let arg0 = &*arg0;
foo(arg0);
}
But I do not see a ManuallyDrop::drop being called on arg0 . So would the JsValuewrapped in ManuallyDrop be dropped unless the ManuallyDrop::drop(arg0) function is called? Wouldnt it cause a memory leak?
ManuallyDrop does not stop the inner value from being destroyed. It only stops drop from being called. Consider a Vec:
pub struct Vec<T> {
ptr: *mut T,
cap: usize,
len: usize,
}
The fields ptr, cap, and len will still be destroyed even when wrapped by ManuallyDrop. However, any dynamic resources managed (in this case the data referenced by ptr) will not be released since drop is not called.
Since JsValue simply holds a u32, no leak will occur on the Rust-side. And since the glue code ensures proper cleanup for &JsValue arguments, no memory is leaked on the Javascript-side.
I wrote some Rust code that provides a FFI for some C code, which I recently discovered a bug in. Turns out unsafe is hard and error prone — who knew! I think I've fixed the bug but I am curious to understand the issue more.
One function took a Vec, called into_boxed_slice on it and returned the pointer (via as_mut_ptr) and length to the caller. It called mem:forget on the Box before returning.
The corresponding "free" function only accepted the pointer and called Box::from_raw with it. Now this is wrong, but the amazing thing about undefined behaviour is that it can work most of the time. And this did. Except if the source Vec was empty when it would segfault. Also of note, MIRI correctly identifies the issue: "Undefined Behavior: inbounds test failed: 0x4 is not a valid pointer".
Anyway the fix was to take the length in the free function as well, reconstitute the slice, then Box::from_raw that. E.g. Box::from_raw(slice::from_raw_parts_mut(p, len))
I've tried to capture all of this in this playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=7fe80cb9f0c5c1eee4ac821e58787f17
Here's the playground code for reference:
use std::slice;
fn main() {
// This one does not crash
demo(vec![1]);
// These do not crash
hopefully_correct(vec![2]);
hopefully_correct(vec![]);
// This one seg faults
demo(vec![]);
}
// MIRI complains about UB in this one (in Box::from_raw)
fn demo(v: Vec<i32>) {
let mut s: Box<[i32]> = dbg!(v.into_boxed_slice());
let p: *mut i32 = dbg!(s.as_mut_ptr());
assert!(!p.is_null());
std::mem::forget(s);
// Pretend the pointer is returned to an FFI interface here
// Imagine this is the free function counterpart to the imaginary FFI.
unsafe { Box::from_raw(p) };
}
// MIRI does not complain about this one
fn hopefully_correct(v: Vec<i32>) {
let mut s: Box<[i32]> = dbg!(v.into_boxed_slice());
let p: *mut i32 = dbg!(s.as_mut_ptr());
let len = s.len();
assert!(!p.is_null());
std::mem::forget(s);
// Pretend the pointer is returned to an FFI interface here
// Imagine this is the free function counterpart to the imaginary FFI.
unsafe { Box::from_raw(slice::from_raw_parts_mut(p, len)) };
}
I've looked through the Box source and done a bunch of searching but it's unclear to me how rebuilding the slice helps. It would seem that the pointers are the same but there is some empty optimisation handled properly in the fixed example somewhere, possibly as part of Unique?
Can anyone explain what's going on here?
I found these three links useful but not enough to answer my query:
How to expose a Rust Vec<T> to FFI?
How to pass a boxed slice (Box<[T]>) to a C function?
Box<[T]>::into_raw is useless
That's because when you deconstruct your empty vector, you get a null pointer and a zero length.
When you call Box::from_raw (null), you break one of the box invariants: "Box<T> values will always be fully aligned, non-null pointers". Then when Rust drops the box, it attempts to deallocate the null pointer.
OTOH when you call slice::from_raw_parts, Rust allocates a new fat pointer that contains the null pointer and the zero length, then Box::from_raw stores a reference to this fat pointer in the Box. When dropping the box, Rust first drops the slice (which knows that a length of zero means a null data that doesn't need to be freed), then frees the memory for the fat pointer.
Note also that in the non-working case you reconstruct a Box<i32>, whereas in the working case you reconstruct a Box<[i32]>, as shown if you try to compile the following code:
use std::slice;
fn demo(v: Vec<i32>) {
let mut s: Box<[i32]> = dbg!(v.into_boxed_slice());
let p: *mut i32 = dbg!(s.as_mut_ptr());
assert!(!p.is_null());
std::mem::forget(s);
// Pretend the pointer is returned to an FFI interface here
// Imagine this is the free function counterpart to the imaginary FFI.
let _b: () = unsafe { Box::from_raw(p) };
}
// MIRI does not complain about this one
fn hopefully_correct(v: Vec<i32>) {
let mut s: Box<[i32]> = dbg!(v.into_boxed_slice());
let p: *mut i32 = dbg!(s.as_mut_ptr());
let len = s.len();
assert!(!p.is_null());
std::mem::forget(s);
// Pretend the pointer is returned to an FFI interface here
// Imagine this is the free function counterpart to the imaginary FFI.
let _b: () = unsafe { Box::from_raw(slice::from_raw_parts_mut(p, len)) };
}
Playground
Can I somehow get an array from std::ptr::read?
I'd like to do something close to:
let mut v: Vec<u8> = ...
let view = &some_struct as *const _ as *const u8;
v.write(&std::ptr::read<[u8, ..30]>(view));
Which is not valid in this form (can't use the array signature).
If you want to obtain a slice from a raw pointer, use std::slice::from_raw_parts():
let slice = unsafe { std::slice::from_raw_parts(some_pointer, count_of_items) };
If you want to obtain a mutable slice from a raw pointer, use std::slice::from_raw_parts_mut():
let slice = unsafe { std::slice::from_raw_parts_mut(some_pointer, count_of_items) };
Are you sure you want read()? Without special care it will cause disaster on structs with destructors. Also, read() does not read a value of some specified type from a pointer to bytes; it reads exactly one value of the type behind the pointer (e.g. if it is *const u8 then read() will read one byte) and returns it.
If you only want to write byte contents of a structure into a vector, you can obtain a slice from the raw pointer:
use std::mem;
use std::io::Write;
struct SomeStruct {
a: i32,
}
fn main() {
let some_struct = SomeStruct { a: 32 };
let mut v: Vec<u8> = Vec::new();
let view = &some_struct as *const _ as *const u8;
let slice = unsafe { std::slice::from_raw_parts(view, mem::size_of::<SomeStruct>()) };
v.write(slice).expect("Unable to write");
println!("{:?}", v);
}
This makes your code platform-dependent and even compiler-dependent: if you use types of variable size (e.g. isize/usize) in your struct or if you don't use #[repr(C)], the data you wrote into the vector is likely to be read as garbage on another machine (and even #[repr(C)] may not lift this problem sometimes, as far as I remember).
I have a C function that expects *const std::os::raw::c_char and I have done the following in Rust:
use std::os::raw::c_char;
use std::ffi::{CString, CStr};
extern crate libc;
fn main() {
let _test_str: *const c_char = CString::new("Hello World").unwrap().as_ptr();
let fmt: *const c_char = CString::new("%s\n").unwrap().as_ptr();
unsafe { libc::printf(fmt, _test_str); }
unsafe {
let slice = CStr::from_ptr(_test_str);
println!("string buffer size without nul terminator: {}", slice.to_bytes().len());
}
}
However, I cannot get _test_str print out and the output of the above program is simply
string buffer size without nul terminator: 0
If I pass the _test_str into some C function and see it is an empty string. What did I do wrong?
You are creating a CString in the same statement as creating a pointer to it. The CString is owned but not bound to a variable so it only lives as long as the enclosing statement, causing the pointer to become invalid. This is specifically warned about by the documentation for as_ptr:
For example, the following code will cause undefined behavior when ptr is used inside the unsafe block:
use std::ffi::{CString};
let ptr = CString::new("Hello").expect("CString::new failed").as_ptr();
unsafe {
// `ptr` is dangling
*ptr;
}
This happens because the pointer returned by as_ptr does not carry any lifetime information and the CString is deallocated immediately after the CString::new("Hello").expect("CString::new failed").as_ptr() expression is evaluated.
You can fix the problem by introducing variables which will live for the entire function, and then create pointers to those variables:
fn main() {
let owned_test = CString::new("Hello World").unwrap();
let _test_str: *const c_char = owned_test.as_ptr();
let owned_fmt = CString::new("%s\n").unwrap();
let fmt: *const c_char = owned_fmt.as_ptr();
unsafe {
libc::printf(fmt, _test_str);
}
unsafe {
let slice = CStr::from_ptr(_test_str);
println!(
"string buffer size without nul terminator: {}",
slice.to_bytes().len()
);
}
// owned_fmt is dropped here, making fmt invalid
// owned_test is dropped here, making _test_str invalid
}
If you are working with raw pointers, you need to be extra careful that they are always pointing at live data. Introducing a variable is the best way to control exactly how long that data lives - it will live from the initialization of the variable to the moment the variable goes out of scope.