I have a C function that expects *const std::os::raw::c_char and I have done the following in Rust:
use std::os::raw::c_char;
use std::ffi::{CString, CStr};
extern crate libc;
fn main() {
let _test_str: *const c_char = CString::new("Hello World").unwrap().as_ptr();
let fmt: *const c_char = CString::new("%s\n").unwrap().as_ptr();
unsafe { libc::printf(fmt, _test_str); }
unsafe {
let slice = CStr::from_ptr(_test_str);
println!("string buffer size without nul terminator: {}", slice.to_bytes().len());
}
}
However, I cannot get _test_str print out and the output of the above program is simply
string buffer size without nul terminator: 0
If I pass the _test_str into some C function and see it is an empty string. What did I do wrong?
You are creating a CString in the same statement as creating a pointer to it. The CString is owned but not bound to a variable so it only lives as long as the enclosing statement, causing the pointer to become invalid. This is specifically warned about by the documentation for as_ptr:
For example, the following code will cause undefined behavior when ptr is used inside the unsafe block:
use std::ffi::{CString};
let ptr = CString::new("Hello").expect("CString::new failed").as_ptr();
unsafe {
// `ptr` is dangling
*ptr;
}
This happens because the pointer returned by as_ptr does not carry any lifetime information and the CString is deallocated immediately after the CString::new("Hello").expect("CString::new failed").as_ptr() expression is evaluated.
You can fix the problem by introducing variables which will live for the entire function, and then create pointers to those variables:
fn main() {
let owned_test = CString::new("Hello World").unwrap();
let _test_str: *const c_char = owned_test.as_ptr();
let owned_fmt = CString::new("%s\n").unwrap();
let fmt: *const c_char = owned_fmt.as_ptr();
unsafe {
libc::printf(fmt, _test_str);
}
unsafe {
let slice = CStr::from_ptr(_test_str);
println!(
"string buffer size without nul terminator: {}",
slice.to_bytes().len()
);
}
// owned_fmt is dropped here, making fmt invalid
// owned_test is dropped here, making _test_str invalid
}
If you are working with raw pointers, you need to be extra careful that they are always pointing at live data. Introducing a variable is the best way to control exactly how long that data lives - it will live from the initialization of the variable to the moment the variable goes out of scope.
Related
I work on a Rust library used, through C headers, in a Swift UI.
I can read from Swift in Rust, but I can't write right away to Swift (so from Rust) what I've just read.
--
Basically, I get to convert successfully in String an *const i8 saying hello world.
But the same String fails to be handled with consistency by as_ptr() (and so being parsed as UTF-8 in Swift) =>
Swift send hello world as *const i8
Rust handle it through let input: &str successfully (#1 print in get_message()) => rightly prints hello world
Now I can't convert this input &strto a pointer again:
the pointer can't be decoded by Swift
the "pointer encoding" changes at every call of the function (should be always the same output, as for "hello world".as_ptr())
Basically, why
"hello world".as_ptr() always have the same output and can be decoded by Swift
when input.as_ptr() has a different output every time called and can't never be decoded by Swift (where printing input rightly returns hello world)?
Do you guys have ideas?
#[derive(Debug)]
#[repr(C)]
pub struct MessageC {
pub message_bytes: *const u8,
pub message_len: libc::size_t,
}
/// # Safety
/// call of c_string_safe from Swift
/// => https://doc.rust-lang.org/std/ffi/struct.CStr.html#method.from_ptr
unsafe fn c_string_safe(cstring: *const i8) -> String {
CStr::from_ptr(cstring).to_string_lossy().into_owned()
}
/// # Safety
/// call of c_string_safe from Swift
/// => https://doc.rust-lang.org/std/ffi/struct.CStr.html#method.from_ptr
/// on `async extern "C"` => <https://stackoverflow.com/a/52521592/7281870>
#[no_mangle]
#[tokio::main] // allow async function, needed to call here other async functions (not this example but needed)
pub async unsafe extern "C" fn get_message(
user_input: *const i8,
) -> MessageC {
let input: &str = &c_string_safe(user_input);
println!("from Swift: {}", input); // [consistent] from Swift: hello world
println!("converted to ptr: {:?}", input.as_ptr()); // [inconsistent] converted to ptr: 0x60000079d770 / converted to ptr: 0x6000007b40b0
println!("directly to ptr: {:?}", "hello world".as_ptr()); // [consistent] directly to ptr: 0x1028aaf6f
MessageC {
message_bytes: input.as_ptr(),
message_len: input.len() as libc::size_t,
}
}
The way you construct MessageC is unsound and returns a dangling pointer. The code in get_message() is equivalent to this:
pub async unsafe extern "C" fn get_message(user_input: *const i8) -> MessageC {
let _invisible = c_string_safe(user_input);
let input: &str = &_invisible;
// let's skip the prints
let msg = MessageC {
message_bytes: input.as_ptr(),
message_len: input.len() as libc::size_t,
};
drop(_invisible);
return msg;
}
Hopefully this formulation highlights the issue: c_string_safe() returns an owned heap-allocated String which gets dropped (and its data deallocated) by the end of the function. input is a slice that refers to the data allocated by that String. In safe Rust you wouldn't be allowed to return a slice referring to a local variable such as input - you'd have to either return the String itself or limit yourself to passing the slice downwards to functions.
However, you're not using safe Rust and you're creating a pointer to the heap-allocated data. Now you have a problem because as soon as get_message() returns, the _invisible String gets deallocated, and the pointer you're returning is dangling. The dangling pointer may even appear to work because deallocation is not obligated to clear the data from memory, it just marks it as available for future allocations. But those future allocations can and will happen, perhaps from a different thread. Thus a program that references freed memory is bound to misbehave, often in an unpredictable fashion - which is precisely what you have observed.
In all-Rust code you'd resolve the issue by safely returning String instead. But you're doing FFI, so you must reduce the string to a pointer/length pair. Rust allows you to do just that, the easiest way being to just call std::mem::forget() to prevent the string from getting deallocated:
pub async unsafe extern "C" fn get_message(user_input: *const i8) -> MessageC {
let mut input = c_string_safe(user_input);
input.shrink_to_fit(); // ensure string capacity == len
let msg = MessageC {
message_bytes: input.as_ptr(),
message_len: input.len() as libc::size_t,
};
std::mem::forget(input); // prevent input's data from being deallocated on return
msg
}
But now you have a different problem: get_message() allocates a string, but how do you deallocate it? Just dropping MessageC won't do it because it just contains pointers. (And doing so by implementing Drop would probably be unwise because you're sending it to Swift or whatever.) The solution is to provide a separate function that re-creates the String from the MessageC and drops it immediately:
pub unsafe fn free_message_c(m: MessageC) {
// The call to `shrink_to_fit()` above makes it sound to re-assemble
// the string using a capacity equal to its length
drop(String::from_raw_parts(
m.message_bytes as *mut _,
m.message_len,
m.message_len,
));
}
You should call this function once you're done with MessageC, i.e. when the Swift code has done its job. (You could even make it extern "C" and call it from Swift.)
Finally, using "hello world".as_ptr() directly works because "hello world" is a static &str which is baked into the executable and never gets deallocated. In other words, it doesn't point to a String, it points to some static data that comes with the program.
I would like to return some strings to C via a Rust FFI call. I also would like to ensure they're cleaned up properly.
I'm creating the strings on the Rust side and turning them into an address of an array of strings.
use core::mem;
use std::ffi::CString;
#[no_mangle]
pub extern "C" fn drop_rust_memory(mem: *mut ::libc::c_void) {
unsafe {
let boxed = Box::from_raw(mem);
mem::drop(boxed);
}
}
#[no_mangle]
pub extern "C" fn string_xfer(strings_out: *mut *mut *mut ::libc::c_char) -> usize {
unsafe {
let s1 = CString::new("String 1").unwrap();
let s2 = CString::new("String 2").unwrap();
let s1 = s1.into_raw();
let s2 = s2.into_raw();
let strs = vec![s1, s2];
let len = strs.len();
let mut boxed_slice = strs.into_boxed_slice();
*strings_out = boxed_slice.as_mut_ptr() as *mut *mut ::libc::c_char;
mem::forget(boxed_slice);
len
}
}
On the C side, I call the Rust FFI function, print the strings and then attempt to delete them via another Rust FFI call.
extern size_t string_xfer(char ***out);
extern void drop_rust_memory(void* mem);
int main() {
char **receiver;
int len = string_xfer(&receiver);
for (int i = 0; i < len; i++) {
printf("<%s>\n", receiver[i]);
}
drop_rust_memory(receiver);
printf("# rust memory dropped\n");
for (int i = 0; i < len; i++) {
printf("<%s>\n", receiver[i]);
}
return 0;
}
This appears to work. For the second printing after the drop, I would expect to get a crash or some undefined behavior, but I get this
<String 1>
<String 2>
# rust memory dropped
<(null)>
<String 2>
which makes me less sure about the entire thing.
First you may want take a look at Catching panic! when Rust called from C FFI, without spawning threads. Because panic will invoke undefined behaviour in this case. So you better catch the panic or avoid have code that can panic.
Secondly, into_boxed_slice() is primary use when you don't need vector feature any more so "A contiguous growable array type". You could also use as_mut_ptr() and forget the vector. That a choice either you want to carry the capacity information into C so you can make the vector grow or you don't want. (I think vector is missing a into_raw() method but I'm sure you can code one (just an example) to avoid critical code repetition). You could also use Box::into_raw() followed with a cast to transform the slice to pointer:
use std::panic;
use std::ffi::CString;
pub unsafe extern "C" fn string_xfer(len: &mut libc::size_t) -> Option<*mut *mut libc::c_char> {
if let Ok(slice) = panic::catch_unwind(move || {
let s1 = CString::new("String 1").unwrap();
let s2 = CString::new("String 2").unwrap();
let strs = vec![s1.into_raw(), s2.into_raw()];
Box::into_raw(strs.into_boxed_slice())
}) {
*len = (*slice).len();
Some(slice as _)
} else {
None
}
}
Third, your drop_rust_memory() only drop a pointer, I think you are doing a total UB here. Rust memory allocation need the real size of the allocation (capacity). And you didn't give the size of your slice, you tell to Rust "free this pointer that contain a pointer to nothing (void so 0)" but that not the good capacity. You need to use from_raw_parts_mut(), your C code must give the size of the slice to the Rust code. Also, you need to properly free your CString you need to call from_raw() to do it (More information here):
use std::ffi::CString;
pub unsafe extern "C" fn drop_rust_memory(mem: *mut *mut libc::c_char, len: libc::size_t) {
let slice = Box::from_raw(std::slice::from_raw_parts_mut(mem, len));
for &x in slice.iter() {
CString::from_raw(x);
} // CString will free resource don't use mem/vec element after
}
To conclude, you should read more about undefined behaviour, it's not about "expect a crash" or "something" should happen. When your program trigger a UB, everything can happen, you go into a random zone, read more about UB on this amazing LLVM blog post
Note about C style prefer return the pointer and not the size because strings_out: *mut *mut *mut ::libc::c_char is a ugly thing so do pub extern fn string_xfer(size: &mut libc::size_t) -> *mut *mut libc::c_char. Also, How to check if function pointer passed from C is non-NULL
Can I somehow get an array from std::ptr::read?
I'd like to do something close to:
let mut v: Vec<u8> = ...
let view = &some_struct as *const _ as *const u8;
v.write(&std::ptr::read<[u8, ..30]>(view));
Which is not valid in this form (can't use the array signature).
If you want to obtain a slice from a raw pointer, use std::slice::from_raw_parts():
let slice = unsafe { std::slice::from_raw_parts(some_pointer, count_of_items) };
If you want to obtain a mutable slice from a raw pointer, use std::slice::from_raw_parts_mut():
let slice = unsafe { std::slice::from_raw_parts_mut(some_pointer, count_of_items) };
Are you sure you want read()? Without special care it will cause disaster on structs with destructors. Also, read() does not read a value of some specified type from a pointer to bytes; it reads exactly one value of the type behind the pointer (e.g. if it is *const u8 then read() will read one byte) and returns it.
If you only want to write byte contents of a structure into a vector, you can obtain a slice from the raw pointer:
use std::mem;
use std::io::Write;
struct SomeStruct {
a: i32,
}
fn main() {
let some_struct = SomeStruct { a: 32 };
let mut v: Vec<u8> = Vec::new();
let view = &some_struct as *const _ as *const u8;
let slice = unsafe { std::slice::from_raw_parts(view, mem::size_of::<SomeStruct>()) };
v.write(slice).expect("Unable to write");
println!("{:?}", v);
}
This makes your code platform-dependent and even compiler-dependent: if you use types of variable size (e.g. isize/usize) in your struct or if you don't use #[repr(C)], the data you wrote into the vector is likely to be read as garbage on another machine (and even #[repr(C)] may not lift this problem sometimes, as far as I remember).
I'd like to print out the result of libc::getcwd. My issue is that to create getcwd takes an i8 (c_char) buffer, whereas String::from_utf8 needs a u8 buffer. I started with:
static BUF_BYTES: usize = 4096;
fn main() {
unsafe {
let mut buf: Vec<i8> = Vec::with_capacity(BUF_BYTES as usize);
libc::getcwd(buf.as_mut_ptr(), buf.len());
let s = String::from_utf8(buf).expect("Found invalid UTF-8");
println!("result: {}", s);
}
}
Which produces the error:
14:32 error: mismatched types:
expected `std::vec::Vec<u8>`,
found `std::vec::Vec<i8>` [E0308]
Thanks to the comments, I changed the buf into a Vec<u8> and cast it to a c_char buffer in the getcwd call:
let mut buf: Vec<u8> = Vec::with_capacity(BUF_BYTES as usize);
libc::getcwd(buf.as_mut_ptr() as *mut c_char, buf.len());
This compiles but now, when printing the string it is empty (length: 0)
I found that getcwd returns NULL (libc::getcwd(...).is_null() is true), reading the last error via external crate errno (why is this a separate crate to libc?) reveals that getcwd fails with "Invalid argument". The source of the problem seems that buf.len() returns 0.
In most cases, you should just use env::current_dir. This correctly handles all the platform-specifics for you, such as the "other" encodings mentioned in the comments.
C strings are kind of terrible. getcwd fills a buffer of some length, but doesn't tell you where it ends; you have to manually find the terminating NUL byte.
extern crate libc;
static BUF_BYTES: usize = 4096;
fn main() {
let buf = unsafe {
let mut buf = Vec::with_capacity(BUF_BYTES);
let res = libc::getcwd(buf.as_mut_ptr() as *mut i8, buf.capacity());
if res.is_null() {
panic!("Not long enough");
}
let mut len = 0;
while *buf.as_mut_ptr().offset(len as isize) != 0 { len += 1 }
buf.set_len(len);
buf
};
let s = String::from_utf8(buf).expect("Found invalid UTF-8");
println!("result: {}", s);
}
seems that buf.len() returns 0
Yes, the length is zero because no one told the vector that data was added. Vectors are comprised of three parts - a pointer to data, a length, and a capacity.
The capacity is how much memory is available, the size is how much is used. When treating the vector as a blob to store data into, you want to use the capacity. You then need to inform the vector how many of those bytes were used, so that String::from_utf8 knows where the end is.
You'll note that I changed the scope of unsafe to only include the truly unsafe aspects and the code that makes that code actually safe.
In fact, you could just copy the implementation of env::current_dir for Unix-like systems. It handles the failure cases much nicer and uses the proper types (paths aren't strings). Of course, it's even easier to just call env::current_dir. ^_^
fyi: I ended up with this
extern crate libc;
use std::ffi::CStr;
use std::io;
use std::str;
static BUF_BYTES: usize = 4096;
fn main() {
let buf = unsafe {
let mut buf = Vec::with_capacity(BUF_BYTES);
let ptr = buf.as_mut_ptr() as *mut libc::c_char;
if libc::getcwd(ptr, buf.capacity()).is_null() {
panic!(io::Error::last_os_error());
}
CStr::from_ptr(ptr).to_bytes()
};
println!("result: {}", str::from_utf8(buf).unwrap());
}
This is unsafe and will lead to crashes (in the best case) or silent memory corruption or worse.
When a block ends, any variables within it will be dropped. In this case, the unsafe block creates buf, takes a pointer to it, makes a CStr with the pointer, then frees the Vec, invalidating the pointer. It then returns that CStr containing an invalid reference from the block.
Something like this is better:
extern crate libc;
use std::ffi::{CStr, CString};
use std::io;
use std::str;
static BUF_BYTES: usize = 4096;
fn main() {
let buf = unsafe {
// Allocate some space to store the result
let mut buf = Vec::with_capacity(BUF_BYTES);
// Call the function, panicking if it fails
let ptr = buf.as_mut_ptr() as *mut libc::c_char;
if libc::getcwd(ptr, buf.capacity()).is_null() {
panic!(io::Error::last_os_error());
}
// Find the first NUL and inform the vector of that
let s = CStr::from_ptr(ptr);
buf.set_len(s.to_bytes().len());
// Transfer ownership of the Vec to a CString, ensuring there are no interior NULs
CString::new(buf)
};
let s = buf.expect("Not a C string").into_string().expect("Not UTF-8");
println!("result: {}", s);
}
I wonder why this has actually worked
Likely because nothing changed the memory before you attempted to access it. In a heavily multithreaded environment, I could see more issues arising.
why is it possible to have two mutable references to the vector? First as mut buf and then as ptr = buf.as_mut_ptr(). The ownership has not moved, has it? Otherwise, why is it possible to call buf.capacity()
You don't actually have two references. buf owns the value, then you get a mutable pointer. There is no compiler protection for pointers, which is part of the reason that an unsafe block is needed
I'm trying to write a readline custom completer (tab completion) in Rust. I think I have everything straight, but when I try it en vivo it heads off into the weeds and never comes back. Oddly, when I call it directly from main() I appear to get a valid pointer back. I never see a crash or panic in either case. Backtrace output is not consistent over runs (it's busy doing something). Perhaps one clue is that gdb indicates that the arguments passed to the completer are incorrect (although I'm not actually using them). Eg, after callback:
#2 0x00007f141f701272 in readlinetest::complete (text=0x7f141ff27d10 "", start=2704437, end=499122176) at src/main.rs:24
Or directly, breakpointing the call in main:
#0 readlinetest::complete (text=0x555555559190 <complete::hcda8d6cb2ef52a1bKaa> "dH;$%p", start=0, end=0) at src/main.rs:21
Do I have an ABI problem? Seems unlikely and the function signature isn't complicated :(
Here is a small test project: Cargo.toml:
[package]
name = "readlinetest"
version = "0.1.0"
authors = ["You <you#example.com>"]
[dependencies]
libc = "*"
readline = "*"
And main.rs:
extern crate libc;
extern crate readline;
use libc::{c_char, c_int};
use std::ffi::CString;
use std::process::exit;
use std::ptr;
use std::str;
extern { fn puts(s: *const libc::c_char); }
#[link(name = "readline")]
// Define the global in libreadline that will point to our completion function.
extern {
static mut rl_completion_entry_function: extern fn(text: *const c_char,
start: c_int,
end: c_int) -> *const *const c_char;
}
// Our completion function. Returns two strings.
extern fn complete(text: *const c_char, start: c_int, end: c_int) -> *const *const c_char {
let _ = text; let _ = start; let _ = end;
let mut words:Vec<*const c_char> =
vec!(CString::new("one").unwrap(), CString::new("two").unwrap()).
iter().
map(|arg| arg.as_ptr()).
collect();
words.push(ptr::null()); // append null
words.as_ptr() as *const *const c_char
}
fn main() {
let words = complete(string_to_mut_c_char("hi"), 1, 2);
unsafe { puts(*words) } // prints "one"
//unsafe { puts((*words + ?)) } // not sure hot to get to "two"
unsafe { rl_completion_entry_function = complete }
// Loop until EOF: echo input to stdout
loop {
if let Ok(input) = readline::readline_bare(&CString::new("> ").unwrap()) {
let text = str::from_utf8(&input.to_bytes()).unwrap();
println!("{}", text);
} else { // EOF/^D
exit(0)
}
}
}
// Just for testing
fn string_to_mut_c_char(s: &str) -> *mut c_char {
let mut bytes = s.to_string().into_bytes(); // Vec<u8>
bytes.push(0); // terminating null
let mut cchars = bytes.iter().map(|b| *b as c_char).collect::<Vec<c_char>>();
let name: *mut c_char = cchars.as_mut_ptr();
name
}
Ubuntu 14.04, 64 bit with Rust 1.3.
What am I missing? Thanks for any pointers (ha ha...).
and the function signature isn't complicated
It's not, but it does help to have the right one... ^_^ From my local version of readline (6.3.8):
extern rl_compentry_func_t *rl_completion_entry_function;
typedef char *rl_compentry_func_t PARAMS((const char *, int));
Additionally, you have multiple use after free errors:
vec![CString::new("one").unwrap()].iter().map(|s| s.as_ptr());
This creates a CString and gets the pointer to it. When the statement is done, nothing owns the vector that owns the strings. The vector will be immediately dropped, dropping the strings, invalidating the pointers.
words.as_ptr() as *const *const c_char
Similar thing here — you take the pointer, but then nothing owns the words vector anymore, so it is dropped, invalidating that pointer. So now you have an invalid pointer which attempts to point to a sequence of invalid pointers.
The same problem can be found in string_to_mut_c_char.
I don't know enough readline to understand who is supposed to own the returned strings, but it looks like you pass ownership to readline and it frees them. If so, that means you are going to have to use the same allocator that readline does so that it can free the strings for you. You will likely have to write some custom code that copies a CString's data using the appropriate allocator.
Style-wise, you can use underscores in variable names to indicate they are unused:
extern fn complete(_text: *const c_char, _start: c_int, _end: c_int)
There should be a space after : and there's no need to specify the type of the vector's contents:
let mut words: Vec<_>