I'm trying to write a readline custom completer (tab completion) in Rust. I think I have everything straight, but when I try it en vivo it heads off into the weeds and never comes back. Oddly, when I call it directly from main() I appear to get a valid pointer back. I never see a crash or panic in either case. Backtrace output is not consistent over runs (it's busy doing something). Perhaps one clue is that gdb indicates that the arguments passed to the completer are incorrect (although I'm not actually using them). Eg, after callback:
#2 0x00007f141f701272 in readlinetest::complete (text=0x7f141ff27d10 "", start=2704437, end=499122176) at src/main.rs:24
Or directly, breakpointing the call in main:
#0 readlinetest::complete (text=0x555555559190 <complete::hcda8d6cb2ef52a1bKaa> "dH;$%p", start=0, end=0) at src/main.rs:21
Do I have an ABI problem? Seems unlikely and the function signature isn't complicated :(
Here is a small test project: Cargo.toml:
[package]
name = "readlinetest"
version = "0.1.0"
authors = ["You <you#example.com>"]
[dependencies]
libc = "*"
readline = "*"
And main.rs:
extern crate libc;
extern crate readline;
use libc::{c_char, c_int};
use std::ffi::CString;
use std::process::exit;
use std::ptr;
use std::str;
extern { fn puts(s: *const libc::c_char); }
#[link(name = "readline")]
// Define the global in libreadline that will point to our completion function.
extern {
static mut rl_completion_entry_function: extern fn(text: *const c_char,
start: c_int,
end: c_int) -> *const *const c_char;
}
// Our completion function. Returns two strings.
extern fn complete(text: *const c_char, start: c_int, end: c_int) -> *const *const c_char {
let _ = text; let _ = start; let _ = end;
let mut words:Vec<*const c_char> =
vec!(CString::new("one").unwrap(), CString::new("two").unwrap()).
iter().
map(|arg| arg.as_ptr()).
collect();
words.push(ptr::null()); // append null
words.as_ptr() as *const *const c_char
}
fn main() {
let words = complete(string_to_mut_c_char("hi"), 1, 2);
unsafe { puts(*words) } // prints "one"
//unsafe { puts((*words + ?)) } // not sure hot to get to "two"
unsafe { rl_completion_entry_function = complete }
// Loop until EOF: echo input to stdout
loop {
if let Ok(input) = readline::readline_bare(&CString::new("> ").unwrap()) {
let text = str::from_utf8(&input.to_bytes()).unwrap();
println!("{}", text);
} else { // EOF/^D
exit(0)
}
}
}
// Just for testing
fn string_to_mut_c_char(s: &str) -> *mut c_char {
let mut bytes = s.to_string().into_bytes(); // Vec<u8>
bytes.push(0); // terminating null
let mut cchars = bytes.iter().map(|b| *b as c_char).collect::<Vec<c_char>>();
let name: *mut c_char = cchars.as_mut_ptr();
name
}
Ubuntu 14.04, 64 bit with Rust 1.3.
What am I missing? Thanks for any pointers (ha ha...).
and the function signature isn't complicated
It's not, but it does help to have the right one... ^_^ From my local version of readline (6.3.8):
extern rl_compentry_func_t *rl_completion_entry_function;
typedef char *rl_compentry_func_t PARAMS((const char *, int));
Additionally, you have multiple use after free errors:
vec![CString::new("one").unwrap()].iter().map(|s| s.as_ptr());
This creates a CString and gets the pointer to it. When the statement is done, nothing owns the vector that owns the strings. The vector will be immediately dropped, dropping the strings, invalidating the pointers.
words.as_ptr() as *const *const c_char
Similar thing here — you take the pointer, but then nothing owns the words vector anymore, so it is dropped, invalidating that pointer. So now you have an invalid pointer which attempts to point to a sequence of invalid pointers.
The same problem can be found in string_to_mut_c_char.
I don't know enough readline to understand who is supposed to own the returned strings, but it looks like you pass ownership to readline and it frees them. If so, that means you are going to have to use the same allocator that readline does so that it can free the strings for you. You will likely have to write some custom code that copies a CString's data using the appropriate allocator.
Style-wise, you can use underscores in variable names to indicate they are unused:
extern fn complete(_text: *const c_char, _start: c_int, _end: c_int)
There should be a space after : and there's no need to specify the type of the vector's contents:
let mut words: Vec<_>
Related
I would like to return some strings to C via a Rust FFI call. I also would like to ensure they're cleaned up properly.
I'm creating the strings on the Rust side and turning them into an address of an array of strings.
use core::mem;
use std::ffi::CString;
#[no_mangle]
pub extern "C" fn drop_rust_memory(mem: *mut ::libc::c_void) {
unsafe {
let boxed = Box::from_raw(mem);
mem::drop(boxed);
}
}
#[no_mangle]
pub extern "C" fn string_xfer(strings_out: *mut *mut *mut ::libc::c_char) -> usize {
unsafe {
let s1 = CString::new("String 1").unwrap();
let s2 = CString::new("String 2").unwrap();
let s1 = s1.into_raw();
let s2 = s2.into_raw();
let strs = vec![s1, s2];
let len = strs.len();
let mut boxed_slice = strs.into_boxed_slice();
*strings_out = boxed_slice.as_mut_ptr() as *mut *mut ::libc::c_char;
mem::forget(boxed_slice);
len
}
}
On the C side, I call the Rust FFI function, print the strings and then attempt to delete them via another Rust FFI call.
extern size_t string_xfer(char ***out);
extern void drop_rust_memory(void* mem);
int main() {
char **receiver;
int len = string_xfer(&receiver);
for (int i = 0; i < len; i++) {
printf("<%s>\n", receiver[i]);
}
drop_rust_memory(receiver);
printf("# rust memory dropped\n");
for (int i = 0; i < len; i++) {
printf("<%s>\n", receiver[i]);
}
return 0;
}
This appears to work. For the second printing after the drop, I would expect to get a crash or some undefined behavior, but I get this
<String 1>
<String 2>
# rust memory dropped
<(null)>
<String 2>
which makes me less sure about the entire thing.
First you may want take a look at Catching panic! when Rust called from C FFI, without spawning threads. Because panic will invoke undefined behaviour in this case. So you better catch the panic or avoid have code that can panic.
Secondly, into_boxed_slice() is primary use when you don't need vector feature any more so "A contiguous growable array type". You could also use as_mut_ptr() and forget the vector. That a choice either you want to carry the capacity information into C so you can make the vector grow or you don't want. (I think vector is missing a into_raw() method but I'm sure you can code one (just an example) to avoid critical code repetition). You could also use Box::into_raw() followed with a cast to transform the slice to pointer:
use std::panic;
use std::ffi::CString;
pub unsafe extern "C" fn string_xfer(len: &mut libc::size_t) -> Option<*mut *mut libc::c_char> {
if let Ok(slice) = panic::catch_unwind(move || {
let s1 = CString::new("String 1").unwrap();
let s2 = CString::new("String 2").unwrap();
let strs = vec![s1.into_raw(), s2.into_raw()];
Box::into_raw(strs.into_boxed_slice())
}) {
*len = (*slice).len();
Some(slice as _)
} else {
None
}
}
Third, your drop_rust_memory() only drop a pointer, I think you are doing a total UB here. Rust memory allocation need the real size of the allocation (capacity). And you didn't give the size of your slice, you tell to Rust "free this pointer that contain a pointer to nothing (void so 0)" but that not the good capacity. You need to use from_raw_parts_mut(), your C code must give the size of the slice to the Rust code. Also, you need to properly free your CString you need to call from_raw() to do it (More information here):
use std::ffi::CString;
pub unsafe extern "C" fn drop_rust_memory(mem: *mut *mut libc::c_char, len: libc::size_t) {
let slice = Box::from_raw(std::slice::from_raw_parts_mut(mem, len));
for &x in slice.iter() {
CString::from_raw(x);
} // CString will free resource don't use mem/vec element after
}
To conclude, you should read more about undefined behaviour, it's not about "expect a crash" or "something" should happen. When your program trigger a UB, everything can happen, you go into a random zone, read more about UB on this amazing LLVM blog post
Note about C style prefer return the pointer and not the size because strings_out: *mut *mut *mut ::libc::c_char is a ugly thing so do pub extern fn string_xfer(size: &mut libc::size_t) -> *mut *mut libc::c_char. Also, How to check if function pointer passed from C is non-NULL
This question already has an answer here:
How can I call a raw address from Rust?
(1 answer)
Closed 3 years ago.
Hello people of the internet,
I'm struggeling to invoke a function that is stored in a libc::c_void-Pointer. I can't tell Rust that the pointer is invokable and I can't figure out how to.
I want to translate this C++ Code
void * malloc(size_t size) {
static void *(*real_malloc)(size_t) = nullptr;
if (real_malloc == nullptr) {
real_malloc = reinterpret_cast<void *(*)(size_t)> (dlsym(RTLD_NEXT, "malloc"));
}
// do some logging stuff
void * ptr = real_malloc(size);
return ptr;
}
to Rust.
#[no_mangle]
pub extern fn malloc(bytes: usize) {
let c_string = "malloc\0".as_mut_ptr() as *mut i8; // char array for libc
let real_malloc: *mut libc::c_void = libc::dlsym(libc::RTLD_NEXT, c_string);
return real_malloc(bytes);
}
That's my progress so far after 1h of searching on the internet and trying. I'm new to Rust and not yet familiar with Rust/FFI / Rust with libc. I tried a lot with unsafe{}, casts with as but I always stuck at the following problem:
return real_malloc(bytes);
^^^^^^^^^^^^^^^^^^ expected (), found *-ptr
Q1: How can I call the function behind the void-Pointer stored in real_malloc?
Q2: Is my Rust-String to C-String conversion feasible this way?
I figured it out! Perhaps there is a better way but it works.
The trick is to "cast" the void-Pointer to c-function-Type with std::mem::transmute since it won't work with as
type LibCMallocT = fn(usize) -> *mut libc::c_void;
// C-Style string for symbol-name
let c_string = "malloc\0".as_ptr() as *mut i8; // char array for libc
// Void-Pointer to address of symbol
let real_malloc_addr: *mut libc::c_void = unsafe {libc::dlsym(libc::RTLD_NEXT, c_string)};
// transmute: "Reinterprets the bits of a value of one type as another type"
// Transform void-pointer-type to callable C-Function
let real_malloc: LibCMallocT = unsafe { std::mem::transmute(real_malloc_addr) }
When the shared object is build, one can verify that it works like this:
LD_PRELOAD=./target/debug/libmalloc_log_lib.so some-binary
My full code:
extern crate libc;
use std::io::Write;
const MSG: &str = "HELLO WORLD\n";
type LibCMallocT = fn(usize) -> *mut libc::c_void;
#[no_mangle] // then "malloc" is the symbol name so that ELF-Files can find it (if this lib is preloaded)
pub extern fn malloc(bytes: usize) -> *mut libc::c_void {
/// Disable logging aka return immediately the pointer from the real malloc (libc malloc)
static mut RETURN_IMMEDIATELY: bool = false;
// C-Style string for symbol-name
let c_string = "malloc\0".as_ptr() as *mut i8; // char array for libc
// Void-Pointer to address of symbol
let real_malloc_addr: *mut libc::c_void = unsafe {libc::dlsym(libc::RTLD_NEXT, c_string)};
// transmute: "Reinterprets the bits of a value of one type as another type"
// Transform void-pointer-type to callable C-Function
let real_malloc: LibCMallocT = unsafe { std::mem::transmute(real_malloc_addr) };
unsafe {
if !RETURN_IMMEDIATELY {
// let's do logging and other stuff that potentially
// needs malloc() itself
// This Variable prevent infinite loops because 'std::io::stdout().write_all'
// also uses malloc itself
// TODO: Do proper synchronisazion
// (lock whole method? thread_local variable?)
RETURN_IMMEDIATELY = true;
match std::io::stdout().write_all(MSG.as_bytes()) {
_ => ()
};
RETURN_IMMEDIATELY = false
}
}
(real_malloc)(bytes)
}
PS: Thanks to https://stackoverflow.com/a/46134764/2891595 (After I googled a lot more I found the trick with transmute!)
I have a C function that expects *const std::os::raw::c_char and I have done the following in Rust:
use std::os::raw::c_char;
use std::ffi::{CString, CStr};
extern crate libc;
fn main() {
let _test_str: *const c_char = CString::new("Hello World").unwrap().as_ptr();
let fmt: *const c_char = CString::new("%s\n").unwrap().as_ptr();
unsafe { libc::printf(fmt, _test_str); }
unsafe {
let slice = CStr::from_ptr(_test_str);
println!("string buffer size without nul terminator: {}", slice.to_bytes().len());
}
}
However, I cannot get _test_str print out and the output of the above program is simply
string buffer size without nul terminator: 0
If I pass the _test_str into some C function and see it is an empty string. What did I do wrong?
You are creating a CString in the same statement as creating a pointer to it. The CString is owned but not bound to a variable so it only lives as long as the enclosing statement, causing the pointer to become invalid. This is specifically warned about by the documentation for as_ptr:
For example, the following code will cause undefined behavior when ptr is used inside the unsafe block:
use std::ffi::{CString};
let ptr = CString::new("Hello").expect("CString::new failed").as_ptr();
unsafe {
// `ptr` is dangling
*ptr;
}
This happens because the pointer returned by as_ptr does not carry any lifetime information and the CString is deallocated immediately after the CString::new("Hello").expect("CString::new failed").as_ptr() expression is evaluated.
You can fix the problem by introducing variables which will live for the entire function, and then create pointers to those variables:
fn main() {
let owned_test = CString::new("Hello World").unwrap();
let _test_str: *const c_char = owned_test.as_ptr();
let owned_fmt = CString::new("%s\n").unwrap();
let fmt: *const c_char = owned_fmt.as_ptr();
unsafe {
libc::printf(fmt, _test_str);
}
unsafe {
let slice = CStr::from_ptr(_test_str);
println!(
"string buffer size without nul terminator: {}",
slice.to_bytes().len()
);
}
// owned_fmt is dropped here, making fmt invalid
// owned_test is dropped here, making _test_str invalid
}
If you are working with raw pointers, you need to be extra careful that they are always pointing at live data. Introducing a variable is the best way to control exactly how long that data lives - it will live from the initialization of the variable to the moment the variable goes out of scope.
mexPrintf, just like printf, accepts a varargs list of arguments, but I don't know what the best way to wrap this is in Rust. There is a RFC for variadic generics, but what can we do today?
In this example, I want to print of the number of inputs and outputs, but the wrapped function just prints garbage. Any idea how to fix this?
#![allow(non_snake_case)]
#![allow(unused_variables)]
extern crate mex_sys;
use mex_sys::mxArray;
use std::ffi::CString;
use std::os::raw::c_int;
use std::os::raw::c_void;
type VarArgs = *mut c_void;
// attempt to wrap mex_sys::mexPrintf
fn mexPrintf(fmt: &str, args: VarArgs) {
let cs = CString::new(fmt).unwrap();
unsafe {
mex_sys::mexPrintf(cs.as_ptr(), args);
}
}
#[no_mangle]
pub extern "system" fn mexFunction(
nlhs: c_int,
plhs: *mut *mut mxArray,
nrhs: c_int,
prhs: *mut *mut mxArray,
) {
let hw = CString::new("hello world\n").unwrap();
unsafe {
mex_sys::mexPrintf(hw.as_ptr());
}
let inout = CString::new("%d inputs and %d outputs\n").unwrap();
unsafe {
mex_sys::mexPrintf(inout.as_ptr(), nrhs, nlhs);
}
mexPrintf("hello world wrapped\n", std::ptr::null_mut());
let n = Box::new(nrhs);
let p = Box::into_raw(n);
mexPrintf("inputs %d\n", p as VarArgs);
let mut v = vec![3];
mexPrintf("vec %d\n", v.as_mut_ptr() as VarArgs);
}
Contrary to popular belief, it is possible to call variadic / vararg functions that were defined in C. That doesn't mean that doing so is very easy, and it's definitely even easier to do something bad because there are even fewer types for the compiler to check your work with.
Here's an example of calling printf. I've hard-coded just about everything:
extern crate libc;
fn my_thing() {
unsafe {
libc::printf(b"Hello, %s (%d)\0".as_ptr() as *const i8, b"world\0".as_ptr(), 42i32);
}
}
fn main() {
my_thing()
}
Note that I have to very explicitly make sure my format string and arguments are all the right types and the strings are NUL-terminated.
Normally, you'll use tools like CString:
extern crate libc;
use std::ffi::CString;
fn my_thing(name: &str, number: i32) {
let fmt = CString::new("Hello, %s (%d)").expect("Invalid format string");
let name = CString::new(name).expect("Invalid name");
unsafe {
libc::printf(fmt.as_ptr(), name.as_ptr(), number);
}
}
fn main() {
my_thing("world", 42)
}
The Rust compiler test suite also has an example of calling a variadic function.
A word of warning specifically for printf-like functions: C compiler-writers realized that people screw up this particular type of variadic function call all the time. To help combat that, they've encoded special logic that parses the format string and attempts to check the argument types against the types the format string expect. The Rust compiler will not check your C-style format strings for you!
I had confused a "variable list of arguments" with a va_list. I'm going to avoid both if I can and in this situation, I'm just going to do the string formatting in Rust before passing it to interop. Here is what worked for me in this case:
#![allow(non_snake_case)]
#![allow(unused_variables)]
extern crate mex_sys;
use mex_sys::mxArray;
use std::ffi::CString;
use std::os::raw::c_int;
// attempt to wrap mex_sys::mexPrintf
fn mexPrintf(text: &str) {
let cs = CString::new(text).expect("Invalid text");
unsafe {
mex_sys::mexPrintf(cs.as_ptr());
}
}
#[no_mangle]
pub extern "C" fn mexFunction(
nlhs: c_int,
plhs: *mut *mut mxArray,
nrhs: c_int,
prhs: *mut *mut mxArray,
) {
mexPrintf(&format!("{} inputs and {} outputs\n", nrhs, nlhs));
}
I'd like to print out the result of libc::getcwd. My issue is that to create getcwd takes an i8 (c_char) buffer, whereas String::from_utf8 needs a u8 buffer. I started with:
static BUF_BYTES: usize = 4096;
fn main() {
unsafe {
let mut buf: Vec<i8> = Vec::with_capacity(BUF_BYTES as usize);
libc::getcwd(buf.as_mut_ptr(), buf.len());
let s = String::from_utf8(buf).expect("Found invalid UTF-8");
println!("result: {}", s);
}
}
Which produces the error:
14:32 error: mismatched types:
expected `std::vec::Vec<u8>`,
found `std::vec::Vec<i8>` [E0308]
Thanks to the comments, I changed the buf into a Vec<u8> and cast it to a c_char buffer in the getcwd call:
let mut buf: Vec<u8> = Vec::with_capacity(BUF_BYTES as usize);
libc::getcwd(buf.as_mut_ptr() as *mut c_char, buf.len());
This compiles but now, when printing the string it is empty (length: 0)
I found that getcwd returns NULL (libc::getcwd(...).is_null() is true), reading the last error via external crate errno (why is this a separate crate to libc?) reveals that getcwd fails with "Invalid argument". The source of the problem seems that buf.len() returns 0.
In most cases, you should just use env::current_dir. This correctly handles all the platform-specifics for you, such as the "other" encodings mentioned in the comments.
C strings are kind of terrible. getcwd fills a buffer of some length, but doesn't tell you where it ends; you have to manually find the terminating NUL byte.
extern crate libc;
static BUF_BYTES: usize = 4096;
fn main() {
let buf = unsafe {
let mut buf = Vec::with_capacity(BUF_BYTES);
let res = libc::getcwd(buf.as_mut_ptr() as *mut i8, buf.capacity());
if res.is_null() {
panic!("Not long enough");
}
let mut len = 0;
while *buf.as_mut_ptr().offset(len as isize) != 0 { len += 1 }
buf.set_len(len);
buf
};
let s = String::from_utf8(buf).expect("Found invalid UTF-8");
println!("result: {}", s);
}
seems that buf.len() returns 0
Yes, the length is zero because no one told the vector that data was added. Vectors are comprised of three parts - a pointer to data, a length, and a capacity.
The capacity is how much memory is available, the size is how much is used. When treating the vector as a blob to store data into, you want to use the capacity. You then need to inform the vector how many of those bytes were used, so that String::from_utf8 knows where the end is.
You'll note that I changed the scope of unsafe to only include the truly unsafe aspects and the code that makes that code actually safe.
In fact, you could just copy the implementation of env::current_dir for Unix-like systems. It handles the failure cases much nicer and uses the proper types (paths aren't strings). Of course, it's even easier to just call env::current_dir. ^_^
fyi: I ended up with this
extern crate libc;
use std::ffi::CStr;
use std::io;
use std::str;
static BUF_BYTES: usize = 4096;
fn main() {
let buf = unsafe {
let mut buf = Vec::with_capacity(BUF_BYTES);
let ptr = buf.as_mut_ptr() as *mut libc::c_char;
if libc::getcwd(ptr, buf.capacity()).is_null() {
panic!(io::Error::last_os_error());
}
CStr::from_ptr(ptr).to_bytes()
};
println!("result: {}", str::from_utf8(buf).unwrap());
}
This is unsafe and will lead to crashes (in the best case) or silent memory corruption or worse.
When a block ends, any variables within it will be dropped. In this case, the unsafe block creates buf, takes a pointer to it, makes a CStr with the pointer, then frees the Vec, invalidating the pointer. It then returns that CStr containing an invalid reference from the block.
Something like this is better:
extern crate libc;
use std::ffi::{CStr, CString};
use std::io;
use std::str;
static BUF_BYTES: usize = 4096;
fn main() {
let buf = unsafe {
// Allocate some space to store the result
let mut buf = Vec::with_capacity(BUF_BYTES);
// Call the function, panicking if it fails
let ptr = buf.as_mut_ptr() as *mut libc::c_char;
if libc::getcwd(ptr, buf.capacity()).is_null() {
panic!(io::Error::last_os_error());
}
// Find the first NUL and inform the vector of that
let s = CStr::from_ptr(ptr);
buf.set_len(s.to_bytes().len());
// Transfer ownership of the Vec to a CString, ensuring there are no interior NULs
CString::new(buf)
};
let s = buf.expect("Not a C string").into_string().expect("Not UTF-8");
println!("result: {}", s);
}
I wonder why this has actually worked
Likely because nothing changed the memory before you attempted to access it. In a heavily multithreaded environment, I could see more issues arising.
why is it possible to have two mutable references to the vector? First as mut buf and then as ptr = buf.as_mut_ptr(). The ownership has not moved, has it? Otherwise, why is it possible to call buf.capacity()
You don't actually have two references. buf owns the value, then you get a mutable pointer. There is no compiler protection for pointers, which is part of the reason that an unsafe block is needed