How do I expose a compile time generated static C string through FFI? - rust

I am trying to embed a version number into a library. Ideally, this should be a static C string that can be read and doesn't need any additional allocation for reading the version number.
On the Rust side, I am using vergen to generate the versioning information like this:
pub static VERSION: &str = env!("VERGEN_SEMVER");
and I would like to end up with something like
#[no_mangle]
pub static VERSION_C: *const u8 = ... ;
There seems to be a way to achieve this using string literals, but I haven't found a way to do this with compile time strings. Creating a new CString seems to be beyond the current capabilities of static variables and tends to end with an error E0015.
A function returning the pointer like this would be acceptable, as long as it does not allocate new memory.
#[no_mangle]
pub extern "C" fn get_version() -> *const u8 {
// ...
}
The final type of the variable (or return type of the function) doesn't have to be based on u8, but should be translatable through cbindgen. If some other FFI type is more appropriate, using that is perfectly fine.

By ensuring that the static string slice is compatible with a C-style string (as in, it ends with the null terminator byte \0), we can safely fetch a pointer to the beginning of the slice and pass that across the boundary.
pub static VERSION: &str = concat!(env!("VERGEN_SEMVER"), "\0");
#[no_mangle]
pub extern "C" fn get_version() -> *const c_char {
VER.as_ptr() as *const c_char
}
Here's an example in the Playground, where I used the package's version as the environment variable to fetch and called the function in Rust.

Related

Why are string constant pointers different across crates in Rust?

While working with a HashMap that uses &'static str as the key type, I created a newtype to hash by the pointer rather than by the string contents to reduce overhead.
pub struct StaticStr(&'static str);
impl Hash for StaticStr {
fn hash<H: Hasher>(&self, state: &mut H) {
self.0.as_ptr().hash(state)
}
}
impl PartialEq for StaticStr {
fn eq(&self, other: &Self) -> bool {
self.0.as_ptr() == other.0.as_ptr()
}
}
impl Eq for StaticStr {}
It turns out that this does not work consistently, as in the following example.
pub type MyMap = HashMap<StaticStr, u8>;
pub const A: &str = "A";
pub fn make_map() -> MyMap {
let mut map = MyMap::new();
map.insert(StaticStr(A), 1);
map
}
pub fn get_value(control: &MyMap) -> Option<u8> {
control.get(&StaticStr(A)).cloned()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
pub fn map_made_in_lib() {
let map = make_map();
assert_eq!(get_value(&map), Some(1));
}
#[test]
pub fn map_made_in_test() {
// Same as make_map()
let mut map = MyMap::new();
map.insert(StaticStr(A), 1);
// This check fails
assert_eq!(get_value(&map), Some(1));
}
}
Notice that in the first test, the string constant A is only used directly in the lib crate. In the second test, A is used directly in both the lib crate and the test crate. I discovered that although both tests use the same string constant, the pointers are different depending on which crate refers to the string constant by name. This is demonstrated in the minimal reproduction I created. I would have expected that the string literal be included only once for the crate that defines it, or at least that the linker would be smart enough to deduplicate the string literals. Is there a reason for this behavior?
Instead of a const try a static?
A constant item is an optionally named constant value which is not
associated with a specific memory location in the program. Constants
are essentially inlined wherever they are used, meaning that they are
copied directly into the relevant context when used. This includes
usage of constants from external crates, and non-Copy types.
References to the same constant are not necessarily guaranteed to
refer to the same memory address. -- The Rust Reference
A static item is similar to a constant, except that it represents a
precise memory location in the program. All references to the static
refer to the same memory location. Static items have the static
lifetime, which outlives all other lifetimes in a Rust program. Static
items do not call drop at the end of the program. -- The Rust Reference

Rust FFI - Dangling pointer

I work on a Rust library used, through C headers, in a Swift UI.
I can read from Swift in Rust, but I can't write right away to Swift (so from Rust) what I've just read.
--
Basically, I get to convert successfully in String an *const i8 saying hello world.
But the same String fails to be handled with consistency by as_ptr() (and so being parsed as UTF-8 in Swift) =>
Swift send hello world as *const i8
Rust handle it through let input: &str successfully (#1 print in get_message()) => rightly prints hello world
Now I can't convert this input &strto a pointer again:
the pointer can't be decoded by Swift
the "pointer encoding" changes at every call of the function (should be always the same output, as for "hello world".as_ptr())
Basically, why
"hello world".as_ptr() always have the same output and can be decoded by Swift
when input.as_ptr() has a different output every time called and can't never be decoded by Swift (where printing input rightly returns hello world)?
Do you guys have ideas?
#[derive(Debug)]
#[repr(C)]
pub struct MessageC {
pub message_bytes: *const u8,
pub message_len: libc::size_t,
}
/// # Safety
/// call of c_string_safe from Swift
/// => https://doc.rust-lang.org/std/ffi/struct.CStr.html#method.from_ptr
unsafe fn c_string_safe(cstring: *const i8) -> String {
CStr::from_ptr(cstring).to_string_lossy().into_owned()
}
/// # Safety
/// call of c_string_safe from Swift
/// => https://doc.rust-lang.org/std/ffi/struct.CStr.html#method.from_ptr
/// on `async extern "C"` => <https://stackoverflow.com/a/52521592/7281870>
#[no_mangle]
#[tokio::main] // allow async function, needed to call here other async functions (not this example but needed)
pub async unsafe extern "C" fn get_message(
user_input: *const i8,
) -> MessageC {
let input: &str = &c_string_safe(user_input);
println!("from Swift: {}", input); // [consistent] from Swift: hello world
println!("converted to ptr: {:?}", input.as_ptr()); // [inconsistent] converted to ptr: 0x60000079d770 / converted to ptr: 0x6000007b40b0
println!("directly to ptr: {:?}", "hello world".as_ptr()); // [consistent] directly to ptr: 0x1028aaf6f
MessageC {
message_bytes: input.as_ptr(),
message_len: input.len() as libc::size_t,
}
}
The way you construct MessageC is unsound and returns a dangling pointer. The code in get_message() is equivalent to this:
pub async unsafe extern "C" fn get_message(user_input: *const i8) -> MessageC {
let _invisible = c_string_safe(user_input);
let input: &str = &_invisible;
// let's skip the prints
let msg = MessageC {
message_bytes: input.as_ptr(),
message_len: input.len() as libc::size_t,
};
drop(_invisible);
return msg;
}
Hopefully this formulation highlights the issue: c_string_safe() returns an owned heap-allocated String which gets dropped (and its data deallocated) by the end of the function. input is a slice that refers to the data allocated by that String. In safe Rust you wouldn't be allowed to return a slice referring to a local variable such as input - you'd have to either return the String itself or limit yourself to passing the slice downwards to functions.
However, you're not using safe Rust and you're creating a pointer to the heap-allocated data. Now you have a problem because as soon as get_message() returns, the _invisible String gets deallocated, and the pointer you're returning is dangling. The dangling pointer may even appear to work because deallocation is not obligated to clear the data from memory, it just marks it as available for future allocations. But those future allocations can and will happen, perhaps from a different thread. Thus a program that references freed memory is bound to misbehave, often in an unpredictable fashion - which is precisely what you have observed.
In all-Rust code you'd resolve the issue by safely returning String instead. But you're doing FFI, so you must reduce the string to a pointer/length pair. Rust allows you to do just that, the easiest way being to just call std::mem::forget() to prevent the string from getting deallocated:
pub async unsafe extern "C" fn get_message(user_input: *const i8) -> MessageC {
let mut input = c_string_safe(user_input);
input.shrink_to_fit(); // ensure string capacity == len
let msg = MessageC {
message_bytes: input.as_ptr(),
message_len: input.len() as libc::size_t,
};
std::mem::forget(input); // prevent input's data from being deallocated on return
msg
}
But now you have a different problem: get_message() allocates a string, but how do you deallocate it? Just dropping MessageC won't do it because it just contains pointers. (And doing so by implementing Drop would probably be unwise because you're sending it to Swift or whatever.) The solution is to provide a separate function that re-creates the String from the MessageC and drops it immediately:
pub unsafe fn free_message_c(m: MessageC) {
// The call to `shrink_to_fit()` above makes it sound to re-assemble
// the string using a capacity equal to its length
drop(String::from_raw_parts(
m.message_bytes as *mut _,
m.message_len,
m.message_len,
));
}
You should call this function once you're done with MessageC, i.e. when the Swift code has done its job. (You could even make it extern "C" and call it from Swift.)
Finally, using "hello world".as_ptr() directly works because "hello world" is a static &str which is baked into the executable and never gets deallocated. In other words, it doesn't point to a String, it points to some static data that comes with the program.

Map C-like packed data structure to Rust struct

I'm fairly new to Rust and have spent most of my time writing code in C/C++. I have a flask webserver that returns back a packed data structure in the form of length + null-terminated string:
test_data = "Hello there bob!" + "\x00"
test_data = test_data.encode("utf-8")
data = struct.pack("<I", len(test_data ))
data += test_data
return data
In my rust code, I'm using the easy_http_request crate and can successfully get the response back by calling get_from_url_str. What I'm trying to do is map the returned response back to the Test data structure (if possible). I've attempted to use align_to to unsuccessfully get the string data mapped to the structure.
extern crate easy_http_request;
extern crate libc;
use easy_http_request::DefaultHttpRequest;
use libc::c_char;
#[repr(C, packed)]
#[derive(Debug, Clone, Copy)]
struct Test {
a: u32,
b: *const c_char // TODO: What do I put here???
}
fn main() {
let response = DefaultHttpRequest::get_from_url_str("http://localhost:5000/").unwrap().send().unwrap();
let (head, body, _tail) = unsafe { response.body.align_to::<Test>() };
let my_test: Test = body[0];
println!("{}", my_test.a); // Correctly prints '17'
println!("{:?}", my_test.b); // Fails
}
I'm not sure this is possible in Rust. In the response.body I can correctly see the null-terminated string, so I know the data is there. Just unsure if there's a way to map it to a string in the Test structure. There's no reason I need to use a null-terminated string. Ultimately, I'm just trying to map a data structure of size and a string to a Rust struct of the similar types.
It looks like you are confused by two different meanings of pack:
* In Python, pack is a protocol of sorts to serialize data into an array of bytes.
* In Rust, pack is a directive added to a struct to remove padding between members and disable other weirdness.
While they can be use together to make a protocol work, that is not the case, because in your pack you have a variable-length member. And trying to serialize/deserialize a pointer value directly is a very bad idea.
Your packed flask message is basically:
4 bytes litte endian value with the number of bytes in the string.
so many bytes indicated above for the string, encoded in utf-8.
For that you do not need a packed struct. The easiest way is to just read the fields manually, one by one. Something like this (error checking omitted):
use std::convert::TryInto;
let a = i32::from_le_bytes(response[0..4].try_into().unwrap());
let b = std::str::from_utf8(&response[4 .. 4 + a as usize]).unwrap();
Don't use raw pointers, they are unsafe to use and recommended only when there are strong reasons to
get around Rust’s safety guarantees.
At minumum a struct that fits your requirement is something like:
struct Test<'a> {
value: &'a str
}
or a String owned value for avoiding lifetime dependencies.
A reference to &str comprises a len and a pointer (it is not a C-like char * pointer).
By the way, the hard part is not the parsing of the wire protocol but to manage correctly all the possible
decoding errors and avoid unexpected runtime failures in case of buggy or malicious clients.
In order not to reinvent the wheel, an example with the parse combinator nom:
use nom::{
number::complete::le_u32,
bytes::complete::take,
error::ErrorKind,
IResult
};
use easy_http_request::DefaultHttpRequest;
use std::str::from_utf8;
#[derive(Debug, Clone)]
struct Test {
value: String
}
fn decode_len_value(bytes: &[u8]) -> IResult<&[u8], Test> {
let (buffer, len) = le_u32(bytes)?;
// take len-1 bytes because null char (\0) is accounted into len
let (remaining, val) = take(len-1)(buffer)?;
match from_utf8(val) {
Ok(strval) => Ok((remaining, Test {value: strval.to_owned()})),
Err(_) => Err(nom::Err::Error((remaining, ErrorKind::Char)))
}
}
fn main() {
let response = DefaultHttpRequest::get_from_url_str("http://localhost:5000/").unwrap().send().unwrap();
let result = decode_len_value(&response.body);
println!("{:?}", result);
}

static struct with C strings for lv2 plugin [duplicate]

This question already has answers here:
Creating a static C struct containing strings
(3 answers)
Closed 7 years ago.
I'm trying to learn Rust (newbie in low level programming), and want to translate a tiny lv2 amplifier (audio) plugin "amp.c" (C-code) from C to Rust. I actually got it working (here), but when the host terminates, valgrind says that "
64 bytes in 1 blocks are definitely lost". I think I know why this happens, but I don't know how to fix it.
Before you get tired of reading, here is the final question:
How do I statically allocate a struct that contains a C string?
And here is the introduction:
Why it happens (I think):
Host loads the library and calls lv2_descriptor()
const LV2_Descriptor*
lv2_descriptor()
{
return &descriptor;
}
which returns a pointer to a STATICALLY allocated struct of type LV2_Descriptor,
static const LV2_Descriptor descriptor = {
AMP_URI,
...
};
which is defined as
typedef struct _LV2_Descriptor {
const char * URI;
...
} LV2_Descriptor;
Why is it statically allocated? In the amp.c it says:
It is best to define descriptors statically to avoid leaking memory
and non-portable shared library constructors and destructors to clean
up properly.
However, I translated lv2_descriptor() to Rust as:
#[no_mangle]
pub extern fn lv2_descriptor(index:i32) -> *const LV2Descriptor {
let s = "http://example.org/eg-amp_rust";
let cstr = CString::new(s).unwrap();
let ptr = cstr.as_ptr();
mem::forget(cstr);
let mybox = Box::new(LV2Descriptor{amp_uri: ptr}, ...);
let bxptr = &*mybox as *const LV2Descriptor;
mem::forget(mybox);
return bxptr
}
So it's not statically allocated and I never free it, that's I guess why valgrind complains?
How am I trying to solve it?
I'm trying to do the same thing in Rust as the C-code does, i.e. statically allocate the struct (outside of lv2_descriptor()). The goal is to be fully compatible to the lv2 library, i.e "...to avoid leaking memory..." etc., as it says in the quote, right? So I tried something like:
static ptr1: *const u8 = (b"example.org/eg-amp_rust\n\0").as_ptr();
static ptr2: *const libc::c_char = ptr1 as *const libc::c_char;
static desc: LV2Descriptor = LV2Descriptor{amp_uri: ptr2, ...};
But this does not compile, there are error messages like
src/lib.rs:184:26: 184:72 error: the trait `core::marker::Sync` is not implemented for the type `*const u8` [E0277]
src/lib.rs:184 static ptr1: *const u8 = b"http://example.org/eg-amp_rust\n\0".as_ptr();
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/lib.rs:184:26: 184:72 note: `*const u8` cannot be shared between threads safely
src/lib.rs:184 static ptr1: *const u8 = b"http://example.org/eg-amp_rust\n\0".as_ptr();
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/lib.rs:184:26: 184:72 error: static contains unimplemented expression type [E0019]
src/lib.rs:184 static ptr1: *const u8 = b"http://example.org/eg-amp_rust\n\0".as_ptr();
Specific problem/question:
How do I statically allocate a struct that contains a C string?
The short answer is, you don't for now. Future Rust will probably gain this ability.
What you can do, is statically allocate a struct that contains null pointers, and set those null pointers to something useful when you call the function. Rust has static mut. It requires unsafe code, is not threadsafe at all and is (to the best of my knowledge) considered a code smell.
Right here I consider it a workaround to the fact that there is no way to turn a &[T] into a *const T in a static.
static S: &'static [u8] = b"http://example.org/eg-amp_rust\n\0";
static mut desc: LV2Descriptor = LV2Descriptor {
amp_uri: 0 as *const libc::c_char, // ptr::null() isn't const fn (yet)
};
#[no_mangle]
pub extern fn lv2_descriptor(index: i32) -> *const LV2Descriptor {
let ptr = S.as_ptr() as *const libc::c_char;
unsafe {
desc.amp_uri = ptr;
&desc as *const LV2Descriptor
}
}

C library freeing a pointer coming from Rust

I want to do Rust bindings to a C library which requires a callback, and this callback must return a C-style char* pointer to the C library which will then free it.
The callback must be in some sense exposed to the user of my library (probably using closures), and I want to provide a Rust interface as convenient as possible (meaning accepting a String output if possible).
However, the C library complains when trying to free() a pointer coming from memory allocated by Rust, probably because Rust uses jemalloc and the C library uses malloc.
So currently I can see two workarounds using libc::malloc(), but both of them have disadvantages:
Give the user of the library a slice that he must fill (inconvenient, and imposes length restrictions)
Take his String output, copy it to an array allocated by malloc, and then free the String (useless copy and allocation)
Can anybody see a better solution?
Here is an equivalent of the interface of the C library, and the implementation of the ideal case (if the C library could free a String allocated in Rust)
extern crate libc;
use std::ffi::CString;
use libc::*;
use std::mem;
extern "C" {
// The second parameter of this function gets passed as an argument of my callback
fn need_callback(callback: extern fn(arbitrary_data: *mut c_void) -> *mut c_char,
arbitrary_data: *mut c_void);
}
// This function must return a C-style char[] that will be freed by the C library
extern fn my_callback(arbitrary_data: *mut c_void) -> *mut c_char {
unsafe {
let mut user_callback: *mut &'static mut FnMut() -> String = mem::transmute(arbitrary_data); //'
let user_string = (*user_callback)();
let c_string = CString::new(user_string).unwrap();
let ret: *mut c_char = mem::transmute(c_string.as_ptr());
mem::forget(c_string); // To prevent deallocation by Rust
ret
}
}
pub fn call_callback(mut user_callback: &mut FnMut() -> String) {
unsafe {
need_callback(my_callback, mem::transmute(&mut user_callback));
}
}
The C part would be equivalent to this:
#include <stdlib.h>
typedef char* (*callback)(void *arbitrary_data);
void need_callback(callback cb, void *arbitrary_data) {
char *user_return = cb(arbitrary_data);
free(user_return); // Complains as the pointer has been allocated with jemalloc
}
It might require some annoying work on your part, but what about exposing a type that implements Write, but is backed by memory allocated via malloc? Then, your client can use the write! macro (and friends) instead of allocating a String.
Here's how it currently works with Vec:
let mut v = Vec::new();
write!(&mut v, "hello, world");
You would "just" need to implement the two methods and then you would have a stream-like interface.

Resources