I've got a C function that (simplified) looks like this:
static char buffer[13];
void get_string(const char **s) {
sprintf(buffer, "Hello World!");
*s = buffer;
}
I've declared it in Rust:
extern pub fn get_string(s: *mut *const c_char);
But I can't figure out the required incantation to call it, and convert the result to a Rust string. Everything I've tried either fails to compile, or causes a SEGV.
Any pointers?
First of all, char in Rust is not the equivalent to a char in C:
The char type represents a single character. More specifically, since 'character' isn't a well-defined concept in Unicode, char is a 'Unicode scalar value', which is similar to, but not the same as, a 'Unicode code point'.
In Rust you may use u8 or i8 depending in the operating system. You can use std::os::raw::c_char for this:
Equivalent to C's char type.
C's char type is completely unlike Rust's char type; while Rust's type represents a unicode scalar value, C's char type is just an ordinary integer. This type will always be either i8 or u8, as the type is defined as being one byte long.
C chars are most commonly used to make C strings. Unlike Rust, where the length of a string is included alongside the string, C strings mark the end of a string with the character '\0'. See CStr for more information.
First, we need a variable, which can be passed to the function:
let mut ptr: *const c_char = std::mem::uninitialized();
To pass it as *mut you simply can use a reference:
get_string(&mut ptr);
Now use the *const c_char for creating a CStr:
let c_str = CStr::from_ptr(ptr);
For converting it to a String you can choose:
c_str.to_string_lossy().to_string()
or
c_str().to_str().unwrap().to_string()
However, you shouldn't use String if you don't really need to. In most scenarios, a Cow<str> fulfills the needs. It can be obtained with c_str.to_string_lossy():
If the contents of the CStr are valid UTF-8 data, this function will return a Cow::Borrowed([&str]) with the the corresponding [&str] slice. Otherwise, it will replace any invalid UTF-8 sequences with U+FFFD REPLACEMENT CHARACTER and return a Cow::[Owned](String) with the result.
You can see this in action on the Playground. This Playground shows the usage with to_string_lossy().
Combine Passing a Rust variable to a C function that expects to be able to modify it
unsafe {
let mut c_buf = std::ptr::null();
get_string(&mut c_buf);
}
With How do I convert a C string into a Rust string and back via FFI?:
extern crate libc;
use libc::c_char;
use std::ffi::CStr;
use std::str;
extern "C" {
fn get_string(s: *mut *const c_char);
}
fn main() {
unsafe {
let mut c_buf = std::ptr::null();
get_string(&mut c_buf);
let c_str = CStr::from_ptr(c_buf);
let str_slice: &str = c_str.to_str().unwrap();
let str_buf: String = str_slice.to_owned(); // if necessary
};
}
Related
How does one create a std::ffi::CString from a String in Rust?
Assume the String is already stored in a variable that can be moved if necessary, NOT a literal like it is in many of the examples for constructing a CString.
I have studied the docs for both CString:
https://doc.rust-lang.org/std/ffi/struct.CString.html
and String:
https://doc.rust-lang.org/std/string/struct.String.html
and I still don't see the path. You must have to go through one of the many pointer types; Into and From aren't implemented for these types, so .into() doesn't work.
String implements Into<Vec<u8>> already:
use std::ffi::CString;
fn main() {
let ss = "Hello world".to_string();
let s = CString::new(ss).unwrap();
println!("{:?}", s);
}
Playground
The first part of the question is probably pretty common and there are enough code samples that explain how to generate a random string of alphanumerics. The piece of code I use is from here.
use rand::{thread_rng, Rng};
use rand::distributions::Alphanumeric;
fn main() {
let rand_string: String = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect();
println!("{}", rand_string);
}
This piece of code does however not compile, (note: I'm on nightly):
error[E0277]: a value of type `String` cannot be built from an iterator over elements of type `u8`
--> src/main.rs:8:10
|
8 | .collect();
| ^^^^^^^ value of type `String` cannot be built from `std::iter::Iterator<Item=u8>`
|
= help: the trait `FromIterator<u8>` is not implemented for `String`
Ok, the elements that are generated are of type u8. So I guess this is an array or vector of u8:
use rand::{thread_rng, Rng};
use rand::distributions::Alphanumeric;
fn main() {
let r = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect::<Vec<_>>();
let s = String::from_utf8_lossy(&r);
println!("{}", s);
}
And this compiles and works!
2dCsTqoNUR1f0EzRV60IiuHlaM4TfK
All good, except that I would like to ask if someone could explain what exactly happens regarding the types and how this can be optimised.
Questions
.sample_iter(&Alphanumeric) produces u8 and not chars?
How can I avoid the second variable s and directly interpret an u8 as a utf-8 character? I guess the representation in memory would not change at all?
The length of these strings should always be 30. How can I optimise the heap allocation of a Vec away? Also they could actually be char[] instead of Strings.
.sample_iter(&Alphanumeric) produces u8 and not chars?
Yes, this was changed in rand v0.8. You can see in the docs for 0.7.3:
impl Distribution<char> for Alphanumeric
But then in the docs for 0.8.0:
impl Distribution<u8> for Alphanumeric
How can I avoid the second variable s and directly interpret an u8 as a utf-8 character? I guess the representation in memory would not change at all?
There are a couple of ways to do this, the most obvious being to just cast every u8 to a char:
let s: String = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.map(|x| x as char)
.collect();
Or, using the From<u8> instance of char:
let s: String = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.map(char::from)
.collect();
Of course here, since you know every u8 must be valid UTF-8, you can use String::from_utf8_unchecked, which is faster than from_utf8_lossy (although probably around the same speed as the as char method):
let s = unsafe {
String::from_utf8_unchecked(
thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect::<Vec<_>>(),
)
};
If, for some reason, the unsafe bothers you and you want to stay safe, then you can use the slower String::from_utf8 and unwrap the Result so you get a panic instead of UB (even though the code should never panic or UB):
let s = String::from_utf8(
thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect::<Vec<_>>(),
).unwrap();
The length of these strings should always be 30. How can I optimise the heap allocation of a Vec away? Also they could actually be char[] instead of Strings.
First of all, trust me, you don't want arrays of chars. They are not fun to work with. If you want a stack string, have a u8 array then use a function like std::str::from_utf8 or the faster std::str::from_utf8_unchecked (again only usable since you know valid utf8 will be generated.)
As to optimizing the heap allocation away, refer to this answer. Basically, it's not possible with a bit of hackiness/ugliness (such as making your own function that collects an iterator into an array of 30 elements).
Once const generics are finally stabilized, there'll be a much prettier solution.
The first example in the docs for rand::distributions::Alphanumeric shows that if you want to convert the u8s into chars you should map them using the char::from function:
use rand::{thread_rng, Rng};
use rand::distributions::Alphanumeric;
fn main() {
let rand_string: String = thread_rng()
.sample_iter(&Alphanumeric)
.map(char::from) // map added here
.take(30)
.collect();
println!("{}", rand_string);
}
playground
I want to return a vector in a pub extern "C" fn. Since a vector has an arbitrary length, I guess I need to return a struct with
the pointer to the vector, and
the number of elements in the vector
My current code is:
extern crate libc;
use self::libc::{size_t, int32_t, int64_t};
// struct to represent an array and its size
#[repr(C)]
pub struct array_and_size {
values: int64_t, // this is probably not how you denote a pointer, right?
size: int32_t,
}
// The vector I want to return the address of is already in a Boxed struct,
// which I have a pointer to, so I guess the vector is on the heap already.
// Dunno if this changes/simplifies anything?
#[no_mangle]
pub extern "C" fn rle_show_values(ptr: *mut Rle) -> array_and_size {
let rle = unsafe {
assert!(!ptr.is_null());
&mut *ptr
};
// this is the Vec<i32> I want to return
// the address and length of
let values = rle.values;
let length = values.len();
array_and_size {
values: Box::into_raw(Box::new(values)),
size: length as i32,
}
}
#[derive(Debug, PartialEq)]
pub struct Rle {
pub values: Vec<i32>,
}
The error I get is
$ cargo test
Compiling ranges v0.1.0 (file:///Users/users/havpryd/code/rust-ranges)
error[E0308]: mismatched types
--> src/rle.rs:52:17
|
52 | values: Box::into_raw(Box::new(values)),
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected i64, found *-ptr
|
= note: expected type `i64`
= note: found type `*mut std::vec::Vec<i32>`
error: aborting due to previous error
error: Could not compile `ranges`.
To learn more, run the command again with --verbose.
-> exit code: 101
I posted the whole thing because I could not find an example of returning arrays/vectors in the eminently useful Rust FFI Omnibus.
Is this the best way to return a vector of unknown size from Rust? How do I fix my remaining compile error? Thanks!
Bonus q: if the fact that my vector is in a struct changes the answer, perhaps you could also show how to do this if the vector was not in a Boxed struct already (which I think means the vector it owns is on the heap too)? I guess many people looking up this q will not have their vectors boxed already.
Bonus q2: I only return the vector to view its values (in Python), but I do not want to let the calling code change the vector. But I guess there is no way to make the memory read-only and ensure the calling code does not fudge with the vector? const is just for showing intent, right?
Ps: I do not know C or Rust well, so my attempt might be completely WTF.
pub struct array_and_size {
values: int64_t, // this is probably not how you denote a pointer, right?
size: int32_t,
}
First of all, you're correct. The type you want for values is *mut int32_t.
In general, and note that there are a variety of C coding styles, C often doesn't "like" returning ad-hoc sized array structs like this. The more common C API would be
int32_t rle_values_size(RLE *rle);
int32_t *rle_values(RLE *rle);
(Note: many internal programs do in fact use sized array structs, but this is by far the most common for user-facing libraries because it's automatically compatible with the most basic way of representing arrays in C).
In Rust, this would translate to:
extern "C" fn rle_values_size(rle: *mut RLE) -> int32_t
extern "C" fn rle_values(rle: *mut RLE) -> *mut int32_t
The size function is straightforward, to return the array, simply do
extern "C" fn rle_values(rle: *mut RLE) -> *mut int32_t {
unsafe { &mut (*rle).values[0] }
}
This gives a raw pointer to the first element of the Vec's underlying buffer, which is all C-style arrays really are.
If, instead of giving C a reference to your data you want to give C the data, the most common option would be to allow the user to pass in a buffer that you clone the data into:
extern "C" fn rle_values_buf(rle: *mut RLE, buf: *mut int32_t, len: int32_t) {
use std::{slice,ptr}
unsafe {
// Make sure we don't overrun our buffer's length
if len > (*rle).values.len() {
len = (*rle).values.len()
}
ptr::copy_nonoverlapping(&(*rle).values[0], buf, len as usize);
}
}
Which, from C, looks like
void rle_values_buf(RLE *rle, int32_t *buf, int32_t len);
This (shallowly) copies your data into the presumably C-allocated buffer, which the C user is then responsible for destroying. It also prevents multiple mutable copies of your array from floating around at the same time (assuming you don't implement the version that returns a pointer).
Note that you could sort of "move" the array into C as well, but it's not particularly recommended and involves the use mem::forget and expecting the C user to explicitly call a destruction function, as well as requiring both you and the user to obey some discipline that may be difficult to structure the program around.
If you want to receive an array from C, you essentially just ask for both a *mut i32 and i32 corresponding to the buffer start and length. You can assemble this into a slice using the from_raw_parts function, and then use the to_vec function to create an owned Vector containing the values allocated from the Rust side. If you don't plan on needing to own the values, you can simply pass around the slice you produced via from_raw_parts.
However, it is imperative that all values be initialized from either side, typically to zero. Otherwise you invoke legitimately undefined behavior which often results in segmentation faults (which tend to frustratingly disappear when inspected with GDB).
There are multiple ways to pass an array to C.
First of all, while C has the concept of fixed-size arrays (int a[5] has type int[5] and sizeof(a) will return 5 * sizeof(int)), it is not possible to directly pass an array to a function or return an array from it.
On the other hand, it is possible to wrap a fixed size array in a struct and return that struct.
Furthermore, when using an array, all elements must be initialized, otherwise a memcpy technically has undefined behavior (as it is reading from undefined values) and valgrind will definitely report the issue.
Using a dynamic array
A dynamic array is an array whose length is unknown at compile-time.
One may chose to return a dynamic array if no reasonable upper-bound is known, or this bound is deemed too large for passing by value.
There are two ways to handle this situation:
ask C to pass a suitably sized buffer
allocate a buffer and return it to C
They differ in who allocates the memory: the former is simpler, but may require to either have a way to hint at a suitable size or to be able to "rewind" if the size proves unsuitable.
Ask C to pass a suitable sized buffer
// file.h
int rust_func(int32_t* buffer, size_t buffer_length);
// file.rs
#[no_mangle]
pub extern fn rust_func(buffer: *mut libc::int32_t, buffer_length: libc::size_t) -> libc::c_int {
// your code here
}
Note the existence of std::slice::from_raw_parts_mut to transform this pointer + length into a mutable slice (do initialize it with 0s before making it a slice or ask the client to).
Allocate a buffer and return it to C
// file.h
struct DynArray {
int32_t* array;
size_t length;
}
DynArray rust_alloc();
void rust_free(DynArray);
// file.rs
#[repr(C)]
struct DynArray {
array: *mut libc::int32_t,
length: libc::size_t,
}
#[no_mangle]
pub extern fn rust_alloc() -> DynArray {
let mut v: Vec<i32> = vec!(...);
let result = DynArray {
array: v.as_mut_ptr(),
length: v.len() as _,
};
std::mem::forget(v);
result
}
#[no_mangle]
pub extern fn rust_free(array: DynArray) {
if !array.array.is_null() {
unsafe { Box::from_raw(array.array); }
}
}
Using a fixed-size array
Similarly, a struct containing a fixed size array can be used. Note that both in Rust and C all elements should be initialized, even if unused; zeroing them works well.
Similarly to the dynamic case, it can be either passed by mutable pointer or returned by value.
// file.h
struct FixedArray {
int32_t array[32];
};
// file.rs
#[repr(C)]
struct FixedArray {
array: [libc::int32_t; 32],
}
What is wrong with this:
fn main() {
let word: &str = "lowks";
assert_eq!(word.chars().rev(), "skwol");
}
I get an error like this:
error[E0369]: binary operation `==` cannot be applied to type `std::iter::Rev<std::str::Chars<'_>>`
--> src/main.rs:4:5
|
4 | assert_eq!(word.chars().rev(), "skwol");
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: an implementation of `std::cmp::PartialEq` might be missing for `std::iter::Rev<std::str::Chars<'_>>`
= note: this error originates in a macro outside of the current crate
What is the correct way to do this?
Since, as #DK. suggested, .graphemes() isn't available on &str in stable, you might as well just do what #huon suggested in the comments:
fn main() {
let foo = "palimpsest";
println!("{}", foo.chars().rev().collect::<String>());
}
The first, and most fundamental, problem is that this isn't how you reverse a Unicode string. You are reversing the order of the code points, where you want to reverse the order of graphemes. There may be other issues with this that I'm not aware of. Text is hard.
The second issue is pointed out by the compiler: you are trying to compare a string literal to a char iterator. chars and rev don't produce new strings, they produce lazy sequences, as with iterators in general. The following works:
/*!
Add the following to your `Cargo.toml`:
```cargo
[dependencies]
unicode-segmentation = "0.1.2"
```
*/
extern crate unicode_segmentation;
use unicode_segmentation::UnicodeSegmentation;
fn main() {
let word: &str = "loẅks";
let drow: String = word
// Split the string into an Iterator of &strs, where each element is an
// extended grapheme cluster.
.graphemes(true)
// Reverse the order of the grapheme iterator.
.rev()
// Collect all the chars into a new owned String.
.collect();
assert_eq!(drow, "skẅol");
// Print it out to be sure.
println!("drow = `{}`", drow);
}
Note that graphemes used to be in the standard library as an unstable method, so the above will break with sufficiently old versions of Rust. In that case, you need to use UnicodeSegmentation::graphemes(s, true) instead.
If you are just dealing with ASCII characters, you can make the reversal in place with the unstable reverse function for slices.
It is doing something like that:
fn main() {
let mut slice = *b"lowks";
let end = slice.len() - 1;
for i in 0..end / 2 {
slice.swap(i, end - i);
}
assert_eq!(std::str::from_utf8(&slice).unwrap(), "skwol");
}
Playground
How do I convert a String into a &str? More specifically, I would like to convert it into a str with the static lifetime (&'static str).
Updated for Rust 1.0
You cannot obtain &'static str from a String because Strings may not live for the entire life of your program, and that's what &'static lifetime means. You can only get a slice parameterized by String own lifetime from it.
To go from a String to a slice &'a str you can use slicing syntax:
let s: String = "abcdefg".to_owned();
let s_slice: &str = &s[..]; // take a full slice of the string
Alternatively, you can use the fact that String implements Deref<Target=str> and perform an explicit reborrowing:
let s_slice: &str = &*s; // s : String
// *s : str (via Deref<Target=str>)
// &*s: &str
There is even another way which allows for even more concise syntax but it can only be used if the compiler is able to determine the desired target type (e.g. in function arguments or explicitly typed variable bindings). It is called deref coercion and it allows using just & operator, and the compiler will automatically insert an appropriate amount of *s based on the context:
let s_slice: &str = &s; // okay
fn take_name(name: &str) { ... }
take_name(&s); // okay as well
let not_correct = &s; // this will give &String, not &str,
// because the compiler does not know
// that you want a &str
Note that this pattern is not unique for String/&str - you can use it with every pair of types which are connected through Deref, for example, with CString/CStr and OsString/OsStr from std::ffi module or PathBuf/Path from std::path module.
You can do it, but it involves leaking the memory of the String. This is not something you should do lightly. By leaking the memory of the String, we guarantee that the memory will never be freed (thus the leak). Therefore, any references to the inner object can be interpreted as having the 'static lifetime.
fn string_to_static_str(s: String) -> &'static str {
Box::leak(s.into_boxed_str())
}
fn main() {
let mut s = String::new();
std::io::stdin().read_line(&mut s).unwrap();
let s: &'static str = string_to_static_str(s);
}
As of Rust version 1.26, it is possible to convert a String to &'static str without using unsafe code:
fn string_to_static_str(s: String) -> &'static str {
Box::leak(s.into_boxed_str())
}
This converts the String instance into a boxed str and immediately leaks it. This frees all excess capacity the string may currently occupy.
Note that there are almost always solutions that are preferable over leaking objects, e.g. using the crossbeam crate if you want to share state between threads.
TL;DR: you can get a &'static str from a String which itself has a 'static lifetime.
Although the other answers are correct and most useful, there's a (not so useful) edge case, where you can indeed convert a String to a &'static str:
The lifetime of a reference must always be shorter or equal to the lifetime of the referenced object. I.e. the referenced object has to live longer (or equal long) than the reference. Since 'static means the entire lifetime of a program, a longer lifetime does not exist. But an equal lifetime will be sufficient. So if a String has a lifetime of 'static, you can get a &'static str reference from it.
Creating a static of type String has theoretically become possible with Rust 1.31 when the const fn feature was released. Unfortunately, the only const function returning a String is String::new() currently, and it's still behind a feature gate (so Rust nightly is required for now).
So the following code does the desired conversion (using nightly) ...and actually has no practical use except for completeness of showing that it is possible in this edge case.
#![feature(const_string_new)]
static MY_STRING: String = String::new();
fn do_something(_: &'static str) {
// ...
}
fn main() {
do_something(&MY_STRING);
}