casting *mut u8 to &[u8] without std - rust

I'm writing Rust code for WebAssembly to handle strings from JavaScript land.
Since WebAssembly has no real string type, I'm trying to pass a pointer to WebAssembly memory object which points to UTF-8 encoded string.
#[no_mangle]
pub extern "C" fn check(ptr: *mut u8, length: u32) -> u32 {
unsafe {
let buf: &[u8] = std::slice::from_raw_parts(ptr, length as usize);
// do some operations on buf
0
}
}
It works fine, expect that I have to depend on the std crate, which bloats the final binary to about 600KB.
Is there any way to get rid of std::slice::from_raw_parts but still be able to cast a raw pointer to a slice?

You cannot cast a raw pointer to a slice because in Rust, a slice is not a mere pointer, it is a pointer and a size (otherwise it could not be safe).
If you do not want to use std, you can use the core crate:
extern crate core;
#[no_mangle]
pub extern "C" fn check(ptr: *mut u8, length: u32) -> u32 {
unsafe {
let buf: &mut [u8] = core::slice::from_raw_parts_mut(ptr, length as usize);
}
// do some operations on buf
0
}
The core crate is the part of the std crate suitable for embedded, i.e. without all the stuff that needs some allocation.

It is possible to manually construct something similar to a slice, which is a fat pointer that consists of a thin pointer and a length. Then cast a pointer-to-this-construct to a pointer-to-slice.
This approach is not only unsafe, it also relies on Rust internals (memory layout of a slice) that are not guaranteed to remain stable between compiler version, or even systems I suppose. #Boiethios' answer is the way to go if you want to be sure that your code works correctly in the future. However, for educational purposes, the code below may still be interesting:
unsafe fn make_slice<'a>(ptr: *const u8, len: usize) -> &'a [u8] {
// place pointer address and length in contiguous memory
let x: [usize; 2] = [ptr as usize, len];
// cast pointer to array as pointer to slice
let slice_ptr = &x as * const _ as *const &[u8];
// dereference pointer to slice, so we get a slice
*slice_ptr
}
fn main() {
let src: Vec<u8> = vec![1, 2, 3, 4, 5, 6];
let raw_ptr = &src[1] as *const u8;
unsafe {
println!("{:?}", make_slice(raw_ptr, 3)); // [2, 3, 4]
}
}
(tested on playground with Rust Stable 1.26.2)

Related

How to create a smart pointer to an unsized type with an embedded slice?

I'm trying to avoid multiple heap allocations by using something like C's flexible-array member. For that, I need to allocate an unsized struct, but I didn't find any way to do that through smart pointers. I'm specifically interested in Rc, but this is also the case for Box, so that's what I'll use in the example.
Here is the closest I've gotten so far:
use std::alloc::{self, Layout};
struct Inner {/* Sized fields */}
#[repr(C)] // Ensure the array is always last
// Both `inner` and `arr` need to be allocated, but preferably not separately
struct Unsized {
inner: Inner,
arr: [usize],
}
pub struct Exposed(Box<Unsized>);
impl Exposed {
pub fn new(capacity: usize) -> Self {
// Create a layout of an `Inner` followed by the array
let (layout, arr_base) = Layout::array::<usize>(capacity)
.and_then(|arr_layout| Layout::new::<Inner>().extend(arr_layout))
.unwrap();
let ptr = unsafe { alloc::alloc(layout) };
// At this point, `ptr` is `*mut u8` and the compiler doesn't know the size of the allocation
if ptr.is_null() {
panic!("Internal allocation error");
}
unsafe {
ptr.cast::<Inner>()
.write(Inner {/* Initialize sized fields */});
let tmp_ptr = ptr.add(arr_base).cast::<usize>();
// Initialize the array elements, in this case to 0
(0..capacity).for_each(|i| tmp_ptr.add(i).write(0));
// At this point everything is initialized and can safely be converted to `Box`
Self(Box::from_raw(ptr as *mut _))
}
}
}
This doesn't compile:
error[E0607]: cannot cast thin pointer `*mut u8` to fat pointer `*mut Unsized`
--> src/lib.rs:32:28
|
32 | Self(Box::from_raw(ptr as *mut _))
| ^^^^^^^^^^^^^
I could work directly with *mut u8, but that seems extremely error-prone and requires manual dropping.
Is there a way to create a fat pointer from ptr, since I actually know the allocation size, or to create a smart pointer from a compound unsized type?
The problem is that the pointer *mut Unsized is a wide pointer, so not just an address, but an address and the number of elements in the slice. The pointer *mut u8 on the other hand contains no information about the length of the slice. The standard library supplies the
std::ptr::slice_from_raw_parts and,
std::ptr::slice_from_raw_parts_mut
for this situtation. So you first create a fake (and wrong) *mut usize
ptr as *mut usize
which then allows
slice_from_raw_parts_mut(ptr as *mut usize, capacity)
to create a fake (and still wrong) *mut [usize] with the correct length field in the wide pointer, which we then unceremonously cast
slice_from_raw_parts_mut(ptr as *mut usize, capacity) as *mut Unsized
which does nothing but change the type (the value is unchanged), so we get the correct pointer that we can now finally feed into Box::from_raw
Full example demonstrating this post:
use std::alloc::{self, Layout};
struct Inner {/* Sized fields */}
#[repr(C)] // Ensure the array is always last
// Both `inner` and `arr` need to be allocated, but preferably not separately
struct Unsized {
inner: Inner,
arr: [usize],
}
pub struct Exposed(Box<Unsized>);
impl Exposed {
pub fn new(capacity: usize) -> Self {
// Create a layout of an `Inner` followed by the array
let (layout, arr_base) = Layout::array::<usize>(capacity)
.and_then(|arr_layout| Layout::new::<Inner>().extend(arr_layout))
.unwrap();
let ptr = unsafe { alloc::alloc(layout) };
// At this point, `ptr` is `*mut u8` and the compiler doesn't know the size of the allocation
if ptr.is_null() {
panic!("Internal allocation error");
}
unsafe {
ptr.cast::<Inner>()
.write(Inner {/* Initialize sized fields */});
let tmp_ptr = ptr.add(arr_base).cast::<usize>();
// Initialize the array elements, in this case to 0
(0..capacity).for_each(|i| tmp_ptr.add(i).write(0));
}
// At this point everything is initialized and can safely be converted to `Box`
unsafe {
Self(Box::from_raw(
std::ptr::slice_from_raw_parts_mut(ptr as *mut usize, capacity) as *mut Unsized,
))
}
}
}
Playground
Side note: You do not need #[repr(C)] to make sure that the unsized slice field is at the end, that is guaranteed. What you need it for is knowing the offsets to the fields.

Creating a Vec in Rust from a C array pointer and safely freeing it?

I'm calling a C function from Rust which takes a null pointer as as an argument, then allocates some memory to point it to.
What is the correct way to efficiently (i.e. avoiding unnecessary copies) and safely (i.e. avoid memory leaks or segfaults) turn data from the C pointer into a Vec?
I've got something like:
extern "C" {
// C function that allocates an array of floats
fn allocate_data(data_ptr: *mut *const f32, data_len: *mut i32);
}
fn get_vec() -> Vec<f32> {
// C will set this to length of array it allocates
let mut data_len: i32 = 0;
// C will point this at the array it allocates
let mut data_ptr: *const f32 = std::ptr::null_mut();
unsafe { allocate_data(&mut data_ptr, &mut data_len) };
let data_slice = unsafe { slice::from_raw_parts(data_ptr as *const f32, data_len as usize) };
data_slice.to_vec()
}
If I understand correctly, .to_vec() will copy data from the slice into a new Vec, so the underlying memory will still need to be freed (as the underlying memory for the slice won't be freed when it's dropped).
What is the correct approach for dealing with the above?
can I create a Vec which takes ownership of the underlying memory, which is freed when the Vec is freed?
if not, where/how in Rust should I free the memory that the C function allocated?
anything else in the above that could/should be improved on?
can I create a Vec which takes ownership of the underlying memory, which is freed when the Vec is freed?
Not safely, no. You must not use Vec::from_raw_parts unless the pointer came from a Vec originally (well, from the same memory allocator). Otherwise, you will try to free memory that your allocator doesn't know about; a very bad idea.
Note that the same thing is true for String::from_raw_parts, as a String is a wrapper for a Vec<u8>.
where/how in Rust should I free the memory that the C function allocated?
As soon as you are done with it and no sooner.
anything else in the above that could/should be improved on?
There's no need to cast the pointer when calling slice::from_raw_parts
There's no need for explicit types on the variables
Use ptr::null, not ptr::null_mut
Perform a NULL pointer check
Check the length is non-negative
use std::{ptr, slice};
extern "C" {
fn allocate_data(data_ptr: *mut *const f32, data_len: *mut i32);
fn deallocate_data(data_ptr: *const f32);
}
fn get_vec() -> Vec<f32> {
let mut data_ptr = ptr::null();
let mut data_len = 0;
unsafe {
allocate_data(&mut data_ptr, &mut data_len);
assert!(!data_ptr.is_null());
assert!(data_len >= 0);
let v = slice::from_raw_parts(data_ptr, data_len as usize).to_vec();
deallocate_data(data_ptr);
v
}
}
fn main() {}
You didn't state why you need it to be a Vec, but if you never need to change the size, you can create your own type that can be dereferenced as a slice and drops the data when appropriate:
use std::{ptr, slice};
extern "C" {
fn allocate_data(data_ptr: *mut *const f32, data_len: *mut i32);
fn deallocate_data(data_ptr: *const f32);
}
struct CVec {
ptr: *const f32,
len: usize,
}
impl std::ops::Deref for CVec {
type Target = [f32];
fn deref(&self) -> &[f32] {
unsafe { slice::from_raw_parts(self.ptr, self.len) }
}
}
impl Drop for CVec {
fn drop(&mut self) {
unsafe { deallocate_data(self.ptr) };
}
}
fn get_vec() -> CVec {
let mut ptr = ptr::null();
let mut len = 0;
unsafe {
allocate_data(&mut ptr, &mut len);
assert!(!ptr.is_null());
assert!(len >= 0);
CVec {
ptr,
len: len as usize,
}
}
}
fn main() {}
See also:
How to convert a *const pointer into a Vec to correctly drop it?
Is it possible to call a Rust function taking a Vec from C?

How to get the offset of a struct member in Rust? (offsetof) [duplicate]

I have a type:
struct Foo {
memberA: Bar,
memberB: Baz,
}
and a pointer which I know is a pointer to memberB in Foo:
p: *const Baz
What is the correct way to get a new pointer p: *const Foo which points to the original struct Foo?
My current implementation is the following, which I'm pretty sure invokes undefined behavior due to the dereference of (p as *const Foo) where p is not a pointer to a Foo:
let p2 = p as usize -
((&(*(p as *const Foo)).memberB as *const _ as usize) - (p as usize));
This is part of FFI - I can't easily restructure the code to avoid needing to perform this operation.
This is very similar to Get pointer to object from pointer to some member but for Rust, which as far as I know has no offsetof macro.
The dereference expression produces an lvalue, but that lvalue is not actually read from, we're just doing pointer math on it, so in theory, it should be well defined. That's just my interpretation though.
My solution involves using a null pointer to retrieve the offset to the field, so it's a bit simpler than yours as it avoids one subtraction (we'd be subtracting 0). I believe I saw some C compilers/standard libraries implementing offsetof by essentially returning the address of a field from a null pointer, which is what inspired the following solution.
fn main() {
let p: *const Baz = 0x1248 as *const _;
let p2: *const Foo = unsafe { ((p as usize) - (&(*(0 as *const Foo)).memberB as *const _ as usize)) as *const _ };
println!("{:p}", p2);
}
We can also define our own offset_of! macro:
macro_rules! offset_of {
($ty:ty, $field:ident) => {
unsafe { &(*(0 as *const $ty)).$field as *const _ as usize }
}
}
fn main() {
let p: *const Baz = 0x1248 as *const _;
let p2: *const Foo = ((p as usize) - offset_of!(Foo, memberB)) as *const _;
println!("{:p}", p2);
}
With the implementation of RFC 2582, raw reference MIR operator, it is now possible to get the address of a field in a struct without an instance of the struct and without invoking undefined behavior.
use std::{mem::MaybeUninit, ptr};
struct Example {
a: i32,
b: u8,
c: bool,
}
fn main() {
let offset = unsafe {
let base = MaybeUninit::<Example>::uninit();
let base_ptr = base.as_ptr();
let c = ptr::addr_of!((*base_ptr).c);
(c as usize) - (base_ptr as usize)
};
println!("{}", offset);
}
The implementation of this is tricky and nuanced. It is best to use a crate that is well-maintained, such as memoffset.
Before this functionality was stabilized, you must have a valid instance of the struct. You can use tools like once_cell to minimize the overhead of the dummy value that you need to create:
use once_cell::sync::Lazy; // 1.4.1
struct Example {
a: i32,
b: u8,
c: bool,
}
static DUMMY: Lazy<Example> = Lazy::new(|| Example {
a: 0,
b: 0,
c: false,
});
static OFFSET_C: Lazy<usize> = Lazy::new(|| {
let base: *const Example = &*DUMMY;
let c: *const bool = &DUMMY.c;
(c as usize) - (base as usize)
});
fn main() {
println!("{}", *OFFSET_C);
}
If you must have this at compile time, you can place similar code into a build script and write out a Rust source file with the offsets. However, that will span multiple compiler invocations, so you are relying on the struct layout not changing between those invocations. Using something with a known representation would reduce that risk.
See also:
How do I create a global, mutable singleton?
How to create a static string at compile time

How to expose a Rust `Vec<T>` to FFI?

I'm trying to construct a pair of elements:
array: *mut T
array_len: usize
array is intended to own the data
However, Box::into_raw will return *mut [T]. I cannot find any info on converting raw pointers to slices. What is its layout in memory? How do I use it from C? Should I convert to *mut T? If so, how?
If you just want some C function to mutably borrow the Vec, you can do it like this:
extern "C" {
fn some_c_function(ptr: *mut i32, len: ffi::size_t);
}
fn safe_wrapper(a: &mut [i32]) {
unsafe {
some_c_function(a.as_mut_ptr(), a.len() as ffi::size_t);
}
}
Of course, the C function shouldn't store this pointer somewhere else because that would break aliasing assumptions.
If you want to "pass ownership" of the data to C code, you'd do something like this:
use std::mem;
extern "C" {
fn c_sink(ptr: *mut i32, len: ffi::size_t);
}
fn sink_wrapper(mut vec: Vec<i32>) {
vec.shrink_to_fit();
assert!(vec.len() == vec.capacity());
let ptr = vec.as_mut_ptr();
let len = vec.len();
mem::forget(vec); // prevent deallocation in Rust
// The array is still there but no Rust object
// feels responsible. We only have ptr/len now
// to reach it.
unsafe {
c_sink(ptr, len as ffi::size_t);
}
}
Here, the C function "takes ownership" in the sense that we expect it to eventually return the pointer and length to Rust, for example, by calling a Rust function to deallocate it:
#[no_mangle]
/// This is intended for the C code to call for deallocating the
/// Rust-allocated i32 array.
unsafe extern "C" fn deallocate_rust_buffer(ptr: *mut i32, len: ffi::size_t) {
let len = len as usize;
drop(Vec::from_raw_parts(ptr, len, len));
}
Because Vec::from_raw_parts expects three parameters, a pointer, a size and a capacity, we either have to keep track of the capacity as well somehow, or we use Vec's shrink_to_fit before passing the pointer and length to the C function. This might involve a reallocation, though.
You could use [T]::as_mut_ptr to obtain the *mut T pointer directly from Vec<T>, Box<[T]> or any other DerefMut-to-slice types.
use std::mem;
let mut boxed_slice: Box<[T]> = vector.into_boxed_slice();
let array: *mut T = boxed_slice.as_mut_ptr();
let array_len: usize = boxed_slice.len();
// Prevent the slice from being destroyed (Leak the memory).
mem::forget(boxed_slice);

How to convert a *const pointer into a Vec to correctly drop it?

After asking how I should go about freeing memory across the FFI boundary, someone on the Rust reddit suggested that rather than wrapping my structs in a Box, I could use Vec::from_raw_parts to construct a vector from the following struct, and that this could be safely dropped:
#[repr(C)]
pub struct Array {
data: *const c_void,
len: libc::size_t,
}
However, from_raw_parts seems to require *mut _ data, so I'm not sure how to proceed…
The very short answer is self.data as *mut u8. But, let's talk more details...
First, words of warning:
Do not use Vec::from_raw_parts unless the pointer came from a Vec originally. There is no guarantee that an arbitrary pointer will be compatible with a Vec and you are likely to create giant holes in your program if you proceed.
Do not free a pointer that you don't own. Doing so leads to double frees, which will blow other large holes in your program.
You need to know the capacity of the vector before you can reconstruct it. Your example struct only contains a len. This is only acceptable if the len and capacity are equal.
Now, let's see if I can follow my own rules...
extern crate libc;
use std::mem;
#[repr(C)]
pub struct Array {
data: *const libc::c_void,
len: libc::size_t,
}
// Note that both of these methods should probably be implementations
// of the `From` trait to allow them to participate in more places.
impl Array {
fn from_vec(mut v: Vec<u8>) -> Array {
v.shrink_to_fit(); // ensure capacity == size
let a = Array {
data: v.as_ptr() as *const libc::c_void,
len: v.len(),
};
mem::forget(v);
a
}
fn into_vec(self) -> Vec<u8> {
unsafe { Vec::from_raw_parts(self.data as *mut u8, self.len, self.len) }
}
}
fn main() {
let v = vec![1, 2, 3];
let a = Array::from_vec(v);
let v = a.into_vec();
println!("{:?}", v);
}
Note that we don't have to do any explicit dropping of the Vec because the normal Drop implementation of Vec comes into play. We just have to make sure that we construct a Vec properly.

Resources