How to copy a row of pixels in an i32 slice into an existing slice of pixels in an [u8] slice ?
Both slices are in the same memory layout (i.e. RGBA) but I don't know the unsafe syntax to copy one efficiently into the other. In C it would just be a memcpy().
You can flat_map the byte representation of each i32 into a Vec<u8>:
fn main() {
let pixels: &[i32] = &[-16776961, 16711935, 65535, -1];
let bytes: Vec<u8> = pixels
.iter()
.flat_map(|e| e.to_ne_bytes())
.collect();
println!("{bytes:?}");
}
There are different ways to handle the endianess of the system, I left to_ne_bytes to preserve the native order, but there are also to_le_bytes and to_be_bytes if that is something that needs to be controlled.
Alternatively, if you know the size of your pixel buffer ahead of time, you can use an unsafe transmute:
const BUF_LEN: usize = 4; // this is your buffer length
fn main() {
let pixels: [i32; BUF_LEN] = [-16776961, 16711935, 65535, -1];
let bytes = unsafe {
std::mem::transmute::<[i32; BUF_LEN], [u8; BUF_LEN * 4]>(pixels)
};
println!("{bytes:?}");
}
Assuming that you in fact do not need any byte reordering, the bytemuck library is the tool to use here, as it allows you to write the i32 to u8 reinterpretation without needing to consider safety (because bytemuck has checked it for you).
Specifically, bytemuck::cast_slice() will allow converting &[i32] to &[u8].
(In general, the function may panic if there is an alignment or size problem, but there never can be such a problem when converting to u8 or any other one-byte type.)
Related
It is a common pattern to see this 'shortcut' code in rust:
unsafe fn any_as_u8_slice<T: Sized>(p: &T) -> &[u8] {
::std::slice::from_raw_parts(
(p as *const T) as *const u8,
::std::mem::size_of::<T>(),
)
}
ie. Given a struct, unsafely convert the underlying pointer to &[u8] to read the bytes.
However, is it valid to take the same approach when using Vec<T>?
For example, this appears to work:
use std::mem::size_of;
use std::slice::from_raw_parts;
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct Point {
pub x: u8,
pub y: u8,
pub z: u8,
}
fn as_bytes(data: &[Point]) -> &[u8] {
unsafe {
let raw_pointer = data.as_ptr();
from_raw_parts(raw_pointer as *const u8, size_of::<Point>() * data.len())
}
}
fn main() {
let points = vec![Point{x: 0u8, y: 1u8, z: 2u8}, Point{x: 3u8, y: 4u8, z: 5u8}];
let slice = points.as_slice();
println!("{:?}", slice);
let bytes = as_bytes(slice);
println!("{:?}", bytes);
assert!(bytes.len() == 6);
assert!(bytes[0] == 0u8);
assert!(bytes[1] == 1u8);
assert!(bytes[2] == 2u8);
assert!(bytes[3] == 3u8);
assert!(bytes[4] == 4u8);
assert!(bytes[5] == 5u8);
}
...but is it reliable to assume that Vec<T> is represented as a single contiguous block of data this way?
The documentation on https://doc.rust-lang.org/std/vec/struct.Vec.html#capacity-and-reallocation says:
If a Vec has allocated memory, then the memory it points to is on the heap (as defined by the allocator Rust is configured to use by default), and its pointer points to len initialized, contiguous elements in order (what you would see if you coerced it to a slice), followed by capacity-len logically uninitialized, contiguous elements.
...but I'm not really sure if I understand what it means. Does this actually mean that for Vec<T> the underlying pointer is to a block of memory of length size_of::<T> * length of the Vec?
Yes, a Vec<T> can be made into something that can be treated as a pointer to a block of memory of length std::mem::size_of::<T>() times the length of Vec.
There is one caveat, as what you are actually interested in is the slice of T, which the Vec can provide; the Vec itself should be considered an implementation detail. Besides that:
A Vec<T> can deref to a slice [T]. Take that slice.
The Rust Reference defines that a slice has the same layout as the section of the Array they slice. So when we deref from a Vec<T> to a [T], this slice of length n is guaranteed to have the same memory layout as an array [T; n].
The Rust References defines the memory layout of an Array:
Arrays are laid out so that the nth element of the array is offset
from the start of the array by n * the size of the type bytes. An
array of [T; n] has a size of size_of::<T>() * n and the same
alignment of T.
We know n (from [T]) and we know "the size of the type bytes" (via mem::size_of<T>()). Since all members of an array must be fully initialized at all times, and given the two sentences from the paragraph above, we know it is safe to access all bytes up until mem::size_of<T>() * length of Vec (actually length of slice, which introduces the array memory layout rule).
To make use of all that, you should make sure that you get a slice of the Vec first, use as_ptr() on the slice, and cast the raw pointer you get. This ensures the sequence of definitions as above. Your fn as_bytes(data: &[Point]) -> &[u8] is exactly correct.
This question already has answers here:
How do I convert a Vec<T> to a Vec<U> without copying the vector?
(2 answers)
Closed 3 years ago.
Is there a better way to cast Vec<i8> to Vec<u8> in Rust except for these two?
creating a copy by mapping and casting every entry
using std::transmute
The (1) is slow, the (2) is "transmute should be the absolute last resort" according to the docs.
A bit of background maybe: I'm getting a Vec<i8> from the unsafe gl::GetShaderInfoLog() call and want to create a string from this vector of chars by using String::from_utf8().
The other answers provide excellent solutions for the underlying problem of creating a string from Vec<i8>. To answer the question as posed, creating a Vec<u8> from data in a Vec<i8> can be done without copying or transmuting the vector. As pointed out by #trentcl, transmuting the vector directly constitutes undefined behavior because Vec is allowed to have different layout for different types.
The correct (though still requiring the use of unsafe) way to transfer a vector's data without copying it is:
obtain the *mut i8 pointer to the data in the vector, along with its length and capacity
leak the original vector to prevent it from freeing the data
use Vec::from_raw_parts to build a new vector, giving it the pointer cast to *mut u8 - this is the unsafe part, because we are vouching that the pointer contains valid and initialized data, and that it is not in use by other objects, and so on.
This is not UB because the new Vec is given the pointer of the correct type from the start. Code (playground):
fn vec_i8_into_u8(v: Vec<i8>) -> Vec<u8> {
// ideally we'd use Vec::into_raw_parts, but it's unstable,
// so we have to do it manually:
// first, make sure v's destructor doesn't free the data
// it thinks it owns when it goes out of scope
let mut v = std::mem::ManuallyDrop::new(v);
// then, pick apart the existing Vec
let p = v.as_mut_ptr();
let len = v.len();
let cap = v.capacity();
// finally, adopt the data into a new Vec
unsafe { Vec::from_raw_parts(p as *mut u8, len, cap) }
}
fn main() {
let v = vec![-1i8, 2, 3];
assert!(vec_i8_into_u8(v) == vec![255u8, 2, 3]);
}
transmute on a Vec is always, 100% wrong, causing undefined behavior, because the layout of Vec is not specified. However, as the page you linked also mentions, you can use raw pointers and Vec::from_raw_parts to perform this correctly. user4815162342's answer shows how.
(std::mem::transmute is the only item in the Rust standard library whose documentation consists mostly of suggestions for how not to use it. Take that how you will.)
However, in this case, from_raw_parts is also unnecessary. The best way to deal with C strings in Rust is with the wrappers in std::ffi, CStr and CString. There may be better ways to work this in to your real code, but here's one way you could use CStr to borrow a Vec<c_char> as a &str:
const BUF_SIZE: usize = 1000;
let mut info_log: Vec<c_char> = vec![0; BUF_SIZE];
let mut len: usize;
unsafe {
gl::GetShaderInfoLog(shader, BUF_SIZE, &mut len, info_log.as_mut_ptr());
}
let log = Cstr::from_bytes_with_nul(info_log[..len + 1])
.expect("Slice must be nul terminated and contain no nul bytes")
.to_str()
.expect("Slice must be valid UTF-8 text");
Notice there is no unsafe code except to call the FFI function; you could also use with_capacity + set_len (as in wasmup's answer) to skip initializing the Vec to 1000 zeros, and use from_bytes_with_nul_unchecked to skip checking the validity of the returned string.
See this:
fn get_compilation_log(&self) -> String {
let mut len = 0;
unsafe { gl::GetShaderiv(self.id, gl::INFO_LOG_LENGTH, &mut len) };
assert!(len > 0);
let mut buf = Vec::with_capacity(len as usize);
let buf_ptr = buf.as_mut_ptr() as *mut gl::types::GLchar;
unsafe {
gl::GetShaderInfoLog(self.id, len, std::ptr::null_mut(), buf_ptr);
buf.set_len(len as usize);
};
match String::from_utf8(buf) {
Ok(log) => log,
Err(vec) => panic!("Could not convert compilation log from buffer: {}", vec),
}
}
See ffi:
let s = CStr::from_ptr(strz_ptr).to_str().unwrap();
Doc
Using the Piston image crate, I can write an image by feeding it a Vec<u8>, but my actual data is Vec<Rgb<u8>> (because that is a lot easier to deal with, and I want to grow it dynamically).
How can I convert Vec<Rgb<u8>> to Vec<u8>? Rgb<u8> is really [u8; 3]. Does this have to be an unsafe conversion?
The answer depends on whether you are fine with copying the data. If copying is not an issue for you, you can do something like this:
let img: Vec<Rgb<u8>> = ...;
let buf: Vec<u8> = img.iter().flat_map(|rgb| rgb.data.iter()).cloned().collect();
If you want to perform the conversion without copying, though, we first need to make sure that your source and destination types actually have the same memory layout. Rust makes very few guarantees about the memory layout of structs. It currently does not even guarantee that a struct with a single member has the same memory layout as the member itself.
In this particular case, the Rust memory layout is not relevant though, since Rgb is defined as
#[repr(C)]
pub struct Rgb<T: Primitive> {
pub data: [T; 3],
}
The #[repr(C)] attribute specifies that the memory layout of the struct should be the same as an equivalent C struct. The C memory layout is not fully specified in the C standard, but according to the unsafe code guidelines, there are some rules that hold for "most" platforms:
Field order is preserved.
The first field begins at offset 0.
Assuming the struct is not packed, each field's offset is aligned to the ABI-mandated alignment for that field's type, possibly creating unused padding bits.
The total size of the struct is rounded up to its overall alignment.
As pointed out in the comments, the C standard theoretically allows additional padding at the end of the struct. However, the Piston image library itself makes the assumption that a slice of channel data has the same memory layout as the Rgb struct, so if you are on a platform where this assumption does not hold, all bets are off anyway (and I couldnt' find any evidence that such a platform exists).
Rust does guarantee that arrays, slices and vectors are densely packed, and that structs and arrays have an alignment equal to the maximum alignment of their elements. Together with the assumption that the layout of Rgb is as specified by the rules I quotes above, this guarantees that Rgb<u8> is indeed laid out as three consecutive bytes in memory, and that Vec<Rgb<u8>> is indeed a consecutive, densely packed buffer of RGB values, so our conversion is safe. We still need to use unsafe code to write it:
let p = img.as_mut_ptr();
let len = img.len() * mem::size_of::<Rgb<u8>>();
let cap = img.capacity() * mem::size_of::<Rgb<u8>>();
mem::forget(img);
let buf: Vec<u8> = unsafe { Vec::from_raw_parts(p as *mut u8, len, cap) };
If you want to protect against the case that there is additional padding at the end of Rgb, you can check whether size_of::<Rgb<u8>>() is indeed 3. If it is, you can use the unsafe non-copying version, otherwise you have to use the first version above.
You choose the Vec<Rgb<u8>> storage format because it's easier to deal with and you want it to grow dynamically. But as you noticed, there's no guarantee of compatibility of its storage with a Vec<u8>, and no safe conversion.
Why not take the problem the other way and build a convenient facade for a Vec<u8> ?
type Rgb = [u8; 3];
#[derive(Debug)]
struct Img(Vec<u8>);
impl Img {
fn new() -> Img {
Img(Vec::new())
}
fn push(&mut self, rgb: &Rgb) {
self.0.push(rgb[0]);
self.0.push(rgb[1]);
self.0.push(rgb[2]);
}
// other convenient methods
}
fn main() {
let mut img = Img::new();
let rgb : Rgb = [1, 2, 3];
img.push(&rgb);
img.push(&rgb);
println!("{:?}", img);
}
I want to convert arrays.
Example:
func()-> *mut *mut f32;
...
let buffer = func();
for n in 0..48000 {
buffer[0][n] = 1.0;
buffer[1][n] = 3.0;
}
In Rust &[T]/&mut [T] is called a slice. A slice is not an array; it is a pointer to the beginning of an array and the number of items in this array. Therefore, to create &mut [T] out of *mut T, you need to known the length of the array behind the pointer.
*mut *mut T looks like a C implementation of a 2D, possibly jagged, array, i.e. an array of arrays (this is different from a contiguous 2D array, as you probably know). There is no free way to convert it to &mut [&mut [T]], because, as I said before, *mut T is one pointer-sized number, while &mut [T] is two pointer-sized numbers. So you can't, for example, transmute *mut T to &mut [T], it would be a size mismatch. Therefore, you can't simply transform *mut *mut f32 to &mut [&mut [f32]] because of the layout mismatch.
In order to safely work with numbers stored in *mut *mut f32, you need, first, determine the length of the outer array and lengths of all of the inner arrays. For simplicity, let's consider that they are all known statically:
const ROWS: usize = 48000;
const COLUMNS: usize = 48000;
Now, since you know the length, you can convert the outer pointer to a slice of raw pointers:
use std::slice;
let buffer: *mut *mut f32 = func();
let buf_slice: &mut [*mut f32] = unsafe {
slice::from_raw_parts_mut(buffer, ROWS);
};
Now you need to go through this slice and convert each item to a slice, collecting the results into a vector:
let matrix: Vec<&mut [f32]> = buf_slice.iter_mut()
.map(|p| unsafe { slice::from_raw_parts_mut(p, COLUMNS) })
.collect();
And now you can indeed access your buffer by indices:
for n in 0..COLUMNS {
matrix[0][n] = 1.0;
matrix[1][n] = 3.0;
}
(I have put explicit types on bindings for readability, most of them in fact can be omitted)
So, there are two main things to consider when converting raw pointers to slices:
you need to know exact length of the array to create a slice from it; if you know it, you can use slice::from_raw_parts() or slice::from_raw_parts_mut();
if you are converting nested arrays, you need to rebuild each layer of the indirection because pointers have different size than slices.
And naturally, you have to track who is the owner of the buffer and when it will be freed, otherwise you can easily get a slice pointing to a buffer which does not exist anymore. This is unsafe, after all.
Since your array seems to be an array of pointers to an array of 48000 f32s, you can simply use fixed size arrays ([T; N]) instead of slices ([T]):
fn func() -> *mut *mut f32 { unimplemented!() }
fn main() {
let buffer = func();
let buffer: &mut [&mut [f32; 48000]; 2] = unsafe { std::mem::transmute(buffer) };
for n in 0..48000 {
buffer[0][n] = 1.0;
buffer[1][n] = 3.0;
}
}
I have something that is Read; currently it's a File. I want to read a number of bytes from it that is only known at runtime (length prefix in a binary data structure).
So I tried this:
let mut vec = Vec::with_capacity(length);
let count = file.read(vec.as_mut_slice()).unwrap();
but count is zero because vec.as_mut_slice().len() is zero as well.
[0u8;length] of course doesn't work because the size must be known at compile time.
I wanted to do
let mut vec = Vec::with_capacity(length);
let count = file.take(length).read_to_end(vec).unwrap();
but take's receiver parameter is a T and I only have &mut T (and I'm not really sure why it's needed anyway).
I guess I can replace File with BufReader and dance around with fill_buf and consume which sounds complicated enough but I still wonder: Have I overlooked something?
Like the Iterator adaptors, the IO adaptors take self by value to be as efficient as possible. Also like the Iterator adaptors, a mutable reference to a Read is also a Read.
To solve your problem, you just need Read::by_ref:
use std::io::Read;
use std::fs::File;
fn main() {
let mut file = File::open("/etc/hosts").unwrap();
let length = 5;
let mut vec = Vec::with_capacity(length);
file.by_ref().take(length as u64).read_to_end(&mut vec).unwrap();
let mut the_rest = Vec::new();
file.read_to_end(&mut the_rest).unwrap();
}
1. Fill-this-vector version
Your first solution is close to work. You identified the problem but did not try to solve it! The problem is that whatever the capacity of the vector, it is still empty (vec.len() == 0). Instead, you could actually fill it with empty elements, such as:
let mut vec = vec![0u8; length];
The following full code works:
#![feature(convert)] // needed for `as_mut_slice()` as of 2015-07-19
use std::fs::File;
use std::io::Read;
fn main() {
let mut file = File::open("/usr/share/dict/words").unwrap();
let length: usize = 100;
let mut vec = vec![0u8; length];
let count = file.read(vec.as_mut_slice()).unwrap();
println!("read {} bytes.", count);
println!("vec = {:?}", vec);
}
Of course, you still have to check whether count == length, and read more data into the buffer if that's not the case.
2. Iterator version
Your second solution is better because you won't have to check how many bytes have been read, and you won't have to re-read in case count != length. You need to use the bytes() function on the Read trait (implemented by File). This transform the file into a stream (i.e an iterator). Because errors can still happen, you don't get an Iterator<Item=u8> but an Iterator<Item=Result<u8, R::Err>>. Hence you need to deal with failures explicitly within the iterator. We're going to use unwrap() here for simplicity:
use std::fs::File;
use std::io::Read;
fn main() {
let file = File::open("/usr/share/dict/words").unwrap();
let length: usize = 100;
let vec: Vec<u8> = file
.bytes()
.take(length)
.map(|r: Result<u8, _>| r.unwrap()) // or deal explicitly with failure!
.collect();
println!("vec = {:?}", vec);
}
You can always use a bit of unsafe to create a vector of uninitialized memory. It is perfectly safe to do with primitive types:
let mut v: Vec<u8> = Vec::with_capacity(length);
unsafe { v.set_len(length); }
let count = file.read(vec.as_mut_slice()).unwrap();
This way, vec.len() will be set to its capacity, and all bytes in it will be uninitialized (likely zeros, but possibly some garbage). This way you can avoid zeroing the memory, which is pretty safe for primitive types.
Note that read() method on Read is not guaranteed to fill the whole slice. It is possible for it to return with number of bytes less than the slice length. There are several RFCs on adding methods to fill this gap, for example, this one.