How to convert 'struct' to '&[u8]'? - rust

I want to send my struct via a TcpStream. I could send String or u8, but I can not send an arbitrary struct. For example:
struct MyStruct {
id: u8,
data: [u8; 1024],
}
let my_struct = MyStruct { id: 0, data: [1; 1024] };
let bytes: &[u8] = convert_struct(my_struct); // how??
tcp_stream.write(bytes);
After receiving the data, I want to convert &[u8] back to MyStruct. How can I convert between these two representations?
I know Rust has a JSON module for serializing data, but I don't want to use JSON because I want to send data as fast and small as possible, so I want to no or very small overhead.

A correctly sized struct as zero-copied bytes can be done using stdlib and a generic function.
In the example below there there is a reusable function called any_as_u8_slice instead of convert_struct, since this is a utility to wrap cast and slice creation.
Note that the question asks about converting, this example creates a read-only slice, so has the advantage of not needing to copy the memory.
Heres a working example based on the question:
unsafe fn any_as_u8_slice<T: Sized>(p: &T) -> &[u8] {
::core::slice::from_raw_parts(
(p as *const T) as *const u8,
::core::mem::size_of::<T>(),
)
}
fn main() {
struct MyStruct {
id: u8,
data: [u8; 1024],
}
let my_struct = MyStruct { id: 0, data: [1; 1024] };
let bytes: &[u8] = unsafe { any_as_u8_slice(&my_struct) };
// tcp_stream.write(bytes);
println!("{:?}", bytes);
}
Note 1) even though 3rd party crates might be better in some cases, this is such a primitive operation that its useful to know how to do in Rust.
Note 2) at time of writing (Rust 1.15), there is no support for const functions. Once there is, it will be possible to cast into a fixed sized array instead of a slice.
Note 3) the any_as_u8_slice function is marked unsafe because any padding bytes in the struct may be uninitialized memory (giving undefined behavior).
If there were a way to ensure input arguments used only structs which were #[repr(packed)], then it could be safe.
Otherwise the function is fairly safe since it prevents buffer over-run since the output is read-only, fixed number of bytes, and its lifetime is bound to the input.If you wanted a version that returned a &mut [u8], that would be quite dangerous since modifying could easily create inconsistent/corrupt data.

(Shamelessly stolen and adapted from Renato Zannon's comment on a similar question)
Perhaps a solution like bincode would suit your case? Here's a working excerpt:
Cargo.toml
[package]
name = "foo"
version = "0.1.0"
authors = ["An Devloper <an.devloper#example.com>"]
edition = "2018"
[dependencies]
bincode = "1.0"
serde = { version = "1.0", features = ["derive"] }
main.rs
use serde::{Deserialize, Serialize};
use std::fs::File;
#[derive(Serialize, Deserialize)]
struct A {
id: i8,
key: i16,
name: String,
values: Vec<String>,
}
fn main() {
let a = A {
id: 42,
key: 1337,
name: "Hello world".to_string(),
values: vec!["alpha".to_string(), "beta".to_string()],
};
// Encode to something implementing `Write`
let mut f = File::create("/tmp/output.bin").unwrap();
bincode::serialize_into(&mut f, &a).unwrap();
// Or just to a buffer
let bytes = bincode::serialize(&a).unwrap();
println!("{:?}", bytes);
}
You would then be able to send the bytes wherever you want. I assume you are already aware of the issues with naively sending bytes around (like potential endianness issues or versioning), but I'll mention them just in case ^_^.

Related

How to write a Vec of structs to a file? [duplicate]

I want to send my struct via a TcpStream. I could send String or u8, but I can not send an arbitrary struct. For example:
struct MyStruct {
id: u8,
data: [u8; 1024],
}
let my_struct = MyStruct { id: 0, data: [1; 1024] };
let bytes: &[u8] = convert_struct(my_struct); // how??
tcp_stream.write(bytes);
After receiving the data, I want to convert &[u8] back to MyStruct. How can I convert between these two representations?
I know Rust has a JSON module for serializing data, but I don't want to use JSON because I want to send data as fast and small as possible, so I want to no or very small overhead.
A correctly sized struct as zero-copied bytes can be done using stdlib and a generic function.
In the example below there there is a reusable function called any_as_u8_slice instead of convert_struct, since this is a utility to wrap cast and slice creation.
Note that the question asks about converting, this example creates a read-only slice, so has the advantage of not needing to copy the memory.
Heres a working example based on the question:
unsafe fn any_as_u8_slice<T: Sized>(p: &T) -> &[u8] {
::core::slice::from_raw_parts(
(p as *const T) as *const u8,
::core::mem::size_of::<T>(),
)
}
fn main() {
struct MyStruct {
id: u8,
data: [u8; 1024],
}
let my_struct = MyStruct { id: 0, data: [1; 1024] };
let bytes: &[u8] = unsafe { any_as_u8_slice(&my_struct) };
// tcp_stream.write(bytes);
println!("{:?}", bytes);
}
Note 1) even though 3rd party crates might be better in some cases, this is such a primitive operation that its useful to know how to do in Rust.
Note 2) at time of writing (Rust 1.15), there is no support for const functions. Once there is, it will be possible to cast into a fixed sized array instead of a slice.
Note 3) the any_as_u8_slice function is marked unsafe because any padding bytes in the struct may be uninitialized memory (giving undefined behavior).
If there were a way to ensure input arguments used only structs which were #[repr(packed)], then it could be safe.
Otherwise the function is fairly safe since it prevents buffer over-run since the output is read-only, fixed number of bytes, and its lifetime is bound to the input.If you wanted a version that returned a &mut [u8], that would be quite dangerous since modifying could easily create inconsistent/corrupt data.
(Shamelessly stolen and adapted from Renato Zannon's comment on a similar question)
Perhaps a solution like bincode would suit your case? Here's a working excerpt:
Cargo.toml
[package]
name = "foo"
version = "0.1.0"
authors = ["An Devloper <an.devloper#example.com>"]
edition = "2018"
[dependencies]
bincode = "1.0"
serde = { version = "1.0", features = ["derive"] }
main.rs
use serde::{Deserialize, Serialize};
use std::fs::File;
#[derive(Serialize, Deserialize)]
struct A {
id: i8,
key: i16,
name: String,
values: Vec<String>,
}
fn main() {
let a = A {
id: 42,
key: 1337,
name: "Hello world".to_string(),
values: vec!["alpha".to_string(), "beta".to_string()],
};
// Encode to something implementing `Write`
let mut f = File::create("/tmp/output.bin").unwrap();
bincode::serialize_into(&mut f, &a).unwrap();
// Or just to a buffer
let bytes = bincode::serialize(&a).unwrap();
println!("{:?}", bytes);
}
You would then be able to send the bytes wherever you want. I assume you are already aware of the issues with naively sending bytes around (like potential endianness issues or versioning), but I'll mention them just in case ^_^.

How do I allocate a Vec<u8> that is aligned to the size of the cache line?

I need to allocate a buffer for reading from a File, but this buffer must be aligned to the size of the cache line (64 bytes). I am looking for a function somewhat like this for Vec:
pub fn with_capacity_and_aligned(capacity: usize, alignment: u8) -> Vec<T>
which would give me the 64 byte alignment that I need. This obviously doesn't exist, but there might be some equivalences (i.e. "hacks") that I don't know about.
So, when I use this function (which will give me the desired alignment), I could write this code safely:
#[repr(C)]
struct Header {
magic: u32,
some_data1: u32,
some_data2: u64,
}
let cache_line_size = 64; // bytes
let buffer: Vec<u8> = Vec::<u8>::with_capacity_and_alignment(some_size, cache_line_size);
match file.read_to_end(&mut buffer) {
Ok(_) => {
let header: Header = {
// and since the buffer is aligned to 64 bytes, I wont get any SEGFAULT
unsafe { transmute(buffer[0..(size_of::<Header>())]) }
};
}
}
and not get any panics because of alignment issues (like launching an instruction).
You can enforce the alignment of a type to a certain size using #[repr(align(...))]. We also use repr(C) to ensure that this type has the same memory layout as an array of bytes.
You can then create a vector of the aligned type and transform it to a vector of appropriate type:
use std::mem;
#[repr(C, align(64))]
struct AlignToSixtyFour([u8; 64]);
unsafe fn aligned_vec(n_bytes: usize) -> Vec<u8> {
// Lazy math to ensure we always have enough.
let n_units = (n_bytes / mem::size_of::<AlignToSixtyFour>()) + 1;
let mut aligned: Vec<AlignToSixtyFour> = Vec::with_capacity(n_units);
let ptr = aligned.as_mut_ptr();
let len_units = aligned.len();
let cap_units = aligned.capacity();
mem::forget(aligned);
Vec::from_raw_parts(
ptr as *mut u8,
len_units * mem::size_of::<AlignToSixtyFour>(),
cap_units * mem::size_of::<AlignToSixtyFour>(),
)
}
There are no guarantees that the Vec<u8> will remain aligned if you reallocate the data. This means that you cannot reallocate so you will need to know how big to allocate up front.
The function is unsafe for the same reason. When the type is dropped, the memory must be back to its original allocation, but this function cannot control that.
Thanks to BurntSushi5 for corrections and additions.
See also:
How can I align a struct to a specifed byte boundary?
Align struct to cache lines in Rust
How do I convert a Vec<T> to a Vec<U> without copying the vector?
Because of the limitations and unsafety above, another potential idea would be to allocate a big-enough buffer (maybe with some wiggle room), and then use align_to to get a properly aligned chunk. You could use the same AlignToSixtyFour type as above, and then convert the &[AlignToSixtyFour] into a &[u8] with similar logic.
This technique could be used to give out (optionally mutable) slices that are aligned. Since they are slices, you don't have to worry about the user reallocating or dropping them. This would allow you to wrap it up in a nicer type.
All that being said, I think that relying on alignment here is inappropriate for your actual goal of reading a struct from a file. Simply read the bytes (u32, u32, u64) and build the struct:
use byteorder::{LittleEndian, ReadBytesExt}; // 1.3.4
use std::{fs::File, io};
#[derive(Debug)]
struct Header {
magic: u32,
some_data1: u32,
some_data2: u64,
}
impl Header {
fn from_reader(mut reader: impl io::Read) -> Result<Self, Box<dyn std::error::Error>> {
let magic = reader.read_u32::<LittleEndian>()?;
let some_data1 = reader.read_u32::<LittleEndian>()?;
let some_data2 = reader.read_u64::<LittleEndian>()?;
Ok(Self {
magic,
some_data1,
some_data2,
})
}
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut f = File::open("/etc/hosts")?;
let header = Header::from_reader(&mut f)?;
println!("{:?}", header);
Ok(())
}
See also:
How to read a struct from a file in Rust?
Is this the most natural way to read structs from a binary file?
Can I take a byte array and deserialize it into a struct?
Transmuting u8 buffer to struct in Rust

How do I specify the calling convention for a C callback that uses &mut [u8] and u32? [duplicate]

Can I somehow get an array from std::ptr::read?
I'd like to do something close to:
let mut v: Vec<u8> = ...
let view = &some_struct as *const _ as *const u8;
v.write(&std::ptr::read<[u8, ..30]>(view));
Which is not valid in this form (can't use the array signature).
If you want to obtain a slice from a raw pointer, use std::slice::from_raw_parts():
let slice = unsafe { std::slice::from_raw_parts(some_pointer, count_of_items) };
If you want to obtain a mutable slice from a raw pointer, use std::slice::from_raw_parts_mut():
let slice = unsafe { std::slice::from_raw_parts_mut(some_pointer, count_of_items) };
Are you sure you want read()? Without special care it will cause disaster on structs with destructors. Also, read() does not read a value of some specified type from a pointer to bytes; it reads exactly one value of the type behind the pointer (e.g. if it is *const u8 then read() will read one byte) and returns it.
If you only want to write byte contents of a structure into a vector, you can obtain a slice from the raw pointer:
use std::mem;
use std::io::Write;
struct SomeStruct {
a: i32,
}
fn main() {
let some_struct = SomeStruct { a: 32 };
let mut v: Vec<u8> = Vec::new();
let view = &some_struct as *const _ as *const u8;
let slice = unsafe { std::slice::from_raw_parts(view, mem::size_of::<SomeStruct>()) };
v.write(slice).expect("Unable to write");
println!("{:?}", v);
}
This makes your code platform-dependent and even compiler-dependent: if you use types of variable size (e.g. isize/usize) in your struct or if you don't use #[repr(C)], the data you wrote into the vector is likely to be read as garbage on another machine (and even #[repr(C)] may not lift this problem sometimes, as far as I remember).

How do I encode or pack a struct into bytes without using an external library? [duplicate]

I want to send my struct via a TcpStream. I could send String or u8, but I can not send an arbitrary struct. For example:
struct MyStruct {
id: u8,
data: [u8; 1024],
}
let my_struct = MyStruct { id: 0, data: [1; 1024] };
let bytes: &[u8] = convert_struct(my_struct); // how??
tcp_stream.write(bytes);
After receiving the data, I want to convert &[u8] back to MyStruct. How can I convert between these two representations?
I know Rust has a JSON module for serializing data, but I don't want to use JSON because I want to send data as fast and small as possible, so I want to no or very small overhead.
A correctly sized struct as zero-copied bytes can be done using stdlib and a generic function.
In the example below there there is a reusable function called any_as_u8_slice instead of convert_struct, since this is a utility to wrap cast and slice creation.
Note that the question asks about converting, this example creates a read-only slice, so has the advantage of not needing to copy the memory.
Heres a working example based on the question:
unsafe fn any_as_u8_slice<T: Sized>(p: &T) -> &[u8] {
::core::slice::from_raw_parts(
(p as *const T) as *const u8,
::core::mem::size_of::<T>(),
)
}
fn main() {
struct MyStruct {
id: u8,
data: [u8; 1024],
}
let my_struct = MyStruct { id: 0, data: [1; 1024] };
let bytes: &[u8] = unsafe { any_as_u8_slice(&my_struct) };
// tcp_stream.write(bytes);
println!("{:?}", bytes);
}
Note 1) even though 3rd party crates might be better in some cases, this is such a primitive operation that its useful to know how to do in Rust.
Note 2) at time of writing (Rust 1.15), there is no support for const functions. Once there is, it will be possible to cast into a fixed sized array instead of a slice.
Note 3) the any_as_u8_slice function is marked unsafe because any padding bytes in the struct may be uninitialized memory (giving undefined behavior).
If there were a way to ensure input arguments used only structs which were #[repr(packed)], then it could be safe.
Otherwise the function is fairly safe since it prevents buffer over-run since the output is read-only, fixed number of bytes, and its lifetime is bound to the input.If you wanted a version that returned a &mut [u8], that would be quite dangerous since modifying could easily create inconsistent/corrupt data.
(Shamelessly stolen and adapted from Renato Zannon's comment on a similar question)
Perhaps a solution like bincode would suit your case? Here's a working excerpt:
Cargo.toml
[package]
name = "foo"
version = "0.1.0"
authors = ["An Devloper <an.devloper#example.com>"]
edition = "2018"
[dependencies]
bincode = "1.0"
serde = { version = "1.0", features = ["derive"] }
main.rs
use serde::{Deserialize, Serialize};
use std::fs::File;
#[derive(Serialize, Deserialize)]
struct A {
id: i8,
key: i16,
name: String,
values: Vec<String>,
}
fn main() {
let a = A {
id: 42,
key: 1337,
name: "Hello world".to_string(),
values: vec!["alpha".to_string(), "beta".to_string()],
};
// Encode to something implementing `Write`
let mut f = File::create("/tmp/output.bin").unwrap();
bincode::serialize_into(&mut f, &a).unwrap();
// Or just to a buffer
let bytes = bincode::serialize(&a).unwrap();
println!("{:?}", bytes);
}
You would then be able to send the bytes wherever you want. I assume you are already aware of the issues with naively sending bytes around (like potential endianness issues or versioning), but I'll mention them just in case ^_^.

How can I get an array or a slice from a raw pointer?

Can I somehow get an array from std::ptr::read?
I'd like to do something close to:
let mut v: Vec<u8> = ...
let view = &some_struct as *const _ as *const u8;
v.write(&std::ptr::read<[u8, ..30]>(view));
Which is not valid in this form (can't use the array signature).
If you want to obtain a slice from a raw pointer, use std::slice::from_raw_parts():
let slice = unsafe { std::slice::from_raw_parts(some_pointer, count_of_items) };
If you want to obtain a mutable slice from a raw pointer, use std::slice::from_raw_parts_mut():
let slice = unsafe { std::slice::from_raw_parts_mut(some_pointer, count_of_items) };
Are you sure you want read()? Without special care it will cause disaster on structs with destructors. Also, read() does not read a value of some specified type from a pointer to bytes; it reads exactly one value of the type behind the pointer (e.g. if it is *const u8 then read() will read one byte) and returns it.
If you only want to write byte contents of a structure into a vector, you can obtain a slice from the raw pointer:
use std::mem;
use std::io::Write;
struct SomeStruct {
a: i32,
}
fn main() {
let some_struct = SomeStruct { a: 32 };
let mut v: Vec<u8> = Vec::new();
let view = &some_struct as *const _ as *const u8;
let slice = unsafe { std::slice::from_raw_parts(view, mem::size_of::<SomeStruct>()) };
v.write(slice).expect("Unable to write");
println!("{:?}", v);
}
This makes your code platform-dependent and even compiler-dependent: if you use types of variable size (e.g. isize/usize) in your struct or if you don't use #[repr(C)], the data you wrote into the vector is likely to be read as garbage on another machine (and even #[repr(C)] may not lift this problem sometimes, as far as I remember).

Resources