What is the best way to parse binary protocols with Rust - rust

Essentially I have a tcp based network protocol to parse.
In C I can just cast some memory to the type that I want. How can I accomplish something similar with Rust.

You can do the same thing in Rust too. You just have to be little careful when you define the structure.
use std::mem;
#[repr(C)]
#[packed]
struct YourProtoHeader {
magic: u8,
len: u32
}
let mut buf = [0u8, ..1024]; // large enough buffer
// read header from some Reader (a socket, perhaps)
reader.read_at_least(mem::size_of::<YourProtoHeader>(), buf.as_mut_slice()).unwrap();
let ptr: *const u8 = buf.as_ptr();
let ptr: *const YourProtoHeader = ptr as *const YourProtoHeader;
let ptr: &YourProtoHeader = unsafe { &*ptr };
println!("Data length: {}", ptr.len);
Unfortunately, I don't know how to specify the buffer to be exactly size_of::<YourProtoHeader>() size; buffer length must be a constant, but size_of() call is technically a function, so Rust complains when I use it in the array initializer. Nevertheless, large enough buffer will work too.
Here we're converting a pointer to the beginning of the buffer to a pointer to your structure. This is the same thing you would do in C. The structure itself should be annotated with #[repr(C)] and #[pack] attributes: the first one disallows possible field reordering, the second disables padding for field alignment.

Related

What is the preferred unsafe way to extend lifetimes which are correct but not provable?

I have a minimal arena allocator which demonstrates the intention, although it isn't optimized for minimizing allocations/deallocations like a true arena would be:
#[derive(Clone)]
pub struct Arena(Arc<Mutex<Vec<Box<[u8]>>>>);
impl Arena {
/// Allocate memory of the given size.
pub fn allocate(&self, size: usize) -> &mut [u8] {
let inner = &mut *self.0.lock().unwrap();
let mut new_mem = vec![0u8; size].into_boxed_slice();
let slice = &mut new_mem[..]; // THIS OBVIOUSLY DOESNT WORK
inner.push(new_mem);
slice
}
}
allocate is the only operation, therefore I know that I can safely take a reference to the memory contained in new_mem with the same lifetime as &self, because I don't provide any operations that would allow the boxed memory to become aliased, and I know because the memory block is boxed, it won't move even if the vector it is stored in has to reallocate to add additional blocks.
There's also no way I know of to safely tell the compiler that the reference to the memory block is safe. Using &mut new_mem[..] fails because the compiler thinks I'm borrowing new_mem while trying to move it into the vector in push. I could invert the order and do push followed by &mut inner.last().unwrap()[..], but that also fails, because the compiler sees that reference as being owned by the mutex guard.
That means that to tell the compiler that this borrow is OK, I need to do something unsafe to create a reference with a longer lifetime than normal borrowing would produce in this case.
I know of two ways to extend this lifetime:
std::mem::transmute:
let slice = {
// Strongly-typed line to make sure we aren't accidentally starting from a pointer to
// the box itself by accident.
let slice: &mut [u8] = &mut new_mem[..];
unsafe { mem::transmute(slice) }
};
Dereferencing a raw pointer:
let slice = {
// Strongly-typed line to make sure we aren't accidentally starting from a pointer to
// the box itself by accident.
let slice: &mut [u8] = &mut new_mem[..];
let ptr: *mut _ = slice;
unsafe { &mut *ptr }
};
Is there any particular advantage to either of these options? For example, are there classes of mistakes that are possible with one option that aren't possible, or are harder to make, with the other? Are there other ways to do this that have different advantages? Or are all options about the same?
For classes of mistakes, I'm particularly wondering whether there are type inference mistakes that can occur in one that aren't possible in the other. Obviously transmute can convert to anything with the same size, and raw pointers allow casting to any type regardless of size, but I wonder if type inference on pointers is more restricted in the absence of an as cast.
How about this?
impl Arena {
/// Allocate memory of the given size.
pub fn allocate(&self, size: usize) -> &mut [u8] {
let inner = &mut *self.0.lock().unwrap();
let mut new_mem = vec![0u8; size].into_boxed_slice();
let mem_ptr: *mut u8 = new_mem.as_mut_ptr();
let slice = unsafe { std::slice::from_raw_parts_mut(mem_ptr, size) };
inner.push(new_mem);
slice
}
}
I don't know the pros and cons, but that's how I would do it.

Map C-like packed data structure to Rust struct

I'm fairly new to Rust and have spent most of my time writing code in C/C++. I have a flask webserver that returns back a packed data structure in the form of length + null-terminated string:
test_data = "Hello there bob!" + "\x00"
test_data = test_data.encode("utf-8")
data = struct.pack("<I", len(test_data ))
data += test_data
return data
In my rust code, I'm using the easy_http_request crate and can successfully get the response back by calling get_from_url_str. What I'm trying to do is map the returned response back to the Test data structure (if possible). I've attempted to use align_to to unsuccessfully get the string data mapped to the structure.
extern crate easy_http_request;
extern crate libc;
use easy_http_request::DefaultHttpRequest;
use libc::c_char;
#[repr(C, packed)]
#[derive(Debug, Clone, Copy)]
struct Test {
a: u32,
b: *const c_char // TODO: What do I put here???
}
fn main() {
let response = DefaultHttpRequest::get_from_url_str("http://localhost:5000/").unwrap().send().unwrap();
let (head, body, _tail) = unsafe { response.body.align_to::<Test>() };
let my_test: Test = body[0];
println!("{}", my_test.a); // Correctly prints '17'
println!("{:?}", my_test.b); // Fails
}
I'm not sure this is possible in Rust. In the response.body I can correctly see the null-terminated string, so I know the data is there. Just unsure if there's a way to map it to a string in the Test structure. There's no reason I need to use a null-terminated string. Ultimately, I'm just trying to map a data structure of size and a string to a Rust struct of the similar types.
It looks like you are confused by two different meanings of pack:
* In Python, pack is a protocol of sorts to serialize data into an array of bytes.
* In Rust, pack is a directive added to a struct to remove padding between members and disable other weirdness.
While they can be use together to make a protocol work, that is not the case, because in your pack you have a variable-length member. And trying to serialize/deserialize a pointer value directly is a very bad idea.
Your packed flask message is basically:
4 bytes litte endian value with the number of bytes in the string.
so many bytes indicated above for the string, encoded in utf-8.
For that you do not need a packed struct. The easiest way is to just read the fields manually, one by one. Something like this (error checking omitted):
use std::convert::TryInto;
let a = i32::from_le_bytes(response[0..4].try_into().unwrap());
let b = std::str::from_utf8(&response[4 .. 4 + a as usize]).unwrap();
Don't use raw pointers, they are unsafe to use and recommended only when there are strong reasons to
get around Rust’s safety guarantees.
At minumum a struct that fits your requirement is something like:
struct Test<'a> {
value: &'a str
}
or a String owned value for avoiding lifetime dependencies.
A reference to &str comprises a len and a pointer (it is not a C-like char * pointer).
By the way, the hard part is not the parsing of the wire protocol but to manage correctly all the possible
decoding errors and avoid unexpected runtime failures in case of buggy or malicious clients.
In order not to reinvent the wheel, an example with the parse combinator nom:
use nom::{
number::complete::le_u32,
bytes::complete::take,
error::ErrorKind,
IResult
};
use easy_http_request::DefaultHttpRequest;
use std::str::from_utf8;
#[derive(Debug, Clone)]
struct Test {
value: String
}
fn decode_len_value(bytes: &[u8]) -> IResult<&[u8], Test> {
let (buffer, len) = le_u32(bytes)?;
// take len-1 bytes because null char (\0) is accounted into len
let (remaining, val) = take(len-1)(buffer)?;
match from_utf8(val) {
Ok(strval) => Ok((remaining, Test {value: strval.to_owned()})),
Err(_) => Err(nom::Err::Error((remaining, ErrorKind::Char)))
}
}
fn main() {
let response = DefaultHttpRequest::get_from_url_str("http://localhost:5000/").unwrap().send().unwrap();
let result = decode_len_value(&response.body);
println!("{:?}", result);
}

How do I allocate a Vec<u8> that is aligned to the size of the cache line?

I need to allocate a buffer for reading from a File, but this buffer must be aligned to the size of the cache line (64 bytes). I am looking for a function somewhat like this for Vec:
pub fn with_capacity_and_aligned(capacity: usize, alignment: u8) -> Vec<T>
which would give me the 64 byte alignment that I need. This obviously doesn't exist, but there might be some equivalences (i.e. "hacks") that I don't know about.
So, when I use this function (which will give me the desired alignment), I could write this code safely:
#[repr(C)]
struct Header {
magic: u32,
some_data1: u32,
some_data2: u64,
}
let cache_line_size = 64; // bytes
let buffer: Vec<u8> = Vec::<u8>::with_capacity_and_alignment(some_size, cache_line_size);
match file.read_to_end(&mut buffer) {
Ok(_) => {
let header: Header = {
// and since the buffer is aligned to 64 bytes, I wont get any SEGFAULT
unsafe { transmute(buffer[0..(size_of::<Header>())]) }
};
}
}
and not get any panics because of alignment issues (like launching an instruction).
You can enforce the alignment of a type to a certain size using #[repr(align(...))]. We also use repr(C) to ensure that this type has the same memory layout as an array of bytes.
You can then create a vector of the aligned type and transform it to a vector of appropriate type:
use std::mem;
#[repr(C, align(64))]
struct AlignToSixtyFour([u8; 64]);
unsafe fn aligned_vec(n_bytes: usize) -> Vec<u8> {
// Lazy math to ensure we always have enough.
let n_units = (n_bytes / mem::size_of::<AlignToSixtyFour>()) + 1;
let mut aligned: Vec<AlignToSixtyFour> = Vec::with_capacity(n_units);
let ptr = aligned.as_mut_ptr();
let len_units = aligned.len();
let cap_units = aligned.capacity();
mem::forget(aligned);
Vec::from_raw_parts(
ptr as *mut u8,
len_units * mem::size_of::<AlignToSixtyFour>(),
cap_units * mem::size_of::<AlignToSixtyFour>(),
)
}
There are no guarantees that the Vec<u8> will remain aligned if you reallocate the data. This means that you cannot reallocate so you will need to know how big to allocate up front.
The function is unsafe for the same reason. When the type is dropped, the memory must be back to its original allocation, but this function cannot control that.
Thanks to BurntSushi5 for corrections and additions.
See also:
How can I align a struct to a specifed byte boundary?
Align struct to cache lines in Rust
How do I convert a Vec<T> to a Vec<U> without copying the vector?
Because of the limitations and unsafety above, another potential idea would be to allocate a big-enough buffer (maybe with some wiggle room), and then use align_to to get a properly aligned chunk. You could use the same AlignToSixtyFour type as above, and then convert the &[AlignToSixtyFour] into a &[u8] with similar logic.
This technique could be used to give out (optionally mutable) slices that are aligned. Since they are slices, you don't have to worry about the user reallocating or dropping them. This would allow you to wrap it up in a nicer type.
All that being said, I think that relying on alignment here is inappropriate for your actual goal of reading a struct from a file. Simply read the bytes (u32, u32, u64) and build the struct:
use byteorder::{LittleEndian, ReadBytesExt}; // 1.3.4
use std::{fs::File, io};
#[derive(Debug)]
struct Header {
magic: u32,
some_data1: u32,
some_data2: u64,
}
impl Header {
fn from_reader(mut reader: impl io::Read) -> Result<Self, Box<dyn std::error::Error>> {
let magic = reader.read_u32::<LittleEndian>()?;
let some_data1 = reader.read_u32::<LittleEndian>()?;
let some_data2 = reader.read_u64::<LittleEndian>()?;
Ok(Self {
magic,
some_data1,
some_data2,
})
}
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut f = File::open("/etc/hosts")?;
let header = Header::from_reader(&mut f)?;
println!("{:?}", header);
Ok(())
}
See also:
How to read a struct from a file in Rust?
Is this the most natural way to read structs from a binary file?
Can I take a byte array and deserialize it into a struct?
Transmuting u8 buffer to struct in Rust

How do I specify the calling convention for a C callback that uses &mut [u8] and u32? [duplicate]

Can I somehow get an array from std::ptr::read?
I'd like to do something close to:
let mut v: Vec<u8> = ...
let view = &some_struct as *const _ as *const u8;
v.write(&std::ptr::read<[u8, ..30]>(view));
Which is not valid in this form (can't use the array signature).
If you want to obtain a slice from a raw pointer, use std::slice::from_raw_parts():
let slice = unsafe { std::slice::from_raw_parts(some_pointer, count_of_items) };
If you want to obtain a mutable slice from a raw pointer, use std::slice::from_raw_parts_mut():
let slice = unsafe { std::slice::from_raw_parts_mut(some_pointer, count_of_items) };
Are you sure you want read()? Without special care it will cause disaster on structs with destructors. Also, read() does not read a value of some specified type from a pointer to bytes; it reads exactly one value of the type behind the pointer (e.g. if it is *const u8 then read() will read one byte) and returns it.
If you only want to write byte contents of a structure into a vector, you can obtain a slice from the raw pointer:
use std::mem;
use std::io::Write;
struct SomeStruct {
a: i32,
}
fn main() {
let some_struct = SomeStruct { a: 32 };
let mut v: Vec<u8> = Vec::new();
let view = &some_struct as *const _ as *const u8;
let slice = unsafe { std::slice::from_raw_parts(view, mem::size_of::<SomeStruct>()) };
v.write(slice).expect("Unable to write");
println!("{:?}", v);
}
This makes your code platform-dependent and even compiler-dependent: if you use types of variable size (e.g. isize/usize) in your struct or if you don't use #[repr(C)], the data you wrote into the vector is likely to be read as garbage on another machine (and even #[repr(C)] may not lift this problem sometimes, as far as I remember).

How can I get an array or a slice from a raw pointer?

Can I somehow get an array from std::ptr::read?
I'd like to do something close to:
let mut v: Vec<u8> = ...
let view = &some_struct as *const _ as *const u8;
v.write(&std::ptr::read<[u8, ..30]>(view));
Which is not valid in this form (can't use the array signature).
If you want to obtain a slice from a raw pointer, use std::slice::from_raw_parts():
let slice = unsafe { std::slice::from_raw_parts(some_pointer, count_of_items) };
If you want to obtain a mutable slice from a raw pointer, use std::slice::from_raw_parts_mut():
let slice = unsafe { std::slice::from_raw_parts_mut(some_pointer, count_of_items) };
Are you sure you want read()? Without special care it will cause disaster on structs with destructors. Also, read() does not read a value of some specified type from a pointer to bytes; it reads exactly one value of the type behind the pointer (e.g. if it is *const u8 then read() will read one byte) and returns it.
If you only want to write byte contents of a structure into a vector, you can obtain a slice from the raw pointer:
use std::mem;
use std::io::Write;
struct SomeStruct {
a: i32,
}
fn main() {
let some_struct = SomeStruct { a: 32 };
let mut v: Vec<u8> = Vec::new();
let view = &some_struct as *const _ as *const u8;
let slice = unsafe { std::slice::from_raw_parts(view, mem::size_of::<SomeStruct>()) };
v.write(slice).expect("Unable to write");
println!("{:?}", v);
}
This makes your code platform-dependent and even compiler-dependent: if you use types of variable size (e.g. isize/usize) in your struct or if you don't use #[repr(C)], the data you wrote into the vector is likely to be read as garbage on another machine (and even #[repr(C)] may not lift this problem sometimes, as far as I remember).

Resources