serde: Stateful deserialisation at top level - rust

I'm trying to deserialise a binary format (OpenType) which consists of data in multiple tables (binary structs). I would like to be able to deserialise the tables independently (because of how they're stored in the top-level file structure; imagine them being in separate files, so they have to be deserialised separately), but sometimes there are dependencies between them.
A simple example is the loca table which contains an array of either 16-bit or 32-bit offsets, depending on the value of the indexToLocFormat field in the head table. As a more complex example, these loca table offsets in turn are used as offsets into the binary data of the glyf table to locate elements. So I need to get indexToLocFormat and loca: Vec<32> "into" the serializer somehow.
Obviously I need to implement Deserialize myself and write visitors, and I've got my head around doing that. When there are dependencies from a table to a subtable, I've also been able to work that out using deserialize_seed inside the table's visitor. But I don't know how to apply that to pass in information between tables.
I think I need to store what is essentially configuration information (value of indexToLocFormat, array of offsets) when constructing my serializer object:
pub struct Deserializer<'de> {
input: &'de [u8],
ptr: usize,
locaShortVersion: Option<bool>,
glyfOffsets: Option<Vec<u32>>,
...
}
The problem is that I don't know how to retrieve that information when I'm inside the Visitor impl for the struct; I don't know how to get at the deserializer object at all, let alone how to type things so that I get at my Deserializer object with the configuration fields, not just a generic serde::de::Deserializer:
impl<'de> Visitor<'de> for LocaVisitor {
type Value = Vec<u32>;
fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(formatter, "A loca table")
}
fn visit_seq<A: SeqAccess<'de>>(self, mut seq: A) -> Result<Self::Value, A::Error> {
let locaShortVersion = /* what goes here? */;
if locaShortVersion {
Ok(seq.next_element::Vec<u16>()?
.ok_or_else(|| serde::de::Error::custom("Oops"))?
.map { |x| x as u32 }
} else {
Ok(seq.next_element::Vec<u32>()?
.ok_or_else(|| serde::de::Error::custom("Oops"))?
}
}
}
(terrible code here; if you're wondering why I'm writing Yet Another OpenType Parsing Crate, it's because I want to both read and write font files.)

Actually, I think I've got it. The trick is to do the deserialization in stages. Rather than calling the deserializer module's from_bytes function (which wraps the struct creation, and T::deserialize call), do this instead:
use serde::de::DeserializeSeed; // Having this trait in scope is also key
let mut de = Deserializer::from_bytes(&binary_loca_table);
let ssd: SomeSpecialistDeserializer { ... configuration goes here .. };
let loca_table: Vec<u32> = ssd.deserialize(&mut de).unwrap();
In this case, I use a LocaDeserializer defined like so:
pub struct LocaDeserializer { locaIs32Bit: bool }
impl<'de> DeserializeSeed<'de> for LocaDeserializer {
type Value = Vec<u32>;
fn deserialize<D>(self, deserializer: D) -> std::result::Result<Self::Value, D::Error>
where
D: serde::de::Deserializer<'de>,
{
struct LocaDeserializerVisitor {
locaIs32Bit: bool,
}
impl<'de> Visitor<'de> for LocaDeserializerVisitor {
type Value = Vec<u32>;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
write!(formatter, "a loca table")
}
fn visit_seq<A>(self, mut seq: A) -> std::result::Result<Vec<u32>, A::Error>
where
A: SeqAccess<'de>,
{
if self.locaIs32Bit {
Ok(seq.next_element::<u32>()?.ok_or_else(|| serde::de::Error::custom(format!("Expecting a 32 bit glyph offset")))?)
} else {
Ok(seq.next_element::<u16>()?.ok_or_else(|| serde::de::Error::custom(format!("Expecting a 16 bit glyph offset")))?
.iter()
.map(|x| (*x as u32) * 2)
.collect())
}
}
}
deserializer.deserialize_seq(LocaDeserializerVisitor {
locaIs32Bit: self.locaIs32Bit,
})
}
}
And now:
fn loca_de() {
let binary_loca = vec![
0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x1a,
];
let mut de = Deserializer::from_bytes(&binary_loca);
let cs: loca::LocaDeserializer = loca::LocaDeserializer { locaIs32Bit: false };
let floca: Vec<u32> = cs.deserialize(&mut de).unwrap();
println!("{:?}", floca);
// [2, 0, 2, 0, 0, 52]
let mut de = Deserializer::from_bytes(&binary_loca);
let cs: loca::LocaDeserializer = loca::LocaDeserializer { locaIs32Bit: true };
let floca: Vec<u32> = cs.deserialize(&mut de).unwrap();
println!("{:?}", floca);
// [65536, 65536, 26]
}
Serde is very nice - once you have got your head around it.

Related

Is there a way in Rust to overload method for a specific type?

The following is only an example. If there's a native solution for this exact problem with reading bytes - cool, but my goal is to learn how to do it by myself, for any other purpose as well.
I'd like to do something like this: (pseudo-code below)
let mut reader = Reader::new(bytesArr);
let int32: i32 = reader.read(); // separate implementation to read 4 bits and convert into int32
let int64: i64 = reader.read(); // separate implementation to read 8 bits and convert into int64
I imagine it looking like this: (pseudo-code again)
impl Reader {
read<T>(&mut self) -> T {
// if T is i32 ... else if ...
}
}
or like this:
impl Reader {
read(&mut self) -> i32 {
// ...
}
read(&mut self) -> i64 {
// ...
}
}
But haven't found anything relatable yet.
(I actually have, for the first case (if T is i32 ...), but it looked really unreadable and inconvenient)
You could do this by having a Readable trait which you implement on i32 and i64, which does the operation. Then on Reader you could have a generic function which takes any type that is Readable and return it, for example:
struct Reader {
n: u8,
}
trait Readable {
fn read_from_reader(reader: &mut Reader) -> Self;
}
impl Readable for i32 {
fn read_from_reader(reader: &mut Reader) -> i32 {
reader.n += 1;
reader.n as i32
}
}
impl Readable for i64 {
fn read_from_reader(reader: &mut Reader) -> i64 {
reader.n += 1;
reader.n as i64
}
}
impl Reader {
fn read<T: Readable>(&mut self) -> T {
T::read_from_reader(self)
}
}
fn main() {
let mut r = Reader { n: 0 };
let int32: i32 = r.read();
let int64: i64 = r.read();
println!("{} {}", int32, int64);
}
You can try it on the playground
After some trials and searches, I found that implementing them in current Rust seems a bit difficult, but not impossible.
Here is the code, I'll explain it afterwards:
#![feature(generic_const_exprs)]
use std::{
mem::{self, MaybeUninit},
ptr,
};
static DATA: [u8; 8] = [
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
];
struct Reader;
impl Reader {
fn read<T: Copy + Sized>(&self) -> T
where
[(); mem::size_of::<T>()]: ,
{
let mut buf = [unsafe { MaybeUninit::uninit().assume_init() }; mem::size_of::<T>()];
unsafe {
ptr::copy_nonoverlapping(DATA.as_ptr(), buf.as_mut_ptr(), buf.len());
mem::transmute_copy(&buf)
}
}
}
fn main() {
let reader = Reader;
let v_u8: u8 = reader.read();
dbg!(v_u8);
let v_u16: u16 = reader.read();
dbg!(v_u16);
let v_u32: u32 = reader.read();
dbg!(v_u32);
let v_u64: u64 = reader.read();
dbg!(v_u64);
}
Suppose the global static variable DATA is the target data you want to read.
In current Rust, we cannot directly use the size of a generic parameter as the length of an array. This does not work:
fn example<T: Copy + Sized>() {
let mut _buf = [0_u8; mem::size_of::<T>()];
}
The compiler gives a weird error:
error: unconstrained generic constant
--> src\main.rs:34:31
|
34 | let mut _buf = [0_u8; mem::size_of::<T>()];
| ^^^^^^^^^^^^^^^^^^^
|
= help: try adding a `where` bound using this expression: `where [(); mem::size_of::<T>()]:`
There is an issue that is tracking it, if you want to go deeper into this error you can take a look.
We just follow the compiler's suggestion to add a where bound. This requires feature generic_const_exprs to be enabled.
Next, unsafe { MaybeUninit::uninit().assume_init() } is optional, which drops the overhead of initializing this array, since we will eventually overwrite it completely. You can replace it with 0_u8 if you don't like it.
Finally, copy the data you need and transmute this array to your generic type, return.
I think you will see the output you expect:
[src\main.rs:38] v_u8 = 255
[src\main.rs:41] v_u16 = 65535
[src\main.rs:44] v_u32 = 4294967295
[src\main.rs:47] v_u64 = 18446744073709551615

If an ffi function modifies a pointer, should the owning struct be referenced mutable?

I am currently experimenting with the FFI functionality of Rust and implemented a simble HTTP request using libcurl as an exercise. Consider the following self-contained example:
use std::ffi::c_void;
#[repr(C)]
struct CURL {
_private: [u8; 0],
}
// Global CURL codes
const CURL_GLOBAL_DEFAULT: i64 = 3;
const CURLOPT_WRITEDATA: i64 = 10001;
const CURLOPT_URL: i64 = 10002;
const CURLOPT_WRITEFUNCTION: i64 = 20011;
// Curl types
type CURLcode = i64;
type CURLoption = i64;
// Curl function bindings
#[link(name = "curl")]
extern "C" {
fn curl_easy_init() -> *mut CURL;
fn curl_easy_setopt(handle: *mut CURL, option: CURLoption, value: *mut c_void) -> CURLcode;
fn curl_easy_perform(handle: *mut CURL) -> CURLcode;
fn curl_global_init(flags: i64) -> CURLcode;
}
// Curl callback for data retrieving
extern "C" fn callback_writefunction(
data: *mut u8,
size: usize,
nmemb: usize,
user_data: *mut c_void,
) -> usize {
let slice = unsafe { std::slice::from_raw_parts(data, size * nmemb) };
let mut vec = unsafe { Box::from_raw(user_data as *mut Vec<u8>) };
vec.extend_from_slice(slice);
Box::into_raw(vec);
nmemb * size
}
type Result<T> = std::result::Result<T, CURLcode>;
// Our own curl handle
pub struct Curl {
handle: *mut CURL,
data_ptr: *mut Vec<u8>,
}
impl Curl {
pub fn new() -> std::result::Result<Curl, CURLcode> {
let ret = unsafe { curl_global_init(CURL_GLOBAL_DEFAULT) };
if ret != 0 {
return Err(ret);
}
let handle = unsafe { curl_easy_init() };
if handle.is_null() {
return Err(2); // CURLE_FAILED_INIT according to libcurl-errors(3)
}
// Set data callback
let ret = unsafe {
curl_easy_setopt(
handle,
CURLOPT_WRITEFUNCTION,
callback_writefunction as *mut c_void,
)
};
if ret != 0 {
return Err(2);
}
// Set data pointer
let data_buf = Box::new(Vec::new());
let data_ptr = Box::into_raw(data_buf);
let ret = unsafe {
curl_easy_setopt(handle, CURLOPT_WRITEDATA, data_ptr as *mut std::ffi::c_void)
};
match ret {
0 => Ok(Curl { handle, data_ptr }),
_ => Err(2),
}
}
pub fn set_url(&self, url: &str) -> Result<()> {
let url_cstr = std::ffi::CString::new(url.as_bytes()).unwrap();
let ret = unsafe {
curl_easy_setopt(
self.handle,
CURLOPT_URL,
url_cstr.as_ptr() as *mut std::ffi::c_void,
)
};
match ret {
0 => Ok(()),
x => Err(x),
}
}
pub fn perform(&self) -> Result<String> {
let ret = unsafe { curl_easy_perform(self.handle) };
if ret == 0 {
let b = unsafe { Box::from_raw(self.data_ptr) };
let data = (*b).clone();
Box::into_raw(b);
Ok(String::from_utf8(data).unwrap())
} else {
Err(ret)
}
}
}
fn main() -> Result<()> {
let my_curl = Curl::new().unwrap();
my_curl.set_url("https://www.example.com")?;
my_curl.perform().and_then(|data| Ok(println!("{}", data)))
// No cleanup code in this example for the sake of brevity.
}
While this works, I found it surprising that my_curl does not need to be declared mut, since none of the methods use &mut self, even though they pass a mut* pointer to the FFI function
s.
Should I change the declaration of perform to use &mut self instead of &self (for safety), since the internal buffer gets modified? Rust does not enforce this, but of course Rust does not know that the buffer gets modified by libcurl.
This small example runs fine, but I am unsure if I would be facing any kind of issues in larger programs, when the compiler might optimize for non-mutable access on the Curl struct, even though the instance of the struct is getting modified - or at least the data the pointer is pointing to.
Contrary to popular belief, there is absolutely no borrowchecker-induced restriction in Rust on passing *const/*mut pointers. There doesn't need to be, because dereferencing pointers is inherently unsafe, and can only be done in such blocks, with the programmer verifying all necessary invariants manually. In your case, you need to tell the compiler that is a mutable reference, as you already suspected.
The interested reader should definitely give the ffi section of the nomicon a read, to find out about some interesting ways to shoot yourself in the foot with it.

How do I read CSV data without knowing the structure at compile time?

I'm pretty new to Rust and trying to implement some kind of database. Users should create tables by giving a table name, a vector of column names and a vector of column types (realized over an enum). Filling tables should be done by specifying csv files. However, this requires the structure of the table rows to be specified at compile time, like shown in the basic example:
#[derive(Debug, Deserialize, Eq, PartialEq)]
struct Row {
key: u32,
name: String,
comment: String
}
use std::error::Error;
use csv::ReaderBuilder;
use serde::Deserialize;
use std::fs;
fn read_from_file(path: &str) -> Result<(), Box<dyn Error>> {
let data = fs::read_to_string(path).expect("Unable to read file");
let mut rdr = ReaderBuilder::new()
.has_headers(false)
.delimiter(b'|')
.from_reader(data.as_bytes());
let mut iter = rdr.deserialize();
if let Some(result) = iter.next() {
let record:Row = result?;
println!("{:?}", record);
Ok(())
} else {
Err(From::from("expected at least one record but got none"))
}
}
Is there a possibility to use the generic table information instead of the "Row"-struct to cast the results from the deserialization? Is it possible to simply allocate memory according to the combined sizes of the column types and parse the records in? I would do something like this in C...
Is there a possibility to use the generic table information instead of the "Row"-struct to cast the results from the deserialization?
All generics replaced with concrete types at compile time. If you do not know types you will need in runtime, "generics" is not what you need.
Is it possible to simply allocate memory according to the combined sizes of the column types and parse the records in? I would do something like this in C...
I suggest using Box<dyn Any> instead, to be able to store reference of any type and, still, know what type it is.
Maintenance cost for this approach is pretty high. You have to manage each possible value type everywhere you want to use a cell's value. On the other hand, you do not need to parse value each time, just make some type checks in runtime.
I have used std::any::TypeId to identify type, but it can not be used in match expressions. You can consider using custom enum as type identifier.
use std::any::{Any, TypeId};
use std::io::Read;
use csv::Reader;
#[derive(Default)]
struct Table {
name: String,
headers: Vec<(String, TypeId)>,
data: Vec<Vec<Box<dyn Any>>>,
}
impl Table {
fn add_header(&mut self, header: String, _type: TypeId) {
self.headers.push((header, _type));
}
fn populate_data<R: Read>(
&mut self,
rdr: &mut Reader<R>,
) -> Result<(), Box<dyn std::error::Error>> {
for record in rdr.records() {
let record = record?;
let mut row: Vec<Box<dyn Any>> = vec![];
for (&(_, type_id), value) in self.headers.iter().zip(record.iter()) {
if type_id == TypeId::of::<u32>() {
row.push(Box::new(value.parse::<u32>()?));
} else if type_id == TypeId::of::<String>() {
row.push(Box::new(value.to_owned()));
}
}
self.data.push(row);
}
Ok(())
}
}
impl std::fmt::Display for Table {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
writeln!(f, "Table: {}", self.name)?;
for (name, _) in self.headers.iter() {
write!(f, "{}, ", name)?;
}
writeln!(f)?;
for row in self.data.iter() {
for cell in row.iter() {
if let Some(&value) = cell.downcast_ref::<u32>() {
write!(f, "{}, ", value)?;
} else if let Some(value) = cell.downcast_ref::<String>() {
write!(f, "{}, ", value)?;
}
}
writeln!(f)?;
}
Ok(())
}
}
fn main() {
let mut table: Table = Default::default();
table.name = "Foo".to_owned();
table.add_header("key".to_owned(), TypeId::of::<u32>());
table.add_header("name".to_owned(), TypeId::of::<String>());
table.add_header("comment".to_owned(), TypeId::of::<String>());
let data = "\
key,name,comment
1,foo,foo comment
2,bar,bar comment
";
let mut rdr = Reader::from_reader(data.as_bytes());
table.populate_data(&mut rdr).unwrap();
print!("{}", table);
}

How can you allocate a raw mutable pointer in stable Rust?

I was trying to build a naive implementation of a custom String-like struct with small string optimization. Now that unions are allowed in stable Rust, I came up with the following code:
struct Large {
capacity: usize,
buffer: *mut u8,
}
struct Small([u8; 16]);
union Container {
large: Large,
small: Small,
}
struct MyString {
len: usize,
container: Container,
}
I can't seem to find a way how to allocate that *mut u8. Is it possible to do in stable Rust? It looks like using alloc::heap would work, but it is only available in nightly.
As of Rust 1.28, std::alloc::alloc is stable.
Here is an example which shows in general how it can be used.
use std::{
alloc::{self, Layout},
cmp, mem, ptr, slice, str,
};
// This really should **not** be copied
#[derive(Copy, Clone)]
struct Large {
capacity: usize,
buffer: *mut u8,
}
// This really should **not** be copied
#[derive(Copy, Clone, Default)]
struct Small([u8; 16]);
union Container {
large: Large,
small: Small,
}
struct MyString {
len: usize,
container: Container,
}
impl MyString {
fn new() -> Self {
MyString {
len: 0,
container: Container {
small: Small::default(),
},
}
}
fn as_buf(&self) -> &[u8] {
unsafe {
if self.len <= 16 {
&self.container.small.0[..self.len]
} else {
slice::from_raw_parts(self.container.large.buffer, self.len)
}
}
}
pub fn as_str(&self) -> &str {
unsafe { str::from_utf8_unchecked(self.as_buf()) }
}
// Not actually UTF-8 safe!
fn push(&mut self, c: u8) {
unsafe {
use cmp::Ordering::*;
match self.len.cmp(&16) {
Less => {
self.container.small.0[self.len] = c;
}
Equal => {
let capacity = 17;
let layout = Layout::from_size_align(capacity, mem::align_of::<u8>())
.expect("Bad layout");
let buffer = alloc::alloc(layout);
{
let buf = self.as_buf();
ptr::copy_nonoverlapping(buf.as_ptr(), buffer, buf.len());
}
self.container.large = Large { capacity, buffer };
*self.container.large.buffer.offset(self.len as isize) = c;
}
Greater => {
let Large {
mut capacity,
buffer,
} = self.container.large;
capacity += 1;
let layout = Layout::from_size_align(capacity, mem::align_of::<u8>())
.expect("Bad layout");
let buffer = alloc::realloc(buffer, layout, capacity);
self.container.large = Large { capacity, buffer };
*self.container.large.buffer.offset(self.len as isize) = c;
}
}
self.len += 1;
}
}
}
impl Drop for MyString {
fn drop(&mut self) {
unsafe {
if self.len > 16 {
let Large { capacity, buffer } = self.container.large;
let layout =
Layout::from_size_align(capacity, mem::align_of::<u8>()).expect("Bad layout");
alloc::dealloc(buffer, layout);
}
}
}
}
fn main() {
let mut s = MyString::new();
for _ in 0..32 {
s.push(b'a');
println!("{}", s.as_str());
}
}
I believe this code to be correct with respect to allocations, but not for anything else. Like all unsafe code, verify it yourself. It's also completely inefficient as it reallocates for every additional character.
If you'd like to allocate a collection of u8 instead of a single u8, you can create a Vec and then convert it into the constituent pieces, such as by calling as_mut_ptr:
use std::mem;
fn main() {
let mut foo = vec![0; 1024]; // or Vec::<u8>::with_capacity(1024);
let ptr = foo.as_mut_ptr();
let cap = foo.capacity();
let len = foo.len();
mem::forget(foo); // Avoid calling the destructor!
let foo_again = unsafe { Vec::from_raw_parts(ptr, len, cap) }; // Rebuild it to drop it
// Do *NOT* use `ptr` / `cap` / `len` anymore
}
Re allocating is a bit of a pain though; you'd have to convert back to a Vec and do the whole dance forwards and backwards
That being said, your Large struct seems to be missing a length, which would be distinct from capacity. You could just use a Vec instead of writing it out. I see now it's up a bit in the hierarchy.
I wonder if having a full String wouldn't be a lot easier, even if it were a bit less efficient in that the length is double-counted...
union Container {
large: String,
small: Small,
}
See also:
What is the right way to allocate data to pass to an FFI call?
How do I use the Rust memory allocator for a C library that can be provided an allocator?
What about Box::into_raw()?
struct TypeMatches(*mut u8);
TypeMatches(Box::into_raw(Box::new(0u8)));
But it's difficult to tell from your code snippet if this is what you really need. You probably want a real allocator, and you could use libc::malloc with an as cast, as in this example.
There's a memalloc crate which provides a stable allocation API. It's implemented by allocating memory with Vec::with_capacity, then extracting the pointer:
let vec = Vec::with_capacity(cap);
let ptr = buf.as_mut_ptr();
mem::forget(vec);
To free the memory, use Vec::from_raw_parts.

How to read a struct from a file in Rust?

Is there a way I can read a structure directly from a file in Rust? My code is:
use std::fs::File;
struct Configuration {
item1: u8,
item2: u16,
item3: i32,
item4: [char; 8],
}
fn main() {
let file = File::open("config_file").unwrap();
let mut config: Configuration;
// How to read struct from file?
}
How would I read my configuration directly into config from the file? Is this even possible?
Here you go:
use std::io::Read;
use std::mem;
use std::slice;
#[repr(C, packed)]
#[derive(Debug, Copy, Clone)]
struct Configuration {
item1: u8,
item2: u16,
item3: i32,
item4: [char; 8],
}
const CONFIG_DATA: &[u8] = &[
0xfd, // u8
0xb4, 0x50, // u16
0x45, 0xcd, 0x3c, 0x15, // i32
0x71, 0x3c, 0x87, 0xff, // char
0xe8, 0x5d, 0x20, 0xe7, // char
0x5f, 0x38, 0x05, 0x4a, // char
0xc4, 0x58, 0x8f, 0xdc, // char
0x67, 0x1d, 0xb4, 0x64, // char
0xf2, 0xc5, 0x2c, 0x15, // char
0xd8, 0x9a, 0xae, 0x23, // char
0x7d, 0xce, 0x4b, 0xeb, // char
];
fn main() {
let mut buffer = CONFIG_DATA;
let mut config: Configuration = unsafe { mem::zeroed() };
let config_size = mem::size_of::<Configuration>();
unsafe {
let config_slice = slice::from_raw_parts_mut(&mut config as *mut _ as *mut u8, config_size);
// `read_exact()` comes from `Read` impl for `&[u8]`
buffer.read_exact(config_slice).unwrap();
}
println!("Read structure: {:#?}", config);
}
Try it here (Updated for Rust 1.38)
You need to be careful, however, as unsafe code is, well, unsafe. After the slice::from_raw_parts_mut() invocation, there exist two mutable handles to the same data at the same time, which is a violation of Rust aliasing rules. Therefore you would want to keep the mutable slice created out of a structure for the shortest possible time. I also assume that you know about endianness issues - the code above is by no means portable, and will return different results if compiled and run on different kinds of machines (ARM vs x86, for example).
If you can choose the format and you want a compact binary one, consider using bincode. Otherwise, if you need e.g. to parse some pre-defined binary structure, byteorder crate is the way to go.
As Vladimir Matveev mentions, using the byteorder crate is often the best solution. This way, you account for endianness issues, don't have to deal with any unsafe code, or worry about alignment or padding:
use byteorder::{LittleEndian, ReadBytesExt}; // 1.2.7
use std::{
fs::File,
io::{self, Read},
};
struct Configuration {
item1: u8,
item2: u16,
item3: i32,
}
impl Configuration {
fn from_reader(mut rdr: impl Read) -> io::Result<Self> {
let item1 = rdr.read_u8()?;
let item2 = rdr.read_u16::<LittleEndian>()?;
let item3 = rdr.read_i32::<LittleEndian>()?;
Ok(Configuration {
item1,
item2,
item3,
})
}
}
fn main() {
let file = File::open("/dev/random").unwrap();
let config = Configuration::from_reader(file);
// How to read struct from file?
}
I've ignored the [char; 8] for a few reasons:
Rust's char is a 32-bit type and it's unclear if your file has actual Unicode code points or C-style 8-bit values.
You can't easily parse an array with byteorder, you have to parse N values and then build the array yourself.
The following code does not take into account any endianness or padding issues and is intended to be used with POD types. struct Configuration should be safe in this case.
Here is a function that can read a struct (of a POD type) from a file:
use std::io::{self, Read};
use std::slice;
fn read_struct<T, R: Read>(mut read: R) -> io::Result<T> {
let num_bytes = ::std::mem::size_of::<T>();
unsafe {
let mut s = ::std::mem::uninitialized();
let buffer = slice::from_raw_parts_mut(&mut s as *mut T as *mut u8, num_bytes);
match read.read_exact(buffer) {
Ok(()) => Ok(s),
Err(e) => {
::std::mem::forget(s);
Err(e)
}
}
}
}
// use
// read_struct::<Configuration>(reader)
If you want to read a sequence of structs from a file, you can execute read_struct multiple times or read all the file at once:
use std::fs::{self, File};
use std::io::BufReader;
use std::path::Path;
fn read_structs<T, P: AsRef<Path>>(path: P) -> io::Result<Vec<T>> {
let path = path.as_ref();
let struct_size = ::std::mem::size_of::<T>();
let num_bytes = fs::metadata(path)?.len() as usize;
let num_structs = num_bytes / struct_size;
let mut reader = BufReader::new(File::open(path)?);
let mut r = Vec::<T>::with_capacity(num_structs);
unsafe {
let buffer = slice::from_raw_parts_mut(r.as_mut_ptr() as *mut u8, num_bytes);
reader.read_exact(buffer)?;
r.set_len(num_structs);
}
Ok(r)
}
// use
// read_structs::<StructName, _>("path/to/file"))

Resources