Read large f64 binary file into array - rust

I'm looking for help/examples on how to read a relatively large (>12M) binary file of double precision numbers into a rust array. I have metadata on the number of f64 values in the file.
I've read on this and seen the byteorder crate but did not find the documentation/examples particularly helpful.
This is not something that needs to be BufRead, since that likely won't help performance.
Thank you!

The easiest way to do it is to read 8 bytes and convert it to f64 using one of the f64::from_byte-order_bytes() methods:
from_ne_bytes()
from_be_bytes()
from_le_bytes()
These methods are used like that:
let mut buffer = [0u8; 8]; // the buffer can be reused!
reader.read_exact(&mut buffer) ?;
let float = f64::from_be_bytes(buffer);
So you can either read the file 8 bytes at a time or on some larger chunks:
fn main() -> Result<(), Box<dyn Error>> {
let file = File::open("./path/to/file")?;
let mut reader = BufReader::new(file);
let mut buffer = [0u8; 8];
loop {
if let Err(e) = reader.read_exact(&mut buffer) {
// if you know how many bytes are expected, then it's better not to rely on `UnexpectedEof`!
if e.kind() == ErrorKind::UnexpectedEof {
// nothing more to read
break;
}
return Err(e.into());
}
// or use `from_le_bytes()` depending on the byte-order
let float = f64::from_be_bytes(buffer);
//do something with the f64
println!("{}", float);
}
Ok(())
}
If you don't mind adding an additional dependency to your project, then you can also use the ByteOrder crate which has convenience methods to read whole slices:
use byteorder::{ByteOrder, LittleEndian};
let mut bytes = [0; 32]; // the buffer you've read the file into
let mut numbers_got = [0.0; 4];
LittleEndian::read_f64_into(&bytes, &mut numbers_got);
assert_eq!(numbers_given, numbers_got)

Related

What is the most efficient way to read the first line of a file separately to the rest of the file?

I am trying to figure out the best way to read the contents of a file. The problem is that I need to read the first line separately, because I need that to be parsed as a usize which I need for the dimension of a Array2 by ndarray.
I tried the following:
use ndarray::prelude::*;
use std::io:{BufRead,BufReader};
use std::fs;
fn read_inputfile(geom_filename: &str) -> (Vec<i32>, Array2<f64>, usize) {
//* Step 1: Read the coord data from input
println!("Inputfile: {geom_filename}");
let geom_file = fs::File::open(geom_filename).expect("Geometry file not found!");
let geom_file_reader = BufReader::new(geom_file);
let geom_file_lines: Vec<String> = geom_file_reader
.lines()
.map(|line| line.expect("Failed to read line!"))
.collect();
//* Read no of atoms first for array size
let no_atoms: usize = geom_file_lines[0].parse().unwrap();
let mut Z_vals: Vec<i32> = Vec::new();
let mut geom_matr: Array2<f64> = Array2::zeros((no_atoms, 3));
for (atom_idx, line) in geom_file_lines[1..].iter().enumerate() {
//* into_iter would do the same
let line_split: Vec<&str> = line.split_whitespace().collect();
Z_vals.push(line_split[0].parse().unwrap());
(0..3).for_each(|cart_coord| {
geom_matr[(atom_idx, cart_coord)] = line_split[cart_coord + 1].parse().unwrap();
});
}
(Z_vals, geom_matr, no_atoms)
}
Does this not kind of defeat the purpose of the BufReader? I am still relative new to Rust, so I might have misunderstood something, but I thought that one uses the BufReader so that the whole file does not need to be read into memory.
With the Vec<String> for geom_file_lines I am mostlike loading the whole file into memory again, right?
Does this not kind of defeat the purpose of the BufReader?
It very much does, yes. lines() gives you an iterator, so you can read them without loading all of them into memory at once. You force them all into memory, though, as you call collect().
Simply don't do that. Use the iterator as an iterator. Especially as you convert it back to an iterator later, via geom_file_lines[1..].iter().
Like this:
use ndarray::prelude::*;
use std::fs;
use std::io::{BufRead, BufReader};
pub fn read_inputfile(geom_filename: &str) -> (Vec<i32>, Array2<f64>, usize) {
//* Step 1: Read the coord data from input
println!("Inputfile: {geom_filename}");
let geom_file = fs::File::open(geom_filename).expect("Geometry file not found!");
let geom_file_reader = BufReader::new(geom_file);
let mut geom_file_lines = geom_file_reader
.lines()
.map(|line| line.expect("Failed to read line!"));
//* Read no of atoms first for array size
let no_atoms: usize = geom_file_lines.next().unwrap().parse().unwrap();
let mut z_vals: Vec<i32> = Vec::new();
let mut geom_matr: Array2<f64> = Array2::zeros((no_atoms, 3));
for (atom_idx, line) in geom_file_lines.enumerate() {
let line_split: Vec<&str> = line.split_whitespace().collect();
z_vals.push(line_split[0].parse().unwrap());
(0..3).for_each(|cart_coord| {
geom_matr[(atom_idx, cart_coord)] = line_split[cart_coord + 1].parse().unwrap();
});
}
(z_vals, geom_matr, no_atoms)
}
You can apply the same logic in your for loop:
for (atom_idx, line) in geom_file_lines.enumerate() {
let mut line_split = line.split_whitespace();
z_vals.push(line_split.next().unwrap().parse().unwrap());
(0..3).for_each(|cart_coord| {
geom_matr[(atom_idx, cart_coord)] = line_split.next().unwrap().parse().unwrap();
});
}

Get file size in uefi-rs

I am making a basic uefi application that is supposed to load an elf kernel. I have gotten to the point that I have the fille loaded, and a buffer with the file info. But to actually read the file and do anything with it, I need to know the file size so I can make the buffer for it. I know uefi-rs has a FileInfo struct, but I do not know how to cast the buffer I have to this struct.
I have tried looking for solutions to similar problems, came across this Transmuting u8 buffer to struct in Rust. None of these solutions worked, I kept getting an error with the answers on that page because I cannot cast the thin u8 pointer to the fat FileInfo pointer.
This is my source code so far:
#![no_main]
#![no_std]
#![feature(abi_efiapi)]
#![allow(stable_features)]
#[macro_use]
extern crate alloc;
use elf_rs::{Elf, ElfFile};
use log::info;
use uefi::{prelude::{entry, BootServices, cstr16}, Handle, table::{SystemTable, Boot}, Status, Char16, proto::{loaded_image::LoadedImage, media::{file::{File, FileHandle, FileMode, FileAttribute, FileInfo}, fs::SimpleFileSystem}}, CStr16, data_types::Align};
fn load_file(path: &CStr16, boot_services: &BootServices) -> FileHandle {
let loaded_image = boot_services.open_protocol_exclusive::<LoadedImage>(boot_services.image_handle()).unwrap();
let mut file_system = boot_services.open_protocol_exclusive::<SimpleFileSystem>(loaded_image.device()).unwrap();
let mut directory = file_system.open_volume().unwrap();
directory.open(path, FileMode::Read, FileAttribute::READ_ONLY).unwrap()
}
#[entry]
fn main(image_handle: Handle, mut system_table: SystemTable<Boot>) -> Status {
uefi_services::init(&mut system_table).unwrap();
info!("Loading kernel...");
let mut kernel = load_file(cstr16!("kernel.elf"), system_table.boot_services()).into_regular_file().unwrap();
let mut small_buffer = vec![0u8; 0];
let size = kernel.get_info::<FileInfo>(&mut small_buffer).err().unwrap().data().unwrap();
let mut file_info = vec![0u8; size];
kernel.get_info::<FileInfo>(&mut file_info);
let info: FileInfo; //this is what I need
let elf_buffer = vec![0u8; info.file_size().try_into().unwrap()];
let elf = Elf::from_bytes(&mut elf_buffer).expect("Kernel loading failed!");
info!("{:?} header: {:?}", elf, elf.elf_header());
for p in elf.program_header_iter() {
info!("{:x?}", p);
}
for s in elf.section_header_iter() {
info!("{:x?}", s);
}
let s = elf.lookup_section(b".text");
info!("s {:?}", s);
system_table.boot_services().stall(100_000_000);
Status::SUCCESS
}

Writing to statically-sized file with offset

I am trying to write bytes to a file with random access in the context of downloading random chunks of the file and assembling them in-place. The std::io::Write seem to be limited to sequential writing so I am currently using the following implementation which first needs to download all the chunks and then do big write_all once all the chunks are in memory.
use rand::prelude::*;
use rand; // 0.7.3
fn download<W: std::io::Write>(writer: &mut W) -> std::io::Result<()>
{
// the size (20) is known
let mut buf = [0u8; 20];
// simulate downloading four 5 byte chunks at random disjoint slices in *buf*
for n in (0..20).step_by(5).collect::<Vec<usize>>().choose_multiple(&mut thread_rng(), 4) {
let n = *n as usize;
let chunk: [u8; 5] = rand::random();
buf[n..n+5].clone_from_slice(&chunk);
}
println!("filled buffer: {:?}", buf);
// finally, do (unbuffered) write
writer.write_all(&buf)?;
Ok(())
}
fn main() -> std::io::Result<()> {
let mut f = std::fs::File::create("foo.txt")?;
download(&mut f)?;
Ok(())
}
playground link
With large files I think this solution may consume to much memory and I would rather write each chunk directly into the file without any intermediate buffer. I see there is write_at in the documentation, but is this method unique for unix systems? I am looking for a generic solution.
Intuitively, I would want something like this to work:
use rand::prelude::*;
use rand; // 0.7.3
fn download<W: std::io::Write>(writer: &mut W) -> std::io::Result<()>
{
// the size (20) is known
writer.reserve(20)?;
// simulate downloading four 5 byte chunks at random disjoint slices in *buf*
for n in (0..20).step_by(5).collect::<Vec<usize>>().choose_multiple(&mut thread_rng(), 4) {
let n = *n as usize;
let chunk: [u8; 5] = rand::random();
writer.write_at(chunk, n)?;
}
Ok(())
}
fn main() -> std::io::Result<()> {
let mut f = std::fs::File::create("foo.txt")?;
download(&mut f)?;
Ok(())
}
Is there any specialization for writing to random access file output? Can I use a memory mapped file? Doing a quick search I could only find the positioned-io crate, which has not been updated in 3 years. Is there anything similar to this crate in the rust standard library?
You're looking for Seek. File implements both Seek and Write so you just need to bound on Seek as well:
use std::io::{Write, Seek, SeekFrom, Result as IoResult};
fn download<W: Write + Seek>(writer: &mut W) -> IoResult<()> { /* ... */ }
Now you can do writer.seek(SeekFrom::Start(*n as u64 * CHUNK_SIZE))?; before writing out the chunk.

Read binary file in units of f64 in Rust

Assuming you have a binary file example.bin and you want to read that file in units of f64, i.e. the first 8 bytes give a float, the next 8 bytes give a number, etc. (assuming you know endianess) How can this be done in Rust?
I know that one can use std::fs::read("example.bin") to get a Vec<u8> of the data, but then you have to do quite a bit of "gymnastics" to convert always 8 of the bytes to a f64, i.e.
fn eight_bytes_to_array(barry: &[u8]) -> &[u8; 8] {
barry.try_into().expect("slice with incorrect length")
}
let mut file_content = std::fs::read("example.bin").expect("Could not read file!");
let nr = eight_bytes_to_array(&file_content[0..8]);
let nr = f64::from_be_bytes(*nr_dp_per_spectrum);
I saw this post, but its from 2015 and a lot of changes have happend in Rust since then, so I was wondering if there is a better/faster way these days?
Example without proper error handling and checking for cases when file contains not divisible amount of bytes.
use std::fs::File;
use std::io::{BufReader, Read};
fn main() {
// Using BufReader because files in std is unbuffered by default
// And reading by 8 bytes is really bad idea.
let mut input = BufReader::new(
File::open("floats.bin")
.expect("Failed to open file")
);
let mut floats = Vec::new();
loop {
use std::io::ErrorKind;
// You may use 8 instead of `size_of` but size_of is less error-prone.
let mut buffer = [0u8; std::mem::size_of::<f64>()];
// Using read_exact because `read` may return less
// than 8 bytes even if there are bytes in the file.
// This, however, prevents us from handling cases
// when file size cannot be divided by 8.
let res = input.read_exact(&mut buffer);
match res {
// We detect if we read until the end.
// If there were some excess bytes after last read, they are lost.
Err(error) if error.kind() == ErrorKind::UnexpectedEof => break,
// Add more cases of errors you want to handle.
_ => {}
}
// You should do better error-handling probably.
// This simply panics.
res.expect("Unexpected error during read");
// Use `from_be_bytes` if numbers in file is big-endian
let f = f64::from_le_bytes(buffer);
floats.push(f);
}
}
I would create a generic iterator that returns f64 for flexibility and reusability.
struct F64Reader<R: io::BufRead> {
inner: R,
}
impl<R: io::BufRead> F64Reader<R> {
pub fn new(inner: R) -> Self {
Self{
inner
}
}
}
impl<R: io::BufRead> Iterator for F64Reader<R> {
type Item = f64;
fn next(&mut self) -> Option<Self::Item> {
let mut buff: [u8; 8] = [0;8];
self.inner.read_exact(&mut buff).ok()?;
Some(f64::from_be_bytes(buff))
}
}
This means if the file is large, you can loop through the values without storing it all in memory
let input = fs::File::open("example.bin")?;
for f in F64Reader::new(io::BufReader::new(input)) {
println!("{}", f)
}
Or if you want all the values you can collect them
let input = fs::File::open("example.bin")?;
let values : Vec<f64> = F64Reader::new(io::BufReader::new(input)).collect();

Why does a generic function replicating C's fread for unsigned integers always return zero?

I am trying to read in binary 16-bit machine instructions from a 16-bit architecture (the exact nature of that is irrelevant here), and print them back out as hexadecimal values. In C, I found this simple by using the fread function to read 16 bits into a uint16_t.
I figured that I would try to replicate fread in Rust. It seems to be reasonably trivial if I can know ahead-of-time the exact size of the variable that is being read into, and I had that working specifically for 16 bits.
I decided that I wanted to try to make the fread function generic over the various built-in unsigned integer types. For that I came up with the below function, using some traits from the Num crate:
fn fread<T>(
buffer: &mut T,
element_count: usize,
stream: &mut BufReader<File>,
) -> Result<usize, std::io::Error>
where
T: num::PrimInt + num::Unsigned,
{
let type_size = std::mem::size_of::<T>();
let mut buf = Vec::with_capacity(element_count * type_size);
let buf_slice = buf.as_mut_slice();
let bytes_read = match stream.read_exact(buf_slice) {
Ok(()) => element_count * type_size,
Err(ref e) if e.kind() == std::io::ErrorKind::UnexpectedEof => 0,
Err(e) => panic!("{}", e),
};
*buffer = buf_slice
.iter()
.enumerate()
.map(|(i, &b)| {
let mut holder2: T = num::zero();
holder2 = holder2 | T::from(b).expect("Casting from u8 to T failed");
holder2 << ((type_size - i) * 8)
})
.fold(num::zero(), |acc, h| acc | h);
Ok(bytes_read)
}
The issue is that when I call it in the main function, I seem to always get 0x00 back out, but the number of bytes read that is returned by the function is always 2, so that the program enters an infinite loop:
extern crate num;
use std::fs::File;
use std::io::BufReader;
use std::io::prelude::Read;
fn main() -> Result<(), std::io::Error> {
let cmd_line_args = std::env::args().collect::<Vec<_>>();
let f = File::open(&cmd_line_args[1])?;
let mut reader = BufReader::new(f);
let mut instructions: Vec<u16> = Vec::new();
let mut next_instruction: u16 = 0;
fread(&mut next_instruction, 1, &mut reader)?;
let base_address = next_instruction;
while fread(&mut next_instruction, 1, &mut reader)? > 0 {
instructions.push(next_instruction);
}
println!("{:#04x}", base_address);
for i in instructions {
println!("0x{:04x}", i);
}
Ok(())
}
It appears to me that I'm somehow never reading anything from the file, so the function always just returns the number of bytes it was supposed to read. I'm clearly not using something correctly here, but I'm honestly unsure what I'm doing wrong.
This is compiled on Rust 1.26 stable for Windows if that matters.
What am I doing wrong, and what should I do differently to replicate fread? I realise that this is probably a case of the XY problem (in that there's almost certainly a better Rust way to repeatedly read some bytes from a file and pack them into one unsigned integer), but I'm really curious as to what I'm doing wrong here.
Your problem is that this line:
let mut buf = Vec::with_capacity(element_count * type_size);
creates a zero-length vector, even though it allocates memory for element_count * type_size bytes. Therefore you are asking stream.read_exact to read zero bytes. One way to fix this is to replace the above line with:
let mut buf = vec![0; element_count * type_size];
Side note: when the read succeeds, bytes_read receives the number of bytes you expected to read, not the number of bytes you actually read. You should probably use std::mem::size_of_val (buf_slice) to get the true byte count.
in that there's almost certainly a better Rust way to repeatedly read some bytes from a file and pack them into one unsigned integer
Yes, use the byteorder crate. This requires no unneeded heap allocation (the Vec in the original code):
extern crate byteorder;
use byteorder::{LittleEndian, ReadBytesExt};
use std::{
fs::File, io::{self, BufReader, Read},
};
fn read_instructions_to_end<R>(mut rdr: R) -> io::Result<Vec<u16>>
where
R: Read,
{
let mut instructions = Vec::new();
loop {
match rdr.read_u16::<LittleEndian>() {
Ok(instruction) => instructions.push(instruction),
Err(e) => {
return if e.kind() == std::io::ErrorKind::UnexpectedEof {
Ok(instructions)
} else {
Err(e)
}
}
}
}
}
fn main() -> Result<(), std::io::Error> {
let name = std::env::args().skip(1).next().expect("no file name");
let f = File::open(name)?;
let mut f = BufReader::new(f);
let base_address = f.read_u16::<LittleEndian>()?;
let instructions = read_instructions_to_end(f)?;
println!("{:#04x}", base_address);
for i in &instructions {
println!("0x{:04x}", i);
}
Ok(())
}

Resources