Writing to statically-sized file with offset - rust

I am trying to write bytes to a file with random access in the context of downloading random chunks of the file and assembling them in-place. The std::io::Write seem to be limited to sequential writing so I am currently using the following implementation which first needs to download all the chunks and then do big write_all once all the chunks are in memory.
use rand::prelude::*;
use rand; // 0.7.3
fn download<W: std::io::Write>(writer: &mut W) -> std::io::Result<()>
{
// the size (20) is known
let mut buf = [0u8; 20];
// simulate downloading four 5 byte chunks at random disjoint slices in *buf*
for n in (0..20).step_by(5).collect::<Vec<usize>>().choose_multiple(&mut thread_rng(), 4) {
let n = *n as usize;
let chunk: [u8; 5] = rand::random();
buf[n..n+5].clone_from_slice(&chunk);
}
println!("filled buffer: {:?}", buf);
// finally, do (unbuffered) write
writer.write_all(&buf)?;
Ok(())
}
fn main() -> std::io::Result<()> {
let mut f = std::fs::File::create("foo.txt")?;
download(&mut f)?;
Ok(())
}
playground link
With large files I think this solution may consume to much memory and I would rather write each chunk directly into the file without any intermediate buffer. I see there is write_at in the documentation, but is this method unique for unix systems? I am looking for a generic solution.
Intuitively, I would want something like this to work:
use rand::prelude::*;
use rand; // 0.7.3
fn download<W: std::io::Write>(writer: &mut W) -> std::io::Result<()>
{
// the size (20) is known
writer.reserve(20)?;
// simulate downloading four 5 byte chunks at random disjoint slices in *buf*
for n in (0..20).step_by(5).collect::<Vec<usize>>().choose_multiple(&mut thread_rng(), 4) {
let n = *n as usize;
let chunk: [u8; 5] = rand::random();
writer.write_at(chunk, n)?;
}
Ok(())
}
fn main() -> std::io::Result<()> {
let mut f = std::fs::File::create("foo.txt")?;
download(&mut f)?;
Ok(())
}
Is there any specialization for writing to random access file output? Can I use a memory mapped file? Doing a quick search I could only find the positioned-io crate, which has not been updated in 3 years. Is there anything similar to this crate in the rust standard library?

You're looking for Seek. File implements both Seek and Write so you just need to bound on Seek as well:
use std::io::{Write, Seek, SeekFrom, Result as IoResult};
fn download<W: Write + Seek>(writer: &mut W) -> IoResult<()> { /* ... */ }
Now you can do writer.seek(SeekFrom::Start(*n as u64 * CHUNK_SIZE))?; before writing out the chunk.

Related

Read binary file in units of f64 in Rust

Assuming you have a binary file example.bin and you want to read that file in units of f64, i.e. the first 8 bytes give a float, the next 8 bytes give a number, etc. (assuming you know endianess) How can this be done in Rust?
I know that one can use std::fs::read("example.bin") to get a Vec<u8> of the data, but then you have to do quite a bit of "gymnastics" to convert always 8 of the bytes to a f64, i.e.
fn eight_bytes_to_array(barry: &[u8]) -> &[u8; 8] {
barry.try_into().expect("slice with incorrect length")
}
let mut file_content = std::fs::read("example.bin").expect("Could not read file!");
let nr = eight_bytes_to_array(&file_content[0..8]);
let nr = f64::from_be_bytes(*nr_dp_per_spectrum);
I saw this post, but its from 2015 and a lot of changes have happend in Rust since then, so I was wondering if there is a better/faster way these days?
Example without proper error handling and checking for cases when file contains not divisible amount of bytes.
use std::fs::File;
use std::io::{BufReader, Read};
fn main() {
// Using BufReader because files in std is unbuffered by default
// And reading by 8 bytes is really bad idea.
let mut input = BufReader::new(
File::open("floats.bin")
.expect("Failed to open file")
);
let mut floats = Vec::new();
loop {
use std::io::ErrorKind;
// You may use 8 instead of `size_of` but size_of is less error-prone.
let mut buffer = [0u8; std::mem::size_of::<f64>()];
// Using read_exact because `read` may return less
// than 8 bytes even if there are bytes in the file.
// This, however, prevents us from handling cases
// when file size cannot be divided by 8.
let res = input.read_exact(&mut buffer);
match res {
// We detect if we read until the end.
// If there were some excess bytes after last read, they are lost.
Err(error) if error.kind() == ErrorKind::UnexpectedEof => break,
// Add more cases of errors you want to handle.
_ => {}
}
// You should do better error-handling probably.
// This simply panics.
res.expect("Unexpected error during read");
// Use `from_be_bytes` if numbers in file is big-endian
let f = f64::from_le_bytes(buffer);
floats.push(f);
}
}
I would create a generic iterator that returns f64 for flexibility and reusability.
struct F64Reader<R: io::BufRead> {
inner: R,
}
impl<R: io::BufRead> F64Reader<R> {
pub fn new(inner: R) -> Self {
Self{
inner
}
}
}
impl<R: io::BufRead> Iterator for F64Reader<R> {
type Item = f64;
fn next(&mut self) -> Option<Self::Item> {
let mut buff: [u8; 8] = [0;8];
self.inner.read_exact(&mut buff).ok()?;
Some(f64::from_be_bytes(buff))
}
}
This means if the file is large, you can loop through the values without storing it all in memory
let input = fs::File::open("example.bin")?;
for f in F64Reader::new(io::BufReader::new(input)) {
println!("{}", f)
}
Or if you want all the values you can collect them
let input = fs::File::open("example.bin")?;
let values : Vec<f64> = F64Reader::new(io::BufReader::new(input)).collect();

Read large f64 binary file into array

I'm looking for help/examples on how to read a relatively large (>12M) binary file of double precision numbers into a rust array. I have metadata on the number of f64 values in the file.
I've read on this and seen the byteorder crate but did not find the documentation/examples particularly helpful.
This is not something that needs to be BufRead, since that likely won't help performance.
Thank you!
The easiest way to do it is to read 8 bytes and convert it to f64 using one of the f64::from_byte-order_bytes() methods:
from_ne_bytes()
from_be_bytes()
from_le_bytes()
These methods are used like that:
let mut buffer = [0u8; 8]; // the buffer can be reused!
reader.read_exact(&mut buffer) ?;
let float = f64::from_be_bytes(buffer);
So you can either read the file 8 bytes at a time or on some larger chunks:
fn main() -> Result<(), Box<dyn Error>> {
let file = File::open("./path/to/file")?;
let mut reader = BufReader::new(file);
let mut buffer = [0u8; 8];
loop {
if let Err(e) = reader.read_exact(&mut buffer) {
// if you know how many bytes are expected, then it's better not to rely on `UnexpectedEof`!
if e.kind() == ErrorKind::UnexpectedEof {
// nothing more to read
break;
}
return Err(e.into());
}
// or use `from_le_bytes()` depending on the byte-order
let float = f64::from_be_bytes(buffer);
//do something with the f64
println!("{}", float);
}
Ok(())
}
If you don't mind adding an additional dependency to your project, then you can also use the ByteOrder crate which has convenience methods to read whole slices:
use byteorder::{ByteOrder, LittleEndian};
let mut bytes = [0; 32]; // the buffer you've read the file into
let mut numbers_got = [0.0; 4];
LittleEndian::read_f64_into(&bytes, &mut numbers_got);
assert_eq!(numbers_given, numbers_got)

How to do simple math with a list of numbers from a file and print out the result in Rust?

use std::fs::File;
use std::io::prelude::*;
use std::io::BufReader;
use std::iter::Iterator;
fn main() -> std::io::Result<()> {
let file = File::open("input")?; // file is input
let mut buf_reader = BufReader::new(file);
let mut contents = String::new();
buf_reader.read_to_string(&mut contents)?;
for i in contents.parse::<i32>() {
let i = i / 2;
println!("{}", i);
}
Ok(())
}
list of numbers:
50951
69212
119076
124303
95335
65069
109778
113786
124821
103423
128775
111918
138158
141455
92800
50908
107279
77352
129442
60097
84670
143682
104335
105729
87948
59542
81481
147508
str::parse::<i32> can only parse a single number at a time, so you will need to split the text first and then parse each number one by one. For example if you have one number per line and no extra whitespace, you can use BufRead::lines to process the text line by line:
use std::fs::File;
use std::io::{BufRead, BufReader};
fn main() -> std::io::Result<()> {
let file = File::open("input")?; // file is input
let mut buf_reader = BufReader::new(file);
for line in buf_reader.lines() {
let value = line?
.parse::<i32>()
.expect("Not able to parse: Content is malformed !");
println!("{}", value / 2);
}
Ok(())
}
As an extra bonus this avoids reading the whole file into memory, which can be important if the file is big.
For tiny examples like this, I'd read the entire string at once, then split it up on lines.
use std::fs;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let contents = fs::read_to_string("input")?;
for line in contents.trim().lines() {
let i: i32 = line.trim().parse()?;
let i = i / 2;
println!("{}", i);
}
Ok(())
}
See also:
What's the de-facto way of reading and writing files in Rust 1.x?
For tightly-controlled examples like this, I'd ignore errors occurring while parsing:
use std::fs;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let contents = fs::read_to_string("input")?;
for i in contents.trim().lines().flat_map(|l| l.trim().parse::<i32>()) {
let i = i / 2;
println!("{}", i);
}
Ok(())
}
See also:
Why does `Option` support `IntoIterator`?
For fixed-input examples like this, I'd avoid opening the file at runtime at all, pushing the error to compile-time:
fn main() -> Result<(), Box<dyn std::error::Error>> {
let contents = include_str!("../input");
for i in contents.trim().lines().flat_map(|l| l.trim().parse::<i32>()) {
let i = i / 2;
println!("{}", i);
}
Ok(())
}
See also:
Is there a good way to include external resource data into Rust source code?
If I wanted to handle failures to parse but treat the iterator as if errors were impossible, I'd use Itertools::process_results:
use itertools; // 0.8.2
fn main() -> Result<(), Box<dyn std::error::Error>> {
let contents = include_str!("../input");
let numbers = contents.trim().lines().map(|l| l.trim().parse::<i32>());
let sum = itertools::process_results(numbers, |i| i.sum::<i32>());
println!("{:?}", sum);
Ok(())
}
See also:
How do I perform iterator computations over iterators of Results without collecting to a temporary vector?
How do I stop iteration and return an error when Iterator::map returns a Result::Err?

Why does a generic function replicating C's fread for unsigned integers always return zero?

I am trying to read in binary 16-bit machine instructions from a 16-bit architecture (the exact nature of that is irrelevant here), and print them back out as hexadecimal values. In C, I found this simple by using the fread function to read 16 bits into a uint16_t.
I figured that I would try to replicate fread in Rust. It seems to be reasonably trivial if I can know ahead-of-time the exact size of the variable that is being read into, and I had that working specifically for 16 bits.
I decided that I wanted to try to make the fread function generic over the various built-in unsigned integer types. For that I came up with the below function, using some traits from the Num crate:
fn fread<T>(
buffer: &mut T,
element_count: usize,
stream: &mut BufReader<File>,
) -> Result<usize, std::io::Error>
where
T: num::PrimInt + num::Unsigned,
{
let type_size = std::mem::size_of::<T>();
let mut buf = Vec::with_capacity(element_count * type_size);
let buf_slice = buf.as_mut_slice();
let bytes_read = match stream.read_exact(buf_slice) {
Ok(()) => element_count * type_size,
Err(ref e) if e.kind() == std::io::ErrorKind::UnexpectedEof => 0,
Err(e) => panic!("{}", e),
};
*buffer = buf_slice
.iter()
.enumerate()
.map(|(i, &b)| {
let mut holder2: T = num::zero();
holder2 = holder2 | T::from(b).expect("Casting from u8 to T failed");
holder2 << ((type_size - i) * 8)
})
.fold(num::zero(), |acc, h| acc | h);
Ok(bytes_read)
}
The issue is that when I call it in the main function, I seem to always get 0x00 back out, but the number of bytes read that is returned by the function is always 2, so that the program enters an infinite loop:
extern crate num;
use std::fs::File;
use std::io::BufReader;
use std::io::prelude::Read;
fn main() -> Result<(), std::io::Error> {
let cmd_line_args = std::env::args().collect::<Vec<_>>();
let f = File::open(&cmd_line_args[1])?;
let mut reader = BufReader::new(f);
let mut instructions: Vec<u16> = Vec::new();
let mut next_instruction: u16 = 0;
fread(&mut next_instruction, 1, &mut reader)?;
let base_address = next_instruction;
while fread(&mut next_instruction, 1, &mut reader)? > 0 {
instructions.push(next_instruction);
}
println!("{:#04x}", base_address);
for i in instructions {
println!("0x{:04x}", i);
}
Ok(())
}
It appears to me that I'm somehow never reading anything from the file, so the function always just returns the number of bytes it was supposed to read. I'm clearly not using something correctly here, but I'm honestly unsure what I'm doing wrong.
This is compiled on Rust 1.26 stable for Windows if that matters.
What am I doing wrong, and what should I do differently to replicate fread? I realise that this is probably a case of the XY problem (in that there's almost certainly a better Rust way to repeatedly read some bytes from a file and pack them into one unsigned integer), but I'm really curious as to what I'm doing wrong here.
Your problem is that this line:
let mut buf = Vec::with_capacity(element_count * type_size);
creates a zero-length vector, even though it allocates memory for element_count * type_size bytes. Therefore you are asking stream.read_exact to read zero bytes. One way to fix this is to replace the above line with:
let mut buf = vec![0; element_count * type_size];
Side note: when the read succeeds, bytes_read receives the number of bytes you expected to read, not the number of bytes you actually read. You should probably use std::mem::size_of_val (buf_slice) to get the true byte count.
in that there's almost certainly a better Rust way to repeatedly read some bytes from a file and pack them into one unsigned integer
Yes, use the byteorder crate. This requires no unneeded heap allocation (the Vec in the original code):
extern crate byteorder;
use byteorder::{LittleEndian, ReadBytesExt};
use std::{
fs::File, io::{self, BufReader, Read},
};
fn read_instructions_to_end<R>(mut rdr: R) -> io::Result<Vec<u16>>
where
R: Read,
{
let mut instructions = Vec::new();
loop {
match rdr.read_u16::<LittleEndian>() {
Ok(instruction) => instructions.push(instruction),
Err(e) => {
return if e.kind() == std::io::ErrorKind::UnexpectedEof {
Ok(instructions)
} else {
Err(e)
}
}
}
}
}
fn main() -> Result<(), std::io::Error> {
let name = std::env::args().skip(1).next().expect("no file name");
let f = File::open(name)?;
let mut f = BufReader::new(f);
let base_address = f.read_u16::<LittleEndian>()?;
let instructions = read_instructions_to_end(f)?;
println!("{:#04x}", base_address);
for i in &instructions {
println!("0x{:04x}", i);
}
Ok(())
}

How to convert a String to a &[u8] in order to write it to a file? [duplicate]

With Rust being comparatively new, I've seen far too many ways of reading and writing files. Many are extremely messy snippets someone came up with for their blog, and 99% of the examples I've found (even on Stack Overflow) are from unstable builds that no longer work. Now that Rust is stable, what is a simple, readable, non-panicking snippet for reading or writing files?
This is the closest I've gotten to something that works in terms of reading a text file, but it's still not compiling even though I'm fairly certain I've included everything I should have. This is based off of a snippet I found on Google+ of all places, and the only thing I've changed is that the old BufferedReader is now just BufReader:
use std::fs::File;
use std::io::BufReader;
use std::path::Path;
fn main() {
let path = Path::new("./textfile");
let mut file = BufReader::new(File::open(&path));
for line in file.lines() {
println!("{}", line);
}
}
The compiler complains:
error: the trait bound `std::result::Result<std::fs::File, std::io::Error>: std::io::Read` is not satisfied [--explain E0277]
--> src/main.rs:7:20
|>
7 |> let mut file = BufReader::new(File::open(&path));
|> ^^^^^^^^^^^^^^
note: required by `std::io::BufReader::new`
error: no method named `lines` found for type `std::io::BufReader<std::result::Result<std::fs::File, std::io::Error>>` in the current scope
--> src/main.rs:8:22
|>
8 |> for line in file.lines() {
|> ^^^^^
To sum it up, what I'm looking for is:
brevity
readability
covers all possible errors
doesn't panic
None of the functions I show here panic on their own, but I am using expect because I don't know what kind of error handling will fit best into your application. Go read The Rust Programming Language's chapter on error handling to understand how to appropriately handle failure in your own program.
Rust 1.26 and onwards
If you don't want to care about the underlying details, there are one-line functions for reading and writing.
Read a file to a String
use std::fs;
fn main() {
let data = fs::read_to_string("/etc/hosts").expect("Unable to read file");
println!("{}", data);
}
Read a file as a Vec<u8>
use std::fs;
fn main() {
let data = fs::read("/etc/hosts").expect("Unable to read file");
println!("{}", data.len());
}
Write a file
use std::fs;
fn main() {
let data = "Some data!";
fs::write("/tmp/foo", data).expect("Unable to write file");
}
Rust 1.0 and onwards
These forms are slightly more verbose than the one-line functions that allocate a String or Vec for you, but are more powerful in that you can reuse allocated data or append to an existing object.
Reading data
Reading a file requires two core pieces: File and Read.
Read a file to a String
use std::fs::File;
use std::io::Read;
fn main() {
let mut data = String::new();
let mut f = File::open("/etc/hosts").expect("Unable to open file");
f.read_to_string(&mut data).expect("Unable to read string");
println!("{}", data);
}
Read a file as a Vec<u8>
use std::fs::File;
use std::io::Read;
fn main() {
let mut data = Vec::new();
let mut f = File::open("/etc/hosts").expect("Unable to open file");
f.read_to_end(&mut data).expect("Unable to read data");
println!("{}", data.len());
}
Write a file
Writing a file is similar, except we use the Write trait and we always write out bytes. You can convert a String / &str to bytes with as_bytes:
use std::fs::File;
use std::io::Write;
fn main() {
let data = "Some data!";
let mut f = File::create("/tmp/foo").expect("Unable to create file");
f.write_all(data.as_bytes()).expect("Unable to write data");
}
Buffered I/O
I felt a bit of a push from the community to use BufReader and BufWriter instead of reading straight from a file
A buffered reader (or writer) uses a buffer to reduce the number of I/O requests. For example, it's much more efficient to access the disk once to read 256 bytes instead of accessing the disk 256 times.
That being said, I don't believe a buffered reader/writer will be useful when reading the entire file. read_to_end seems to copy data in somewhat large chunks, so the transfer may already be naturally coalesced into fewer I/O requests.
Here's an example of using it for reading:
use std::fs::File;
use std::io::{BufReader, Read};
fn main() {
let mut data = String::new();
let f = File::open("/etc/hosts").expect("Unable to open file");
let mut br = BufReader::new(f);
br.read_to_string(&mut data).expect("Unable to read string");
println!("{}", data);
}
And for writing:
use std::fs::File;
use std::io::{BufWriter, Write};
fn main() {
let data = "Some data!";
let f = File::create("/tmp/foo").expect("Unable to create file");
let mut f = BufWriter::new(f);
f.write_all(data.as_bytes()).expect("Unable to write data");
}
A BufReader is more useful when you want to read line-by-line:
use std::fs::File;
use std::io::{BufRead, BufReader};
fn main() {
let f = File::open("/etc/hosts").expect("Unable to open file");
let f = BufReader::new(f);
for line in f.lines() {
let line = line.expect("Unable to read line");
println!("Line: {}", line);
}
}
For anybody who is writing to a file, the accepted answer is good but if you need to append to the file you have to use the OpenOptions struct instead:
use std::io::Write;
use std::fs::OpenOptions;
fn main() {
let data = "Some data!\n";
let mut f = OpenOptions::new()
.append(true)
.create(true) // Optionally create the file if it doesn't already exist
.open("/tmp/foo")
.expect("Unable to open file");
f.write_all(data.as_bytes()).expect("Unable to write data");
}
Buffered writing still works the same way:
use std::io::{BufWriter, Write};
use std::fs::OpenOptions;
fn main() {
let data = "Some data!\n";
let f = OpenOptions::new()
.append(true)
.open("/tmp/foo")
.expect("Unable to open file");
let mut f = BufWriter::new(f);
f.write_all(data.as_bytes()).expect("Unable to write data");
}
By using the Buffered I/O you can copy the file size is greater than the actual memory.
use std::fs::{File, OpenOptions};
use std::io::{BufReader, BufWriter, Write, BufRead};
fn main() {
let read = File::open(r#"E:\1.xls"#);
let write = OpenOptions::new().write(true).create(true).open(r#"E:\2.xls"#);
let mut reader = BufReader::new(read.unwrap());
let mut writer = BufWriter::new(write.unwrap());
let mut length = 1;
while length > 0 {
let buffer = reader.fill_buf().unwrap();
writer.write(buffer);
length = buffer.len();
reader.consume(length);
}
}

Resources