What is the most efficient way to read the first line of a file separately to the rest of the file? - rust

I am trying to figure out the best way to read the contents of a file. The problem is that I need to read the first line separately, because I need that to be parsed as a usize which I need for the dimension of a Array2 by ndarray.
I tried the following:
use ndarray::prelude::*;
use std::io:{BufRead,BufReader};
use std::fs;
fn read_inputfile(geom_filename: &str) -> (Vec<i32>, Array2<f64>, usize) {
//* Step 1: Read the coord data from input
println!("Inputfile: {geom_filename}");
let geom_file = fs::File::open(geom_filename).expect("Geometry file not found!");
let geom_file_reader = BufReader::new(geom_file);
let geom_file_lines: Vec<String> = geom_file_reader
.lines()
.map(|line| line.expect("Failed to read line!"))
.collect();
//* Read no of atoms first for array size
let no_atoms: usize = geom_file_lines[0].parse().unwrap();
let mut Z_vals: Vec<i32> = Vec::new();
let mut geom_matr: Array2<f64> = Array2::zeros((no_atoms, 3));
for (atom_idx, line) in geom_file_lines[1..].iter().enumerate() {
//* into_iter would do the same
let line_split: Vec<&str> = line.split_whitespace().collect();
Z_vals.push(line_split[0].parse().unwrap());
(0..3).for_each(|cart_coord| {
geom_matr[(atom_idx, cart_coord)] = line_split[cart_coord + 1].parse().unwrap();
});
}
(Z_vals, geom_matr, no_atoms)
}
Does this not kind of defeat the purpose of the BufReader? I am still relative new to Rust, so I might have misunderstood something, but I thought that one uses the BufReader so that the whole file does not need to be read into memory.
With the Vec<String> for geom_file_lines I am mostlike loading the whole file into memory again, right?

Does this not kind of defeat the purpose of the BufReader?
It very much does, yes. lines() gives you an iterator, so you can read them without loading all of them into memory at once. You force them all into memory, though, as you call collect().
Simply don't do that. Use the iterator as an iterator. Especially as you convert it back to an iterator later, via geom_file_lines[1..].iter().
Like this:
use ndarray::prelude::*;
use std::fs;
use std::io::{BufRead, BufReader};
pub fn read_inputfile(geom_filename: &str) -> (Vec<i32>, Array2<f64>, usize) {
//* Step 1: Read the coord data from input
println!("Inputfile: {geom_filename}");
let geom_file = fs::File::open(geom_filename).expect("Geometry file not found!");
let geom_file_reader = BufReader::new(geom_file);
let mut geom_file_lines = geom_file_reader
.lines()
.map(|line| line.expect("Failed to read line!"));
//* Read no of atoms first for array size
let no_atoms: usize = geom_file_lines.next().unwrap().parse().unwrap();
let mut z_vals: Vec<i32> = Vec::new();
let mut geom_matr: Array2<f64> = Array2::zeros((no_atoms, 3));
for (atom_idx, line) in geom_file_lines.enumerate() {
let line_split: Vec<&str> = line.split_whitespace().collect();
z_vals.push(line_split[0].parse().unwrap());
(0..3).for_each(|cart_coord| {
geom_matr[(atom_idx, cart_coord)] = line_split[cart_coord + 1].parse().unwrap();
});
}
(z_vals, geom_matr, no_atoms)
}
You can apply the same logic in your for loop:
for (atom_idx, line) in geom_file_lines.enumerate() {
let mut line_split = line.split_whitespace();
z_vals.push(line_split.next().unwrap().parse().unwrap());
(0..3).for_each(|cart_coord| {
geom_matr[(atom_idx, cart_coord)] = line_split.next().unwrap().parse().unwrap();
});
}

Related

Splitting a Vec of strings into Vec<Vec<String>>

I am attempting to relearn data-science in rust.
I have a Vec<String> that includes a delimiter "|" and a new line "!end".
What I'd like to end up with is Vec<Vec<String>> that can be put into a 2D ND array.
I have this python Code:
file = open('somefile.dat')
lst = []
for line in file:
lst += [line.split('|')]
df = pd.DataFrame(lst)
SAMV2FinalDataFrame = pd.DataFrame(lst,columns=column_names)
And i've recreated it here in rust:
fn lines_from_file(filename: impl AsRef<Path>) -> Vec<String> {
let file = File::open(filename).expect("no such file");
let buf = BufReader::new(file);
buf.lines()
.map(|l| l.expect("Could not parse line"))
.collect()
}
fn main() {
let lines = lines_from_file(".dat");
let mut new_arr = vec![];
//Here i get a lines immitable borrow
for line in lines{
new_arr.push([*line.split("!end")]);
}
// here i get expeected closure found str
let x = lines.split("!end");
let array = Array::from(lines)
what i have: ['1','1','1','end!','2','2','2','!end']
What i need: [['1','1','1'],['2','2','2']]
Edit: also why when i turbo fish does it make it disappear on Stack Overflow?
I think part of the issue you ran into was due how you worked with arrays. For example, Vec::push will only add a single element so you would want to use Vec::extend instead. I also ran into a few cases of empty strings due to splitting by "!end" would leave trailing '|' on the ends of substrings. The errors were quite strange, I am not completely sure where the closure came from.
let lines = vec!["1|1|1|!end|2|2|2|!end".to_string()];
let mut new_arr = Vec::new();
// Iterate over &lines so we don't consume lines and it can be used again later
for line in &lines {
new_arr.extend(line.split("!end")
// Remove trailing empty string
.filter(|x| !x.is_empty())
// Convert each &str into a Vec<String>
.map(|x| {
x.split('|')
// Remove empty strings from ends split (Ex split: "|2|2|2|")
.filter(|x| !x.is_empty())
// Convert &str into owned String
.map(|x| x.to_string())
// Turn iterator into Vec<String>
.collect::<Vec<_>>()
}));
}
println!("{:?}", new_arr);
I also came up with this other version which should handle your use case better. The earlier approach dropped all empty strings, while this one should preserve them while correctly handling the "!end".
use std::io::{self, BufRead, BufReader, Read, Cursor};
fn split_data<R: Read>(buffer: &mut R) -> io::Result<Vec<Vec<String>>> {
let mut sections = Vec::new();
let mut current_section = Vec::new();
for line in BufReader::new(buffer).lines() {
for item in line?.split('|') {
if item != "!end" {
current_section.push(item.to_string());
} else {
sections.push(current_section);
current_section = Vec::new();
}
}
}
Ok(sections)
}
In this example, I used Read for easier testing, but it will also work with a file.
let sample_input = b"1|1|1|!end|2|2|2|!end";
println!("{:?}", split_data(&mut Cursor::new(sample_input)));
// Output: Ok([["1", "1", "1"], ["2", "2", "2"]])
// You can also use a file instead
let mut file = File::new("somefile.dat");
let solution: Vec<Vec<String>> = split_data(&mut file).unwrap();
playground link

Read large f64 binary file into array

I'm looking for help/examples on how to read a relatively large (>12M) binary file of double precision numbers into a rust array. I have metadata on the number of f64 values in the file.
I've read on this and seen the byteorder crate but did not find the documentation/examples particularly helpful.
This is not something that needs to be BufRead, since that likely won't help performance.
Thank you!
The easiest way to do it is to read 8 bytes and convert it to f64 using one of the f64::from_byte-order_bytes() methods:
from_ne_bytes()
from_be_bytes()
from_le_bytes()
These methods are used like that:
let mut buffer = [0u8; 8]; // the buffer can be reused!
reader.read_exact(&mut buffer) ?;
let float = f64::from_be_bytes(buffer);
So you can either read the file 8 bytes at a time or on some larger chunks:
fn main() -> Result<(), Box<dyn Error>> {
let file = File::open("./path/to/file")?;
let mut reader = BufReader::new(file);
let mut buffer = [0u8; 8];
loop {
if let Err(e) = reader.read_exact(&mut buffer) {
// if you know how many bytes are expected, then it's better not to rely on `UnexpectedEof`!
if e.kind() == ErrorKind::UnexpectedEof {
// nothing more to read
break;
}
return Err(e.into());
}
// or use `from_le_bytes()` depending on the byte-order
let float = f64::from_be_bytes(buffer);
//do something with the f64
println!("{}", float);
}
Ok(())
}
If you don't mind adding an additional dependency to your project, then you can also use the ByteOrder crate which has convenience methods to read whole slices:
use byteorder::{ByteOrder, LittleEndian};
let mut bytes = [0; 32]; // the buffer you've read the file into
let mut numbers_got = [0.0; 4];
LittleEndian::read_f64_into(&bytes, &mut numbers_got);
assert_eq!(numbers_given, numbers_got)

How to do simple math with a list of numbers from a file and print out the result in Rust?

use std::fs::File;
use std::io::prelude::*;
use std::io::BufReader;
use std::iter::Iterator;
fn main() -> std::io::Result<()> {
let file = File::open("input")?; // file is input
let mut buf_reader = BufReader::new(file);
let mut contents = String::new();
buf_reader.read_to_string(&mut contents)?;
for i in contents.parse::<i32>() {
let i = i / 2;
println!("{}", i);
}
Ok(())
}
list of numbers:
50951
69212
119076
124303
95335
65069
109778
113786
124821
103423
128775
111918
138158
141455
92800
50908
107279
77352
129442
60097
84670
143682
104335
105729
87948
59542
81481
147508
str::parse::<i32> can only parse a single number at a time, so you will need to split the text first and then parse each number one by one. For example if you have one number per line and no extra whitespace, you can use BufRead::lines to process the text line by line:
use std::fs::File;
use std::io::{BufRead, BufReader};
fn main() -> std::io::Result<()> {
let file = File::open("input")?; // file is input
let mut buf_reader = BufReader::new(file);
for line in buf_reader.lines() {
let value = line?
.parse::<i32>()
.expect("Not able to parse: Content is malformed !");
println!("{}", value / 2);
}
Ok(())
}
As an extra bonus this avoids reading the whole file into memory, which can be important if the file is big.
For tiny examples like this, I'd read the entire string at once, then split it up on lines.
use std::fs;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let contents = fs::read_to_string("input")?;
for line in contents.trim().lines() {
let i: i32 = line.trim().parse()?;
let i = i / 2;
println!("{}", i);
}
Ok(())
}
See also:
What's the de-facto way of reading and writing files in Rust 1.x?
For tightly-controlled examples like this, I'd ignore errors occurring while parsing:
use std::fs;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let contents = fs::read_to_string("input")?;
for i in contents.trim().lines().flat_map(|l| l.trim().parse::<i32>()) {
let i = i / 2;
println!("{}", i);
}
Ok(())
}
See also:
Why does `Option` support `IntoIterator`?
For fixed-input examples like this, I'd avoid opening the file at runtime at all, pushing the error to compile-time:
fn main() -> Result<(), Box<dyn std::error::Error>> {
let contents = include_str!("../input");
for i in contents.trim().lines().flat_map(|l| l.trim().parse::<i32>()) {
let i = i / 2;
println!("{}", i);
}
Ok(())
}
See also:
Is there a good way to include external resource data into Rust source code?
If I wanted to handle failures to parse but treat the iterator as if errors were impossible, I'd use Itertools::process_results:
use itertools; // 0.8.2
fn main() -> Result<(), Box<dyn std::error::Error>> {
let contents = include_str!("../input");
let numbers = contents.trim().lines().map(|l| l.trim().parse::<i32>());
let sum = itertools::process_results(numbers, |i| i.sum::<i32>());
println!("{:?}", sum);
Ok(())
}
See also:
How do I perform iterator computations over iterators of Results without collecting to a temporary vector?
How do I stop iteration and return an error when Iterator::map returns a Result::Err?

How to convert a String to a &[u8] in order to write it to a file? [duplicate]

With Rust being comparatively new, I've seen far too many ways of reading and writing files. Many are extremely messy snippets someone came up with for their blog, and 99% of the examples I've found (even on Stack Overflow) are from unstable builds that no longer work. Now that Rust is stable, what is a simple, readable, non-panicking snippet for reading or writing files?
This is the closest I've gotten to something that works in terms of reading a text file, but it's still not compiling even though I'm fairly certain I've included everything I should have. This is based off of a snippet I found on Google+ of all places, and the only thing I've changed is that the old BufferedReader is now just BufReader:
use std::fs::File;
use std::io::BufReader;
use std::path::Path;
fn main() {
let path = Path::new("./textfile");
let mut file = BufReader::new(File::open(&path));
for line in file.lines() {
println!("{}", line);
}
}
The compiler complains:
error: the trait bound `std::result::Result<std::fs::File, std::io::Error>: std::io::Read` is not satisfied [--explain E0277]
--> src/main.rs:7:20
|>
7 |> let mut file = BufReader::new(File::open(&path));
|> ^^^^^^^^^^^^^^
note: required by `std::io::BufReader::new`
error: no method named `lines` found for type `std::io::BufReader<std::result::Result<std::fs::File, std::io::Error>>` in the current scope
--> src/main.rs:8:22
|>
8 |> for line in file.lines() {
|> ^^^^^
To sum it up, what I'm looking for is:
brevity
readability
covers all possible errors
doesn't panic
None of the functions I show here panic on their own, but I am using expect because I don't know what kind of error handling will fit best into your application. Go read The Rust Programming Language's chapter on error handling to understand how to appropriately handle failure in your own program.
Rust 1.26 and onwards
If you don't want to care about the underlying details, there are one-line functions for reading and writing.
Read a file to a String
use std::fs;
fn main() {
let data = fs::read_to_string("/etc/hosts").expect("Unable to read file");
println!("{}", data);
}
Read a file as a Vec<u8>
use std::fs;
fn main() {
let data = fs::read("/etc/hosts").expect("Unable to read file");
println!("{}", data.len());
}
Write a file
use std::fs;
fn main() {
let data = "Some data!";
fs::write("/tmp/foo", data).expect("Unable to write file");
}
Rust 1.0 and onwards
These forms are slightly more verbose than the one-line functions that allocate a String or Vec for you, but are more powerful in that you can reuse allocated data or append to an existing object.
Reading data
Reading a file requires two core pieces: File and Read.
Read a file to a String
use std::fs::File;
use std::io::Read;
fn main() {
let mut data = String::new();
let mut f = File::open("/etc/hosts").expect("Unable to open file");
f.read_to_string(&mut data).expect("Unable to read string");
println!("{}", data);
}
Read a file as a Vec<u8>
use std::fs::File;
use std::io::Read;
fn main() {
let mut data = Vec::new();
let mut f = File::open("/etc/hosts").expect("Unable to open file");
f.read_to_end(&mut data).expect("Unable to read data");
println!("{}", data.len());
}
Write a file
Writing a file is similar, except we use the Write trait and we always write out bytes. You can convert a String / &str to bytes with as_bytes:
use std::fs::File;
use std::io::Write;
fn main() {
let data = "Some data!";
let mut f = File::create("/tmp/foo").expect("Unable to create file");
f.write_all(data.as_bytes()).expect("Unable to write data");
}
Buffered I/O
I felt a bit of a push from the community to use BufReader and BufWriter instead of reading straight from a file
A buffered reader (or writer) uses a buffer to reduce the number of I/O requests. For example, it's much more efficient to access the disk once to read 256 bytes instead of accessing the disk 256 times.
That being said, I don't believe a buffered reader/writer will be useful when reading the entire file. read_to_end seems to copy data in somewhat large chunks, so the transfer may already be naturally coalesced into fewer I/O requests.
Here's an example of using it for reading:
use std::fs::File;
use std::io::{BufReader, Read};
fn main() {
let mut data = String::new();
let f = File::open("/etc/hosts").expect("Unable to open file");
let mut br = BufReader::new(f);
br.read_to_string(&mut data).expect("Unable to read string");
println!("{}", data);
}
And for writing:
use std::fs::File;
use std::io::{BufWriter, Write};
fn main() {
let data = "Some data!";
let f = File::create("/tmp/foo").expect("Unable to create file");
let mut f = BufWriter::new(f);
f.write_all(data.as_bytes()).expect("Unable to write data");
}
A BufReader is more useful when you want to read line-by-line:
use std::fs::File;
use std::io::{BufRead, BufReader};
fn main() {
let f = File::open("/etc/hosts").expect("Unable to open file");
let f = BufReader::new(f);
for line in f.lines() {
let line = line.expect("Unable to read line");
println!("Line: {}", line);
}
}
For anybody who is writing to a file, the accepted answer is good but if you need to append to the file you have to use the OpenOptions struct instead:
use std::io::Write;
use std::fs::OpenOptions;
fn main() {
let data = "Some data!\n";
let mut f = OpenOptions::new()
.append(true)
.create(true) // Optionally create the file if it doesn't already exist
.open("/tmp/foo")
.expect("Unable to open file");
f.write_all(data.as_bytes()).expect("Unable to write data");
}
Buffered writing still works the same way:
use std::io::{BufWriter, Write};
use std::fs::OpenOptions;
fn main() {
let data = "Some data!\n";
let f = OpenOptions::new()
.append(true)
.open("/tmp/foo")
.expect("Unable to open file");
let mut f = BufWriter::new(f);
f.write_all(data.as_bytes()).expect("Unable to write data");
}
By using the Buffered I/O you can copy the file size is greater than the actual memory.
use std::fs::{File, OpenOptions};
use std::io::{BufReader, BufWriter, Write, BufRead};
fn main() {
let read = File::open(r#"E:\1.xls"#);
let write = OpenOptions::new().write(true).create(true).open(r#"E:\2.xls"#);
let mut reader = BufReader::new(read.unwrap());
let mut writer = BufWriter::new(write.unwrap());
let mut length = 1;
while length > 0 {
let buffer = reader.fill_buf().unwrap();
writer.write(buffer);
length = buffer.len();
reader.consume(length);
}
}

Does Read::read guarantee to append data and not overwrite any existing one?

I'm working on an SMTP library that reads lines over the network using a buffered reader.
I want a nice, safe way to read data from the network, without depending on Rust internals to make sure the code works as expected. Specifically, I'm wondering if the Read trait guarantees that data read with Read::read is appended to the buffer passed as an argument rather than overwriting the buffer entirely.
At the moment, I use a Range to make sure existing data is not overwritten without depending on Rust internals.
However, given that Rust used to have a nice way to do what I want, I'm wondering if the current code can be improved, possibly removing the unsafe blocks too.
No, it does not guarantee that:
use std::io::prelude::*;
use std::str;
fn main() {
let mut source1 = "hello, world!".as_bytes();
let mut source2 = "moo".as_bytes();
let mut dest = [0; 128];
source1.read(&mut dest).unwrap();
source2.read(&mut dest).unwrap();
let s = str::from_utf8(&dest[..16]).unwrap();
println!("{:?}", s)
}
This prints
"moolo, world!\u{0}\u{0}\u{0}"
Specifically, it cannot do what you want, based purely on the type signature:
fn read(&mut self, buf: &mut [u8]) -> Result<usize>;
All that the read method has access to is your mutable slice - there's nowhere to store information like "how far in the buffer you are". Furthermore, you aren't allowed to "extend" a mutable slice with more elements - you are only allowed to mutate the values within the slice.
For your particular case, you may want to look at BufRead::read_until. Here's a barely-tested example:
use std::io::{BufRead,BufReader};
use std::str;
fn main() {
let source1 = "header 1\r\nheader 2\r\n".as_bytes();
let mut reader = BufReader::new(source1);
let mut buf = vec![];
buf.reserve(128); // Maybe more efficient?
loop {
match reader.read_until(b'\n', &mut buf) {
Ok(0) => break,
Ok(_) => {},
Err(_) => panic!("Handle errors"),
}
if buf.len() < 2 { continue }
if buf[buf.len() - 2] == b'\r' {
{
let s = str::from_utf8(&buf).unwrap();
println!("Got a header {:?}", s);
}
buf.clear();
}
}
}

Resources