Rust - How to check EOF in File? - rust

I am trying to implement the last exercise of https://github.com/pingcap/talent-plan/blob/master/rust/building-blocks/bb-2.md.
In short, I need to serialize some enums to bson, write to a file and read them back from file.
I got stuck in checking whether EOF is reached. I googled for some time but failed to find an answer.
use std::path::Path;
use std::fs::{create_dir_all, OpenOptions};
use std::io::{Write, Seek, Cursor, SeekFrom};
use serde::{Serialize, Deserialize};
use std::str;
use bson;
use bson::{encode_document, decode_document};
fn main() {
let dir = Path::new("D:/download/rust");
create_dir_all(dir).expect("Cannot create dir");
let txt = dir.join("move.txt");
// Encode some moves
let mut file = OpenOptions::new()
.append(true)
.create(true)
.open(&txt)
.expect("Cannot open txt");
encode_and_write_move(&mut file, &Move::Up(1));
encode_and_write_move(&mut file, &Move::Down(2));
encode_and_write_move(&mut file, &Move::Left(3));
encode_and_write_move(&mut file, &Move::Right(4));
// Decode and print moves
let mut file = OpenOptions::new()
.read(true)
.open(&txt)
.expect("Cannot open txt");
loop {
let end_of_file = false; // How?
if end_of_file {
break;
}
let doc = decode_document(&mut file).expect("cannot decode doc");
println!("doc = {:?}", doc);
}
}
fn encode_and_write_move<W: Write>(writer: &mut W, mov: &Move) {
let serialized = bson::to_bson(mov).unwrap();
let doc = serialized.as_document().unwrap();
encode_document(writer, doc).expect("failed to encode doc");
}
#[derive(Debug, Serialize, Deserialize)]
enum Move {
Up(i32),
Down(i32),
Left(i32),
Right(i32),
}
Update:
It seems the only way to see if EOF is reached is to check the returned Err. Here is my attempt for the whole exercise.
use std::path::Path;
use std::fs::{create_dir_all, OpenOptions};
use std::io::Write;
use serde::{Serialize, Deserialize};
use std::{str, io};
use bson;
use bson::{encode_document, decode_document, DecoderError};
fn main() {
let dir = Path::new("D:/download/rust");
create_dir_all(dir).expect("Cannot create dir");
let txt = dir.join("move.txt");
// Encode some moves
let mut writer = OpenOptions::new()
.append(true)
.create(true)
.open(&txt)
.expect("Cannot open txt");
encode_and_write_move(&mut writer, &Move::Up(1));
encode_and_write_move(&mut writer, &Move::Down(2));
encode_and_write_move(&mut writer, &Move::Left(3));
encode_and_write_move(&mut writer, &Move::Right(4));
// Decode and print moves
let mut reader = OpenOptions::new()
.read(true)
.open(&txt)
.expect("Cannot open txt");
loop {
match decode_document(&mut reader) {
Result::Ok(doc) => println!("doc = {:?}", &doc),
Result::Err(DecoderError::IoError(e)) if e.kind() == io::ErrorKind::UnexpectedEof => break,
Result::Err(err) => panic!("Decoding failed with {}", err)
}
}
}
fn encode_and_write_move<W: Write>(writer: &mut W, mov: &Move) {
let serialized = bson::to_bson(mov).unwrap();
let doc = serialized.as_document().unwrap();
encode_document(writer, doc).expect("failed to encode doc");
}
#[derive(Debug, Serialize, Deserialize)]
enum Move {
Up(i32),
Down(i32),
Left(i32),
Right(i32),
}

bson::decode_document returns DecoderResult<Document> which is an alias for Result<T, DecoderError>. If you check possible values for enum DecoderError, you will see EndOfStream It seems decode_document returns a Result::Err<DecoderError::IoError<_>> in case of an EOF.
So instead of trying to detect EOF in your code, it would be more straightforward to consume the stream (by repeatedly calling decode_document) until an error is raised. Then the error can be handled, or in case of EndOfStream processing can continue normally.
You can try something like this:
loop {
match decode_documment(&mut file) {
Result::Ok(doc) => println!("doc = {:?}", &doc),
Result::Err(DecoderError::IoError(io_error)) => match io_error.kind() {
std::io::ErrorKind::UnexpectedEof => break,
_ => panic!("Decoding failed with I/O error {}", io_error)
}
Result::Err(err) => panic!("Decoding failed with {}", err)
}
}

Related

Why am I getting a "bad file descriptor" error when writing file in rust?

Ignoring the fact that I'm using unwrap to ignore what happens if the file doesn't exist, it seems to me like this short bit of code should work (as long as the file does exist):
use std::fs::File;
use std::io::Write;
fn main() {
let mut f = File::open("test.txt").unwrap();
let result = f.write_all(b"some data");
match result {
Ok(_) => println!("Data written successfully"),
Err(e) => panic!("Failed to write data: {}", {e}),
}
}
Instead, I'm getting this:
thread 'main' panicked at 'Failed to write data: Bad file descriptor (os error 9)', src/main.rs:10:19
To be clear, I know if I follow one of the many examples online, I can write to a file. The question isn't "how do I write to a file?". It's why THIS isn't working.
It isn't working because File::open() open's a file in read-only mode. Instead you have to use File::create() which opens a file in write-only mode. Alternatively you can also use OpenOptions, to further specify if you want to append() to a file instead.
use std::fs::File;
use std::io::Write;
fn main() {
let mut f = File::create("test.txt").unwrap();
let result = f.write_all(b"some data");
match result {
Ok(_) => println!("Data written successfully"),
Err(err) => panic!("Failed to write data: {}", err),
}
}
Using File::create() is the same as using OpenOptions in the following way:
use std::fs::OpenOptions;
use std::io::Write;
fn main() {
let mut f = OpenOptions::new()
.write(true)
.create(true)
.truncate(true)
.open("test.txt")
.unwrap();
let result = f.write_all(b"some data");
match result {
Ok(_) => println!("Data written successfully"),
Err(err) => panic!("Failed to write data: {}", err),
}
}

How to convert a Bytes Iterator into a Stream in Rust

I'm trying to figure out build a feature which requires reading the contents of a file into a futures::stream::BoxStream but I'm having a tough time figuring out what I need to do.
I have figured out how to read a file byte by byte via Bytes which implements an iterator.
use std::fs::File;
use std::io::prelude::*;
use std::io::{BufReader, Bytes};
// TODO: Convert this to a async Stream
fn async_read() -> Box<dyn Iterator<Item = Result<u8, std::io::Error>>> {
let f = File::open("/dev/random").expect("Could not open file");
let reader = BufReader::new(f);
let iter = reader.bytes().into_iter();
Box::new(iter)
}
fn main() {
ctrlc::set_handler(move || {
println!("received Ctrl+C!");
std::process::exit(0);
})
.expect("Error setting Ctrl-C handler");
for b in async_read().into_iter() {
println!("{:?}", b);
}
}
However, I've been struggling a bunch trying to figure out how I can turn this Box<dyn Iterator<Item = Result<u8, std::io::Error>>> into an Stream.
I would have thought something like this would work:
use futures::stream;
use std::fs::File;
use std::io::prelude::*;
use std::io::{BufReader, Bytes};
// TODO: Convert this to a async Stream
fn async_read() -> stream::BoxStream<'static, dyn Iterator<Item = Result<u8, std::io::Error>>> {
let f = File::open("/dev/random").expect("Could not open file");
let reader = BufReader::new(f);
let iter = reader.bytes().into_iter();
std::pin::Pin::new(Box::new(stream::iter(iter)))
}
fn main() {
ctrlc::set_handler(move || {
println!("received Ctrl+C!");
std::process::exit(0);
})
.expect("Error setting Ctrl-C handler");
while let Some(b) = async_read().poll() {
println!("{:?}", b);
}
}
But I keep getting a ton of compiler errors, I've tried other permutations but generally getting no where.
One of the compiler errors:
std::pin::Pin::new
``` --> src/main.rs:14:24
|
14 | std::pin::Pin::new(Box::new(stream::iter(iter)))
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected trait object `dyn std::iter::Iterator`, found enum `std::result::Result`
Anyone have any advice?
I'm pretty new to Rust, and specifically Streams/lower level stuff so I apologize if I got anything wrong, feel free to correct me.
For some additional background, I'm trying to do this so you can CTRL-C out of a command in nushell
I think you are overcomplicating it a bit, you can just return impl Stream from async_read, there is no need to box or pin (same goes for the original Iterator-based version). Then you need to set up an async runtime in order to poll the stream (in this example I just use the runtime provided by futures::executor::block_on). Then you can call futures::stream::StreamExt::next() on the stream to get a future representing the next item.
Here is one way to do this:
use futures::prelude::*;
use std::{
fs::File,
io::{prelude::*, BufReader},
};
fn async_read() -> impl Stream<Item = Result<u8, std::io::Error>> {
let f = File::open("/dev/random").expect("Could not open file");
let reader = BufReader::new(f);
stream::iter(reader.bytes())
}
async fn async_main() {
while let Some(b) = async_read().next().await {
println!("{:?}", b);
}
}
fn main() {
ctrlc::set_handler(move || {
println!("received Ctrl+C!");
std::process::exit(0);
})
.expect("Error setting Ctrl-C handler");
futures::executor::block_on(async_main());
}

Reading ZIP file in Rust causes data owned by the current function

I'm new to Rust and am likely have a huge knowledge gap. Basically, I'm hoping to be create a utility function that would except a regular text file or a ZIP file and return a BufRead where the caller can start processing line by line. It is working well for non ZIP files but I am not understanding how to achieve the same for the ZIP files. The ZIP files will only contain a single file within the archive which is why I'm only processing the first file in the ZipArchive.
I'm running into the the following error.
error[E0515]: cannot return value referencing local variable `archive_contents`
--> src/file_reader.rs:30:9
|
27 | let archive_file: zip::read::ZipFile = archive_contents.by_index(0).unwrap();
| ---------------- `archive_contents` is borrowed here
...
30 | Ok(Box::new(BufReader::with_capacity(128 * 1024, archive_file)))
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ returns a value referencing data owned by the current function
It seems the archive_contents is preventing the BufRead object from returning to the caller. I'm just not sure how to work around this.
file_reader.rs
use std::ffi::OsStr;
use std::fs::File;
use std::io::BufRead;
use std::io::BufReader;
use std::path::Path;
pub struct FileReader {
pub file_reader: Result<Box<BufRead>, &'static str>,
}
pub fn file_reader(filename: &str) -> Result<Box<BufRead>, &'static str> {
let path = Path::new(filename);
let file = match File::open(&path) {
Ok(file) => file,
Err(why) => panic!(
"ERROR: Could not open file, {}: {}",
path.display(),
why.to_string()
),
};
if path.extension() == Some(OsStr::new("zip")) {
// Processing ZIP file.
let mut archive_contents: zip::read::ZipArchive<std::fs::File> =
zip::ZipArchive::new(file).unwrap();
let archive_file: zip::read::ZipFile = archive_contents.by_index(0).unwrap();
// ERRORS: returns a value referencing data owned by the current function
Ok(Box::new(BufReader::with_capacity(128 * 1024, archive_file)))
} else {
// Processing non-ZIP file.
Ok(Box::new(BufReader::with_capacity(128 * 1024, file)))
}
}
main.rs
mod file_reader;
use std::io::BufRead;
fn main() {
let mut files: Vec<String> = Vec::new();
files.push("/tmp/text_file.txt".to_string());
files.push("/tmp/zip_file.zip".to_string());
for f in files {
let mut fr = match file_reader::file_reader(&f) {
Ok(fr) => fr,
Err(e) => panic!("Error reading file."),
};
fr.lines().for_each(|l| match l {
Ok(l) => {
println!("{}", l);
}
Err(e) => {
println!("ERROR: Failed to read line:\n {}", e);
}
});
}
}
Any help is greatly appreciated!
It seems the archive_contents is preventing the BufRead object from returning to the caller. I'm just not sure how to work around this.
You have to restructure the code somehow. The issue here is that, well, the archive data is part of the archive. So unlike file, archive_file is not an independent item, it is rather a pointer of sort into the archive itself. Which means the archive needs to live longer than archive_file for this code to be correct.
In a GC'd language this isn't an issue, archive_file has a reference to archive and will keep it alive however long it needs. Not so for Rust.
A simple way to fix this would be to just copy the data out of archive_file and into an owned buffer you can return to the parent. An other option might be to return a wrapper for (archive_contents, item_index), which would delegate the reading (might be somewhat tricky though). Yet another would be to not have file_reader.
Thanks to #Masklinn for the direction! Here's the working solution using their suggestion.
file_reader.rs
use std::ffi::OsStr;
use std::fs::File;
use std::io::BufRead;
use std::io::BufReader;
use std::io::Cursor;
use std::io::Error;
use std::io::Read;
use std::path::Path;
use zip::read::ZipArchive;
pub fn file_reader(filename: &str) -> Result<Box<dyn BufRead>, Error> {
let path = Path::new(filename);
let file = match File::open(&path) {
Ok(file) => file,
Err(why) => return Err(why),
};
if path.extension() == Some(OsStr::new("zip")) {
let mut archive_contents = ZipArchive::new(file)?;
let mut archive_file = archive_contents.by_index(0)?;
// Read the contents of the file into a vec.
let mut data = Vec::new();
archive_file.read_to_end(&mut data)?;
// Wrap vec in a std::io::Cursor.
let cursor = Cursor::new(data);
Ok(Box::new(cursor))
} else {
// Processing non-ZIP file.
Ok(Box::new(BufReader::with_capacity(128 * 1024, file)))
}
}
While the solution you have settled on does work, it has a few disadvantages. One is that when you read from a zip file, you have to read the contents of the file you want to process into memory before proceeding, which might be impractical for a large file. Another is that you have to heap allocate the BufReader in either case.
Another possibly more idiomatic solution is to restructure your code, such that the BufReader does not need to be returned from the function at all - rather, structure your code so that it has a function that opens the file, which in turn calls a function that processes the file:
use std::ffi::OsStr;
use std::fs::File;
use std::io::BufRead;
use std::io::BufReader;
use std::path::Path;
pub fn process_file(filename: &str) -> Result<usize, String> {
let path = Path::new(filename);
let file = match File::open(&path) {
Ok(file) => file,
Err(why) => return Err(format!(
"ERROR: Could not open file, {}: {}",
path.display(),
why.to_string()
)),
};
if path.extension() == Some(OsStr::new("zip")) {
// Handling a zip file
let mut archive_contents=zip::ZipArchive::new(file).unwrap();
let mut buf_reader = BufReader::with_capacity(128 * 1024,archive_contents.by_index(0).unwrap());
process_reader(&mut buf_reader)
} else {
// Handling a plain file.
process_reader(&mut BufReader::with_capacity(128 * 1024, file))
}
}
pub fn process_reader(reader: &mut dyn BufRead) -> Result<usize, String> {
// Example, just count the number of lines
return Ok(reader.lines().count());
}
fn main() {
let mut files: Vec<String> = Vec::new();
files.push("/tmp/text_file.txt".to_string());
files.push("/tmp/zip_file.zip".to_string());
for f in files {
match process_file(&f) {
Ok(count) => println!("File {} Count: {}", &f, count),
Err(e) => println!("Error reading file: {}", e),
};
}
}
This way, you don't need any Boxes and you don't need to read the file into memory before processing it.
A drawback to this solution would if you had multiple functions that need to be able to read from zip files. One way to handle that would be to define process_file to take a callback function to do the processing. First you would change the definition of process_file to be:
pub fn process_file<C>(filename: &str, process_reader: C) -> Result<usize, String>
where C: FnOnce(&mut dyn BufRead)->Result<usize, String>
The rest of the function body can be left unchanged. Now, process_reader can be passed into the function, like this:
process_file(&f, count_lines)
where count_lines would be the original simple function to count the lines, for instance.
This would also allow you to pass in a closure:
process_file(&f, |reader| Ok(reader.lines().count()))

Writing to file with Append(false) doesn't work as expected

I am learning to program with Rust and decided to build a CLI to manage my personal library. I'm still working on a quick proof of concept before going further so I have the barebones of what I need to work.
I am saving data to a file called "books.json" using std::fs and serde_json. The program works great the first time I run it, but on the second run, instead of overwriting the file, it is appending the data (for test purposes, it would add the same book twice).
Here's the code I have written so far. By using OpenOptions.append(false), shouldn't the file be overwritten when I write to it?
use serde::{Deserialize, Serialize};
use serde_json::Error;
use std::fs;
use std::fs::File;
use std::io::Read;
use std::io::Write;
#[derive(Serialize, Deserialize)]
struct Book {
title: String,
author: String,
isbn: String,
pub_year: usize,
}
fn main() -> Result<(), serde_json::Error> {
let mut file = fs::OpenOptions::new()
.read(true)
.write(true)
.append(false)
.create(true)
.open("books.json")
.expect("Unable to open");
let mut data = String::new();
file.read_to_string(&mut data);
let mut bookshelf: Vec<Book> = Vec::new();
if file.metadata().unwrap().len() != 0 {
bookshelf = serde_json::from_str(&data)?;
}
let book = Book {
title: "The Institute".to_string(),
author: "Stephen King".to_string(),
isbn: "9781982110567".to_string(),
pub_year: 2019,
};
bookshelf.push(book);
let j: String = serde_json::to_string(&bookshelf)?;
file.write_all(j.as_bytes()).expect("Unable to write data");
Ok(())
}
books.json after running the program twice:
[{"title":"The Institute","author":"Stephen King","isbn":"9781982110567","pub_year":2019}]
[{"title":"The Institute","author":"Stephen King","isbn":"9781982110567","pub_year":2019},
{"title":"The Institute","author":"Stephen King","isbn":"9781982110567","pub_year":2019}]%
Members of the Rust Discord community pointed out that by using OpenOptions, the file pointer was ending up at the end of the file when I wrote to it. They suggested I use fs::read and fs::write, and that worked. I then added some code to handle cases where the file did not already exist.
The main() function would then need to look something like this:
fn main() -> std::io::Result<()> {
let f = File::open("books.json");
let _ = match f {
Ok(file) => file,
Err(error) => match error.kind() {
ErrorKind::NotFound => match File::create("books.json") {
Ok(fc) => fc,
Err(e) => panic!("Problem creating the file: {:?}", e),
},
},
};
let data = fs::read_to_string("books.json").expect("Unable to read file");
let mut bookshelf: Vec<Book> = Vec::new();
if fs::metadata("books.json").unwrap().len() != 0 {
bookshelf = serde_json::from_str(&data)?;
}
let book = Book {
title: "The Institute".to_string(),
author: "Stephen King".to_string(),
isbn: "9781982110567".to_string(),
pub_year: 2019,
};
bookshelf.push(book);
let json: String = serde_json::to_string(&bookshelf)?;
fs::write("books.json", &json).expect("Unable to write file");
println!("{}", &json);
Ok(())
}

How can I read a file line-by-line, eliminate duplicates, then write back to the same file?

I want to read a file, eliminate all duplicates and write the rest back into the file - like a duplicate cleaner.
Vec because a normal array has a fixed size but my .txt is flexible (am I doing this right?).
Read, lines in Vec + delete duplices:
Missing write back to file.
use std::io;
fn main() {
let path = Path::new("test.txt");
let mut file = io::BufferedReader::new(io::File::open(&path, R));
let mut lines: Vec<String> = file.lines().map(|x| x.unwrap()).collect();
// dedup() deletes all duplicates if sort() before
lines.sort();
lines.dedup();
for e in lines.iter() {
print!("{}", e.as_slice());
}
}
Read + write to file (untested but should work I guess).
Missing lines to Vec because it doesn't work without BufferedReader as it seems (or I'm doing something else wrong, also a good chance).
use std::io;
fn main() {
let path = Path::new("test.txt");
let mut file = match io::File::open_mode(&path, io::Open, io::ReadWrite) {
Ok(f) => f,
Err(e) => panic!("file error: {}", e),
};
let mut lines: Vec<String> = file.lines().map(|x| x.unwrap()).collect();
lines.sort();
// dedup() deletes all duplicates if sort() before
lines.dedup();
for e in lines.iter() {
file.write("{}", e);
}
}
So .... how do I get those 2 together? :)
Ultimately, you are going to run into a problem: you are trying to write to the same file you are reading from. In this case, it's safe because you are going to read the entire file, so you don't need it after that. However, if you did try to write to the file, you'd see that opening a file for reading doesn't allow writing! Here's the code to do that:
use std::{
fs::File,
io::{BufRead, BufReader, Write},
};
fn main() {
let mut file = File::open("test.txt").expect("file error");
let reader = BufReader::new(&mut file);
let mut lines: Vec<_> = reader
.lines()
.map(|l| l.expect("Couldn't read a line"))
.collect();
lines.sort();
lines.dedup();
for line in lines {
file.write_all(line.as_bytes())
.expect("Couldn't write to file");
}
}
Here's the output:
% cat test.txt
a
a
b
a
% cargo run
thread 'main' panicked at 'Couldn't write to file: Os { code: 9, kind: Other, message: "Bad file descriptor" }', src/main.rs:12:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
You could open the file for both reading and writing:
use std::{
fs::OpenOptions,
io::{BufRead, BufReader, Write},
};
fn main() {
let mut file = OpenOptions::new()
.read(true)
.write(true)
.open("test.txt")
.expect("file error");
// Remaining code unchanged
}
But then you'd see that (a) the output is appended and (b) all the newlines are lost on the new lines because BufRead doesn't include them.
We could reset the file pointer back to the beginning, but then you'd probably leave trailing stuff at the end (deduplicating is likely to have less bytes written than read). It's easier to just reopen the file for writing, which will truncate the file. Also, let's use a set data structure to do the deduplication for us!
use std::{
collections::BTreeSet,
fs::File,
io::{BufRead, BufReader, Write},
};
fn main() {
let file = File::open("test.txt").expect("file error");
let reader = BufReader::new(file);
let lines: BTreeSet<_> = reader
.lines()
.map(|l| l.expect("Couldn't read a line"))
.collect();
let mut file = File::create("test.txt").expect("file error");
for line in lines {
file.write_all(line.as_bytes())
.expect("Couldn't write to file");
file.write_all(b"\n").expect("Couldn't write to file");
}
}
And the output:
% cat test.txt
a
a
b
a
a
b
a
b
% cargo run
% cat test.txt
a
b
The less-efficient but shorter solution is to read the entire file as one string and use str::lines:
use std::{
collections::BTreeSet,
fs::{self, File},
io::Write,
};
fn main() {
let contents = fs::read_to_string("test.txt").expect("can't read");
let lines: BTreeSet<_> = contents.lines().collect();
let mut file = File::open("test.txt").expect("can't create");
for line in lines {
writeln!(file, "{}", line).expect("can't write");
}
}
See also:
What's the de-facto way of reading and writing files in Rust 1.x?
What is the best variant for appending a new line in a text file?

Resources