Writing to file with Append(false) doesn't work as expected

Writing to file with Append(false) doesn't work as expected - rust

I am learning to program with Rust and decided to build a CLI to manage my personal library. I'm still working on a quick proof of concept before going further so I have the barebones of what I need to work.
I am saving data to a file called "books.json" using std::fs and serde_json. The program works great the first time I run it, but on the second run, instead of overwriting the file, it is appending the data (for test purposes, it would add the same book twice).
Here's the code I have written so far. By using OpenOptions.append(false), shouldn't the file be overwritten when I write to it?
use serde::{Deserialize, Serialize};
use serde_json::Error;
use std::fs;
use std::fs::File;
use std::io::Read;
use std::io::Write;
#[derive(Serialize, Deserialize)]
struct Book {
title: String,
author: String,
isbn: String,
pub_year: usize,
}
fn main() -> Result<(), serde_json::Error> {
let mut file = fs::OpenOptions::new()
.read(true)
.write(true)
.append(false)
.create(true)
.open("books.json")
.expect("Unable to open");
let mut data = String::new();
file.read_to_string(&mut data);
let mut bookshelf: Vec<Book> = Vec::new();
if file.metadata().unwrap().len() != 0 {
bookshelf = serde_json::from_str(&data)?;
}
let book = Book {
title: "The Institute".to_string(),
author: "Stephen King".to_string(),
isbn: "9781982110567".to_string(),
pub_year: 2019,
};
bookshelf.push(book);
let j: String = serde_json::to_string(&bookshelf)?;
file.write_all(j.as_bytes()).expect("Unable to write data");
Ok(())
}
books.json after running the program twice:
[{"title":"The Institute","author":"Stephen King","isbn":"9781982110567","pub_year":2019}]
[{"title":"The Institute","author":"Stephen King","isbn":"9781982110567","pub_year":2019},
{"title":"The Institute","author":"Stephen King","isbn":"9781982110567","pub_year":2019}]%

Members of the Rust Discord community pointed out that by using OpenOptions, the file pointer was ending up at the end of the file when I wrote to it. They suggested I use fs::read and fs::write, and that worked. I then added some code to handle cases where the file did not already exist.
The main() function would then need to look something like this:
fn main() -> std::io::Result<()> {
let f = File::open("books.json");
let _ = match f {
Ok(file) => file,
Err(error) => match error.kind() {
ErrorKind::NotFound => match File::create("books.json") {
Ok(fc) => fc,
Err(e) => panic!("Problem creating the file: {:?}", e),
},
},
};
let data = fs::read_to_string("books.json").expect("Unable to read file");
let mut bookshelf: Vec<Book> = Vec::new();
if fs::metadata("books.json").unwrap().len() != 0 {
bookshelf = serde_json::from_str(&data)?;
}
let book = Book {
title: "The Institute".to_string(),
author: "Stephen King".to_string(),
isbn: "9781982110567".to_string(),
pub_year: 2019,
};
bookshelf.push(book);
let json: String = serde_json::to_string(&bookshelf)?;
fs::write("books.json", &json).expect("Unable to write file");
println!("{}", &json);
Ok(())
}

Related

how to generalize from `File` to `Read`?

I have some working code that reads a file, but I need to generalize it to pull data from additional sources other than simple disk files.
Is Read the correct generalization I should work with in order to replace File?
If so, how can I fix example2 in the following sample code? As is, it fails with the compile error dyn async_std::io::Read cannot be unpinned at the commented line. If not, what type should I return instead from get_read and are there any corresponding changes required in example2?
//! [dependencies]
//! tokio = { version = "1.0.1", features = ["full"] }
//! async-std = "1.8.0"
//! anyhow = "1.0.32"
use async_std::io::prelude::*;
use async_std::fs::File;
use anyhow::Result;
#[tokio::main]
async fn main() -> Result<()> {
example1().await?;
example2().await?;
Ok(())
}
// Example of consuming `File` ... works great!
async fn example1() -> Result<()> {
let mut file = get_file().await?;
let mut contents = String::new();
let _ = file.read_to_string(&mut contents).await?;
println!("read {} characters", contents.len());
Ok(())
}
// Example of consuming `Read` ... does not compile?
async fn example2() -> Result<()> {
let mut read = get_read().await?;
let mut contents = String::new();
// ERROR: `dyn async_std::io::Read` cannot be unpinned
let _ = read.read_to_string(&mut contents).await?;
println!("read {} characters", contents.len());
Ok(())
}
async fn get_read() -> Result<Box<dyn Read>> {
let file = get_file().await?;
Ok(Box::new(file))
}
async fn get_file() -> Result<File> {
let file = File::open("/etc/hosts").await?;
Ok(file)
}

You need to pin:
async fn get_read() -> Result<Pin<Box<dyn Read>>> {
let file = get_file().await?;
Ok(Box::pin(file))
}
Box<File> (without Pin) works because File implements Unpin. Box<dyn Read + Unpin> would work too.

How to convert a Bytes Iterator into a Stream in Rust

I'm trying to figure out build a feature which requires reading the contents of a file into a futures::stream::BoxStream but I'm having a tough time figuring out what I need to do.
I have figured out how to read a file byte by byte via Bytes which implements an iterator.
use std::fs::File;
use std::io::prelude::*;
use std::io::{BufReader, Bytes};
// TODO: Convert this to a async Stream
fn async_read() -> Box<dyn Iterator<Item = Result<u8, std::io::Error>>> {
let f = File::open("/dev/random").expect("Could not open file");
let reader = BufReader::new(f);
let iter = reader.bytes().into_iter();
Box::new(iter)
}
fn main() {
ctrlc::set_handler(move || {
println!("received Ctrl+C!");
std::process::exit(0);
})
.expect("Error setting Ctrl-C handler");
for b in async_read().into_iter() {
println!("{:?}", b);
}
}
However, I've been struggling a bunch trying to figure out how I can turn this Box<dyn Iterator<Item = Result<u8, std::io::Error>>> into an Stream.
I would have thought something like this would work:
use futures::stream;
use std::fs::File;
use std::io::prelude::*;
use std::io::{BufReader, Bytes};
// TODO: Convert this to a async Stream
fn async_read() -> stream::BoxStream<'static, dyn Iterator<Item = Result<u8, std::io::Error>>> {
let f = File::open("/dev/random").expect("Could not open file");
let reader = BufReader::new(f);
let iter = reader.bytes().into_iter();
std::pin::Pin::new(Box::new(stream::iter(iter)))
}
fn main() {
ctrlc::set_handler(move || {
println!("received Ctrl+C!");
std::process::exit(0);
})
.expect("Error setting Ctrl-C handler");
while let Some(b) = async_read().poll() {
println!("{:?}", b);
}
}
But I keep getting a ton of compiler errors, I've tried other permutations but generally getting no where.
One of the compiler errors:
std::pin::Pin::new
``` --> src/main.rs:14:24
|
14 | std::pin::Pin::new(Box::new(stream::iter(iter)))
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected trait object `dyn std::iter::Iterator`, found enum `std::result::Result`
Anyone have any advice?
I'm pretty new to Rust, and specifically Streams/lower level stuff so I apologize if I got anything wrong, feel free to correct me.
For some additional background, I'm trying to do this so you can CTRL-C out of a command in nushell

I think you are overcomplicating it a bit, you can just return impl Stream from async_read, there is no need to box or pin (same goes for the original Iterator-based version). Then you need to set up an async runtime in order to poll the stream (in this example I just use the runtime provided by futures::executor::block_on). Then you can call futures::stream::StreamExt::next() on the stream to get a future representing the next item.
Here is one way to do this:
use futures::prelude::*;
use std::{
fs::File,
io::{prelude::*, BufReader},
};
fn async_read() -> impl Stream<Item = Result<u8, std::io::Error>> {
let f = File::open("/dev/random").expect("Could not open file");
let reader = BufReader::new(f);
stream::iter(reader.bytes())
}
async fn async_main() {
while let Some(b) = async_read().next().await {
println!("{:?}", b);
}
}
fn main() {
ctrlc::set_handler(move || {
println!("received Ctrl+C!");
std::process::exit(0);
})
.expect("Error setting Ctrl-C handler");
futures::executor::block_on(async_main());
}

Reading ZIP file in Rust causes data owned by the current function

I'm new to Rust and am likely have a huge knowledge gap. Basically, I'm hoping to be create a utility function that would except a regular text file or a ZIP file and return a BufRead where the caller can start processing line by line. It is working well for non ZIP files but I am not understanding how to achieve the same for the ZIP files. The ZIP files will only contain a single file within the archive which is why I'm only processing the first file in the ZipArchive.
I'm running into the the following error.
error[E0515]: cannot return value referencing local variable `archive_contents`
--> src/file_reader.rs:30:9
|
27 | let archive_file: zip::read::ZipFile = archive_contents.by_index(0).unwrap();
| ---------------- `archive_contents` is borrowed here
...
30 | Ok(Box::new(BufReader::with_capacity(128 * 1024, archive_file)))
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ returns a value referencing data owned by the current function
It seems the archive_contents is preventing the BufRead object from returning to the caller. I'm just not sure how to work around this.
file_reader.rs
use std::ffi::OsStr;
use std::fs::File;
use std::io::BufRead;
use std::io::BufReader;
use std::path::Path;
pub struct FileReader {
pub file_reader: Result<Box<BufRead>, &'static str>,
}
pub fn file_reader(filename: &str) -> Result<Box<BufRead>, &'static str> {
let path = Path::new(filename);
let file = match File::open(&path) {
Ok(file) => file,
Err(why) => panic!(
"ERROR: Could not open file, {}: {}",
path.display(),
why.to_string()
),
};
if path.extension() == Some(OsStr::new("zip")) {
// Processing ZIP file.
let mut archive_contents: zip::read::ZipArchive<std::fs::File> =
zip::ZipArchive::new(file).unwrap();
let archive_file: zip::read::ZipFile = archive_contents.by_index(0).unwrap();
// ERRORS: returns a value referencing data owned by the current function
Ok(Box::new(BufReader::with_capacity(128 * 1024, archive_file)))
} else {
// Processing non-ZIP file.
Ok(Box::new(BufReader::with_capacity(128 * 1024, file)))
}
}
main.rs
mod file_reader;
use std::io::BufRead;
fn main() {
let mut files: Vec<String> = Vec::new();
files.push("/tmp/text_file.txt".to_string());
files.push("/tmp/zip_file.zip".to_string());
for f in files {
let mut fr = match file_reader::file_reader(&f) {
Ok(fr) => fr,
Err(e) => panic!("Error reading file."),
};
fr.lines().for_each(|l| match l {
Ok(l) => {
println!("{}", l);
}
Err(e) => {
println!("ERROR: Failed to read line:\n {}", e);
}
});
}
}
Any help is greatly appreciated!

It seems the archive_contents is preventing the BufRead object from returning to the caller. I'm just not sure how to work around this.
You have to restructure the code somehow. The issue here is that, well, the archive data is part of the archive. So unlike file, archive_file is not an independent item, it is rather a pointer of sort into the archive itself. Which means the archive needs to live longer than archive_file for this code to be correct.
In a GC'd language this isn't an issue, archive_file has a reference to archive and will keep it alive however long it needs. Not so for Rust.
A simple way to fix this would be to just copy the data out of archive_file and into an owned buffer you can return to the parent. An other option might be to return a wrapper for (archive_contents, item_index), which would delegate the reading (might be somewhat tricky though). Yet another would be to not have file_reader.

Thanks to #Masklinn for the direction! Here's the working solution using their suggestion.
file_reader.rs
use std::ffi::OsStr;
use std::fs::File;
use std::io::BufRead;
use std::io::BufReader;
use std::io::Cursor;
use std::io::Error;
use std::io::Read;
use std::path::Path;
use zip::read::ZipArchive;
pub fn file_reader(filename: &str) -> Result<Box<dyn BufRead>, Error> {
let path = Path::new(filename);
let file = match File::open(&path) {
Ok(file) => file,
Err(why) => return Err(why),
};
if path.extension() == Some(OsStr::new("zip")) {
let mut archive_contents = ZipArchive::new(file)?;
let mut archive_file = archive_contents.by_index(0)?;
// Read the contents of the file into a vec.
let mut data = Vec::new();
archive_file.read_to_end(&mut data)?;
// Wrap vec in a std::io::Cursor.
let cursor = Cursor::new(data);
Ok(Box::new(cursor))
} else {
// Processing non-ZIP file.
Ok(Box::new(BufReader::with_capacity(128 * 1024, file)))
}
}

While the solution you have settled on does work, it has a few disadvantages. One is that when you read from a zip file, you have to read the contents of the file you want to process into memory before proceeding, which might be impractical for a large file. Another is that you have to heap allocate the BufReader in either case.
Another possibly more idiomatic solution is to restructure your code, such that the BufReader does not need to be returned from the function at all - rather, structure your code so that it has a function that opens the file, which in turn calls a function that processes the file:
use std::ffi::OsStr;
use std::fs::File;
use std::io::BufRead;
use std::io::BufReader;
use std::path::Path;
pub fn process_file(filename: &str) -> Result<usize, String> {
let path = Path::new(filename);
let file = match File::open(&path) {
Ok(file) => file,
Err(why) => return Err(format!(
"ERROR: Could not open file, {}: {}",
path.display(),
why.to_string()
)),
};
if path.extension() == Some(OsStr::new("zip")) {
// Handling a zip file
let mut archive_contents=zip::ZipArchive::new(file).unwrap();
let mut buf_reader = BufReader::with_capacity(128 * 1024,archive_contents.by_index(0).unwrap());
process_reader(&mut buf_reader)
} else {
// Handling a plain file.
process_reader(&mut BufReader::with_capacity(128 * 1024, file))
}
}
pub fn process_reader(reader: &mut dyn BufRead) -> Result<usize, String> {
// Example, just count the number of lines
return Ok(reader.lines().count());
}
fn main() {
let mut files: Vec<String> = Vec::new();
files.push("/tmp/text_file.txt".to_string());
files.push("/tmp/zip_file.zip".to_string());
for f in files {
match process_file(&f) {
Ok(count) => println!("File {} Count: {}", &f, count),
Err(e) => println!("Error reading file: {}", e),
};
}
}
This way, you don't need any Boxes and you don't need to read the file into memory before processing it.
A drawback to this solution would if you had multiple functions that need to be able to read from zip files. One way to handle that would be to define process_file to take a callback function to do the processing. First you would change the definition of process_file to be:
pub fn process_file<C>(filename: &str, process_reader: C) -> Result<usize, String>
where C: FnOnce(&mut dyn BufRead)->Result<usize, String>
The rest of the function body can be left unchanged. Now, process_reader can be passed into the function, like this:
process_file(&f, count_lines)
where count_lines would be the original simple function to count the lines, for instance.
This would also allow you to pass in a closure:
process_file(&f, |reader| Ok(reader.lines().count()))

Rust - How to check EOF in File?

I am trying to implement the last exercise of https://github.com/pingcap/talent-plan/blob/master/rust/building-blocks/bb-2.md.
In short, I need to serialize some enums to bson, write to a file and read them back from file.
I got stuck in checking whether EOF is reached. I googled for some time but failed to find an answer.
use std::path::Path;
use std::fs::{create_dir_all, OpenOptions};
use std::io::{Write, Seek, Cursor, SeekFrom};
use serde::{Serialize, Deserialize};
use std::str;
use bson;
use bson::{encode_document, decode_document};
fn main() {
let dir = Path::new("D:/download/rust");
create_dir_all(dir).expect("Cannot create dir");
let txt = dir.join("move.txt");
// Encode some moves
let mut file = OpenOptions::new()
.append(true)
.create(true)
.open(&txt)
.expect("Cannot open txt");
encode_and_write_move(&mut file, &Move::Up(1));
encode_and_write_move(&mut file, &Move::Down(2));
encode_and_write_move(&mut file, &Move::Left(3));
encode_and_write_move(&mut file, &Move::Right(4));
// Decode and print moves
let mut file = OpenOptions::new()
.read(true)
.open(&txt)
.expect("Cannot open txt");
loop {
let end_of_file = false; // How?
if end_of_file {
break;
}
let doc = decode_document(&mut file).expect("cannot decode doc");
println!("doc = {:?}", doc);
}
}
fn encode_and_write_move<W: Write>(writer: &mut W, mov: &Move) {
let serialized = bson::to_bson(mov).unwrap();
let doc = serialized.as_document().unwrap();
encode_document(writer, doc).expect("failed to encode doc");
}
#[derive(Debug, Serialize, Deserialize)]
enum Move {
Up(i32),
Down(i32),
Left(i32),
Right(i32),
}
Update:
It seems the only way to see if EOF is reached is to check the returned Err. Here is my attempt for the whole exercise.
use std::path::Path;
use std::fs::{create_dir_all, OpenOptions};
use std::io::Write;
use serde::{Serialize, Deserialize};
use std::{str, io};
use bson;
use bson::{encode_document, decode_document, DecoderError};
fn main() {
let dir = Path::new("D:/download/rust");
create_dir_all(dir).expect("Cannot create dir");
let txt = dir.join("move.txt");
// Encode some moves
let mut writer = OpenOptions::new()
.append(true)
.create(true)
.open(&txt)
.expect("Cannot open txt");
encode_and_write_move(&mut writer, &Move::Up(1));
encode_and_write_move(&mut writer, &Move::Down(2));
encode_and_write_move(&mut writer, &Move::Left(3));
encode_and_write_move(&mut writer, &Move::Right(4));
// Decode and print moves
let mut reader = OpenOptions::new()
.read(true)
.open(&txt)
.expect("Cannot open txt");
loop {
match decode_document(&mut reader) {
Result::Ok(doc) => println!("doc = {:?}", &doc),
Result::Err(DecoderError::IoError(e)) if e.kind() == io::ErrorKind::UnexpectedEof => break,
Result::Err(err) => panic!("Decoding failed with {}", err)
}
}
}
fn encode_and_write_move<W: Write>(writer: &mut W, mov: &Move) {
let serialized = bson::to_bson(mov).unwrap();
let doc = serialized.as_document().unwrap();
encode_document(writer, doc).expect("failed to encode doc");
}
#[derive(Debug, Serialize, Deserialize)]
enum Move {
Up(i32),
Down(i32),
Left(i32),
Right(i32),
}

bson::decode_document returns DecoderResult<Document> which is an alias for Result<T, DecoderError>. If you check possible values for enum DecoderError, you will see EndOfStream It seems decode_document returns a Result::Err<DecoderError::IoError<_>> in case of an EOF.
So instead of trying to detect EOF in your code, it would be more straightforward to consume the stream (by repeatedly calling decode_document) until an error is raised. Then the error can be handled, or in case of EndOfStream processing can continue normally.
You can try something like this:
loop {
match decode_documment(&mut file) {
Result::Ok(doc) => println!("doc = {:?}", &doc),
Result::Err(DecoderError::IoError(io_error)) => match io_error.kind() {
std::io::ErrorKind::UnexpectedEof => break,
_ => panic!("Decoding failed with I/O error {}", io_error)
}
Result::Err(err) => panic!("Decoding failed with {}", err)
}
}

How to add special NotReady logic to tokio-io?

I'm trying to make a Stream that would wait until a specific character is in buffer. I know there's read_until() on BufRead but I actually need a custom solution, as this is a stepping stone to implement waiting until a specific string in in buffer (or, for example, a regexp match happens).
In my project where I first encountered the problem, problem was that future processing just hanged when I get a Ready(_) from inner future and return NotReady from my function. I discovered I shouldn't do that per docs (last paragraph). However, what I didn't get, is what's the actual alternative that is promised in that paragraph. I read all the published documentation on the Tokio site and it doesn't make sense for me at the moment.
So following is my current code. Unfortunately I couldn't make it simpler and smaller as it's already broken. Current result is this:
Err(Custom { kind: Other, error: Error(Shutdown) })
Err(Custom { kind: Other, error: Error(Shutdown) })
Err(Custom { kind: Other, error: Error(Shutdown) })
<ad infinum>
Expected result is getting some Ok(Ready(_)) out of it, while printing W and W', and waiting for specific character in buffer.
extern crate futures;
extern crate tokio_core;
extern crate tokio_io;
extern crate tokio_io_timeout;
extern crate tokio_process;
use futures::stream::poll_fn;
use futures::{Async, Poll, Stream};
use tokio_core::reactor::Core;
use tokio_io::AsyncRead;
use tokio_io_timeout::TimeoutReader;
use tokio_process::CommandExt;
use std::process::{Command, Stdio};
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Duration;
struct Process {
child: tokio_process::Child,
stdout: Arc<Mutex<tokio_io_timeout::TimeoutReader<tokio_process::ChildStdout>>>,
}
impl Process {
fn new(
command: &str,
reader_timeout: Option<Duration>,
core: &tokio_core::reactor::Core,
) -> Self {
let mut cmd = Command::new(command);
let cat = cmd.stdout(Stdio::piped());
let mut child = cat.spawn_async(&core.handle()).unwrap();
let stdout = child.stdout().take().unwrap();
let mut timeout_reader = TimeoutReader::new(stdout);
timeout_reader.set_timeout(reader_timeout);
let timeout_reader = Arc::new(Mutex::new(timeout_reader));
Self {
child,
stdout: timeout_reader,
}
}
}
fn work() -> Result<(), ()> {
let window = Arc::new(Mutex::new(Vec::new()));
let mut core = Core::new().unwrap();
let process = Process::new("cat", Some(Duration::from_secs(20)), &core);
let mark = Arc::new(Mutex::new(b'c'));
let read_until_stream = poll_fn({
let window = window.clone();
let timeout_reader = process.stdout.clone();
move || -> Poll<Option<u8>, std::io::Error> {
let mut buf = [0; 8];
let poll;
{
let mut timeout_reader = timeout_reader.lock().unwrap();
poll = timeout_reader.poll_read(&mut buf);
}
match poll {
Ok(Async::Ready(0)) => Ok(Async::Ready(None)),
Ok(Async::Ready(x)) => {
{
let mut window = window.lock().unwrap();
println!("W: {:?}", *window);
println!("buf: {:?}", &buf[0..x]);
window.extend(buf[0..x].into_iter().map(|x| *x));
println!("W': {:?}", *window);
if let Some(_) = window.iter().find(|c| **c == *mark.lock().unwrap()) {
Ok(Async::Ready(Some(1)))
} else {
Ok(Async::NotReady)
}
}
}
Ok(Async::NotReady) => Ok(Async::NotReady),
Err(e) => Err(e),
}
}
});
let _stream_thread = thread::spawn(move || {
for o in read_until_stream.wait() {
println!("{:?}", o);
}
});
match core.run(process.child) {
Ok(_) => {}
Err(e) => {
println!("Child error: {:?}", e);
}
}
Ok(())
}
fn main() {
work().unwrap();
}
This is complete example project.

If you need more data you need to call poll_read again until you either find what you were looking for or poll_read returns NotReady.
You might want to avoid looping in one task for too long, so you can build yourself a yield_task function to call instead if poll_read didn't return NotReady; it makes sure your task gets called again ASAP after other pending tasks were run.
To use it just run return yield_task();.
fn yield_inner() {
use futures::task;
task::current().notify();
}
#[inline(always)]
pub fn yield_task<T, E>() -> Poll<T, E> {
yield_inner();
Ok(Async::NotReady)
}
Also see futures-rs#354: Handle long-running, always-ready futures fairly #354.
With the new async/await API futures::task::current is gone; instead you'll need a std::task::Context reference, which is provided as parameter to the new std::future::Future::poll trait method.
If you're already manually implementing the std::future::Future trait you can simply insert:
context.waker().wake_by_ref();
return std::task::Poll::Pending;
Or build yourself a Future-implementing type that yields exactly once:
pub struct Yield {
ready: bool,
}
impl core::future::Future for Yield {
type Output = ();
fn poll(self: core::pin::Pin<&mut Self>, cx: &mut core::task::Context<'_>) -> core::task::Poll<Self::Output> {
let this = self.get_mut();
if this.ready {
core::task::Poll::Ready(())
} else {
cx.waker().wake_by_ref();
this.ready = true; // ready next round
core::task::Poll::Pending
}
}
}
pub fn yield_task() -> Yield {
Yield { ready: false }
}
And then use it in async code like this:
yield_task().await;

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Writing to file with Append(false) doesn't work as expected - rust

Related

how to generalize from `File` to `Read`?

How to convert a Bytes Iterator into a Stream in Rust

Reading ZIP file in Rust causes data owned by the current function

Rust - How to check EOF in File?

How to add special NotReady logic to tokio-io?

Categories

Resources