How do I read and process N lines of a file at a time in Rust?

How do I read and process N lines of a file at a time in Rust? - io

I would like to read N lines of a file at a time, possibly using itertools::Itertools::chunks.
When I do:
for line in stdin.lock().lines() {
... // this is processing one line at a time
}
... although I'm buffering input, I am not processing the buffer.

You could use chunks() from itertools:
use itertools::Itertools; // 0.8.0
use std::io::BufRead;
fn main() {
let stdin = std::io::stdin();
let n = 3;
for lines in &stdin.lock().lines().chunks(n) {
for (i, line) in lines.enumerate() {
println!("Line {}: {:?}", i, line);
}
}
}

Related

Streaming version for Rust (nom 7) multiline parser

I'm trying to learn NOM, but I'm also new in Rust.
I have a text file with words on each line.
I want to split it to 2 files: valid ASCII (without control codes) and everything else.
extern crate nom;
use std::fs::File;
use std::io::{prelude::*, BufReader};
fn main() -> std::io::Result<()> {
let file = File::open("/words.txt")?;
let reader = BufReader::new(file);
for line in reader.lines().filter_map(|result| result.ok()) {
parse(line);
}
Ok(())
}
fn parse(line: String) {
for c in line.chars() {
if c.is_ascii_control() | !c.is_ascii() {
println!("C> {}", line);
return;
}
}
if line.len() > 0 {
println!("A{}> {}", line.len(), line);
}
}
But input file is too large for in-memory processing and I should use Streaming functionality, like this.
How to modify this code to combine streaming buffer with limited capacity (1000 chars) and line_ending check?

how to pass every line from a text file as an argument in rust

i have made this code to check for alive urls in a text file it was first to check for a single url the script worked but then i wanted to make it multithreaded i got this error
error
here is the original code :
use hyper_tls::HttpsConnector;
use hyper::Client;
use tokio::io::BufReader;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
let https = HttpsConnector::new();
let url = std::env::args().nth(1).expect("no list given");
let client = Client::builder().build::<_, hyper::Body>(https);
let reader = BufReader::new(url);
let lines = reader.lines();
for l in lines {
let sep = l.parse()?;
// Await the response...
let resp = client.get(sep).await?;
if resp.status() == 200 {
println!("{}", l);}
if resp.status() == 301 {
println!("{}", l); }
}
Ok(())
}

the issue seems to be that you are passing in the file's name as opposed to its content to the BufReader.
In order to read the contents instead, you can use a tokio::fs:File.
Here's an example of reading a file and printing its lines to stdout using tokio and a BufReader:
use tokio::{
fs::File,
io::{
// This trait needs to be imported, as the lines function being
// used on reader is defined there
AsyncBufReadExt,
BufReader
}
};
#[tokio::main]
async fn main() {
// get file command line argument
let file_argument = std::env::args().nth(1).expect("Please provide a file as command line argument.");
// open file
let file = File::open(file_argument).await.expect("Failed to open file");
// create reader using file
let reader = BufReader::new(file);
// get iterator over lines
let mut lines = reader.lines();
// this has to be used instead of a for loop, since lines isn't a
// normal iterator, but a Lines struct, the next element of which
// can be obtained using the next_line function.
while let Some(line) = lines.next_line().await.expect("Failed to read file") {
// print current line
println!("{}", line);
}
}

How to capture the output of a process piped into a Rust program?

I know how to read the command line arguments, but I am having difficulties reading the command output from a pipe.
Connect a program (A) that outputs data to my Rust program using a pipe:
A | R
The program should consume the data line by line as they come.
$ pwd | cargo run should print the pwd output.
OR
$ find . | cargo run should output the find command output which is more than 1 line.

Use BufRead::lines on a locked handle to standard input:
use std::io::{self, BufRead};
fn main() {
let stdin = io::stdin();
for line in stdin.lock().lines() {
let line = line.expect("Could not read line from standard in");
println!("{}", line);
}
}
If you wanted to reuse the allocation of the String, you could use the loop form:
use std::io::{self, Read};
fn main() {
let stdin = io::stdin();
let mut stdin = stdin.lock(); // locking is optional
let mut line = String::new();
// Could also `match` on the `Result` if you wanted to handle `Err`
while let Ok(n_bytes) = stdin.read_to_string(&mut line) {
if n_bytes == 0 { break }
println!("{}", line);
line.clear();
}
}

You just need to read from Stdin.
This is based on an example taken from the documentation:
use std::io;
fn main() {
loop {
let mut input = String::new();
match io::stdin().read_line(&mut input) {
Ok(len) => if len == 0 {
return;
} else {
println!("{}", input);
}
Err(error) => {
eprintln!("error: {}", error);
return;
}
}
}
}
It's mostly the docs example wrapped in a loop, breaking out of the loop when there is no more input, or if there is an error.
The other changes is that it's better in your context to write errors to stderr, which is why the error branch uses eprintln!, instead of println!. This macro probably wasn't available when that documentation was written.

use std::io;
fn main() {
loop {
let mut input = String::new();
io::stdin()
.read_line(&mut input)
.expect("failed to read from pipe");
input = input.trim().to_string();
if input == "" {
break;
}
println!("Pipe output: {}", input);
}
}
OUTPUT:
[18:50:29 Abhinickz#wsl -> pipe$ pwd
/mnt/d/Abhinickz/dev_work/learn_rust/pipe
[18:50:46 Abhinickz#wsl -> pipe$ pwd | cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
Running `target/debug/pipe`
Pipe output: /mnt/d/Abhinickz/dev_work/learn_rust/pipe

You can do it in a pretty snazzy and concise way with rust's iterator methods
use std::io::{self, BufRead};
fn main() {
// get piped input
// eg `cat file | ./program`
// ( `cat file | cargo run` also works )
let input = io::stdin().lock().lines().fold("".to_string(), |acc, line| {
acc + &line.unwrap() + "\n"
});
dbg!(input);
}

How can I read a file line-by-line, eliminate duplicates, then write back to the same file?

I want to read a file, eliminate all duplicates and write the rest back into the file - like a duplicate cleaner.
Vec because a normal array has a fixed size but my .txt is flexible (am I doing this right?).
Read, lines in Vec + delete duplices:
Missing write back to file.
use std::io;
fn main() {
let path = Path::new("test.txt");
let mut file = io::BufferedReader::new(io::File::open(&path, R));
let mut lines: Vec<String> = file.lines().map(|x| x.unwrap()).collect();
// dedup() deletes all duplicates if sort() before
lines.sort();
lines.dedup();
for e in lines.iter() {
print!("{}", e.as_slice());
}
}
Read + write to file (untested but should work I guess).
Missing lines to Vec because it doesn't work without BufferedReader as it seems (or I'm doing something else wrong, also a good chance).
use std::io;
fn main() {
let path = Path::new("test.txt");
let mut file = match io::File::open_mode(&path, io::Open, io::ReadWrite) {
Ok(f) => f,
Err(e) => panic!("file error: {}", e),
};
let mut lines: Vec<String> = file.lines().map(|x| x.unwrap()).collect();
lines.sort();
// dedup() deletes all duplicates if sort() before
lines.dedup();
for e in lines.iter() {
file.write("{}", e);
}
}
So .... how do I get those 2 together? :)

Ultimately, you are going to run into a problem: you are trying to write to the same file you are reading from. In this case, it's safe because you are going to read the entire file, so you don't need it after that. However, if you did try to write to the file, you'd see that opening a file for reading doesn't allow writing! Here's the code to do that:
use std::{
fs::File,
io::{BufRead, BufReader, Write},
};
fn main() {
let mut file = File::open("test.txt").expect("file error");
let reader = BufReader::new(&mut file);
let mut lines: Vec<_> = reader
.lines()
.map(|l| l.expect("Couldn't read a line"))
.collect();
lines.sort();
lines.dedup();
for line in lines {
file.write_all(line.as_bytes())
.expect("Couldn't write to file");
}
}
Here's the output:
% cat test.txt
a
a
b
a
% cargo run
thread 'main' panicked at 'Couldn't write to file: Os { code: 9, kind: Other, message: "Bad file descriptor" }', src/main.rs:12:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
You could open the file for both reading and writing:
use std::{
fs::OpenOptions,
io::{BufRead, BufReader, Write},
};
fn main() {
let mut file = OpenOptions::new()
.read(true)
.write(true)
.open("test.txt")
.expect("file error");
// Remaining code unchanged
}
But then you'd see that (a) the output is appended and (b) all the newlines are lost on the new lines because BufRead doesn't include them.
We could reset the file pointer back to the beginning, but then you'd probably leave trailing stuff at the end (deduplicating is likely to have less bytes written than read). It's easier to just reopen the file for writing, which will truncate the file. Also, let's use a set data structure to do the deduplication for us!
use std::{
collections::BTreeSet,
fs::File,
io::{BufRead, BufReader, Write},
};
fn main() {
let file = File::open("test.txt").expect("file error");
let reader = BufReader::new(file);
let lines: BTreeSet<_> = reader
.lines()
.map(|l| l.expect("Couldn't read a line"))
.collect();
let mut file = File::create("test.txt").expect("file error");
for line in lines {
file.write_all(line.as_bytes())
.expect("Couldn't write to file");
file.write_all(b"\n").expect("Couldn't write to file");
}
}
And the output:
% cat test.txt
a
a
b
a
a
b
a
b
% cargo run
% cat test.txt
a
b
The less-efficient but shorter solution is to read the entire file as one string and use str::lines:
use std::{
collections::BTreeSet,
fs::{self, File},
io::Write,
};
fn main() {
let contents = fs::read_to_string("test.txt").expect("can't read");
let lines: BTreeSet<_> = contents.lines().collect();
let mut file = File::open("test.txt").expect("can't create");
for line in lines {
writeln!(file, "{}", line).expect("can't write");
}
}
See also:
What's the de-facto way of reading and writing files in Rust 1.x?
What is the best variant for appending a new line in a text file?

How to combine reading a file line by line and iterating over each character in each line?

I started from this code, which just reads every line in a file, and which works well:
use std::io::{BufRead, BufReader};
use std::fs::File;
fn main() {
let file = File::open("chry.fa").expect("cannot open file");
let file = BufReader::new(file);
for line in file.lines() {
print!("{}", line.unwrap());
}
}
... but then I tried to also loop over each character in each line, something like this:
use std::io::{BufRead, BufReader};
use std::fs::File;
fn main() {
let file = File::open("chry.fa").expect("cannot open file");
let file = BufReader::new(file);
for line in file.lines() {
for c in line.chars() {
print!("{}", c.unwrap());
}
}
}
... but it turns out that this innermost for loop is not correct. I get the following error message:
error[E0599]: no method named `chars` found for type `std::result::Result<std::string::String, std::io::Error>` in the current scope
--> src/main.rs:8:23
|
8 | for c in line.chars() {
| ^^^^^

You need to handle the potential error that could arise from each IO operation, represented by an io::Result which can contain either the requested data or an error. There are different ways to handle errors.
One way is to just ignore them and read whatever data we can get.
The code shows how this can be done:
use std::io::{BufRead, BufReader};
use std::fs::File;
fn main() {
let file = File::open("chry.fa").expect("cannot open file");
let file = BufReader::new(file);
for line in file.lines().filter_map(|result| result.ok()) {
for c in line.chars() {
print!("{}", c);
}
}
}
The key points: file.lines() is an iterator that yields io::Result. In the filter_map, we convert the io::Result into an Option and filter any occurrences of None. We're then left with just plain lines (i.e. strings).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How do I read and process N lines of a file at a time in Rust? - io

I would like to read N lines of a file at a time, possibly using itertools::Itertools::chunks. When I do: for line in stdin.lock().lines() { ... // this is processing one line at a time } ... although I'm buffering input, I am not processing the buffer.

You could use chunks() from itertools: use itertools::Itertools; // 0.8.0 use std::io::BufRead; fn main() { let stdin = std::io::stdin(); let n = 3; for lines in &stdin.lock().lines().chunks(n) { for (i, line) in lines.enumerate() { println!("Line {}: {:?}", i, line); } } }

Related

Streaming version for Rust (nom 7) multiline parser

how to pass every line from a text file as an argument in rust

How to capture the output of a process piped into a Rust program?

How can I read a file line-by-line, eliminate duplicates, then write back to the same file?

How to combine reading a file line by line and iterating over each character in each line?

Categories

Resources