How to read a GBK-encoded file into a String? - rust

use std::env;
use std::fs::File;
use std::io::prelude::*;
fn main() {
let args: Vec<String> = env::args().collect();
let filename = &args[1];
let mut f = File::open(filename).expect("file not found");
let mut contents = String::new();
f.read_to_string(&mut contents).expect("something went wrong reading the file");
println!("file content:\n{}", contents);
}
When I attempt to read a GBK encoded file, I get the following error:
thread 'main' panicked at 'something went wrong reading the file: Error { repr: Custom(Custom { kind: InvalidData, error: StringError("stream did not contain valid UTF-8") }) }', /checkout/src/libcore/result.rs:860
It says the stream must contain valid UTF-8. How can I read a GBK file?

I figured out how to read line by line from a GBK-encoded file.
extern crate encoding;
use std::env;
use std::fs::File;
use std::io::prelude::*;
use std::io::BufReader;
use encoding::all::GBK;
use encoding::{Encoding, EncoderTrap, DecoderTrap};
fn main() {
let args: Vec<String> = env::args().collect();
let filename = &args[1];
let mut file = File::open(filename).expect("file not found");
let reader = BufReader::new(&file);
let mut lines = reader.split(b'\n').map(|l| l.unwrap());
for line in lines {
let decoded_string = GBK.decode(&line, DecoderTrap::Strict).unwrap();
println!("{}", decoded_string);
}
}

You likely want the encoding crate.

Related

How can I parse the .nth() line in a file as an integer?

I am trying to figure out how to parse a specific line in a file as a u32 but I keep getting method not found in Option<String> when I try to parse a Option<String>.
Is there a way to parse it or am I approaching this wrong?
use std::io::{BufRead, BufReader};
use std::fs::File;
fn main() {
let reader = BufReader::new(File::open("input").expect("Cannot open file"));
let lines = reader.lines();
let number: u32 = lines.nth(5).unwrap().ok().parse::<u32>();
println!("{}", number);
}
You can't parse a number out of an Option<String>, since if it is None then there is nothing to parse. You must unwrap the Option first (or do proper error handling):
use std::io::{BufRead, BufReader};
use std::fs::File;
fn main() {
let reader = BufReader::new(File::open("input").expect("Cannot open file"));
let number: u32 = reader.lines()
.nth(5)
.expect("input is not 5 lines long")
.expect("could not read 5th line")
.parse::<u32>()
.expect("invalid number");
println!("{}", number);
}

How to read a text File in Rust and read mutliple Values per line

So basically, I have a text file with the following syntax:
String int
String int
String int
I have an idea how to read the Values if there is only one entry per line, but if there are multiple, I do not know how to do it.
In Java, I would do something simple with while and Scanner but in Rust I have no clue.
I am fairly new to Rust so please help me.
Thanks for your help in advance
Solution
Here is my modified Solution of #netwave 's code:
use std::fs;
use std::io::{BufRead, BufReader, Error};
fn main() -> Result<(), Error> {
let buff_reader = BufReader::new(fs::File::open(file)?);
for line in buff_reader.lines() {
let parsed = sscanf::scanf!(line?, "{} {}", String, i32);
println!("{:?}\n", parsed);
}
Ok(())
}
You can use the BuffRead trait, which has a read_line method. Also you can use lines.
For doing so the easiest option would be to wrap the File instance with a BuffReader:
use std::fs;
use std::io::{BufRead, BufReader};
...
let buff_reader = BufReader::new(fs::File::open(path)?);
loop {
let mut buff = String::new();
buff_reader.read_line(&mut buff)?;
println!("{}", buff);
}
Playground
Once you have each line you can easily use sscanf crate to parse the line to the types you need:
let parsed = sscanf::scanf!(buff, "{} {}", String, i32);
Based on: https://doc.rust-lang.org/rust-by-example/std_misc/file/read_lines.html
For data.txt to contain:
str1 100
str2 200
str3 300
use std::fs::File;
use std::io::{self, BufRead};
use std::path::Path;
fn main() {
// File hosts must exist in current path before this produces output
if let Ok(lines) = read_lines("./data.txt") {
// Consumes the iterator, returns an (Optional) String
for line in lines {
if let Ok(data) = line {
let values: Vec<&str> = data.split(' ').collect();
match values.len() {
2 => {
let strdata = values[0].parse::<String>();
let intdata = values[1].parse::<i32>();
println!("Got: {:?} {:?}", strdata, intdata);
},
_ => panic!("Invalid input line {}", data),
};
}
}
}
}
// The output is wrapped in a Result to allow matching on errors
// Returns an Iterator to the Reader of the lines of the file.
fn read_lines<P>(filename: P) -> io::Result<io::Lines<io::BufReader<File>>>
where P: AsRef<Path>, {
let file = File::open(filename)?;
Ok(io::BufReader::new(file).lines())
}
Outputs:
Got: Ok("str1") Ok(100)
Got: Ok("str2") Ok(200)
Got: Ok("str3") Ok(300)

Rust: access line in file by index, or is there another way to compare two lines

i have a simple txt file with one value per line.
Is it somehow possible to compare two lines?
I was looking for a way to index each line and then compare the index [n] with index [n+1].
By now i am able to print each line, but not to compare the entries.
Here is my code:
use std::fs::File;
use std::env;
use std::io::{self, BufReader, BufRead};
fn read_file(filename: &String) -> io::Result<()> {
let file = File::open(filename)?;
let content = BufReader::new(file);
for line in content.lines() {
println!("{}", line.unwrap());
}
Ok(())
}
fn main() {
let args: Vec<String> = env::args().collect();
let filename = &args[1];
read_file(filename).expect("error reading file");
}
One solution is to collect all lines into a vector and use std::iter::zip method.
fn read_file(filename: &String) -> io::Result<()> {
let file = File::open(filename)?;
let content = BufReader::new(file);
let lines: Vec<String> = content
.lines()
.map(|line| line.expect("Something went wrong"))
.collect();
for (current, next) in lines.iter().zip(lines.iter().skip(1)) {
println!("{}, {}", current, next)
}
Ok(())
}
So for the input file having content,
1
2
3
4
read_file function will print
1, 2
2, 3
3, 4

How to unzip a Reqwest/Hyper response using streams?

I need to download a 60MB ZIP file and extract the only file that comes within it. I want to download it and extract it using streams. How can I achieve this using Rust?
fn main () {
let mut res = reqwest::get("myfile.zip").unwrap();
// extract the response body to myfile.txt
}
In Node.js I would do something like this:
http.get('myfile.zip', response => {
response.pipe(unzip.Parse())
.on('entry', entry => {
if (entry.path.endsWith('.txt')) {
entry.pipe(fs.createWriteStream('myfile.txt'))
}
})
})
With reqwest you can get the .zip file:
reqwest::get("myfile.zip")
Since reqwest can only be used for retrieving the file, ZipArchive from the zip crate can be used for unpacking it. It's not possible to stream the .zip file into ZipArchive, since ZipArchive::new(reader: R) requires R to implement Read (which is fulfilled by the Response of reqwest) and Seek, which is not implemented by Response.
As a workaround you may use a temporary file:
copy_to(&mut tmpfile)
As File implements both Seek and Read, zip can be used here:
zip::ZipArchive::new(tmpfile)
This is a working example of the described method:
extern crate reqwest;
extern crate tempfile;
extern crate zip;
use std::io::Read;
fn main() {
let mut tmpfile = tempfile::tempfile().unwrap();
reqwest::get("myfile.zip").unwrap().copy_to(&mut tmpfile);
let mut zip = zip::ZipArchive::new(tmpfile).unwrap();
println!("{:#?}", zip);
}
tempfile is a handy crate, which lets you create a temporary file, so you don't have to think of a name.
That's how I'd read the file hello.txt with content hello world from the archive hello.zip located on a local server:
extern crate reqwest;
extern crate zip;
use std::io::Read;
fn main() {
let mut res = reqwest::get("http://localhost:8000/hello.zip").unwrap();
let mut buf: Vec<u8> = Vec::new();
let _ = res.read_to_end(&mut buf);
let reader = std::io::Cursor::new(buf);
let mut zip = zip::ZipArchive::new(reader).unwrap();
let mut file_zip = zip.by_name("hello.txt").unwrap();
let mut file_buf: Vec<u8> = Vec::new();
let _ = file_zip.read_to_end(&mut file_buf);
let content = String::from_utf8(file_buf).unwrap();
println!("{}", content);
}
This will output hello world
async solution using Tokio
It's a bit convoluted, but you can do this using tokio, futures, tokio_util::compat and async_compression. The key is to create a futures::io::AsyncRead stream using .into_async_read() and then convert it into a tokio::io::AsyncRead using .compat().
For simplicity, it downloads a txt.gz file and prints it line by line.
use async_compression::tokio::bufread::GzipDecoder;
use futures::stream::TryStreamExt;
use tokio::io::AsyncBufReadExt;
use tokio_util::compat::FuturesAsyncReadCompatExt;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let url = "https://f001.backblazeb2.com/file/korteur/hello-world.txt.gz";
let response = reqwest::get(url).await?;
let stream = response
.bytes_stream()
.map_err(|e| futures::io::Error::new(futures::io::ErrorKind::Other, e))
.into_async_read()
.compat();
let gzip_decoder = GzipDecoder::new(stream);
// Print decompressed txt content
let buf_reader = tokio::io::BufReader::new(gzip_decoder);
let mut lines = buf_reader.lines();
while let Some(line) = lines.next_line().await? {
println!("{line}");
}
Ok(())
}
Credit to Benjamin Kay.

How can I read a file line-by-line, eliminate duplicates, then write back to the same file?

I want to read a file, eliminate all duplicates and write the rest back into the file - like a duplicate cleaner.
Vec because a normal array has a fixed size but my .txt is flexible (am I doing this right?).
Read, lines in Vec + delete duplices:
Missing write back to file.
use std::io;
fn main() {
let path = Path::new("test.txt");
let mut file = io::BufferedReader::new(io::File::open(&path, R));
let mut lines: Vec<String> = file.lines().map(|x| x.unwrap()).collect();
// dedup() deletes all duplicates if sort() before
lines.sort();
lines.dedup();
for e in lines.iter() {
print!("{}", e.as_slice());
}
}
Read + write to file (untested but should work I guess).
Missing lines to Vec because it doesn't work without BufferedReader as it seems (or I'm doing something else wrong, also a good chance).
use std::io;
fn main() {
let path = Path::new("test.txt");
let mut file = match io::File::open_mode(&path, io::Open, io::ReadWrite) {
Ok(f) => f,
Err(e) => panic!("file error: {}", e),
};
let mut lines: Vec<String> = file.lines().map(|x| x.unwrap()).collect();
lines.sort();
// dedup() deletes all duplicates if sort() before
lines.dedup();
for e in lines.iter() {
file.write("{}", e);
}
}
So .... how do I get those 2 together? :)
Ultimately, you are going to run into a problem: you are trying to write to the same file you are reading from. In this case, it's safe because you are going to read the entire file, so you don't need it after that. However, if you did try to write to the file, you'd see that opening a file for reading doesn't allow writing! Here's the code to do that:
use std::{
fs::File,
io::{BufRead, BufReader, Write},
};
fn main() {
let mut file = File::open("test.txt").expect("file error");
let reader = BufReader::new(&mut file);
let mut lines: Vec<_> = reader
.lines()
.map(|l| l.expect("Couldn't read a line"))
.collect();
lines.sort();
lines.dedup();
for line in lines {
file.write_all(line.as_bytes())
.expect("Couldn't write to file");
}
}
Here's the output:
% cat test.txt
a
a
b
a
% cargo run
thread 'main' panicked at 'Couldn't write to file: Os { code: 9, kind: Other, message: "Bad file descriptor" }', src/main.rs:12:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
You could open the file for both reading and writing:
use std::{
fs::OpenOptions,
io::{BufRead, BufReader, Write},
};
fn main() {
let mut file = OpenOptions::new()
.read(true)
.write(true)
.open("test.txt")
.expect("file error");
// Remaining code unchanged
}
But then you'd see that (a) the output is appended and (b) all the newlines are lost on the new lines because BufRead doesn't include them.
We could reset the file pointer back to the beginning, but then you'd probably leave trailing stuff at the end (deduplicating is likely to have less bytes written than read). It's easier to just reopen the file for writing, which will truncate the file. Also, let's use a set data structure to do the deduplication for us!
use std::{
collections::BTreeSet,
fs::File,
io::{BufRead, BufReader, Write},
};
fn main() {
let file = File::open("test.txt").expect("file error");
let reader = BufReader::new(file);
let lines: BTreeSet<_> = reader
.lines()
.map(|l| l.expect("Couldn't read a line"))
.collect();
let mut file = File::create("test.txt").expect("file error");
for line in lines {
file.write_all(line.as_bytes())
.expect("Couldn't write to file");
file.write_all(b"\n").expect("Couldn't write to file");
}
}
And the output:
% cat test.txt
a
a
b
a
a
b
a
b
% cargo run
% cat test.txt
a
b
The less-efficient but shorter solution is to read the entire file as one string and use str::lines:
use std::{
collections::BTreeSet,
fs::{self, File},
io::Write,
};
fn main() {
let contents = fs::read_to_string("test.txt").expect("can't read");
let lines: BTreeSet<_> = contents.lines().collect();
let mut file = File::open("test.txt").expect("can't create");
for line in lines {
writeln!(file, "{}", line).expect("can't write");
}
}
See also:
What's the de-facto way of reading and writing files in Rust 1.x?
What is the best variant for appending a new line in a text file?

Resources