I've written this:
/* Turn a Vec<Vec<u8>> into a Vec<String>, without re-allocating the inner bytes. */
let lines: Vec<String> = u8_lines.into_iter()
.filter_map(|u8_line| match String::from_utf8(u8_line) {
Ok(u8_line) => Some(u8_line),
Err(_) => None,
})
.collect();
Is there a way of simplifying this by replacing the match or the filter_map?
Use can use Result::ok to turn a Result<T, E> into an Option<T>:
/* Turn a Vec<Vec<u8>> into a Vec<String>, without re-allocating the inner bytes. */
let lines: Vec<String> = u8_lines.into_iter()
.filter_map(|u8_line| String::from_utf8(u8_line).ok())
.collect();
Related
How could I pack the following code into a single iterator?
use std::io::{BufRead, BufReader};
use std::fs::File;
let file = BufReader::new(File::open("sample.txt").expect("Unable to open file"));
for line in file.lines() {
for ch in line.expect("Unable to read line").chars() {
println!("Character: {}", ch);
}
}
Naively, I’d like to have something like (I skipped unwraps)
let lines = file.lines().next();
Reader {
line: lines,
char: next().chars()
}
and iterate over Reader.char till hitting None, then refreshing Reader.line to a new line and Reader.char to the first character of the line. This doesn't seem to be possible though because Reader.char depends on the temporary variable.
Please notice that the question is about nested iterators, reading text files is used as an example.
You can use the flat_map() iterator utility to create new iterator that can produce any number of items for each item in the iterator it's called on.
In this case, that's complicated by the fact that lines() returns an iterator of Results, so the Err case must be handled.
There's also the issue that .chars() references the original string to avoid an additional allocation, so you have to collect the characters into another iterable container.
Solving both issues results in this mess:
fn example() -> impl Iterator<Item=Result<char, std::io::Error>> {
let file = BufReader::new(File::open("sample.txt").expect("Unable to open file"));
file.lines().flat_map(|line| match line {
Err(e) => vec![Err(e)],
Ok(line) => line.chars().map(Ok).collect(),
})
}
If String gave us an into_chars() method we could avoid collect() here, but then we'd have differently-typed iterators and would need to use either Box<dyn Iterator> or something like either::Either.
Since you already use .expect() here, you can simplify a bit by using .expect() within the closure to avoid handling the Err case:
fn example() -> impl Iterator<Item=char> {
let file = BufReader::new(File::open("sample.txt").expect("Unable to open file"));
file.lines().flat_map(|line|
line.expect("Unable to read line").chars().collect::<Vec<_>>()
)
}
In the general case, flat_map() is usually quite easy. You just need to be mindful of whether you are iterating owned vs borrowed values; both cases have some sharp corners. In this case, iterating over owned String values makes using .chars() problematic. If we could iterate over borrowed str slices we wouldn't have to .collect().
Drawing on the answer from #cdhowie and this answer that suggests using IntoIter to get an iterator of owned chars, I was able to come up with this solution that is the closest to what I expected:
use std::fs::File;
use std::io;
use std::io::{BufRead, BufReader, Lines};
use std::vec::IntoIter;
struct Reader {
lines: Lines<BufReader<File>>,
iter: IntoIter<char>,
}
impl Reader {
fn new(filename: &str) -> Self {
let file = BufReader::new(File::open(filename).expect("Unable to open file"));
let mut lines = file.lines();
let iter = Reader::char_iter(lines.next().expect("Unable to read file"));
Reader { lines, iter }
}
fn char_iter(line: io::Result<String>) -> IntoIter<char> {
line.unwrap().chars().collect::<Vec<_>>().into_iter()
}
}
impl Iterator for Reader {
type Item = char;
fn next(&mut self) -> Option<Self::Item> {
match self.iter.next() {
None => {
self.iter = match self.lines.next() {
None => return None,
Some(line) => Reader::char_iter(line),
};
Some('\n')
}
Some(val) => Some(val),
}
}
}
it works as expected:
let reader = Reader::new("src/main.rs");
for ch in reader {
print!("{}", ch);
}
Assuming you have a binary file example.bin and you want to read that file in units of f64, i.e. the first 8 bytes give a float, the next 8 bytes give a number, etc. (assuming you know endianess) How can this be done in Rust?
I know that one can use std::fs::read("example.bin") to get a Vec<u8> of the data, but then you have to do quite a bit of "gymnastics" to convert always 8 of the bytes to a f64, i.e.
fn eight_bytes_to_array(barry: &[u8]) -> &[u8; 8] {
barry.try_into().expect("slice with incorrect length")
}
let mut file_content = std::fs::read("example.bin").expect("Could not read file!");
let nr = eight_bytes_to_array(&file_content[0..8]);
let nr = f64::from_be_bytes(*nr_dp_per_spectrum);
I saw this post, but its from 2015 and a lot of changes have happend in Rust since then, so I was wondering if there is a better/faster way these days?
Example without proper error handling and checking for cases when file contains not divisible amount of bytes.
use std::fs::File;
use std::io::{BufReader, Read};
fn main() {
// Using BufReader because files in std is unbuffered by default
// And reading by 8 bytes is really bad idea.
let mut input = BufReader::new(
File::open("floats.bin")
.expect("Failed to open file")
);
let mut floats = Vec::new();
loop {
use std::io::ErrorKind;
// You may use 8 instead of `size_of` but size_of is less error-prone.
let mut buffer = [0u8; std::mem::size_of::<f64>()];
// Using read_exact because `read` may return less
// than 8 bytes even if there are bytes in the file.
// This, however, prevents us from handling cases
// when file size cannot be divided by 8.
let res = input.read_exact(&mut buffer);
match res {
// We detect if we read until the end.
// If there were some excess bytes after last read, they are lost.
Err(error) if error.kind() == ErrorKind::UnexpectedEof => break,
// Add more cases of errors you want to handle.
_ => {}
}
// You should do better error-handling probably.
// This simply panics.
res.expect("Unexpected error during read");
// Use `from_be_bytes` if numbers in file is big-endian
let f = f64::from_le_bytes(buffer);
floats.push(f);
}
}
I would create a generic iterator that returns f64 for flexibility and reusability.
struct F64Reader<R: io::BufRead> {
inner: R,
}
impl<R: io::BufRead> F64Reader<R> {
pub fn new(inner: R) -> Self {
Self{
inner
}
}
}
impl<R: io::BufRead> Iterator for F64Reader<R> {
type Item = f64;
fn next(&mut self) -> Option<Self::Item> {
let mut buff: [u8; 8] = [0;8];
self.inner.read_exact(&mut buff).ok()?;
Some(f64::from_be_bytes(buff))
}
}
This means if the file is large, you can loop through the values without storing it all in memory
let input = fs::File::open("example.bin")?;
for f in F64Reader::new(io::BufReader::new(input)) {
println!("{}", f)
}
Or if you want all the values you can collect them
let input = fs::File::open("example.bin")?;
let values : Vec<f64> = F64Reader::new(io::BufReader::new(input)).collect();
I need to put some futures in a Vec for later joining. However if I try to collect it using an iterator, the compiler doesn't seem to be able to determine the type for the vector.
I'm trying to create a command line utility that accepts an arbitrary number of IP addresses, communicates with those remotes and collects the results for printing. The communication function works well, I've cut down the program to show the failure I need to understand.
use futures::future::join_all;
use itertools::Itertools;
use std::net::SocketAddr;
use std::str::from_utf8;
use std::fmt;
#[tokio::main(flavor = "current_thread")]
pub async fn main() -> Result<(), Box<dyn std::error::Error>> {
let socket: Vec<SocketAddr> = vec![
"192.168.20.33:502".parse().unwrap(),
"192.168.20.34:502".parse().unwrap(),];
let async_vec = vec![
MyStruct::get(socket[0]),
MyStruct::get(socket[1]),];
// The above 3 lines happen to work to build a Vec because there are
// 2 sockets. But I need to build a Vec to join_all from an arbitary
// number of addresses. Why doesn't the line below work instead?
//let async_vec = socket.iter().map(|x| MyStruct::get(*x)).collect();
let rt = join_all(async_vec).await;
let results = rt.iter().map(|x| x.as_ref().unwrap().to_string()).join("\n");
let mut rvec: Vec<String> = results.split("\n").map(|x| x.to_string()).collect();
rvec.sort_by(|a, b| a[15..20].cmp(&b[15..20]));
println!("{}", rvec.join("\n"));
Ok(())
}
struct MyStruct {
serial: [u8; 12],
placeholder: String,
}
impl fmt::Display for MyStruct {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
let serial = match from_utf8(&self.serial) {
Ok(v) => v,
Err(_) => "(invalid)",
};
let lines = (1..4).map(|x| format!("{}, line{}, {}", serial, x, self.placeholder)).join("\n");
write!(f, "{}", lines)
}
}
impl MyStruct {
pub async fn get(sockaddr: SocketAddr) -> Result<MyStruct, Box<dyn std::error::Error>> {
let char = sockaddr.ip().to_string().chars().last().unwrap();
let rv = MyStruct{serial: [char as u8;12], placeholder: sockaddr.to_string(), };
Ok(rv)
}
}
This line:
let async_vec = socket.iter().map(|x| MyStruct::get(*x)).collect();
doesn't work because the compiler can't know that you want to collect everything into a Vec. You might want to collect into some other container (e.g. a linked list or a set). Therefore you need to tell the compiler the kind of container you want with:
let async_vec = socket.iter().map(|x| MyStruct::get(*x)).collect::<Vec::<_>>();
or:
let async_vec: Vec::<_> = socket.iter().map(|x| MyStruct::get(*x)).collect();
I'm trying to read a file into a vector, then print out a random line from that vector.
What am I doing wrong?
I'm asking here because I know I'm making a big conceptual mistake, but I'm having trouble identifying exactly where it is.
I know the error -
error[E0308]: mismatched types
26 | processor(&lines)
| ^^^^^^ expected &str, found struct std::string::String
And I see that there's a mismatch - but I don't know how to give the right type, or refactor the code for that (very short) function.
My code is below:
use std::{
fs::File,
io::{prelude::*, BufReader},
path::Path,
};
fn lines_from_file(filename: impl AsRef<Path>) -> Vec<String> {
let file = File::open(filename).expect("no such file");
let buf = BufReader::new(file);
buf.lines()
.map(|l| l.expect("Could not parse line"))
.collect()
}
fn processor(vectr: &Vec<&str>) -> () {
let vec = vectr;
let index = (rand::random::<f32>() * vec.len() as f32).floor() as usize;
println!("{}", vectr[index]);
}
fn main() {
let lines = lines_from_file("./example.txt");
for line in lines {
println!("{:?}", line);
}
processor(&lines);
}
While you're calling the processor function you're trying to pass a Vec<String> which is what the lines_from_file returns but the processor is expecting a &Vec<&str>. You can change the processor to match that expectation:
fn processor(vectr: &Vec<String>) -> () {
let vec = vectr;
let index = (rand::random::<f32>() * vec.len() as f32).floor() as usize;
println!("{}", vectr[index]);
}
The main function:
fn main() {
let lines = lines_from_file("./example.txt");
for line in &lines {. // &lines to avoid moving the variable
println!("{:?}", line);
}
processor(&lines);
}
More generally, a String is not the same as a string slice &str, therefore Vec<String> is not the same as Vec<&str>. I'd recommend checking the rust book: https://doc.rust-lang.org/nightly/book/ch04-03-slices.html?highlight=String#string-slices
Is it possible to read file line by line, if it is not in utf-8 encoding with std::io::File and std::io::BufReader?
I look at std::io::Lines and it return Result<String>, so
I worry, have I implement my own BufReader that do the same, but return Vec<u8> instead, or I can reuse std::io::BufReader in some way?
You do not have to re-implement BufReader itself, it provides exactly the method you need for your usecase read_until:
fn read_until(&mut self, byte: u8, buf: &mut Vec<u8>) -> Result<usize>
You supply your own Vec<u8> and the content of the file will be appended until byte is encountered (0x0A being LF).
There are several potential gotchas:
the buffer may end not only with a LF byte, but with a CR LF sequence,
it is up to you to clear buf between subsequent calls.
A simple while let Ok(_) = reader.read_until(0x0A as u8, buffer) should let you read your file easily enough.
You may consider implement a std::io::Lines equivalent which converts from the encoding to UTF-8 to provide a nice API, though it will have a performance cost.
To read files (utf-8 or not utf-8) line by line, I use:
/*
// Cargo.toml:
[dependencies]
encoding_rs = "0.8"
encoding_rs_io = "0.1.7"
...
*/
use encoding_rs::WINDOWS_1252;
use encoding_rs_io::DecodeReaderBytesBuilder;
use std::{
fs,
io::{Read, BufRead, BufReader, Error},
};
fn main() {
let file_path: &str = "/tmp/file.txt";
let buffer: Box<dyn BufRead> = read_file(file_path);
for (index, result_vec_bytes) in buffer.split(b'\n').enumerate() {
let line_number: usize = index + 1;
let line_utf8: String = get_string_utf8(result_vec_bytes, line_number, file_path);
println!("{line_utf8}");
}
}
Such that:
fn read_file(file_path: &str) -> Box<dyn BufRead> {
let file = match fs::File::open(file_path) {
Ok(f) => f,
Err(why) => panic!("Problem opening the file: \"{file_path}\"\n{why:?}"),
};
Box::new(BufReader::new(file))
}
And
fn get_string_utf8(result_vec_bytes: Result<Vec<u8>, std::io::Error>, line_number: usize) -> String {
let vec_bytes: Vec<u8> = match result_vec_bytes {
Ok(values) => values,
Err(why) => panic!("Failed to read line nº {line_number}: {why}"),
};
// from_utf8() checks to ensure that the bytes are valid UTF-8
let line_utf8: String = match std::str::from_utf8(&vec_bytes) {
Ok(str) => str.to_string(),
Err(_) => {
let mut data = DecodeReaderBytesBuilder::new()
.encoding(Some(WINDOWS_1252))
.build(vec_bytes.as_slice());
let mut buffer = String::new();
let _number_of_bytes = match data.read_to_string(&mut buffer) {
Ok(num) => num,
Err(why) => {
eprintln!("Problem reading data from file in buffer!");
eprintln!("Line nº {line_number}");
eprintln!("Used encoding type: WINDOWS_1252.");
eprintln!("Try another encoding type!");
panic!("Failed to convert data to UTF-8!: {why}")
},
};
buffer
}
};
// remove Window new line: "\r\n"
line_utf8.trim_end_matches('\r').to_string()
}