Reading all file contents in current directory to a vector - rust

I want to read all the files in the current directory.
Here's my progress:
use std::fs;
fn main() {
let files = fs::read_dir(".").unwrap();
files
.filter_map(Result::ok)
.filter(|d| if let Some(e) = d.path().extension() { e == "txt" } else {false})
.for_each(|f| println!("{:?}", f));
}
Here I got a little lost, how can I read all file contents? Should I add them to a growing Vec in the for_each block? if so then how?

If you want a single vec with all files bytes in one you can use
let target_ext = OsString::from("txt");
let files = fs::read_dir(".").unwrap();
let file_bytes : Vec<u8> = files
.filter_map(Result::ok)
.map(|d| d.path())
.filter(|path| path.extension() == Some(&target_ext))
.flat_map(|path| fs::read(path).expect("Failed to read"))
.collect();
if you want a vec that contains each file's content separately, change flat_map to a map and it will return a Vec<Vec<u8>>
let file_bytes : Vec<Vec<u8>> = files
.filter_map(Result::ok)
.map(|d| d.path())
.filter(|path| path.extension() == Some(&target_ext))
.map(|path| fs::read(path).expect("Failed to read"))
.collect();

Related

Splitting a Vec of strings into Vec<Vec<String>>

I am attempting to relearn data-science in rust.
I have a Vec<String> that includes a delimiter "|" and a new line "!end".
What I'd like to end up with is Vec<Vec<String>> that can be put into a 2D ND array.
I have this python Code:
file = open('somefile.dat')
lst = []
for line in file:
lst += [line.split('|')]
df = pd.DataFrame(lst)
SAMV2FinalDataFrame = pd.DataFrame(lst,columns=column_names)
And i've recreated it here in rust:
fn lines_from_file(filename: impl AsRef<Path>) -> Vec<String> {
let file = File::open(filename).expect("no such file");
let buf = BufReader::new(file);
buf.lines()
.map(|l| l.expect("Could not parse line"))
.collect()
}
fn main() {
let lines = lines_from_file(".dat");
let mut new_arr = vec![];
//Here i get a lines immitable borrow
for line in lines{
new_arr.push([*line.split("!end")]);
}
// here i get expeected closure found str
let x = lines.split("!end");
let array = Array::from(lines)
what i have: ['1','1','1','end!','2','2','2','!end']
What i need: [['1','1','1'],['2','2','2']]
Edit: also why when i turbo fish does it make it disappear on Stack Overflow?
I think part of the issue you ran into was due how you worked with arrays. For example, Vec::push will only add a single element so you would want to use Vec::extend instead. I also ran into a few cases of empty strings due to splitting by "!end" would leave trailing '|' on the ends of substrings. The errors were quite strange, I am not completely sure where the closure came from.
let lines = vec!["1|1|1|!end|2|2|2|!end".to_string()];
let mut new_arr = Vec::new();
// Iterate over &lines so we don't consume lines and it can be used again later
for line in &lines {
new_arr.extend(line.split("!end")
// Remove trailing empty string
.filter(|x| !x.is_empty())
// Convert each &str into a Vec<String>
.map(|x| {
x.split('|')
// Remove empty strings from ends split (Ex split: "|2|2|2|")
.filter(|x| !x.is_empty())
// Convert &str into owned String
.map(|x| x.to_string())
// Turn iterator into Vec<String>
.collect::<Vec<_>>()
}));
}
println!("{:?}", new_arr);
I also came up with this other version which should handle your use case better. The earlier approach dropped all empty strings, while this one should preserve them while correctly handling the "!end".
use std::io::{self, BufRead, BufReader, Read, Cursor};
fn split_data<R: Read>(buffer: &mut R) -> io::Result<Vec<Vec<String>>> {
let mut sections = Vec::new();
let mut current_section = Vec::new();
for line in BufReader::new(buffer).lines() {
for item in line?.split('|') {
if item != "!end" {
current_section.push(item.to_string());
} else {
sections.push(current_section);
current_section = Vec::new();
}
}
}
Ok(sections)
}
In this example, I used Read for easier testing, but it will also work with a file.
let sample_input = b"1|1|1|!end|2|2|2|!end";
println!("{:?}", split_data(&mut Cursor::new(sample_input)));
// Output: Ok([["1", "1", "1"], ["2", "2", "2"]])
// You can also use a file instead
let mut file = File::new("somefile.dat");
let solution: Vec<Vec<String>> = split_data(&mut file).unwrap();
playground link

How to read specific file from zip file

I'm totally stuck reading a file from a variable path structure of a zip file without decompressing it.
My file is located here:
/info/[random-string]/info.json
Where [random-string] is the only file in the info folder.
So its like read the 'info' folder read the first folder read the 'info.json'.
Any ideas how to do that with one of these libraries (zip or rc_zip)?
let file_path = file.to_str().unwrap();
let file = File::open(file_path).unwrap();
let reader = BufReader::new(file);
let mut archive = zip::ZipArchive::new(reader).unwrap();
let info_folder = archive.by_name("info").unwrap();
// how to list files of info_folder
Here you are:
use std::error::Error;
use std::ffi::OsStr;
use std::fs::File;
use std::path::Path;
use zip::ZipArchive; // zip 0.5.13
fn main() -> Result<(), Box<dyn Error>> {
let archive = File::open("./info.zip")?;
let mut archive = ZipArchive::new(archive)?;
// iterate over all files, because you don't know the exact name
for idx in 0..archive.len() {
let entry = archive.by_index(idx)?;
let name = entry.enclosed_name();
if let Some(name) = name {
// process only entries which are named info.json
if name.file_name() == Some(OsStr::new("info.json")) {
// the ancestors() iterator lets you walk up the path segments
let mut ancestors = name.ancestors();
// skip self - the first entry is always the full path
ancestors.next();
// skip the random string
ancestors.next();
let expect_info = ancestors.next();
// the reminder must be only 'info/' otherwise this is the wrong entry
if expect_info == Some(Path::new("info/")) {
// do something with the file
println!("Found!!!");
break;
}
}
}
}
Ok(())
}

How can I filter a list of filenames by their extension?

I managed to read all files from one path to a Vec, and now I want to filter by extension.
let path = Path::new(r"C:\Testpath");
let mut faxvec: Vec<String> = Vec::new();
for element in path.read_dir().unwrap() {
// if match element
faxvec.push(element);
}
I want to only push the files to the vector that end with ".txt"
Maybe you're looking for something along the lines of:
fn main() {
let mut faxvec: Vec<std::path::PathBuf> = Vec::new();
for element in std::path::Path::new(r"C:\Testpath").read_dir().unwrap() {
let path = element.unwrap().path();
if let Some(extension) = path.extension() {
if extension == "txt" {
faxvec.push(path);
}
}
}
}

How do I use include_str! for multiple files or an entire directory?

I would like to copy an entire directory to a location in a user's $HOME. Individually copying files to that directory is straightforward:
let contents = include_str!("resources/profiles/default.json");
let fpath = dpath.join(&fname);
fs::write(fpath, contents).expect(&format!("failed to create profile: {}", n));
I haven't found a way to adapt this to multiple files:
for n in ["default"] {
let fname = format!("{}{}", n, ".json");
let x = format!("resources/profiles/{}", fname).as_str();
let contents = include_str!(x);
let fpath = dpath.join(&fname);
fs::write(fpath, contents).expect(&format!("failed to create profile: {}", n));
}
...the compiler complains that x must be a string literal.
As far as I know, there are two options:
Write a custom macro.
Replicate the first code for each file I want to copy.
What is the best way of doing this?
I would create a build script that iterates through a directory, building up an array of tuples containing the name and another macro call to include the raw data:
use std::{
env,
error::Error,
fs::{self, File},
io::Write,
path::Path,
};
const SOURCE_DIR: &str = "some/path/to/include";
fn main() -> Result<(), Box<dyn Error>> {
let out_dir = env::var("OUT_DIR")?;
let dest_path = Path::new(&out_dir).join("all_the_files.rs");
let mut all_the_files = File::create(&dest_path)?;
writeln!(&mut all_the_files, r##"["##,)?;
for f in fs::read_dir(SOURCE_DIR)? {
let f = f?;
if !f.file_type()?.is_file() {
continue;
}
writeln!(
&mut all_the_files,
r##"("{name}", include_bytes!(r#"{name}"#)),"##,
name = f.path().display(),
)?;
}
writeln!(&mut all_the_files, r##"]"##,)?;
Ok(())
}
This has some weaknesses, namely that it requires the path to be expressible as a &str. Since you were already using include_string!, I don't think that's an extra requirement. This also means that the generated string has to be a valid Rust string. We use raw strings inside the generated file, but this can still fail if a filename were to contain the string "#. A better solution would probably use str::escape_default.
Since we are including files, I used include_bytes! instead of include_str!, but if you really needed to you can switch back. The raw bytes skips performing UTF-8 validation at compile time, so it's a small win.
Using it involves importing the generated value:
const ALL_THE_FILES: &[(&str, &[u8])] = &include!(concat!(env!("OUT_DIR"), "/all_the_files.rs"));
fn main() {
for (name, data) in ALL_THE_FILES {
println!("File {} is {} bytes", name, data.len());
}
}
See also:
How can I locate resources for testing with Cargo?
You can use include_dir macro.
use include_dir::{include_dir, Dir};
use std::path::Path;
const PROJECT_DIR: Dir = include_dir!(".");
// of course, you can retrieve a file by its full path
let lib_rs = PROJECT_DIR.get_file("src/lib.rs").unwrap();
// you can also inspect the file's contents
let body = lib_rs.contents_utf8().unwrap();
assert!(body.contains("SOME_INTERESTING_STRING"));
Using a macro:
macro_rules! incl_profiles {
( $( $x:expr ),* ) => {
{
let mut profs = Vec::new();
$(
profs.push(($x, include_str!(concat!("resources/profiles/", $x, ".json"))));
)*
profs
}
};
}
...
let prof_tups: Vec<(&str, &str)> = incl_profiles!("default", "python");
for (prof_name, prof_str) in prof_tups {
let fname = format!("{}{}", prof_name, ".json");
let fpath = dpath.join(&fname);
fs::write(fpath, prof_str).expect(&format!("failed to create profile: {}", prof_name));
}
Note: This is not dynamic. The files ("default" and "python") are specified in the call to the macro.
Updated: Use Vec instead of HashMap.

How can I read a file line-by-line, eliminate duplicates, then write back to the same file?

I want to read a file, eliminate all duplicates and write the rest back into the file - like a duplicate cleaner.
Vec because a normal array has a fixed size but my .txt is flexible (am I doing this right?).
Read, lines in Vec + delete duplices:
Missing write back to file.
use std::io;
fn main() {
let path = Path::new("test.txt");
let mut file = io::BufferedReader::new(io::File::open(&path, R));
let mut lines: Vec<String> = file.lines().map(|x| x.unwrap()).collect();
// dedup() deletes all duplicates if sort() before
lines.sort();
lines.dedup();
for e in lines.iter() {
print!("{}", e.as_slice());
}
}
Read + write to file (untested but should work I guess).
Missing lines to Vec because it doesn't work without BufferedReader as it seems (or I'm doing something else wrong, also a good chance).
use std::io;
fn main() {
let path = Path::new("test.txt");
let mut file = match io::File::open_mode(&path, io::Open, io::ReadWrite) {
Ok(f) => f,
Err(e) => panic!("file error: {}", e),
};
let mut lines: Vec<String> = file.lines().map(|x| x.unwrap()).collect();
lines.sort();
// dedup() deletes all duplicates if sort() before
lines.dedup();
for e in lines.iter() {
file.write("{}", e);
}
}
So .... how do I get those 2 together? :)
Ultimately, you are going to run into a problem: you are trying to write to the same file you are reading from. In this case, it's safe because you are going to read the entire file, so you don't need it after that. However, if you did try to write to the file, you'd see that opening a file for reading doesn't allow writing! Here's the code to do that:
use std::{
fs::File,
io::{BufRead, BufReader, Write},
};
fn main() {
let mut file = File::open("test.txt").expect("file error");
let reader = BufReader::new(&mut file);
let mut lines: Vec<_> = reader
.lines()
.map(|l| l.expect("Couldn't read a line"))
.collect();
lines.sort();
lines.dedup();
for line in lines {
file.write_all(line.as_bytes())
.expect("Couldn't write to file");
}
}
Here's the output:
% cat test.txt
a
a
b
a
% cargo run
thread 'main' panicked at 'Couldn't write to file: Os { code: 9, kind: Other, message: "Bad file descriptor" }', src/main.rs:12:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
You could open the file for both reading and writing:
use std::{
fs::OpenOptions,
io::{BufRead, BufReader, Write},
};
fn main() {
let mut file = OpenOptions::new()
.read(true)
.write(true)
.open("test.txt")
.expect("file error");
// Remaining code unchanged
}
But then you'd see that (a) the output is appended and (b) all the newlines are lost on the new lines because BufRead doesn't include them.
We could reset the file pointer back to the beginning, but then you'd probably leave trailing stuff at the end (deduplicating is likely to have less bytes written than read). It's easier to just reopen the file for writing, which will truncate the file. Also, let's use a set data structure to do the deduplication for us!
use std::{
collections::BTreeSet,
fs::File,
io::{BufRead, BufReader, Write},
};
fn main() {
let file = File::open("test.txt").expect("file error");
let reader = BufReader::new(file);
let lines: BTreeSet<_> = reader
.lines()
.map(|l| l.expect("Couldn't read a line"))
.collect();
let mut file = File::create("test.txt").expect("file error");
for line in lines {
file.write_all(line.as_bytes())
.expect("Couldn't write to file");
file.write_all(b"\n").expect("Couldn't write to file");
}
}
And the output:
% cat test.txt
a
a
b
a
a
b
a
b
% cargo run
% cat test.txt
a
b
The less-efficient but shorter solution is to read the entire file as one string and use str::lines:
use std::{
collections::BTreeSet,
fs::{self, File},
io::Write,
};
fn main() {
let contents = fs::read_to_string("test.txt").expect("can't read");
let lines: BTreeSet<_> = contents.lines().collect();
let mut file = File::open("test.txt").expect("can't create");
for line in lines {
writeln!(file, "{}", line).expect("can't write");
}
}
See also:
What's the de-facto way of reading and writing files in Rust 1.x?
What is the best variant for appending a new line in a text file?

Resources