Pasting columns of multiple files together line by line by iterating - rust

I want to concatenate an unknown number of files side by side, but I'm making a bit of a mess. The operation would be roughly similar to unix paste. I thought I could iterate line by line over every file and write every element from every file to stdout, but it is proving harder than expected. Maybe there is a far better approach?
Every file looks like
name1 value1
name2 value2
name3 value3
name4 value4
I want to treat the first file special, because each row has an identifier (name as in the example above). The files are known to be sorted and of the same length, so I don't have to check anything while pasting the files together. For every file after the first file I don't have to write the name field again, and I can just take the value field. I haven't even started to bother with splitting those columns because I'm stuck iterating over all files simultaneously.
The code below doesn't compile, since
use of moved value: `iterfiles`rustcE0382
combine.rs(17, 14): `iterfiles` moved due to this method call, in previous iteration of loop
use std::fs::File;
use std::io::{BufRead, BufReader};
use std::path::PathBuf;
pub fn combine(calls: Vec<PathBuf>) {
let file1 = File::open(calls[0].clone()).unwrap();
let reader = BufReader::new(file1).lines();
let mut files = Vec::new();
for file in &calls[1..] {
files.push(BufReader::new(File::open(file).unwrap()).lines());
}
let iterfiles = files.iter();
for line in reader {
let mut line_out = Vec::new();
line_out.push(line.unwrap());
let rest_of_files: Vec<String> = iterfiles
.map(|file2| file2.next().unwrap().unwrap())
.collect();
}
}

You need to move it into the loop body and then use iter_mut instead of iter:
for line in reader {
let mut line_out = Vec::new();
line_out.push(line.unwrap());
let rest_of_files: Vec<String> = files.iter_mut()
.map(|file2| file2.next().unwrap().unwrap())
.collect();
}
By the way, you can construct files like this:
let mut files: Vec<_> = calls[1..].iter()
.map(|file| BufReader::new(File::open(file).unwrap()).lines())
.collect();
And you don't need the clone for file1:
let file1 = File::open(&calls[0]).unwrap();

Related

Iterate over set of strings and concatenate them

So I have a code which constantly asks for input and then executes your input as a shell command. I understand that the output I am getting from the shell commands is in a buffer of some sort. Now, however, as there are many commands which output lots of lines, I would like to get all of the output into one single string.
extern crate subprocess;
use std::io;
use std::io::{BufRead, BufReader};
use subprocess::Exec;
fn main() {
loop {
let mut mycommand_string: String = String::new();
io::stdin()
.read_line(&mut mycommand_string)
.expect("Failed to read line");
let mycommand: &str = &*mycommand_string;
let x = Exec::shell(mycommand).stream_stdout().unwrap();
let br = BufReader::new(x);
let full: String = " ".to_string();
let string = for (i, line) in br.lines().enumerate() {
let string: String = line.unwrap().to_string();
let full = format!("{}{}", full, string);
println!("{}", full);
};
println!("{}", string);
}
}
This is my code. As you can see, the thing I am aiming for is to somehow iterate over br.lines() and for each line of output it contains, append or concatenate it to a string, so that all the output ends up in one single string, preferably with "\n" in between each line, but not neccesarilly.
Specifically I would like to iterate over the result of the variable br which has a type I dont understand and to concatenate all the strings together.
If you have an iterator of lines, then you can simply collect that into a string:
br.lines().collect();
Of course we should not ignore that there do not seem to be many possible reasons for ever doing that...

How to call regexes in a loop without cloning the data

I am writing some code to call an N number of regexes over contents and if possible, I'd like to avoid cloning the strings all the time as not every regex would actually be a match. Is that even possible?
My code where I tried to do is this:
use std::borrow::Cow;
use regex::Regex;
fn main() {
let test = "abcde";
let regexes = vec![
(Regex::new("a").unwrap(), "b"),
(Regex::new("b").unwrap(), "c"),
(Regex::new("z").unwrap(), "-"),
];
let mut contents = Cow::Borrowed(test);
for (regex, new_value) in regexes {
contents = regex.replace_all(&contents, new_value);
}
println!("{}", contents);
}
The expected result there would be cccde (if it worked) and two clones. But to make it work, I have to keep cloning on every iteration:
fn main() {
let test = "abcde";
let regexes = vec![
(Regex::new("a").unwrap(), "b"),
(Regex::new("b").unwrap(), "c"),
(Regex::new("z").unwrap(), "-"),
];
let mut contents = test.to_string();
for (regex, new_value) in regexes {
contents = regex.replace_all(&contents, new_value).to_string();
}
println!("{}", contents);
}
Which then outputs cccde but with 3 clones.
Is it possible to avoid it somehow? I know I could call every regex and rebind the return but I do not have control over the amount of regex that could come.
Thanks in advance!
EDIT 1
For those who want to see the real code:
It is doing O(n^2) regexes operations.
It starts here https://github.com/jaysonsantos/there-i-fixed-it/blob/ad214a27606bc595d80bb7c5968d4f80ac032e65/src/plan/executor.rs#L185-L192 and calls this https://github.com/jaysonsantos/there-i-fixed-it/blob/main/src/plan/mod.rs#L107-L115
EDIT 2
Here is the new code with the accepted answer https://github.com/jaysonsantos/there-i-fixed-it/commit/a4f5916b3e80749de269efa219b0689cb08551f2
You can do it by using a string as the persistent owner of the string as it is being replaced, and on each iteration, checking if the returned Cow is owned. If it is owned, you know the replacement was successful, so you assign the string that is owned by the Cow into the loop variable.
let mut contents = test.to_owned();
for (regex, new_value) in regexes {
let new_contents = regex.replace_all(&contents, new_value);
if let Cow::Owned(new_string) = new_contents {
contents = new_string;
}
}
Note that assignment in Rust is by default a 'move' - this means that the value of new_string is moved rather than copied into contents.
Playground

CSV from_writer works on stdout(), but fails on from_path

Rust beginner here.
I've been trying to learn the CSV crate but got stuck on the following case.
My goal is to:
Parse a nested array
Set column names to array values
Write to CSV
Firstly here is the code that outputs exactly what I want it to.
use serde::Serialize;
use serde::Deserialize;
use csv;
use serde_json;
use std::io;
#[derive(Debug,Serialize,Deserialize)]
#[serde(transparent)]
struct Parent {
arr_field: Vec<Row>
}
#[derive(Debug,Serialize,Deserialize)]
struct Row {
a: u8,
b: u8,
c: u8,
}
fn main() {
let resp = r#" [[1,2,3],[3,2,1],[4,5,6]] "#;
let mut wtr = csv::WriterBuilder::new().from_writer(io::stdout());
let v: Parent = serde_json::from_str(resp).unwrap();
for row in v.arr_field{
wtr.serialize(row);
}
}
The output of this code is:
a,b,c
1,2,3
3,2,1
4,5,6
But when I want to save the output to a local file rather than stdout, like so:
let mut wtr = csv::WriterBuilder::new().from_path("./foo.csv");
I'm getting the following error at wtr.serialize
error[E0599]: no method named `serialize` found for enum `std::result::Result<Writer<File>, csv::Error>` in the current scope
Thank you for your help.
The error message tells you all you need to know - from_path returns a Result rather than a WriterBuilder, because opening that file might not always work. That is different with from_writer - no file needs to be opened, so no possibility of encountering an error.
To fix this, you can just use .unwrap(), like you do with serde_json::from_str the line below. This will cause a panic when an error was encountered, immediately terminating your program.
let mut wtr = csv::WriterBuilder::new().from_path("./foo.csv").unwrap();
Note that serialize also returns a result, so you should also add .unwrap() or some other logic to handle errors in your for loop. Rust will likely show a warning that there is an unused result.

How to determine messages in .rs files generated from rust-protobuf included by include! macro

I previously asked How can I include an arbitrary set of Protobuf-built files without knowing their names? - this is a follow up question based on the results of that.
I now have a file that I include that contains the different modules on their own line - i.e.:
mod foo;
mod bar;
These modules and their names can be totally random depending on what the user has put in the directory for the proto files.
I need to perform operations on those random modules. For instance, the first thing I would like to do is get all the messages that exist in those new modules and present them back as strings that I can push onto a vector.
So really a 2 part question:
Is there a way I can not know the names of the modules that I am now including in this file with include! and use the structures inside them (generically - now that I have them included).
After the above, how to get all the possible messages inside a protobuf generated .rs file/module. Each .rs file has a FileDescriptorProto() method, which looking on the Google protobuf documentation, looks similar to this: Google Protobuf FileDescriptor
What about if you include a single file that is generated by the build.rs script. This script can scan the given directory and generate the proper file.
I do have an example I can link to, but it includes solutions to Project Euler solutions, so I'm not sure how people feel about that.
Here is the build.rs that I use:
// Generate the problem list based on available modules.
use std::env;
use std::fs;
use std::io::prelude::*;
use std::fs::File;
use std::path::Path;
use regex::Regex;
extern crate regex;
fn main() {
let odir = env::var("OUT_DIR").unwrap();
let cwd = env::current_dir().unwrap().to_str().unwrap().to_owned();
let dst = Path::new(&odir);
let gen_name = dst.join("plist.rs");
let mut f = File::create(&gen_name).unwrap();
writeln!(&mut f, "// Auto-generated, do not edit.").unwrap();
writeln!(&mut f, "").unwrap();
writeln!(&mut f, "pub use super::Problem;").unwrap();
writeln!(&mut f, "").unwrap();
let problems = get_problems();
// Generate the inputs.
for &p in problems.iter() {
writeln!(&mut f, "#[path=\"{1}/src/pr{0:03}.rs\"] mod pr{0:03};", p, cwd).unwrap();
}
writeln!(&mut f, "").unwrap();
// Make the problem set.
writeln!(&mut f, "pub fn make() -> Vec<Box<Problem + 'static>> {{").unwrap();
writeln!(&mut f, " let mut probs = Vec::new();").unwrap();
for &p in problems.iter() {
writeln!(&mut f, " add_problem!(probs, pr{:03}::Solution);", p).unwrap();
}
writeln!(&mut f, " probs").unwrap();
writeln!(&mut f, "}}").unwrap();
drop(f);
}
// Get all of the problems, based on standard filenames of "src/prxxx.rs" where xxx is the problem
// number. Returns the result, sorted.
fn get_problems() -> Vec<u32> {
let mut result = vec![];
let re = Regex::new(r"^.*/pr(\d\d\d)\.rs$").unwrap();
for entry in fs::read_dir(&Path::new("src")).unwrap() {
let entry = entry.unwrap();
let p = entry.path();
let n = p.as_os_str().to_str();
let name = match n {
Some(n) => n,
None => continue,
};
match re.captures(name) {
None => continue,
Some(cap) => {
let num: u32 = cap.at(1).unwrap().parse().unwrap();
result.push(num);
},
}
}
result.sort();
result
}
Another source file under src then has the following:
include!(concat!(env!("OUT_DIR"), "/plist.rs"));
I have figured out a way to do this, based on #Shepmaster's suggestion in the comment on the original post:
Since Rust doesn't support reflection (at the time of this post), I had to expand my cargo build script to write code in the file that is being generated to have symbols that I would know would always be there.
I generated specific functions for each of the modules that I was including (since I had their module names at that point), and then generated "aggregate" functions that had generic names, that I could call back in my main code.

Is this the right way to read lines from file and split them into words in Rust?

Editor's note: This code example is from a version of Rust prior to 1.0 and is not syntactically valid Rust 1.0 code. Updated versions of this code produce different errors, but the answers still contain valuable information.
I've implemented the following method to return me the words from a file in a 2 dimensional data structure:
fn read_terms() -> Vec<Vec<String>> {
let path = Path::new("terms.txt");
let mut file = BufferedReader::new(File::open(&path));
return file.lines().map(|x| x.unwrap().as_slice().words().map(|x| x.to_string()).collect()).collect();
}
Is this the right, idiomatic and efficient way in Rust? I'm wondering if collect() needs to be called so often and whether it's necessary to call to_string() here to allocate memory. Maybe the return type should be defined differently to be more idiomatic and efficient?
There is a shorter and more readable way of getting words from a text file.
use std::io::{BufRead, BufReader};
use std::fs::File;
let reader = BufReader::new(File::open("file.txt").expect("Cannot open file.txt"));
for line in reader.lines() {
for word in line.unwrap().split_whitespace() {
println!("word '{}'", word);
}
}
You could instead read the entire file as a single String and then build a structure of references that points to the words inside:
use std::io::{self, Read};
use std::fs::File;
fn filename_to_string(s: &str) -> io::Result<String> {
let mut file = File::open(s)?;
let mut s = String::new();
file.read_to_string(&mut s)?;
Ok(s)
}
fn words_by_line<'a>(s: &'a str) -> Vec<Vec<&'a str>> {
s.lines().map(|line| {
line.split_whitespace().collect()
}).collect()
}
fn example_use() {
let whole_file = filename_to_string("terms.txt").unwrap();
let wbyl = words_by_line(&whole_file);
println!("{:?}", wbyl)
}
This will read the file with less overhead because it can slurp it into a single buffer, whereas reading lines with BufReader implies a lot of copying and allocating, first into the buffer inside BufReader, and then into a newly allocated String for each line, and then into a newly allocated the String for each word. It will also use less memory, because the single large String and vectors of references are more compact than many individual Strings.
A drawback is that you can't directly return the structure of references, because it can't live past the stack frame the holds the single large String. In example_use above, we have to put the large String into a let in order to call words_by_line. It is possible to get around this with unsafe code and wrapping the String and references in a private struct, but that is much more complicated.

Resources