Cannot get Hash::get_mut() and File::open() to agree about mutability - rust

During a lengthy computation, I need to look up some data in a number of different files. I cannot know beforehand how many or which files exactly, but chances are high that each file is used many times (on the order of 100 million times).
In the first version, I opened the file (whose name is an intermediate result of the computation) each time for lookup.
In the second version, I have a HashMap<String, Box<File>> where I remember already open files and open new ones lazily on demand.
I couldn't manage to handle the mutable stuff that arises from the need to have Files to be mutable. I got something working, but it looks overly silly:
let path = format!("egtb/{}.egtb", self.signature());
let hentry = hash.get_mut(&self.signature());
let mut file = match hentry {
Some(f) => f,
None => {
let rfile = File::open(&path);
let wtf = Box::new(match rfile {
Err(ioe) => return Err(format!("could not open EGTB file {} ({})", path, ioe)),
Ok(opened) => opened,
});
hash.insert(self.signature(), wtf);
// the following won't work
// wtf
// &wtf
// &mut wtf
// So I came up with the following, but it doesn't feel right, does it?
hash.get_mut(&self.signature()).unwrap()
}
};
Is there a canonical way to get a mut File from File::open() or File::create()? In the manuals, this is always done with:
let mut file = File:open("foo.txt")?;
This means my function would have to return Result<_, io::Error> and I can't have that.
The problem seems to be that with the hash-lookup Some(f) gives me a &mut File but the Ok(f) from File::open gives me just a File, and I don't know how to make a mutable reference from that, so that the match arm's types match. I have no clear idea why the version as above at least compiles, but I'd very much like to learn how to do that without getting the File from the HashMap again.

Attempts to use wtf after it has been inserted into the hashmap fail to compile because the value was moved into the hashmap. Instead, you need to obtain the reference into the value stored in the hashmap. To do so without a second lookup, you can use the entry API:
let path = format!("egtb/{}.egtb", self.signature());
let mut file = match hash.entry(self.signature()) {
Entry::Occupied(e) => e.into_mut(),
Entry::Vacant(e) => {
let rfile = File::open(&path)
.map_err(|_| format!("could not open EGTB file {} ({})", path, ioe))?;
e.insert(Box::new(rfile))
}
};
// `file` is `&mut File`, use it as needed
Note that map_err() allows you to use ? even when your function returns a Result not immediately compatible with the one you have.
Also note that there is no reason to box the File, a HashMap<String, File> would work just as nicely.

Related

error handling when unwrapping several try_into calls

I have a case where I need to parse some different values out from a vector.
I made a function for it, that returns a option, which either should give a option or a None, depending on whether the unwrapping succeeds.
Currently it looks like this:
fn extract_edhoc_message(msg : Vec<u8>)-> Option<EdhocMessage>{
let mtype = msg[0];
let fcnt = msg[1..3].try_into().unwrap();
let devaddr = msg[3..7].try_into().unwrap();
let msg = msg[7..].try_into().unwrap();
Some(EdhocMessage {
m_type: mtype,
fcntup: fcnt,
devaddr: devaddr,
edhoc_msg: msg,
})
}
But, I would like to be able to return a None, if any of the unwrap calls fail.
I can do that by pattern matching on each of them, and then explicitly return a None, if anything fails, but that would a lot of repeated code.
Is there any way to say something like:
"if any of these unwraps fail, return a None?"
This is exactly what ? does. It's even shorter than the .unwrap() version:
fn extract_error_message(msg: Vec<u8>) -> Option<EdhocMessage> {
let m_type = msg[0];
let fcntup = msg[1..3].try_into().ok()?;
let devaddr = msg[3..7].try_into().ok()?;
let edhoc_msg = msg[7..].try_into().ok()?;
Some(EdhocMessage {
m_type,
fcntup,
devaddr,
edhoc_msg
})
}
See this relevant part of the Rust Book.

Is there a way to treat an absolute std::path::Path as a relative one when joining? [duplicate]

I think this should be quite doable, given that there is a nice function canonicalize which normalizes paths (so I can start by normalizing my two input paths) and Path and PathBuf give us a way of iterating over the parts of paths through components. I imagine something could be worked out here to factor out a common prefix, then prepend as many .. components as remain in the anchor path to what remains of the initial input path.
My problem seems to be pretty common:
How to find relative path given two absolute paths?
Find a path in Windows relative to another
This now exists as the pathdiff crate, using the code from kennytm's answer
You can use it as:
extern crate pathdiff;
pathdiff::diff_paths(path, base);
where base is where the relative path should be applied to obtain path
If one path is a base of another, you could use Path::strip_prefix, but it won't calculate the ../ for you (instead returns an Err):
use std::path::*;
let base = Path::new("/foo/bar");
let child_a = Path::new("/foo/bar/a");
let child_b = Path::new("/foo/bar/b");
println!("{:?}", child_a.strip_prefix(base)); // Ok("a")
println!("{:?}", child_a.strip_prefix(child_b)); // Err(StripPrefixError(()))
The previous incarnation of strip_prefix was path_relative_from which used to add ../, but this behavior was dropped due to symlinks:
The current behavior where joining the result onto the first path unambiguously refers to the same thing the second path does, even if there's symlinks (which basically means base needs to be a prefix of self)
The old behavior where the result can start with ../ components. Symlinks mean traversing the base path and then traversing the returned relative path may not put you in the same directory that traversing the self path does. But this operation is useful when either you're working with a path-based system that doesn't care about symlinks, or you've already resolved symlinks in the paths you're working with.
If you need the ../ behavior, you could copy the implementation from librustc_back (the compiler backend). I didn't find any packages on crates.io providing this yet.
// This routine is adapted from the *old* Path's `path_relative_from`
// function, which works differently from the new `relative_from` function.
// In particular, this handles the case on unix where both paths are
// absolute but with only the root as the common directory.
fn path_relative_from(path: &Path, base: &Path) -> Option<PathBuf> {
use std::path::Component;
if path.is_absolute() != base.is_absolute() {
if path.is_absolute() {
Some(PathBuf::from(path))
} else {
None
}
} else {
let mut ita = path.components();
let mut itb = base.components();
let mut comps: Vec<Component> = vec![];
loop {
match (ita.next(), itb.next()) {
(None, None) => break,
(Some(a), None) => {
comps.push(a);
comps.extend(ita.by_ref());
break;
}
(None, _) => comps.push(Component::ParentDir),
(Some(a), Some(b)) if comps.is_empty() && a == b => (),
(Some(a), Some(b)) if b == Component::CurDir => comps.push(a),
(Some(_), Some(b)) if b == Component::ParentDir => return None,
(Some(a), Some(_)) => {
comps.push(Component::ParentDir);
for _ in itb {
comps.push(Component::ParentDir);
}
comps.push(a);
comps.extend(ita.by_ref());
break;
}
}
}
Some(comps.iter().map(|c| c.as_os_str()).collect())
}
}

Why can't this be done with if?

I'm trying to handle errors received from an async call:
let res: Result<TcpStream, Box<dyn std::error::Error>> = session.runtime().borrow_mut().block_on(async {
let fut = TcpStream::connect(session.configuration().socket()).await?;
Ok(fut)
});
I tried to do it the old school way with an if but the compiler didn't like it:
if res.is_err() {
return Err(res);
}
After some googling I came across this:
let mut stream = match res {
Ok(res) => res,
Err(res) => return Err(res),
};
which feels very much the same but with Rusts' equivalent of a switch statement. Why can't I use the if?
if res.is_err() { return res } should work. Result is an enum with two variants: Ok which by convention holds a "successful" result, and Err which holds error information. As John pointed out, wrapping the existing Result (which happens to hold an Err) in another Err result doesn't make sense - or, more precisely, doesn't match the return type of the function.
When you use match, you unpack the result into its constituent values, and then in the error case re-pack it into a new result. Note that instead of the match statement use can use the ? operator, which would compress the declaration to just:
let mut stream = res?;

Idiomatic rust way to properly parse Clap ArgMatches

I'm learning rust and trying to make a find like utility (yes another one), im using clap and trying to support command line and config file for the program's parameters(this has nothing to do with the clap yml file).
Im trying to parse the commands and if no commands were passed to the app, i will try to load them from a config file.
Now I don't know how to do this in an idiomatic way.
fn main() {
let matches = App::new("findx")
.version(crate_version!())
.author(crate_authors!())
.about("find + directory operations utility")
.arg(
Arg::with_name("paths")
...
)
.arg(
Arg::with_name("patterns")
...
)
.arg(
Arg::with_name("operation")
...
)
.get_matches();
let paths;
let patterns;
let operation;
if matches.is_present("patterns") && matches.is_present("operation") {
patterns = matches.values_of("patterns").unwrap().collect();
paths = matches.values_of("paths").unwrap_or(clap::Values<&str>{"./"}).collect(); // this doesn't work
operation = match matches.value_of("operation").unwrap() { // I dont like this
"Append" => Operation::Append,
"Prepend" => Operation::Prepend,
"Rename" => Operation::Rename,
_ => {
print!("Operation unsupported");
process::exit(1);
}
};
}else if Path::new("findx.yml").is_file(){
//TODO: try load from config file
}else{
eprintln!("Command line parameters or findx.yml file must be provided");
process::exit(1);
}
if let Err(e) = findx::run(Config {
paths: paths,
patterns: patterns,
operation: operation,
}) {
eprintln!("Application error: {}", e);
process::exit(1);
}
}
There is an idiomatic way to extract Option and Result types values to the same scope, i mean all examples that i have read, uses match or if let Some(x) to consume the x value inside the scope of the pattern matching, but I need to assign the value to a variable.
Can someone help me with this, or point me to the right direction?
Best Regards
Personally I see nothing wrong with using the match statements and folding it or placing it in another function. But if you want to remove it there are many options.
There is the ability to use the .default_value_if() method which is impl for clap::Arg and have a different default value depending on which match arm is matched.
From the clap documentation
//sets value of arg "other" to "default" if value of "--opt" is "special"
let m = App::new("prog")
.arg(Arg::with_name("opt")
.takes_value(true)
.long("opt"))
.arg(Arg::with_name("other")
.long("other")
.default_value_if("opt", Some("special"), "default"))
.get_matches_from(vec![
"prog", "--opt", "special"
]);
assert_eq!(m.value_of("other"), Some("default"));
In addition you can add a validator to your operation OR convert your valid operation values into flags.
Here's an example converting your match arm values into individual flags (smaller example for clarity).
extern crate clap;
use clap::{Arg,App};
fn command_line_interface<'a>() -> clap::ArgMatches<'a> {
//Sets the command line interface of the program.
App::new("something")
.version("0.1")
.arg(Arg::with_name("rename")
.help("renames something")
.short("r")
.long("rename"))
.arg(Arg::with_name("prepend")
.help("prepends something")
.short("p")
.long("prepend"))
.arg(Arg::with_name("append")
.help("appends something")
.short("a")
.long("append"))
.get_matches()
}
#[derive(Debug)]
enum Operation {
Rename,
Append,
Prepend,
}
fn main() {
let matches = command_line_interface();
let operation = if matches.is_present("rename") {
Operation::Rename
} else if matches.is_present("prepend"){
Operation::Prepend
} else {
//DEFAULT
Operation::Append
};
println!("Value of operation is {:?}",operation);
}
I hope this helps!
EDIT:
You can also use Subcommands with your specific operations. It all depends on what you want to interface to be like.
let app_m = App::new("git")
.subcommand(SubCommand::with_name("clone"))
.subcommand(SubCommand::with_name("push"))
.subcommand(SubCommand::with_name("commit"))
.get_matches();
match app_m.subcommand() {
("clone", Some(sub_m)) => {}, // clone was used
("push", Some(sub_m)) => {}, // push was used
("commit", Some(sub_m)) => {}, // commit was used
_ => {}, // Either no subcommand or one not tested for...
}

Why is this hashmap search slower than expected?

What is the best way to check a hash map for a key?
Currently I am using this:
let hashmap = HashMap::<&str, &str>::new(); // Empty hashmap
let name = "random";
for i in 0..5000000 {
if !hashmap.contains_key(&name) {
// Do nothing
}
}
This seems to be fast in most cases and takes 0.06 seconds when run as shown, but when I use it in this following loop it becomes very slow and takes almost 1 min on my machine. (This is compiling with cargo run --release).
The code aims to open an external program, and loop over the output from that program.
let a = vec!["view", "-h"]; // Arguments to open process with
let mut child = Command::new("samtools").args(&a)
.stdout(Stdio::piped())
.spawn()
.unwrap();
let collect_pairs = HashMap::<&str, &str>::new();
if let Some(ref mut stdout) = child.stdout {
for line in BufReader::new(stdout).lines() {
// Do stuff here
let name = "random";
if !collect_pairs.contains_key(&name) {
// Do nothing
}
}
}
For some reason adding the if !collect_pairs.contains_key( line increases the run time by almost a minute. The output from child is around 5 million lines. All this code exists in fn main()
EDIT
This appears to fix the problem, resulting in a fast run time, but I do not know why the !hashmap.contains_key does not work well here:
let n: Option<&&str> = collect_pairs.get(name);
if match n {Some(v) => 1, None => 0} == 1 {
// Do something
}
One thing to consider is that HashMap<K, V> uses a cryptographically secure hashing algorithm by default, so it will always be a bit slow by nature.
get() boils down to
self.search(k).map(|bucket| bucket.into_refs().1)
contains_key is
self.search(k).is_some()
As such, that get() is faster for you seems strange to me, it's doing more work!
Also,
if match n {Some(v) => 1, None => 0} == 1 {
This can be written more idiomatically as
if let Some(v) = n {
Ive found my problem, Im sorry I didnt pick up until now. I wasnt checking the return of if !collect_pairs.contains_key(&name) properly. It returns true for some reason resulting in the rest of the if block being run. I assumed it was evaluating to false. Thanks for the help

Resources