Clap - subcommands with possibly shared sets of default values? - rust

I've been going in circles on this for a while and haven't found a good solution:
I've got a bunch of simulations in the same codebase. I'm trying to be an adult and use command line arguments to pick which simulation runs and with what parameters. The issue is that lots of the arguments are shared between different simulations, and some sets are passed to shared components.
For instance, in the below I have three simulations, two of which share the "Threaded" argument subset, another two share the MazeCli argument subset, which would be wanted by the Maze trait. (In the full thing, there are more arguments and combinations.)
use clap::*;
#[derive(Parser, Debug)]
#[command(author, version, about, long_about = None)]
struct Cli {
#[command(subcommand)]
sim: Simulation,
}
#[derive(Subcommand, Debug)]
enum Simulation {
Maze {
maze_args: MazeCli,
thread_args: ThreadCli,
},
ThreadedMaze {
maze_args: MazeCli,
},
Threaded {
thread_args: ThreadCli,
},
}
trait Maze {
fn new(args: MazeCli) -> Self;
}
#[derive(Args, Debug, Clone)]
struct MazeCli {
squares: usize,
openness: f64,
}
#[derive(Args, Debug, Clone)]
struct ThreadCli {
threads: usize,
}
fn main() {
let config = Cli::parse();
println!("{:?}", config);
}
That version fails because it wants the MazeCli and ThreadCli to implement ValueEnum (as far as I can tell - the actual error is long and unhelpful). ValueEnum can't be derived for structs, though. I've tried a few other approaches, but have not gotten anything to compile.
I could do everything as one flat arg list, but then error messages won't tell you what's expected for the specific subcommand you run.
I could manually specify every list, but that's a bunch of boilerplate, especially when default values are factored in.
What's the right way to do this? It it something clap supports at all?
Bonus questions :
If this is possible at all, is there any way for specific subcommands to override generic defaults for their specific case (like, if one simulation wanted twice as many threads as the norm by default.)
Can I make the default values somehow fall back to a Default trait implementation for that struct?

Use #[command(flatten)]:
#[derive(Subcommand, Debug)]
enum Simulation {
Maze {
#[command(flatten)]
maze_args: MazeCli,
#[command(flatten)]
thread_args: ThreadCli,
},
ThreadedMaze {
#[command(flatten)]
maze_args: MazeCli,
},
Threaded {
#[command(flatten)]
thread_args: ThreadCli,
},
}
Can I make the default values somehow fall back to a Default trait implementation for that struct?
Use default_value_t without a parameter. It defaults to Default::default():
#[derive(Args, Debug, Clone)]
struct MazeCli {
#[arg(long, default_value_t)]
squares: usize, // Will be optional with a default of 0
openness: f64,
}
If this is possible at all, is there any way for specific subcommands to override generic defaults for their specific case (like, if one simulation wanted twice as many threads as the norm by default.)
I don't think it's possible without using separate structs.

Related

How do I inspect function arguments at runtime in Rust?

Say I have a trait that looks like this:
use std::{error::Error, fmt::Debug};
use super::CheckResult;
/// A Checker is a component that is responsible for checking a
/// particular aspect of the node under investigation, be that metrics,
/// system information, API checks, load tests, etc.
#[async_trait::async_trait]
pub trait Checker: Debug + Sync + Send {
type Input: Debug;
/// This function is expected to take input, whatever that may be,
/// and return a vec of check results.
async fn check(&self, input: &Self::Input) -> anyhow::Result<Vec<CheckResult>>;
}
And say I have two implementations of this trait:
pub struct ApiData {
some_response: String,
}
pub MetricsData {
number_of_events: u64,
}
pub struct ApiChecker;
impl Checker for ApiChecker {
type Input = ApiData;
// implement check function
}
pub struct MetricsChecker;
impl Checker for MetricsChecker {
type Input = MetricsData;
// implement check function
}
In my code I have a Vec of these Checkers that looks like this:
pub struct MyServer {
checkers: Vec<Box<dyn Checker>>,
}
What I want to do is figure out, based on what Checkers are in this Vec, what data I need to fetch. For example, if it just contained an ApiChecker, I would only need to fetch the ApiData. If both ApiChecker and MetricsChecker were there, I'd need both ApiData and MetricsData. You can also imagine a third checker where Input = (ApiData, MetricsData). In that case I'd still just need to fetch ApiData and MetricsData once.
I imagine an approach where the Checker trait has an additional function on it that looks like this:
fn required_data(&self) -> HashSet<DataId>;
This could then return something like [DataId::Api, DataId::Metrics]. I would then run this for all Checkers in my vec and then I'd end up a complete list of data I need to get. I could then do some complicated set of checks like this:
let mut required_data = HashSet::new();
for checker in checkers {
required_data.union(&mut checker.required_data());
}
let api_data: Option<ApiData> = None;
if required_data.contains(DataId::Api) {
api_data = Some(get_api_data());
}
And so on for each of the data types.
I'd then pass them into the check calls like this:
api_checker.check(
api_data.expect("There was some logic error and we didn't get the API data even though a Checker declared that it needed it")
);
The reasons I want to fetch the data outside of the Checkers is:
To avoid fetching the same data multiple times.
To support memoization between unrelated calls where the arguments are the same (this could be done inside some kind of Fetcher trait implementation for example).
To support generic retry logic.
By now you can probably see that I've got two big problems:
The declaration of what data a specific Checker needs is duplicated, once in the function signature and again from the required_data function. This naturally introduces bug potential. Ideally this information would only be declared once.
Similarly, in the calling code, I have to trust that the data that the Checkers said they needed was actually accurate (the expect in the previous snippet). If it's not, and we didn't get data we needed, there will be problems.
I think both of these problems would be solved if the function signature, and specifically the Input associated type, was able to express this "required data" declaration on its own. Unfortunately I'm not sure how to do that. I see there is a nightly feature in any that implements Provider and Demand: https://doc.rust-lang.org/std/any/index.html#provider-and-demand. This sort of sounds like what I want, but I have to use stable Rust, plus I figure I must be missing something and there is an easier way to do this without going rogue with semi dynamic typing.
tl;dr: How can I inspect what types the arguments are for a function (keeping in mind that the input might be more complex than just one thing, such as a struct or tuple) at runtime from outside the trait implementer? Alternatively, is there a better way to design this code that would eliminate the need for this kind of reflection?
Your problems start way earlier than you mention:
checkers: Vec<Box<dyn Checker>>
This is an incomplete type. The associated type Input means that Checker<Input = ApiData> and Checker<Input = MetricsData> are incompatible. How would you call checkers[0].check(input)? What type would input be? If you want a collection of "checkers" then you'll need a unified API, where the arguments to .check() are all the same.
I would suggest a different route altogether: Instead of providing the input, provide a type that can retrieve the input that they ask for. That way there's no need to coordinate what type the checkers will ask for in a type-safe way, it'll be inherent to the methods the checkers themselves call. And if your primary concern is repeatedly retrieving the same data for different checkers, then all you need to do is implement caching in the provider. Same with retry logic.
Here's my suggestion:
struct DataProvider { /* cached api and metrics */ }
impl DataProvider {
fn fetch_api_data(&mut self) -> anyhow::Result<ApiData> { todo!() }
fn fetch_metrics_data(&mut self) -> anyhow::Result<MetricsData> { todo!() }
}
#[async_trait::async_trait]
trait Checker {
async fn check(&self, data: &mut DataProvider) -> anyhow::Result<Vec<CheckResult>>;
}
struct ApiAndMetricsChecker;
#[async_trait::async_trait]
impl Checker for ApiAndMetricsChecker {
async fn check(&self, data: &mut DataProvider) -> anyhow::Result<Vec<CheckResult>> {
let _api_data = data.fetch_api_data()?;
let _metrics_data = data.fetch_metrics_data()?;
// do something with api and metrics data
todo!()
}
}

A macro registering marked structures into an enum

I am searching for a macro, or a procedural macro, which could register some marked structures (or enums) as variants of an enum.
Moreover, I would like, that if the marked structures are implementing a trait, then the trait would be implemented by this enum.
For example, I would like something like that:
trait MyTrait {
fn print_it(&self);
}
#[register(RegEnum,MyTrait)]
struct Foo;
#[register(RegEnum,MyTrait)]
struct Bar;
impl MyTrait for Foo {
fn print_it(&self) { println!("Foo"); }
}
impl MyTrait for Bar {
fn print_it(&self) { println!("Bar"); }
}
fn main() {
let reg_1 = RegEnum::Foo;
let reg_2 = RegEnum::Bar;
// This will print "Foo"
reg_1.print_it();
// This will print "Bar"
reg_2.print_it();
}
Notice that the macro should build the enum, by considering which structures are marked. The enum would not be defined by the programmer.
Is there a way to do that?
First, you have no way of knowing how much structs are annotated with the attribute. So you have to pass this number for each invocation.
The macro should cache the struct, and when that number is reached, it'll build and emit the trait. It should also error if the number is already reached, so you'll get future-proof against new structs instead of silent miscompilation.
Take a look in enum_dispatch's source for an example of a macro that does that.
Note that order of items is not guaranteed, and moreover, the macro can be called with only part of the items with incremental compilation. Also, as far as I am aware, there is no guarantee that the compiler will use the same process for each invocation, so theoretically it may break, but practically it works well (not sure whether it still works with incremental compilation, though).

Why does switching from struct to enum breaks API, exactly?

I encountered an interesting change in a public PR.
Initially they had:
#[derive(Debug, Clone, PartialEq, Eq, Copy)]
pub struct ParseError(ParseErrorKind);
#[derive(Debug, Clone, PartialEq, Eq, Copy)]
enum ParseErrorKind {
OutOfRange // ... omitting other values here to be short
}
ParseError cannot be instantiated by clients because ParseErrorKind is private. They are making that enum public now, which seems ok, but I suggested an alternative: have ParseError be an enum itself, and leverage the type system instead of imitating it with the notion of "kind". They told me that would be an API breakage, and therefore was not ok.
I think I understand why in theory a struct and an enum are different. But I am not sure to understand why it is incompatible in this precise case.
Since the struct ParseError had no mutable field and cannot be instantiated by clients, there was nothing we could do with the type but to assign it and compare it. It seems both struct and enum support that, so client code is unlikely to require a change to compile with a newer version exposing an enum instead of struct. Or did I miss another use we could have with the struct, that would result in requiring a change in client code?
However there might be an ABI incompatibility too. How does Rust handle the struct in practice, knowing that only the library can construct it? Is there any sort of allocation or deallocation mechanism that requires to know precisely what ParseError is made of at buildtime? And does switching from that exact struct to an enum impact that? Or could it be safe in this particular case? And is that relevant to try to maintain the ABI since it is not guaranteed so far?
That's because every struct has fields, and hence this pattern will work for any struct, but will not compile with an enum:
struct Foo {}
fn returns_a_foo() -> Foo {
// anything that may return a Foo
}
if let Foo { .. } = returns_a_foo() {}
For example, this code compiles:
fn main() {
if let String { .. } = String::new() {}
}
Playground.
And while probably not code you'd write on your own, it's still possible to write, and additionally, possible to generate through a macro. Note that this is then, obviously, not compatible with an enum pattern match:
if let Option { .. } = None {
// Compile error.
}
Playground.

Updating the elements of a vector containing polymorphic instances in Rust

I am trying to use polymorphism in Rust but am having great difficulties. Here are my basic structures:
trait Constant { ... }
struct ConstantString { ... }
impl Constant for ConstantString { ... }
struct ConstantClass { ... }
impl Constant for ConstantClass { ... }
// other Constant implementations
struct JavaClass
constants: Vec<Box<dyn Constant>>
First, I cannot manage to downcast say a Constant to a ConstantString, even when using match.
Also, I cannot manage to go through the constants and initialize them, especially as they reference each other, so each initialization needs to go through the constants vector. I tried various versions of the following inside a JavaClass method:
for constant in self.constants.iter_mut() {
constant.init(&self);
}
to no avail, as I am hitting multiple borrow or immutable borrow errors.
But am I even using the right approach? Rust is behaving very differently than other languages when it comes to memory management.
After some further tries, here's how I worked it out. First of all, I got rid of the Constant trait and created a collection for each subtype, e.g.
pub struct JavaClass {
constants_class: HashMap<usize, ConstantClass>,
constants_string: HashMap<usize, ConstantString>,
constants_string_ref: HashMap<usize, ConstantStringRef>,
constants_method: HashMap<usize, ConstantMethod>,
constants_field: HashMap<usize, ConstantField>,
constants_name_type: HashMap<usize, ConstantNameType>
}
Also, by building those inside the JavaClass constructor I was able to avoid any mutable error. I found it was easier to manipulate collections declared locally than if they belong to self.

Can I implement a trait while capturing the environment?

I'm trying to implement A* search for Advent of Code 2019 (Yes, Slowpoke, I know). I've started like this:
fn find_path(start: Coords, goal: Coords, map: &Vec<Vec<char>>) -> Vec<Coords> {
struct Node {
distance: u32,
pos: Coords,
}
impl PartialEq for Node {
fn eq(&self, other: &Self) -> bool {
self.distance + manhattan(self.pos, goal) == other.distance + manhattan(other.pos, goal)
}
}
...
let mut edge = BinaryHeap::new();
edge.push(Node{distance: 0, pos: start});
...
Coords is a struct with an x and an y. The problem here is that I can't use goal in the trait, because it's not in scope. A closure would be able to capture it, but I am doubtful whether I can use a closure instead of a fn here. If so, what's the syntax? If not, is there a fundamental reason why it can't be done? I wasn't able to find an answer online.
I know the simple solution is to include goal in Node, but it's redundant because I'll be creating thousands of Node during A*, all of which will have the same goal, wasting memory and CPU cycles. In principle, goal could be a single global variable, but that's an untidy option.
Even though I'm sure including goal in Node would work fine in practice, I'd rather not.
Is there another idiomatic way of accomplishing what I'm trying to do?
No, you cannot capture any environment in an impl block. Closures capture the environment, so you cannot use a closure as a function in an impl block.
Functions and methods are designed to be called from any context, so there's no guarantee that there even is an environment to be captured. The fact that we can declare types, functions, methods, etc. inside of another function is basically a syntax nicety.
I'd probably create a type that wraps Node and goal:
struct Foo(Node, Coord);
impl Foo {
fn value(&self) -> WhateverType {
self.0.distance + manhattan(self.0.pos, self.1)
}
}
impl PartialEq for Foo {
fn eq(&self, other: &Self) -> bool {
self.value() == other.value()
}
}
See also:
The petgraph crate
Implement graph-like datastructure in Rust
Which algorithm from petgraph will find the shortest path from A to B?
The reason why I need PartialEq is that the Nodes are subsequently going into a BinaryHeap. If the BinaryHeap had an option to provide a custom comparator, that would be perfect: I wouldn't need Node to be Ord, and I could let goal reside in that comparator (closure).
It seems this is being considered: https://github.com/rust-lang/rust/pull/69454
In the meantime, there's a crate that provides that functionality: https://crates.io/crates/binary-heap-plus
For now, I'm going to accept the overhead of goal inside Node but it's good to know my options.

Resources