Options, and_then() and tuples - rust

I'm convinced there is a way to handle this 'cleanly', I am just not quite figuring it out.
use git2::Repository;
// Prints out the current branch and sha if it exists.
fn report_repo() -> () {
Repository::open(".")
.ok()
.and_then(branch_and_sha)
.and_then(|branch_sha| => { // Fails here with E0061
let (branch, sha) = branch_sha;
println!("Branch={} sha={}", branch, sha);
None
});
}
fn branch_and_sha(repo: Repository) -> Option<(String, String)> {
match repo.head().ok() {
Some(reference) => {
match (reference.name(), reference.target()){
(Some(branch), Some(sha)) => Some((branch.to_string(), sha.to_string())),
_ => None
}
},
None => None
}
}
The error that arises is E0061, and I think it's because the 'value' in the Option returned from branch_and_sha() is a tuple. branch_and_sha() effectively says, "If there is a repository, get it's reference, and if that exists, if it has both a name (branch) and target (sha), return an Option<(String, String)> with that info - otherwise return None. And the reporting function wants to do something if all of the Repository, branch and sha can be found - and nothing otherwise. (It shouldn't error or panic.)
To some degree this is contrived - it's an example of an optimistic reporting function similar to several I'd like to write. I'm looking for a clean, idiomatic way to do it. The key thrust is 'several depths and several branches could return None which should cause a no-op, and otherwise make specific (leaf) info available.' The specific error is how I should be handling the and_then function, which is surprisingly difficult to find similar problems about.

First off, you have a minor typo. Closures in Rust don't use =>. So your closure should look more like
.and_then(|branch_sha| { // Note: No => here
let (branch, sha) = branch_sha;
println!("Branch={} sha={}", branch, sha);
None
});
Then the error we get is
--> so_cleanly.rs:15:10
|
15 | .and_then(|branch_sha| {
| ^^^^^^^^ cannot infer type for type parameter `U` declared on the associated function `and_then`
|
and_then is declared with two generic arguments: U and F (technically, there's also T, but that's determined by the type of the receiver self, so we won't worry about it). Now, F is the type of the closure and is always determined by the argument. On the other hand, U is the return type of the closure.
The closure must return an Option<U>. Rust needs to look at the closure and determine what its return type is. What does the closure return? It returns None, and None can be Option<U> for any U in existence. Rust doesn't know which one to use. We need to tell it. We could do that on the line we return None from
None as Option<()>
or in the and_then call itself.
.and_then::<(), _>(|branch_sha| { ... })
However, the compiler is making a very valid point. and_then and company produce a result of type Option, which you're ignoring. You're writing a piece of code which has side effects and doesn't produce a value, which is sensible, but you're using a functional interface intended for returning values. It can be done, but it's probably not idiomatic. I had to look at your code a few times before realizing that the () return value was not a typo.
One option is to return Option<()> from your report_repo. The () on the inside indicates that we don't care about anything except the side effects, and the Option lets the caller of report_repo handle (or ignore) any errors that occur during the process, whereas your current function simply suppresses all errors unconditionally.
fn report_repo() -> Option<()> {
Repository::open(".")
.ok()
.and_then(branch_and_sha)
.map(|branch_sha| {
let (branch, sha) = branch_sha;
println!("Branch={} sha={}", branch, sha);
// Implicit return of () here, which we could do explicitly if we wanted
})
}
I've made several subtle changes here. The return type is Option<()> now. In accordance with that, there's no semicolon at the end of the line inside the function (we're returning that value). Finally, the last and_then is a map, since the final step can't fail and simply does some work on Some.
That's an improvement, but it's probably still not how I'd write this function.
Instead, if you're performing code for side effects, consider using the ? operator, which does and_then and map shenanigans but keeps the control flow relatively linear. and_then and its friends are great for constructing values, but the point of your function is that it should read like a sequence of instructions, not a constructor for a value. This is how I would write that function.
fn report_repo() -> Option<()> {
let repo = Repository::open(".").ok()?;
let (branch, sha) = branch_and_sha(repo)?;
println!("Branch={} sha={}", branch, sha);
Some(())
}
Each line that ends in a ? effectively says "If this thing is None, return None now. Otherwise, keep going." But a cursory glance of the code reads "open the repo, branch and sha, and then print", which is exactly what you want people to see at a glance.
If we wanted to be really proper about this, we should probably return Result<(), Error>, where Error is some more detailed error type, but that's overkill for this simple example snippet.

You can chose an if let style too, you do not need the option value so just stop using them at some point it feels more comfortable:
fn report_repo() {
if let Some((branch, sha)) = Repository::open(".").ok().and_then(branch_and_sha) {
println!("Branch={} sha={}", branch, sha);
}
}

Related

How can I best pattern match in Result::map

I know I can pattern match like this in rust
some_result.map(|some_number| {
match some_number {
1 => HttpResponse::NoContent().finish(),
_ => HttpResponse::NotFound().finish(),
}
})
but in Scala I can do like this
some_option.map {
case 1 => ???
case _ => ???
}
Is there a way to avoid the repetition of the variable some_number in the rust code above?
EDIT:
I found out i could do it this way, but i still think the original question answered my question best.
Ok(match result {
Ok(1) => HttpResponse::NoContent(),
Ok(_) => HttpResponse::NotFound(),
Err(_) => HttpResponse::InternalServerError()
}.finish())
its all about the context and in this case i didnt include much of it ...
EDIT #2:
Changed to another answer. I really like inverting the problem. And if else is not idiomatic rust afaik.
If we're just bike-shedding style, you could avoid introducing some_number entirely by matching on the whole result:
match some_result {
Ok(1) => Ok(HttpResponse::NoContent().finish()),
Ok(_) => Ok(HttpResponse::NotFound().finish()),
Err(e) => Err(e)
};
But this just trades some_number for some Oks and Errs. I would generally prefer the original style, but beauty is in the eye of the beholder.
There is no way that I know of to avoid the repetition, however I think it might be more idiomatic to simply write
some_result.map(
|some_number|
if some_number == 1 {
HttpResponse::NoContent().finish()
} else {
HttpResponse::NotFound().finish()
}
)
since there is no need for a match in such a simple situation.
EDIT: Why is an if statement more idiomatic than a match on in this situation?
The general idea is that match is more powerful than if (every if statement could be replaced by a match statement), therefore if is more specific, and thus should be used when possible (without matches!). The only exception is the switch/case use-case, which could be expressed as an if statement but a match one should be used.
But this is more of a guideline than an argument, so let's break down the reason why if is more idiomatic.
You start with something like
match some_number {
1 => { ... }
_ => { ... }
}
In the situation of
match x {
Pattern => { ... }
_ => { ... }
}
if let is more idiomatic. Since we're in this situation, we can rewrite
if let 1 = some_number { ... } else { ... }
However, in our case, we are matching a single literal, so it is more idiomatic to simply transform the if let into
if some_number == 1 { ... } else { ... }
The only exception is when you are planning to add more branching to the match statement, like
match some_number {
1 => { ... }
2 => { ... }
_ => { ... }
}
in which case it would make sense to keep it like that.
Keep in mind that being idiomatic also means being able to convey by the way you code your intention so that your programming becomes clear.
Note: Why is this more idiomatic than than simply matching the whole result?
Most of the time, being idiomatic is a synonym of being concise. If you are being verbose, it's a good hint you're not being idiomatic. However, it'is not always true, and this is a good example of being idiomatic meaning being more verbose.
When you are matching a result, you are expressing that you want to handle both the error and the ok case. When you are mapping, you are instead expressing that you are only interested in the ok case.
Most of the time, people don't want to handle manually the error case, and just add a ?. However, when they don't, most of the time they want to handle the error case. Finally, they might want not to handle the error, but also not to get rid of it right away.
These three choices are increasingly verbose to implement due to the frequency of usage. This means that you should not aim for the one that is less verbose, but instead for the solution that matches your intention, so that when one reads your code, it's easier to grasp your intention just by your choice structure of implementation.
In your original question, you seemed not to care about the error case, and also you didn't seem to want to get rid of it with ?, which is why I think that having an if statement inside a map is more idiomatic, in the sense that it is more clear and communicates better what you want to achieve. Indeed, I didn't even think about the error case, which is, IMO, what idiomatic means (ie. the capacity of adapting the way one thinks to ease the comprehension of code by writing it in the most expressive way, for a given language).
Finally, I would point out the most idiomatic choice for handling an error, that you didn't seem to take into account, and I wonder why.
if some_result? == 1 {
HttpResponse::NoContent().finish()
} else {
HttpResponse::NotFound().finish()
}
Where you have implemented an appropriate conversion from the eventual error type to the return type.

Moving context into several closures?

I have found a way to move context into several closures, but it looks ugly. I do it with help of Rc and cloning each variable I need to use for each closure. Particularly I don't like to clone every variable for every closure I want to use:
let mut context = Rc::new( Context { a : 13 } );
..
let context_clone_1 = Rc::clone( &context );
engine.on_event1( Box::new( move ||
{
println!( "on_event1 : {}", context_clone_1.a );
...
let context_clone_2 = Rc::clone( &context );
engine.on_event2( Box::new( move ||
{
println!( "on_event1 : {}", context_clone_1.a );
...
It is an extensive way to go and I feel there must be a better way to do it. Also, uncommenting line // context_clone_1.a += 1; breaks the compilation. What is the proper way of solving problems like this in Rust?
Here is a playground with minimal code.
There are two "problems" here:
Since you specifically asked about context_clone_1.a += 1;: When putting a value into an Rc, there could be multiple references to that value, derived from the independent Rc owners. If mutation was allowed, this would also allow simultaneous mutation and aliasing, which is not allowed in Rust; therefore Rc does not allow mutating its inner value. A common approach to regain mutability is to put the value into a RefCell, which provides mutability through try_borrow_mut() with a runtime check that ensures no aliasing occurs. A Rc<RefCell<T>> is commonly seen in Rust.
Regarding the use of Rc: The way your code is currently set up is actually fine, at least if that's how it should work. The way the code is currently structured allows for flexibility, including cases where multiple Context-objects provide callback implementations on different events. For example, this is currently possible:
let context1 = Context { a : 13 };
engine.on_event1(Box::new(move ||
{
println!("on_event1 : {}", context1.a );
});
let context2 = Context { a : 999 };
engine.on_event2(Box::new(move ||
{
println!("on_event1 : {}", context2.a );
});
In case you have exactly one Context (as in your example), and since the Engine needs to make sure that all callbacks are alive while it itself is alive, you'll need to put each callback - which is structured as a completely separate thing - into a Rc. In your case, all Rc end up pointing to the same object; but they don't have to and this is what your code currently allows for.
A more simple solution would be to define a trait for Context, something along the lines of
trait EventDriver {
fn event1(&mut self, &Engine);
fn event2(&mut self, &Engine);
}
... and then have Context implement the trait. The Engine-struct then becomes generic over E: EventDriver and Context becomes the E in that. This solution only allows for exactly one instance of Context to provide event callbacks. But since Engine is the owner of that object, it can be sure that all callbacks are alive while it itself is alive and the whole Rc-thing goes away.

How can I make this Rust code more idiomatic

Recently I started to learn Rust and one of my main struggles is converting years of Object Oriented thinking into procedural code.
I'm trying to parse a XML that have tags that are processed by an specific handler that can deal with the data it gets from the children.
Further more I have some field members that are common between them and I would prefer not to have to write the same fields to all the handlers.
I tried my hand on it and my code came out like this:
use roxmltree::Node; // roxmltree = "0.14.0"
fn get_data_from(node: &Node) -> String {
let tag_name = get_node_name(node);
let tag_handler: dyn XMLTagHandler = match tag_name {
"name" => NameHandler::new(),
"phone" => PhoneHandler::new(),
_ => DefaultHandler::new()
}
if tag_handler.is_recursive() {
for child in node.children() {
let child_value = get_data_from(&child);
// do something with child value
}
}
let value: String = tag_handler.value()
value
}
// consider that handlers are on my project and can be adapted to my needs, and that XMLTagHandler is the trait that they share in common.
My main issues with this are:
This feels like a Object oriented approach to it;
is_recursive needs to be reimplemented to each struct because they traits cannot have field members, and I will have to add more fields later, which means more boilerplate for each new field;
I could use one type for a Handler and pass to it a function pointer, but this approach seems dirty. e.g.:=> Handler::new(my_other_params, phone_handler_func)
This feels like a Object oriented approach to it
Actually, I don't think so. This code is in clear violation of the Tell-Don't-Ask principle, which falls out from the central idea of object-oriented programming: the encapsulation of data and related behavior into objects. The objects (NameHandler, PhoneHandler, etc.) don't have enough knowledge about what they are to do things on their own, so get_data_from has to query them for information and decide what to do, rather than simply sending a message and letting the object figure out how to deal with it.
So let's start by moving the knowledge about what to do with each kind of tag into the handler itself:
trait XmlTagHandler {
fn foreach_child<F: FnMut(&Node)>(&self, node: &Node, callback: F);
}
impl XmlTagHandler for NameHandler {
fn foreach_child<F: FnMut(&Node)>(&self, _node: &Node, _callback: F) {
// "name" is not a recursive tag, so do nothing
}
}
impl XmlTagHandler for DefaultHandler {
fn foreach_child<F: FnMut(&Node)>(&self, node: &Node, callback: F) {
// all other tags may be recursive
for child in node.children() {
callback(child);
}
}
}
This way you call foreach_child on every kind of Handler, and let the handler itself decide whether the right action is to recurse or not. After all, that's why they have different types -- right?
To get rid of the dyn part, which is unnecessary, let's write a little generic helper function that uses XmlTagHandler to handle one specific kind of tag, and modify get_data_from so it just dispatches to the correct parameterized version of it. (I'll suppose that XmlTagHandler also has a new function so that you can create one generically.)
fn handle_tag<H: XmlTagHandler>(node: &Node) -> String {
let handler = H::new();
handler.foreach_child(node, |child| {
// do something with child value
});
handler.value()
}
fn get_data_from(node: &Node) -> String {
let tag_name = get_node_name(node);
match tag_name {
"name" => handle_tag::<NameHandler>(node),
"phone" => handle_tag::<PhoneHandler>(node),
_ => handle_tag::<DefaultHandler>(node),
}
}
If you don't like handle_tag::<SomeHandler>(node), also consider making handle_tag a provided method of XmlTagHandler, so you can instead write SomeHandler::handle(node).
Note that I have not really changed any of the data structures. Your presumption of an XmlTagHandler trait and various Handler implementors is a pretty normal way to organize code. However, in this case, it doesn't offer any real improvement over just writing three separate functions:
fn get_data_from(node: &Node) -> String {
let tag_name = get_node_name(node);
match tag_name {
"name" => get_name_from(node),
"phone" => get_phone_from(node),
_ => get_other_from(node),
}
}
In some languages, such as Java, all code has to be part of some class – so you can find yourself writing classes that don't exist for any other reason than to group related things together. In Rust you don't need to do this, so make sure that any added complication such as XmlTagHandler is actually pulling its weight.
is_recursive needs to be reimplemented to each struct because they traits cannot have field members, and I will have to add more fields later, which means more boilerplate for each new field
Without more information about the fields, it's impossible to really understand what problem you're facing here; however, in general, if there is a family of structs that have some data in common, you may want to make a generic struct instead of a trait. See the answers to How to reuse codes for Binary Search Tree, Red-Black Tree, and AVL Tree? for more suggestions.
I could use one type for a Handler and pass to it a function pointer, but this approach seems dirty
Elegance is sometimes a useful thing, but it is subjective. I would recommend closures rather than function pointers, but this suggestion doesn't seem "dirty" to me. Making closures and putting them in data structures is a very normal way to write Rust code. If you can elaborate on what you don't like about it, perhaps someone could point out ways to improve it.

How to solve "temporary value dropped while borrowed"

I'm learning Rust (coming from Javascript), and in Rust I'm trying to create a component-based UI template. This is the minimum example I can reproduce in a Rust playground.
I have a Vector of Enums. I want to add components that will return a new set of vectors. The component returns a vector from a member function that is not a reference.
let _new_children = match new_view.unwrap() {
View::View(children) => children, // reference &Vec<View>
View::Render(ref component) => component.render(), // struct Vec<View>
};
let _new_children = match new_view.unwrap() {
View::View(children) => children,
View::Render(ref component) => &component.render(), // temporary value dropped while borrowed
};
How can I solve this problem? Do I need to rewrite the way functions check the difference between two vectors (itertools has a zip_longest method, which I also use).
In order to return a reference to a temporary you need to make the temporary live longer than the use of that reference.
In your code the temporary object is dropped as soon as the match branch ends, so a reference to it cannot escape the match.
There is a nice trick in Rust to extend the lifetime of a temporary. It consist in declaring the temporary name+ in the larger block where you want it to live, without initializing it. Then you assign-initialize it where the object temporary is actually created. Something like this:
let tmp_new;
let new_children = match new_view.unwrap() {
View::View(children) => children,
View::Render(ref component) => {
tmp_new = component.render();
&tmp_new }
};
Now new_children is of type &Vec<_> and it will live for the shorter of the two lifetimes of the match branches.
Note that unless you initialize the temporary in every branch of your match you cannot use tmp_new after it, because you will get:
use of possibly-uninitialized variable: tmp_new

Destructuring while iterating through a Vec<Enum(String)>

I have an enum called Token (can you guess what I'm trying to build? :P)
It looks an awful lot like this:
enum Token {
Paren(String),
Number(String),
Name(String),
}
Now, I have a function with the following signature:
fn tokenizer(input: String) -> Vec<Token>
I have no reason to believe it's not basically working, so I obviously have a Vec<Token>.
Now, in my main function I have this:
let tokens = tokenizer("(add 44 5)".to_owned());
and I'd like to do something like the following:
let mut iter = tokens.iter();
while let Some(token) = iter.next() {
match token {
Token::Paren(p) => println!("Token::Paren({})", p),
Token::Number(p) => println!("Token::Number({})", p),
Token::Name(p) => println!("Token::Name({})", p),
}
}
But obviously the borrow-checker isn't letting me get off so easily.
What's the proper way to do this? Obviously, as you can hopefully tell by the nature of this project, I'm just trying to learn Rust, so any advice would be helpful, even if it's not really related directly to the problem. =)
Your enum owns the strings that are passed in and so destructuring them will attempt to capture them by value (and move it out of the enum).
You can fix this by using ref p while destructuring to capture by reference.. stopping the move.
match *token {
Token::Paren(ref p) => println!("Token::Paren({})", p),
Token::Number(ref p) => println!("Token::Number({})", p),
Token::Name(ref p) => println!("Token::Name({})", p),
}
Working sample on the Playpen
Note that you'll also need to dereference the token because you're using iter() which returns references to the tokens in the vector. If you used into_iter(), it would transfer ownership out and you can match on non-references.. however your tokens variable is now broken as the values are moved.

Resources