Returning objects with internal references

Returning objects with internal references - rust

I'm currently working with rusqlite. I'd like to have an object that implements Iterator and returns my application struct, with values constructed from rows in a DB query.
The obvious way would be to wrap a struct around the Rows or MappedRows objects in the rusqlite API. However, each of these types contains a reference to the Statement object they derive from, and the Statement in turn refers to Connection. Thus, assuming a single long-lived Connection, the Statement object must be preserved for at least the lifetime of any Rows object I wrap.
I understand it's not generally allowed to have internal references within an object, due to concerns about it being moved. But the only way I can think of to structure this code is to preserve the Statement in the same wrapper I use around the Rows object. How can I manage this?

Well, the rusqlite doc specifically explains why Rows doesn't implement Iterator due to exactly those lifetime concerns.
So what you'll want to do is use the statement's query_map method:
https://docs.rs/rusqlite/latest/rusqlite/struct.Statement.html#method.query_map
Note that query_map returns an iterator over the results of applying a function to the rows returned by the query. So then just pass in a function that constructs an Application struct from a row. Just like they do in the example on the main page
let mut stmt = conn.prepare("SELECT id, name, data FROM person")?;
let person_iter = stmt.query_map([], |row| {
Ok(Person {
id: row.get(0)?,
name: row.get(1)?,
data: row.get(2)?,
})
})?;
In this example your person_iter is an iterator over Person which itself doesn't actually wrap anything from the sql stuff (that would be poor separation of concerns anyway).
Unless I'm misunderstanding your question, this should solve your problem.
EDIT to address the comment:
The first issue (creation of application struct can fail separately from rusqlite) can be solved by disentangling things a bit:
In a first step, you'd use query_map purely to extract the raw data that will eventually be used in the application struct creation. And then in a second step you'd use that iterator to create your application struct object, which now can have any return type you want.
Now for your lifetime issues, if you insist on doing a fully lazy evaluation of your application struct, then, yeah, obviously the connection and everything else related to it needs to be kept alive and I'm not sure what the cleanest way would be for that. You could just collect the raw-data iterator into a Vec and then wrap that in an iterator that will build your application structs.

Related

rust uwrap_or_else return a reference

I have a situation in a piece of dynamic programming where i either want to get the pre computed results, or call a function to compute these results.
This is the situation in short
let previous_results HashMap<String, Result> = HashMap::new();
for i in some_values {
let result = previous_results.get(i).unwrap_or_else(|| calculate_results(i))
}
The rust compiler justly complains about the function call and it says
expected reference `&HashMap<std::string::String, Result>`
found struct `HashMap<std::string::String, Result>`
This is because .get normally returns a reference to the object,and not the actual object, but the function returns an actual object. So i could just return a reference to what the function returns
let result = previous_results.get(i).unwrap_or_else(|| &calculate_results(i))
Note the &in front of the function call. But this is also an issue, cause a reference to something that is scoped within the anonymous function will be meaningless as soon as the anonymous function returns. And rust complains
cannot return reference to temporary value
returns a reference to data owned by the current function
What am i missing here? What is the correct approach to do this in rust?

You cannot return a reference to a local value (in your case the unwrap_or_else callback), because as that local value is dropped the reference is invalidated.
You could clone the value taken from the map, but that is usually not the most efficient way to go. And maybe the value isn't even cloneable.
So, that is one of the use cases for Cow:
use std::borrow::Cow;
for i in some_values {
let result = previous_results
.get(i)
.map(Cow::Borrowed)
.unwrap_or_else(|| Cow::Owned(calculate_results(i)));
}
And they you use the result more or less normally, as it implements Deref<Result>. But remember that the value inside may be borrowed from the map, so the Cow<'_, Result> value keeps the map borrowed.
Funnily, while the name Cow stands for Clone-On-Write, many times it is used when you need an Owned_Or_Borrowed.
The main drawback of Cow is that it requires the value to be ToOwned, that basically means to implement Clone, because of that C in the name. If your Result type is not Clone, I think that currently there is no solution in std but it is easy to implement your own, or use an available crate such as maybe-owned.

#rodrigo already answered my question with the correct approach, but I was a bit confused of what magic Cow was doing that solved my issue, so i want to clarify for others that might be confused as well.
The crux of the issue for my problem was that .get returns a reference to an element in the HashMap, but the function called by unwrap_or_else needs to return an owned value or it wont be valid once we are outside of the anonymous function scope.
So the solution that is a bit hidden in #rodrigo's answer is the map called after get, which can be used to turn whatever is returned by get into an owned value, so that unwrap_or_else can return an owned value, and the problem is solved.
Cow helps because cloning the value returned by .get is not the most efficient thing to do, unless you do need a copy later on. So Cow abstracts away the ownership and gets .get and unwrap_or_else to return the same type, without having to clone anything.
#rodrigo's answer should still be the accepted answer as it correctly solves the problem, but i wanted to provide a bit more context as i wasn't quite sure how the ownership was resolved.

Rust E0382 - value used here after move

I am new to Rust and is really struggling with way to write code the Rust way. I understand its rules to enforce memory correctness, however I cannot figure out the changes required to comply in code.
I have created a Tree like object from the json structure recieved from the application.
I am trying to create two operations on tree,
Get the leaves of tree
Get the mapping of parent -> children in a map
The high level code looks like this
fn rename_workspaces(conn: Connection) {
let i3_info = I3Info::new(conn);
let _leaves = i3_info.get_leaves();
let _parent_child = i3_info.dfs_parent_child();
}
However, rust is complaining that i3_info variable has been used after the move. I understand its complaint, however, I cannot figure out what should be the correct Rust way to solve it.
Please help me to figure out the change in thinking required to solve this. This is important, because my application really need to perform these calculations on the tree structure multiple times.
Interesting thing is , I am not really mutating the structure, just iterating over it and returning the new / mutated structure from the function.
Source link: https://github.com/madhur/i3-auto-workspace-icons-rust/blob/main/src/main.rs

The problem is that you have declared the methods of I3Info such that they consume (move) the I3Info:
pub fn dfs_parent_child(self) ...
pub fn get_leaves(self) ...
To not consume the I3Info, allowing it to be used more than once, declare your methods to take references to the I3Info:
pub fn dfs_parent_child(&self) ...
pub fn get_leaves(&self) ...
You will need to modify the code within these methods, also, to work with references because this change also means you can no longer move things out of self — they have to be left intact. Sometimes this is as simple as putting & before a field access (&self.foo instead of self.foo), and sometimes it will require more extensive changes.
The general “Rust way of thinking” lessons here are:
Think about the type of your method receivers. self is not always right, and neither is &self.
Don't take ownership of values except when it makes sense. Passing by & reference is a good default choice (except for Copy types, like numbers).

How to decide when function input params should be references or not?

When writing a function how does one decide whether to make input parameters referenced or consumed?
For example, should I do this?
fn foo(val: Bar) -> bool { check(val) } // version 1
Or use referenced param instead?
fn foo(val: &Bar) -> bool { check(*val) } // version 2
On the client side, if I only had the second version but wanted to have my value consumed, I'd have to do something like this:
// given in: Bar
let out = foo(&in); // using version 2 but wanting to consume ownership
drop(in);
On the other hand, if I only had the first version but wanted to keep my reference, I'd have to do something like this:
// given in: &Bar
let out = foo(in.clone()); // using version 1 but wanting to keep reference alive
So which is preferred, and why?
Are there any performance considerations in making this choice? Or does the compiler make them equivalent in terms of performance, and how?
And when would you want to offer both versions (via traits)? And for those times how do you write the underlying implementations for both functions -- do you duplicate the logic in each method signature or do you have one proxy to the other? Which to which, and why?

Rust's goal is to have performance and syntax similar to C/C++ without the memory problems. To do this it avoids things like garbage collection and instead enforces a particular strict memory model of "ownership" and "borrowing". These are critical concepts in Rust. I would suggest reading Understanding Ownership in The Rust Book.
The rules of memory ownership are...
Each value in Rust has a variable that’s called its owner.
There can only be one owner at a time.
When the owner goes out of scope, the value will be dropped.
Enforcing a single owner avoids a great many bugs and complications typical of C and C++ programs while avoiding complex and slow memory management at runtime.
You can't get very far with only that, so Rust provides references. A reference lets functions safely "borrow" data without taking ownership. You can have either as many immutable references as you like, or only one mutable reference.
When applied to function calls, passing a value passes ownership to the function. Passing a reference is "borrowing", ownership is retained.
It's really, really important to understand ownership, borrowing, and later on, lifetimes. But here's some rules of thumb.
If your function needs to take ownership of the data, pass by value.
If your function only needs to read the data, pass a reference.
If your function needs to change the data, pass a mutable reference.
Note what's not in there: performance. Let the compiler take care of that.
Assuming check only reads data and checks that it's ok, it should take a reference. So your example would be...
fn foo(val: &Bar) -> bool { check(val) }
On the client side, if I only had the second version but wanted to have my value consumed...
There's no reason to want a function which takes a reference to do that. If it's the function's job to manage the memory, you pass it ownership. If it isn't, it's not its job to manage your memory.
There's also no need to manually call drop. You'd simply let the variable fall out of scope and it will be automatically dropped.
And when would you want to offer both versions (via traits)?
You wouldn't. If a function can take a reference there's no reason for it to take ownership.

If the function needs ownership, you should pass by value. If the function only needs a reference, you should pass by reference.
Passing by value fn foo(val: Bar) when it isn't necessary for the function to work could require the user to clone the value. Passing by reference is preferred in this case since a clone can be avoided.
Passing by reference fn foo(val: &Bar) when the function needs ownership would require it to either copy or clone the value. Pass by value is preferred in this case because it gives the user control whether an existing value's ownership is transferred or is cloned. The function doesn't have to make that decision and a clone can be avoided.
There are some exceptions, simple primitives like i32 can be passed-by-value without any performance penalty and may be more convenient.
And when would you want to offer both versions (via traits)?
You could use the Borrow trait:
fn foo<B: Borrow<Bar>>(val: B) -> bool {
check(val.borrow())
}
let b: Bar = ...;
foo(&b); // both of
foo(b); // these work

How do I collect the values of a HashMap into a vector?

I can not find a way to collect the values of a HashMap into a Vec in the documentation. I have score_table: HashMap<Id, Score> and I want to get all the Scores into all_scores: Vec<Score>.
I was tempted to use the values method (all_scores = score_table.values()), but it does not work since values is not a Vec.
I know that Values implements the ExactSizeIterator trait, but I do not know how to collect all values of an iterator into a vector without manually writing a for loop and pushing the values in the vector one after one.
I also tried to use std::iter::FromIterator; but ended with something like:
all_scores = Vec::from_iter(score_table.values());
expected type `std::vec::Vec<Score>`
found type `std::vec::Vec<&Score>`
Thanks to Hash map macro refuses to type-check, failing with a misleading (and seemingly buggy) error message?, I changed it to:
all_scores = Vec::from_iter(score_table.values().cloned());
and it does not produce errors to cargo check.
Is this a good way to do it?

The method Iterator.collect is designed for this specific task. You're right in that you need .cloned() if you want a vector of actual values instead of references (unless the stored type implements Copy, like primitives), so the code looks like this:
all_scores = score_table.values().cloned().collect();
Internally, collect() just uses FromIterator, but it also infers the type of the output. Sometimes there isn't enough information to infer the type, so you may need to explicitly specify the type you want, like so:
all_scores = score_table.values().cloned().collect::<Vec<Score>>();

If you don't need score_table anymore, you can transfer the ownership of Score values to all_scores by:
let all_scores: Vec<Score> = score_table.into_iter()
.map(|(_id, score)| score)
.collect();
This approach will be faster and consume less memory than the clone approach by #apetranzilla. It also supports any struct, not only structs that implement Clone.

There are three useful methods on HashMaps, which all return iterators:
values() borrows the collection and returns references (&T).
values_mut() gives mutable references &mut T which is useful to modify elements of the collection without destroying score_table.
into_values() gives you the elements directly: T! The iterator takes ownership of all the elements. This means that score_table no longer owns them, so you can't use score_table anymore!
In your example, you call values() to get &T references, then convert them to owned values T via a clone().
Instead, if we have an iterator of owned values, then we can convert it to a Vec using Iterator::collect():
let all_scores: Vec<Score> = score_table.into_values().collect();
Sometimes, you may need to specify the collecting type:
let all_scores = score_table.into_values().collect::<Vec<Score>>();

Prefer &str over String? Is it always the case?

The documentation of Rust suggests to use &str whenever it's possible and only when it's not, use String. Is it always the case? For example, I'm building the client for REST API of a web-service and I have an entity:
struct User {
id: &str // or String?
name: &str // or String?
//......
}
So is it better to use &str or String in general and in this particular case?

In Rust everything related to a decision whether to use a reference or not stems from the basic concepts of ownership and borrowing and their applications. When you design your data structures, there is no clean rule: it wholly depends on your exact use case.
For example, if your data structure is intended to provide a view into some other data structure (like iterators do), then it makes sense to use references and slices as its fields. If, on the other hand, your structure is a DTO, it is more natural to make it own all of its data.
I believe that a suggestion to use &str where possible is more applicable to function definitions, and in this case it indeed is natural: if you make your functions accept &str instead of String, their caller will be able to use them easily and with no cost if they have either String or &str; on the other hand, if your functions accept Strings, then if their caller has &str, they will be forced to allocate a new String, and even if they have String but don't want to give up ownership, they still would need to clone it.
But of course there are exceptions: sometimes you do want to transfer ownership inside a function. Some data structures and traits, like Option or Reader, provide an ability to turn an owned variant to a borrowed one (Option::as_ref() and Reader::by_ref()), which are sometimes useful. There is also a Cow type which kind of "abstracts" over ownership, allowing you to pass a borrowed value which will be cloned if necessary. Sometimes there is a trait like BytesContainer which abstracts over various types, owning as well as borrowing, and which allows the caller to pass values of different types.
What I wanted to stress, again, is that there is no fixed rule, and it wholly depends on concrete task you're working on. You should use common sense and ownership/borrowing concepts when you architect your data structures.
In your particular case whether to use String or &str depends on what you will actually do with User objects - just "REST API client" is unfortunately too vague. It depends on your architecture. If these objects are used solely to perform an HTTP request, but the data is actually stored in some other source, then you would likely want to use &strs. If, on the other hand, User objects are used across your entire program, then it makes sense to make them own the data with Strings.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string