I'm generating and expression from a string.
Let us suppose to have a data frame with a close column ant I'm trying to compute all in parallel through expressions.
If the function receives a string "rolling_mean" I need to compute close - close.rolling_mean(of some window).
I defined an expression for the close column to be used in match (don't repeat yourself):
let close_column = Expr::Column(Arc::from("close"));
After that I have a match to get the computation on close column:
close_column
- close_column.rolling_mean(RollingOptions {
window_size: Duration::new(window as i64),
min_periods: 1 as usize,
weights: None,
center: true,
by: None,
closed_window: None,
}))
The problem is that match statement moved the value of close_column inside the brace of match in the first value, so it is impossible to use for the second term (where I need rolling).
If I clone close_column the problem disappear (of course) but I'm not sure it is the best strategy.
Here my function:
pub fn generate_expr_from_str(function_name: &str, window: usize) -> Expr {
let close_column = Expr::Column(Arc::from("close"));
let expression = match function_name {
"rolling" => (close_column
- close_column.rolling_mean(RollingOptions {
window_size: Duration::new(window as i64),
min_periods: 1 as usize,
weights: None,
center: true,
by: None,
closed_window: None,
}))
.alias(format!("rolling_{window}").as_str()),
_ => panic!("function not implemented"),
};
expression
}
In case I clone even just the first close, the value is moved on the second and there is no problem for the compiler.
Suggestions?
Related
I am trying to implement a "transform" function that takes a non-copy enum value and and modifies it based on a parameter, but some of the parameters do nothing. A simplified example is:
enum NonCopy {
A,
B,
C
}
fn transform(to_transfrom: &mut NonCopy, param: u32) -> () {
*to_transfrom = match param {
// transformations
1 => NonCopy::A,
2 => NonCopy::B,
// retains the value
_ => *to_transfrom
};
}
I know that since the NonCopy doesn't implement the Copy trait the value inside to_transform cannot be moved out, however if param is neither 1 or 2 the value of *to_transform is assigned to itself, so it stays the same and nothing should be moved, but the compiler doesn't recognize that.
How can I achieve such a pattern with an assignment to a match expression?
I know I could instead assign in the match expression, but the non-example version is bigger and I do not want to repeat so much code plus it is quite ugly.
A neat little trick in Rust is that when you break out of an expression (via return, break, etc.), you don't actually need to provide a value for that expression. In this case, you can return from the match arm without providing a value:
enum NonCopy {
A,
B,
C
}
fn transform(to_transfrom: &mut NonCopy, param: u32) -> () {
*to_transfrom = match param {
// transformations
1 => NonCopy::A,
2 => NonCopy::B,
// retains the value
_ => return,
};
}
My goal is to move elements out of an owned Vec.
fn f<F>(x: Vec<F>) -> F {
match x.as_slice() {
&[a, b] => a,
_ => panic!(),
}
}
If F is copy, that is no problem as one can simply copy out of the slice. When F is not, slice patterns seem a no-go, as the slice is read only.
Is there such a thing as an "owned slice", or pattern matching on a Vec, to move elements out of x?
Edit: I now see that this code has the more general problem. The function
fn f<T>(x: Vec<T>) -> T {
x[0]
}
leaves "a hole in a Vec", even though it is dropped right after. This is not allowed. This post and this discussion describe that problem.
That leads to the updated question: How can a Vec<T> be properly consumed to do pattern matching?
If you insist on pattern matching, you could do this:
fn f<F>(x: Vec<F>) -> F {
let mut it = x.into_iter();
match (it.next(), it.next(), it.next()) {
(Some(x0), Some(_x1), None) => x0,
_ => panic!(),
}
}
However, if you just want to retrieve the first element of a 2-element vector (panicking in other cases), I guess I'd rather go with this:
fn f<F>(x: Vec<F>) -> F {
assert_eq!(x.len(), 2);
x.into_iter().next().unwrap()
}
You can't use pattern matching with slice patterns in this scenario.
As you have correctly mentioned in your question edits, moving a value out of a Vec leaves it with uninitialized memory. This could then cause Undefined Behaviour when the Vec is subsequently dropped, because its Drop implementation needs to free the heap memory, and possibly drop each element.
There is currently no way to express that your type parameter F does not have a Drop implementation or that it is safe for it to be coerced from uninitialized memory.
You pretty much have to forget the idea of using a slice pattern and write it more explicitly:
fn f<F>(mut x: Vec<F>) -> F {
x.drain(..).next().unwrap()
}
If you are dead set on pattern matching, you can use Itertools::tuples() to match on tuples instead:
use itertools::Itertools; // 0.9.0
fn f<F>(mut x: Vec<F>) -> F {
match x.drain(..).tuples().next() {
Some((a, _)) => a,
None => panic!()
}
}
One way to achieve consuming a single element of a vector is to swap the last element with the element you want to consume, and then pop the last element
fn f<F>(mut x: Vec<F>) -> F {
match x.as_slice() {
[_a, _b] => {
x.swap(0, 1);
x.pop().unwrap() // returns a
},
_ => panic!(),
}
}
The code uses an unwrap which isn't elegant.
This question already has answers here:
How to get the byte offset between `&str`
(2 answers)
Closed 3 years ago.
Given a string and a slice referring to some substring, is it possible to find the starting and ending index of the slice?
I have a ParseString function which takes in a reference to a string, and tries to parse it according to some grammar:
ParseString(inp_string: &str) -> Result<(), &str>
If the parsing is fine, the result is just Ok(()), but if there's some error, it usually is in some substring, and the error instance is Err(e), where e is a slice of that substring.
When given the substring where the error occurs, I want to say something like "Error from characters x to y", where x and y are the starting and ending indices of the erroneous substring.
I don't want to encode the position of the errors directly in Err, because I'm nesting these invocations, and the offsets in the nested slice might not correspond to the some slice in the top level string.
As long as all of your string slices borrow from the same string buffer, you can calculate offsets with simple pointer arithmetic. You need the following methods:
str::as_ptr(): Returns the pointer to the start of the string slice
A way to get the difference between two pointers. Right now, the easiest way is to just cast both pointers to usize (which is always a no-op) and then subtract those. On 1.47.0+, there is a method offset_from() which is slightly nicer.
Here is working code (Playground):
fn get_range(whole_buffer: &str, part: &str) -> (usize, usize) {
let start = part.as_ptr() as usize - whole_buffer.as_ptr() as usize;
let end = start + part.len();
(start, end)
}
fn main() {
let input = "Everyone ♥ Ümläuts!";
let part1 = &input[1..7];
println!("'{}' has offset {:?}", part1, get_range(input, part1));
let part2 = &input[7..16];
println!("'{}' has offset {:?}", part2, get_range(input, part2));
}
Rust actually used to have an unstable method for doing exactly this, but it was removed due to being obsolete, which was a bit odd considering the replacement didn't remotely have the same functionality.
That said, the implementation isn't that big, so you can just add the following to your code somewhere:
pub trait SubsliceOffset {
/**
Returns the byte offset of an inner slice relative to an enclosing outer slice.
Examples
```ignore
let string = "a\nb\nc";
let lines: Vec<&str> = string.lines().collect();
assert!(string.subslice_offset_stable(lines[0]) == Some(0)); // &"a"
assert!(string.subslice_offset_stable(lines[1]) == Some(2)); // &"b"
assert!(string.subslice_offset_stable(lines[2]) == Some(4)); // &"c"
assert!(string.subslice_offset_stable("other!") == None);
```
*/
fn subslice_offset_stable(&self, inner: &Self) -> Option<usize>;
}
impl SubsliceOffset for str {
fn subslice_offset_stable(&self, inner: &str) -> Option<usize> {
let self_beg = self.as_ptr() as usize;
let inner = inner.as_ptr() as usize;
if inner < self_beg || inner > self_beg.wrapping_add(self.len()) {
None
} else {
Some(inner.wrapping_sub(self_beg))
}
}
}
You can remove the _stable suffix if you don't need to support old versions of Rust; it's just there to avoid a name conflict with the now-removed subslice_offset method.
I have the following:
enum SomeType {
VariantA(String),
VariantB(String, i32),
}
fn transform(x: SomeType) -> SomeType {
// very complicated transformation, reusing parts of x in order to produce result:
match x {
SomeType::VariantA(s) => SomeType::VariantB(s, 0),
SomeType::VariantB(s, i) => SomeType::VariantB(s, 2 * i),
}
}
fn main() {
let mut data = vec![
SomeType::VariantA("hello".to_string()),
SomeType::VariantA("bye".to_string()),
SomeType::VariantB("asdf".to_string(), 34),
];
}
I would now like to call transform on each element of data and store the resulting value back in data. I could do something like data.into_iter().map(transform).collect(), but this will allocate a new Vec. Is there a way to do this in-place, reusing the allocated memory of data? There once was Vec::map_in_place in Rust but it has been removed some time ago.
As a work-around, I've added a Dummy variant to SomeType and then do the following:
for x in &mut data {
let original = ::std::mem::replace(x, SomeType::Dummy);
*x = transform(original);
}
This does not feel right, and I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop. Is there a better way of doing this?
Your first problem is not map, it's transform.
transform takes ownership of its argument, while Vec has ownership of its arguments. Either one has to give, and poking a hole in the Vec would be a bad idea: what if transform panics?
The best fix, thus, is to change the signature of transform to:
fn transform(x: &mut SomeType) { ... }
then you can just do:
for x in &mut data { transform(x) }
Other solutions will be clunky, as they will need to deal with the fact that transform might panic.
No, it is not possible in general because the size of each element might change as the mapping is performed (fn transform(u8) -> u32).
Even when the sizes are the same, it's non-trivial.
In this case, you don't need to create a Dummy variant because creating an empty String is cheap; only 3 pointer-sized values and no heap allocation:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
let old = std::mem::replace(self, VariantA(String::new()));
// Note this line for the detailed explanation
*self = match old {
VariantA(s) => VariantB(s, 0),
VariantB(s, i) => VariantB(s, 2 * i),
};
}
}
for x in &mut data {
x.transform();
}
An alternate implementation that just replaces the String:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
*self = match self {
VariantA(s) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 0)
}
VariantB(s, i) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 2 * *i)
}
};
}
}
In general, yes, you have to create some dummy value to do this generically and with safe code. Many times, you can wrap your whole element in Option and call Option::take to achieve the same effect .
See also:
Change enum variant while moving the field to the new variant
Why is it so complicated?
See this proposed and now-closed RFC for lots of related discussion. My understanding of that RFC (and the complexities behind it) is that there's an time period where your value would have an undefined value, which is not safe. If a panic were to happen at that exact second, then when your value is dropped, you might trigger undefined behavior, a bad thing.
If your code were to panic at the commented line, then the value of self is a concrete, known value. If it were some unknown value, dropping that string would try to drop that unknown value, and we are back in C. This is the purpose of the Dummy value - to always have a known-good value stored.
You even hinted at this (emphasis mine):
I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop
That "should" is the problem. During a panic, that dummy value is visible.
See also:
How can I swap in a new value for a field in a mutable reference to a structure?
Temporarily move out of borrowed content
How do I move out of a struct field that is an Option?
The now-removed implementation of Vec::map_in_place spans almost 175 lines of code, most of having to deal with unsafe code and reasoning why it is actually safe! Some crates have re-implemented this concept and attempted to make it safe; you can see an example in Sebastian Redl's answer.
You can write a map_in_place in terms of the take_mut or replace_with crates:
fn map_in_place<T, F>(v: &mut [T], f: F)
where
F: Fn(T) -> T,
{
for e in v {
take_mut::take(e, f);
}
}
However, if this panics in the supplied function, the program aborts completely; you cannot recover from the panic.
Alternatively, you could supply a placeholder element that sits in the empty spot while the inner function executes:
use std::mem;
fn map_in_place_with_placeholder<T, F>(v: &mut [T], f: F, mut placeholder: T)
where
F: Fn(T) -> T,
{
for e in v {
let mut tmp = mem::replace(e, placeholder);
tmp = f(tmp);
placeholder = mem::replace(e, tmp);
}
}
If this panics, the placeholder you supplied will sit in the panicked slot.
Finally, you could produce the placeholder on-demand; basically replace take_mut::take with take_mut::take_or_recover in the first version.
I'm trying to implement a cons list in Rust as an exercise. I've managed to solve all of my compiler errors except this one:
Compiling list v0.0.1 (file:///home/nate/git/rust/list)
/home/nate/git/rust/list/src/main.rs:18:24: 18:60 error: borrowed value does not live long enough
/home/nate/git/rust/list/src/main.rs:18 List::End => list = &*(box List::Node(x, box List::End)),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/nate/git/rust/list/src/main.rs:16:34: 21:2 note: reference must be valid for the anonymous lifetime #1 defined on the block at 16:33...
/home/nate/git/rust/list/src/main.rs:16 fn add(mut list: &List, x: uint) {
/home/nate/git/rust/list/src/main.rs:17 match *list {
/home/nate/git/rust/list/src/main.rs:18 List::End => list = &*(box List::Node(x, box List::End)),
/home/nate/git/rust/list/src/main.rs:19 List::Node(_, ref next_node) => add(&**next_node, x),
/home/nate/git/rust/list/src/main.rs:20 }
/home/nate/git/rust/list/src/main.rs:21 }
/home/nate/git/rust/list/src/main.rs:18:16: 18:60 note: ...but borrowed value is only valid for the expression at 18:15
/home/nate/git/rust/list/src/main.rs:18 List::End => list = &*(box List::Node(x, box List::End)),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
error: aborting due to previous error
Could not compile `list`.
To learn more, run the command again with --verbose.
And the code that I'm trying to compile:
enum List {
Node(uint, Box<List>),
End,
}
fn main() {
let mut list = new();
add(&*list, 10);
//add(list, 20);
//add(list, 30);
print(&*list);
}
fn add(mut list: &List, x: uint) {
match *list {
List::End => list = &*(box List::Node(x, box List::End)),
List::Node(_, ref next_node) => add(&**next_node, x),
}
}
fn new() -> Box<List> {
box List::End
}
So why don't the boxed values live long enough? Is it because I immediately dereference them? I tried it this way:
match *list {
List::End => {
let end = box List::Node(x, box List::End);
list = &*end;
}
List::Node(_, ref next_node) => add(&**next_node, x),
}
But I got exactly the same error. What am I missing?
I think you’re missing some key details of Rust; there are three things that I think we need to deal with:
How patterns work;
The distinction between immutable (&) and mutable (&mut) references;
How Rust’s ownership model works (because of your &*box attempts).
I’ll deal with the patterns part first; in fn add(mut list: &List, x: uint), there are two patterns being used, mut list and x. Other examples of patterns are the left hand side of let lhs = rhs; and the bit before the => on each branch of a match expression. How are these patterns applied to calls, effectively? It’s really like you’re doing this:
fn add(__arg_0: &List, __arg_1: uint) {
let mut list = __arg_0;
let x = __arg_1;
…
}
Perhaps that way of looking at things will make it clearer; the signature of a function does not take the patterns that the variables are bound to into account at all. Your function signature is actually in canonical form fn add(&List, uint). The mut list part just means that you are binding the &List value to a mutable name; that is, you can assign a new value to the list name, but it has no effect outside the function, it’s purely a matter of the binding of a variable to a location.
Now onto the second issue: learn the distinction between immutable references (type &T, value &x) and mutable references (type &mut T, value &x). These are so fundamental that I won’t go into much detail here—they’re documented sufficiently elsewhere and you should probably read those things. Suffice it to say: if you wish to mutate something, you need &mut, not &, so your add method needs to take &mut List.
The third issue, that of ownership: in Rust, each object is owned in precisely one location; there is no garbage collection or anything, and this uniqueness of ownership means that as soon as an object passes out of scope it is destroyed. In this case, the offending expression is &*(box List::Node(x, box List::End)). You have boxed a value, but you haven’t actually stored it anywhere: you have just tried to take a reference to the value contained inside it, but the box will be immediately dropped. What you actually want in this case is to modify the contents of the List; you want to write *list = List::Node(x, box List::End), meaning “store a List::Node value inside the contents of list” instead of list = &…, meaning “assign to the variable list a new reference”.
You’ve also gone a tad overboard with the boxing of values; I’d tend to say that new() should return a List, not a Box<List>, though the question is slightly open to debate. Anyway, here’s the add method that I end up with:
fn add(list: &mut List, x: uint) {
match *list {
List::End => *list = List::Node(x, box List::End),
List::Node(_, box ref mut next_node) => add(next_node, x),
}
}
The main bit you may have difficulty with there is the pattern box ref mut next_node. The box ref mut part reads “take the value out of its box, then create a mutable reference to that value”; hence, given a Box<List>, it produces a &mut List referring to the contents of that box. Remember that patterns are completely back to front compared with normal expressions.
Finally, I would strongly recommend using impls for all of this, putting all the methods on the List type:
enum List {
Node(uint, Box<List>),
End,
}
impl List {
fn new() -> List {
List::End
}
fn add(&mut self, x: uint) {
match *self {
List::End => *self = List::Node(x, box List::End),
List::Node(_, box ref mut next_node) => next_node.add(x),
}
}
}
fn main() {
let mut list = List::new();
list.add(10);
}
Your attempts to fix the other compiler errors have, unfortunately, led you to a dark place of inconsistency. First you need to make up your mind whether you want a Box<List> or a List as your handle for the head of a (sub-)list. Let's go with List because that is more flexible and generally the path of least resistance.
Then, you need to realize there is a difference between mut list: &List and list: &mut List. The first is a read-only reference which you can change to point at other read-only things. The second is a read-write reference which you can not change to point at other read-write things. There's also mut list: &mut List because these two capabilities are orthogonal.
In add, when you write list = ..., you are only affecting your local variable. It has no effect on the caller. You want to change the list node the caller sees! Since we said we wanted to deal with List, not boxes, we'll change the contents of the list nodes that exist (replacing the final End with a Node(..., box End)). That is, signature and code change as follows:
fn add(list: &mut List, x: uint) {
match *list {
List::End => *list = List::Node(x, box List::End),
List::Node(_, box ref mut next_node) => add(next_node, x),
}
}
Note that *list = is different from list =, in that we now change the contents of the list node in-place instead of making our local reference point at a different node.
For consistency and ergonomics (and a tiny bit of efficiency), you should probably change new to return a bare List, i.e.:
fn new() -> List {
List::End
}
This also saves you all the nasty reborrowing (&*) in the calls:
let list = new(); // type: List
add(&mut list, 10);
Finally, as for why the box did not live long enough: Well, you basically created a local/temporary box, took a reference to it, and then attempted to pass on the reference without keeping the box alive. A box without an owner is deallocated, so you need to give it an owner. In the fixed code above, the owner is the next field of the List::Node we create and write to the &mut List.