Rust - how to split/ flatten an iterator? - rust

If I have a vec<string> I can use filter_map to process and eliminate entires. But is there an option opposite of filter ?
Essentially is there an idiomatic way to do something like this -
word_list.iter().merge_map(|s| s.split(".")).collect()
^this is an imaginary method.
Turning an input ["a","b.c","d"] into ["a","b","c","d"]

Use flat_map():
word_list.iter().flat_map(|s| s.split(".")).collect()
Which is semantically equivalent to map() then flatten():
word_list.iter().map(|s| s.split(".")).flatten().collect()

Related

Creating a vector and returning it along with a reference to one of its elements

In rust, I have a function that generates a vector of Strings, and I'd like to return this vector along with a reference to one of the strings. Obviously, I would need to appropriately specify the lifetime of the reference, since it is valid only when the vector is in scope. However, but I can't get this to work.
Here is a minimal example of a failed attempt:
fn foo<'a>() -> ('a Vec<String>, &'a String) {
let x = vec!["some", "data", "in", "the", "vector"].iter().map(|s| s.to_string()).collect::<Vec<String>>();
(x, &x[1])
}
(for this example, I know I could return the index to the vector, but my general problem is more complex. Also, I'd like to understand how to achieve this)
Rust doesn't allow you to do that without unsafe code. Probably your best option is to return the vector with the index of the element in question.
This is conceptually very similar to trying to create a self-referential struct. See this for more on why this is challenging.

Writing expression in polars-lazy in rust

I need to write my own expression in polars_lazy. Based on my understanding from the source code I need to write a function that returns Expr::Function. The problem is that in order to construct an object of this type, an object of type FunctionOptions must be provided. The caveat is that this class is public but the members are pub(crate) and thus outside of the create one cannot construct such an object.
Are there ways around this?
I don't think you're meant to directly construct Exprs. Instead, you can use functions like polars_lazy::dsl::col() and polars_lazy::dsl::lit() to create expressions, then use methods on Expr to build up the expression. Several of those methods, such as map() and apply(), will give you an Expr::Function.
Personally I think the Rust API for polars is not well documented enough to really use yet. Although the other answer and comments mention apply and map, they don't mention how or the trade-offs. I hope this answer prompts others to correct me with the "right" way to do things.
So first, here's how to use apply on lazy dataframe, even though lazy dataframes don't take apply directly as a method as eager ones do, and mutating in-place:
// not sure how you'd find this type easily from apply documentation
let o = GetOutput::from_type(DataType::UInt32);
// this mutates two in place
let lf = lf.with_column(col("two").apply(str_to_len, o));
And here's how to use it while not mutating the source column and adding a new output column instead:
let o = GetOutput::from_type(DataType::UInt32);
// this adds new column len, two is unchanged
let lf = lf.with_column(col("two").alias("len").apply(str_to_len, o));
With the str_to_len looking like:
fn str_to_len(str_val: Series) -> Result<Series> {
let x = str_val
.utf8()
.unwrap()
.into_iter()
// your actual custom function would be in this map
.map(|opt_name: Option<&str>| opt_name.map(|name: &str| name.len() as u32))
.collect::<UInt32Chunked>();
Ok(x.into_series())
}
Note that it takes Series rather than &Series and wraps in Result.
With a regular (non-lazy) dataframe, apply still mutates but doesn't require with_column:
df.apply("two", str_to_len).expect("applied");
Whereas eager/non-lazy's with_column doesn't require apply:
// the fn we use to make the column names it too
df.with_column(str_to_len(df.column("two").expect("has two"))).expect("with_column");
And str_to_len has slightly different signature:
fn str_to_len(str_val: &Series) -> Series {
let mut x = str_val
.utf8()
.unwrap()
.into_iter()
.map(|opt_name: Option<&str>| opt_name.map(|name: &str| name.len() as u32))
.collect::<UInt32Chunked>();
// NB. this is naming the chunked array, before we even get to a series
x.rename("len");
x.into_series()
}
I know there's reasons to have lazy and eager operate differently, but I wish the Rust documentation made this easier to figure out.

How to define an ordered Map/Set with a runtime-defined comparator?

This is similar to How do I use a custom comparator function with BTreeSet? however in my case I won't know the sorting criteria until runtime. The possible criteria are extensive and can't be hard-coded (think something like sort by distance to target or sort by specific bytes in a payload or combination thereof). The sorting criteria won't change after the map/set is created.
The only alternatives I see are:
use a Vec, but log(n) inserts and deletes are crucial
wrap each of the elements with the sorting criteria (directly or indirectly), but that seems wasteful
This is possible with standard C++ containers std::map/std::set but doesn't seem possible with Rust's BTreeMap/BTreeSet. Is there an alternative in the standard library or in another crate that can do this? Or will I have to implement this myself?
My use-case is a database-like system where elements in the set are defined by a schema, like:
Element {
FIELD x: f32
FIELD y: f32
FIELD z: i64
ORDERBY z
}
But since the schema is user-defined at runtime, the elements are stored in a set of bytes (BTreeSet<Vec<u8>>). Likewise the order of the elements is user-defined. So the comparator I would give to BTreeSet would look like |a, b| schema.cmp(a, b). Hard-coded, the above example may look something like:
fn cmp(a: &Vec<u8>, b: &Vec<u8>) -> Ordering {
let a_field = self.get_field(a, 2).as_i64();
let b_field = self.get_field(b, 2).as_i64();
a_field.cmp(b_field)
}
Would it be possible to pass the comparator closure as an argument to each node operation that needs it? It would be owned by the tree wrapper instead of cloned in every node.

What is the best way to dereference values within chains of iterators?

When I'm using iterators, I often find myself needing to explicitly dereference values. The following code finds the sum of all pairs of elements in a vector:
extern crate itertools;
use crate::itertools::Itertools;
fn main() {
let x: Vec<i32> = (1..4).collect();
x.iter()
.combinations(2)
.map(|xi| xi.iter().map(|bar| **bar)
.sum::<i32>())
.for_each(|bar| println!("{:?}", bar));
}
Is there a better way of performing the dereferencing than using a map?
Even better would be a way of performing these types of operations without explicitly dereferencing at all.
Using xi.iter() means that you are explicitly asking for an iterator of references to the values within xi. Since you don't want references in this case and instead want the actual values, you'd want to use xi.into_iter() to get an iterator for the values.
So you can change
.map(|xi| xi.iter().map(|bar| **bar).sum::<i32>())
to
.map(|xi| xi.into_iter().sum::<i32>())
Playground Link

How do I avoid unwrap when converting a vector of Options or Results to only the successful values?

I have a Vec<Result<T, E>> and I want to ignore all Err values, converting it into a Vec<T>. I can do this:
vec.into_iter().filter(|e| e.is_ok()).map(|e| e.unwrap()).collect()
This is safe, but I want to avoid using unwrap. Is there a better way to write this?
I want to ignore all Err values
Since Result implements IntoIterator, you can convert your Vec into an iterator (which will be an iterator of iterators) and then flatten it:
Iterator::flatten:
vec.into_iter().flatten().collect()
Iterator::flat_map:
vec.into_iter().flat_map(|e| e).collect()
These methods also work for Option, which also implements IntoIterator.
You could also convert the Result into an Option and use Iterator::filter_map:
vec.into_iter().filter_map(|e| e.ok()).collect()

Resources