How do I iterate over a pyo3 PyObject in Rust? - rust

I have a pre-imported module that I'm calling a method with python gil, something like the following.
Python::with_gil(|py| {
let res = module.call_method1(py, "my_method", (arg1, arg2))?;
})
This returns the rust object PyObject, however what this returns is a python list. I want to iterate over this list to convert the internals into something I can use in Rust (it's a python list of Numpy arrays, I'm using the numpy/ndarray crates).
I'm a little confused as to how I'm meant to iterate over this. If I try cast_as to a PyList, I get the warning: UnsafeCell<PyObject> cannot be shared between threads safely. It seems extract does not work either.
How do I iterate over this PyObject? Thanks.
Edit: Adding further details as requested
The returned value from python is a List[numpy.ndarray] if you are using the python Typing system. As the lengths of each numpy array could be different, I cannot just convert it all into a numpy array in python and pass it through. An example output is below:
[array([214.17725372, 192.78236675, 354.27965546, 389.84558392,
0.99999297])]
What I've tried in Rust:
let pylist = res.cast_as::<PyList>(py)?;
Fails to compile with: UnsafeCell<PyObject> cannot be shared between threads safely.
let pylist = res.extract::<PyList>(py)?;
Fails to compile with: the trait 'PyClass' is not implemented for 'PyList'. Please note I have use pyo3::prelude::*; at the top.
let pyany = res.extract::<Vec<PyArray1<f64>>>(py)?;
Fails to compile with: the trait bound 'Vec<PyArray<f64, Dim<[usize; 1]>>>: pyo3::FromPyObject<'_>' is not satisfied. This PyArray is from the numpy crate.

I see that you returning a list of numpy arrays with dtype=float. If you are open to using another dependency, there is rust-numpy which lets you map numpy arrays to ndarrays in Rust.
ndarrays have a .to_vec() method to convert to standard Rust containers.
In your case you could do the following (this is not tested):
use numpy::PyArray1;
use pyo3::{types::IntoPyDict, PyResult, Python};
Python::with_gil(|py| {
let res: Vec<&PyArray1<f32>> = module // <-- Notice the return type
.call_method1(py, "my_method", (arg1, arg2))?
.extract()?; // <-- Notice the extract
})
Mappings between Python and Rust types can be found here. You can then use the res to do further computations.
println!("I just got a numpy array from python {:?}", res[0]);
println!("I converted that to a vector here: {:?}", res[0].to_vec());

Related

Rust executing methods concurrently

I'm trying to learn rust and have some issues when trying to work with streams of futures. I have the following code
// Stocks: Vec<Stock> || Stock is my struct that implements method get_stock_depth
let futures = stocks.iter();
let futures = futures.map(|x| x.get_stock_depth());
let stream = stream::iter(futures);
let stream = stream.buffer_unordered(10);
let result = stream.collect().await;
Stocks vector contains over 800 objects and i figured I'd like to limit concurrent executions. When i'm running the following code i get the following error
type inside async block must be known in this context cannot infer
type for type parameter C declared on the associated function
collect
Am i missing something?
This almost certainly has nothing to do with async or futures. This is just the normal requirement for collect to provide a type. collect() can create a number of return types and doesn't know what you want. You probably want a Vec like:
let result: Vec<_> = stream.collect().await;
You don't typically need to tell collect what to fill the Vec with (it can usually figure that out), so you can use _, but you do need to tell it what collection type you want.
You might also write this as:
let result = stream.collect::<Vec<_>>().await;
Or if this is the last line of a function that returns result, you can use type inference on the return type by dropping the assignment and the semicolon:
stream.collect().await

Writing expression in polars-lazy in rust

I need to write my own expression in polars_lazy. Based on my understanding from the source code I need to write a function that returns Expr::Function. The problem is that in order to construct an object of this type, an object of type FunctionOptions must be provided. The caveat is that this class is public but the members are pub(crate) and thus outside of the create one cannot construct such an object.
Are there ways around this?
I don't think you're meant to directly construct Exprs. Instead, you can use functions like polars_lazy::dsl::col() and polars_lazy::dsl::lit() to create expressions, then use methods on Expr to build up the expression. Several of those methods, such as map() and apply(), will give you an Expr::Function.
Personally I think the Rust API for polars is not well documented enough to really use yet. Although the other answer and comments mention apply and map, they don't mention how or the trade-offs. I hope this answer prompts others to correct me with the "right" way to do things.
So first, here's how to use apply on lazy dataframe, even though lazy dataframes don't take apply directly as a method as eager ones do, and mutating in-place:
// not sure how you'd find this type easily from apply documentation
let o = GetOutput::from_type(DataType::UInt32);
// this mutates two in place
let lf = lf.with_column(col("two").apply(str_to_len, o));
And here's how to use it while not mutating the source column and adding a new output column instead:
let o = GetOutput::from_type(DataType::UInt32);
// this adds new column len, two is unchanged
let lf = lf.with_column(col("two").alias("len").apply(str_to_len, o));
With the str_to_len looking like:
fn str_to_len(str_val: Series) -> Result<Series> {
let x = str_val
.utf8()
.unwrap()
.into_iter()
// your actual custom function would be in this map
.map(|opt_name: Option<&str>| opt_name.map(|name: &str| name.len() as u32))
.collect::<UInt32Chunked>();
Ok(x.into_series())
}
Note that it takes Series rather than &Series and wraps in Result.
With a regular (non-lazy) dataframe, apply still mutates but doesn't require with_column:
df.apply("two", str_to_len).expect("applied");
Whereas eager/non-lazy's with_column doesn't require apply:
// the fn we use to make the column names it too
df.with_column(str_to_len(df.column("two").expect("has two"))).expect("with_column");
And str_to_len has slightly different signature:
fn str_to_len(str_val: &Series) -> Series {
let mut x = str_val
.utf8()
.unwrap()
.into_iter()
.map(|opt_name: Option<&str>| opt_name.map(|name: &str| name.len() as u32))
.collect::<UInt32Chunked>();
// NB. this is naming the chunked array, before we even get to a series
x.rename("len");
x.into_series()
}
I know there's reasons to have lazy and eager operate differently, but I wish the Rust documentation made this easier to figure out.

Elementary function math opereations for Rust ndarray arrays

I simply would like to do elementary math operations (e.g., sin, exp, log, sqrt ...) for Rust ndarray. However, I did not find any useful example for doing so from reading ndarray's documentations.
Say, for example:
extern crate ndarray;
use ndarray as nd;
fn main() {
let matrix = nd::array![[1., 2., 3.], [9., 8., 7.]];
let result = some_math(matrix);
println!("{}", result)
}
fn some_math(...) {
//Here I would like to do elementwise exp() and sqrt
sqrt(exp(...))
// Using f64::exp would fail.
}
How to implement such some_math efficiently? I can of course do the elementwise operations by looping over matrix's elements, but this doesn't sound pretty, I prefer not to do so.
In numpy of python, this simply is np.sqrt(np.exp(matrix)). I mean Rust is an awesome language indeed however, it is really inconvenient (lacks proper ecosystem) to do even simple algebra.
UPDATE: There is an on-going pull request of ndarray. If this is accepted, then you can simply do matrix.exp().sqrt() etc.
There is a very hidden page in ndarray-doc telling how to do such math operations.
Some related questions: 1 2
How to implement such some_math efficiently?
You can use mapv_into():
use ndarray as nd;
use ndarray::Array2;
fn some_math(matrix: Array2<f64>) -> Array2<f64> {
// np.sqrt(np.exp(matrix)) would literally translate to equivalent to
// matrix.mapv_into(f64::exp).mapv_into(f64::sqrt)
// but this version iterates over the matrix just once
matrix.mapv_into(|v| v.exp().sqrt())
}
fn main() {
let matrix = nd::array![[1., 2., 3.], [9., 8., 7.]];
let result = some_math(matrix);
println!("{:?}", result)
}
Playground
That should give you performance comparable to that of numpy, but you should measure to be sure.
To use multiple cores, which makes sense for large arrays, you'd enable the rayon feature of the crate and use par_mapv_inplace():
fn some_math(mut matrix: Array2<f64>) -> Array2<f64> {
matrix.par_mapv_inplace(|v| v.exp().sqrt());
matrix
}
(Doesn't compile on the Playground because the Playground's ndarray doesn't include the rayon feature.)
Note that in the above examples you can replace v.exp().sqrt() with f64::sqrt(f64::exp(v)) if that feels more natural.
EDIT: I was curious about timnings, so I decided to do a trivial (and unscientific) benchmark - creating a random 10_000x10_000 array and comparing np.sqrt(np.sqrt(array)) with the Rust equivalent.
Python code used for benchmarking:
import numpy as np
import time
matrix = np.random.rand(10000, 10000)
t0 = time.time()
np.sqrt(np.exp(matrix))
t1 = time.time()
print(t1 - t0)
Rust code:
use std::time::Instant;
use ndarray::Array2;
use ndarray_rand::{RandomExt, rand_distr::Uniform};
fn main() {
let matrix: Array2<f64> = Array2::random((10000, 10000), Uniform::new(0., 1.));
let t0 = Instant::now();
let _result = matrix.mapv_into(|v| v.exp().sqrt());
let elapsed = t0.elapsed();
println!("{}", elapsed.as_secs_f64());
}
In my experiment on my ancient desktop system, Python takes 3.7 s to calculate, whereas Rust takes 2.5 s. Replacing mapv_into() with par_mapv_inplace() makes Rust drastically faster, now clocking at 0.5 s, 7.4x faster than equivalent Python.
It makes sense that the single-threaded Rust version is faster, since it iterates over the entire array only once, whereas Python does it twice. If we remove the sqrt() operation, Python clocks at 2.8 s, while Rust is still slightly faster at 2.4 s (and still 0.5 s parallel). I'm not sure if it's possible to optimize the Python version without using something like numba. Indeed, the ability to tweak the code without suffering the performance penalty for doing low-level calculations manually is the benefit of a compiled language like Rust.
The multi-threaded version is something that I don't know how to replicate in Python, but someone who knows numba could do it and compare.

Rust - the trait `StdError` is not implemented for `OsString`

I'm writing some Rust code which uses the ? operator. Here is a few lines of that code:
fn files() -> Result<Vec<std::string::String>, Box<Error>> {
let mut file_paths: Vec<std::string::String> = Vec::new();
...
file_paths.push(pathbuf.path().into_os_string().into_string()?);
...
Ok(file_paths)
}
However, even though I'm using ? on a Result it is giving me the following error:
`the trait `StdError` is not implemented for `OsString`.
This is contrary to the Rust documentation here, which states that:
The ? is shorthand for the entire match statements we wrote earlier. In other words, ? applies to a Result value, and if it was an Ok, it unwraps it and gives the inner value. If it was an Err, it returns from the function you're currently in.
I've confirmed that pathbuf.path().into_os_string().into_string() is of type Result, because when I remove the ?, I get the following compiler error:
expected struct `std::string::String`, found enum `std::result::Result`
(since file_paths is a Vector of strings, not Results).
Is this a bug with the Rust language or documentation?
In fact I tried this without pushing to the Vector, but simply initializing a variable with the value of pathbuf.path().into_os_string().into_string()?, and I got the same error.
The function OsString::into_string is a little unusual. It returns a Result<String, OsString> - so the Err variant is actually not an error.
In the event that the OsString cannot be converted into a regular string, then the Err variant is returned, containing the original string.
Unfortunately this means you cannot use the ? operator directly. However, you can use map_err to map the error variant into an actual error, like this:
file_paths.push(
pathbuf.path()
.into_os_string()
.into_string().
.map_err(|e| InvalidPathError::new(e))?
);
In the above example, InvalidPathError might be your own error type. You could also use an error type from the std library.

Create mutable iterator in nightly build after mut_iter removed

I downloaded the nightly build of Rust and attempted to build my code, but interestingly enough I realized mut_iter() no longer exists. What was the reason for removing the ability to create mutable iterators for strings? I have the function:
//invert hex picture, this is used in the print_bitmap function
// to save space and break apart one large code base.
pub fn invert_ascii_hex_string(line: &mut [std::ascii::Ascii]) {
for c in line.mut_iter() {
*c = match c.to_char() {
'x' => ' ',
_ => 'x'
}.to_ascii();
}
}
and now I'm not sure how to go about accomplishing this without a mutable iterator. What could I now use to still iterate through the list and change each value?
Try iter_mut() instead of mut_iter()

Resources