Elementary function math opereations for Rust ndarray arrays - rust

I simply would like to do elementary math operations (e.g., sin, exp, log, sqrt ...) for Rust ndarray. However, I did not find any useful example for doing so from reading ndarray's documentations.
Say, for example:
extern crate ndarray;
use ndarray as nd;
fn main() {
let matrix = nd::array![[1., 2., 3.], [9., 8., 7.]];
let result = some_math(matrix);
println!("{}", result)
}
fn some_math(...) {
//Here I would like to do elementwise exp() and sqrt
sqrt(exp(...))
// Using f64::exp would fail.
}
How to implement such some_math efficiently? I can of course do the elementwise operations by looping over matrix's elements, but this doesn't sound pretty, I prefer not to do so.
In numpy of python, this simply is np.sqrt(np.exp(matrix)). I mean Rust is an awesome language indeed however, it is really inconvenient (lacks proper ecosystem) to do even simple algebra.
UPDATE: There is an on-going pull request of ndarray. If this is accepted, then you can simply do matrix.exp().sqrt() etc.
There is a very hidden page in ndarray-doc telling how to do such math operations.
Some related questions: 1 2

How to implement such some_math efficiently?
You can use mapv_into():
use ndarray as nd;
use ndarray::Array2;
fn some_math(matrix: Array2<f64>) -> Array2<f64> {
// np.sqrt(np.exp(matrix)) would literally translate to equivalent to
// matrix.mapv_into(f64::exp).mapv_into(f64::sqrt)
// but this version iterates over the matrix just once
matrix.mapv_into(|v| v.exp().sqrt())
}
fn main() {
let matrix = nd::array![[1., 2., 3.], [9., 8., 7.]];
let result = some_math(matrix);
println!("{:?}", result)
}
Playground
That should give you performance comparable to that of numpy, but you should measure to be sure.
To use multiple cores, which makes sense for large arrays, you'd enable the rayon feature of the crate and use par_mapv_inplace():
fn some_math(mut matrix: Array2<f64>) -> Array2<f64> {
matrix.par_mapv_inplace(|v| v.exp().sqrt());
matrix
}
(Doesn't compile on the Playground because the Playground's ndarray doesn't include the rayon feature.)
Note that in the above examples you can replace v.exp().sqrt() with f64::sqrt(f64::exp(v)) if that feels more natural.
EDIT: I was curious about timnings, so I decided to do a trivial (and unscientific) benchmark - creating a random 10_000x10_000 array and comparing np.sqrt(np.sqrt(array)) with the Rust equivalent.
Python code used for benchmarking:
import numpy as np
import time
matrix = np.random.rand(10000, 10000)
t0 = time.time()
np.sqrt(np.exp(matrix))
t1 = time.time()
print(t1 - t0)
Rust code:
use std::time::Instant;
use ndarray::Array2;
use ndarray_rand::{RandomExt, rand_distr::Uniform};
fn main() {
let matrix: Array2<f64> = Array2::random((10000, 10000), Uniform::new(0., 1.));
let t0 = Instant::now();
let _result = matrix.mapv_into(|v| v.exp().sqrt());
let elapsed = t0.elapsed();
println!("{}", elapsed.as_secs_f64());
}
In my experiment on my ancient desktop system, Python takes 3.7 s to calculate, whereas Rust takes 2.5 s. Replacing mapv_into() with par_mapv_inplace() makes Rust drastically faster, now clocking at 0.5 s, 7.4x faster than equivalent Python.
It makes sense that the single-threaded Rust version is faster, since it iterates over the entire array only once, whereas Python does it twice. If we remove the sqrt() operation, Python clocks at 2.8 s, while Rust is still slightly faster at 2.4 s (and still 0.5 s parallel). I'm not sure if it's possible to optimize the Python version without using something like numba. Indeed, the ability to tweak the code without suffering the performance penalty for doing low-level calculations manually is the benefit of a compiled language like Rust.
The multi-threaded version is something that I don't know how to replicate in Python, but someone who knows numba could do it and compare.

Related

How do I iterate over a pyo3 PyObject in Rust?

I have a pre-imported module that I'm calling a method with python gil, something like the following.
Python::with_gil(|py| {
let res = module.call_method1(py, "my_method", (arg1, arg2))?;
})
This returns the rust object PyObject, however what this returns is a python list. I want to iterate over this list to convert the internals into something I can use in Rust (it's a python list of Numpy arrays, I'm using the numpy/ndarray crates).
I'm a little confused as to how I'm meant to iterate over this. If I try cast_as to a PyList, I get the warning: UnsafeCell<PyObject> cannot be shared between threads safely. It seems extract does not work either.
How do I iterate over this PyObject? Thanks.
Edit: Adding further details as requested
The returned value from python is a List[numpy.ndarray] if you are using the python Typing system. As the lengths of each numpy array could be different, I cannot just convert it all into a numpy array in python and pass it through. An example output is below:
[array([214.17725372, 192.78236675, 354.27965546, 389.84558392,
0.99999297])]
What I've tried in Rust:
let pylist = res.cast_as::<PyList>(py)?;
Fails to compile with: UnsafeCell<PyObject> cannot be shared between threads safely.
let pylist = res.extract::<PyList>(py)?;
Fails to compile with: the trait 'PyClass' is not implemented for 'PyList'. Please note I have use pyo3::prelude::*; at the top.
let pyany = res.extract::<Vec<PyArray1<f64>>>(py)?;
Fails to compile with: the trait bound 'Vec<PyArray<f64, Dim<[usize; 1]>>>: pyo3::FromPyObject<'_>' is not satisfied. This PyArray is from the numpy crate.
I see that you returning a list of numpy arrays with dtype=float. If you are open to using another dependency, there is rust-numpy which lets you map numpy arrays to ndarrays in Rust.
ndarrays have a .to_vec() method to convert to standard Rust containers.
In your case you could do the following (this is not tested):
use numpy::PyArray1;
use pyo3::{types::IntoPyDict, PyResult, Python};
Python::with_gil(|py| {
let res: Vec<&PyArray1<f32>> = module // <-- Notice the return type
.call_method1(py, "my_method", (arg1, arg2))?
.extract()?; // <-- Notice the extract
})
Mappings between Python and Rust types can be found here. You can then use the res to do further computations.
println!("I just got a numpy array from python {:?}", res[0]);
println!("I converted that to a vector here: {:?}", res[0].to_vec());

How do I use `ndarray_stats::CorrelationExt` on a `polars::prelude::DataFrame`?

I'm trying to calculate the covariance of a data frame in Rust. The ndarray_stats crate defines such a function for arrays, and I can produce an array from a DataFrame using to_ndarray. The compiler is happy if I use the example in the documentation (a), but if I try to use it on an Array2 produced from a DataFrame, this doesn't work:
use polars::prelude::*;
use ndarray_stats::CorrelationExt;
fn cov(df: &DataFrame) -> Vec<f64> {
// Both of these are Array2<f64>s
let mat = df.to_ndarray::<Float64Type>().unwrap();
let a = arr2(&[[1., 3., 5.], [2., 4., 6.]]);
let x = a.cov(1.).unwrap();
let y = mat.cov(1.).unwrap();
}
|
22 | let y = mat.cov(1.).unwrap();
| ^^^ method not found in `ndarray::ArrayBase<ndarray::data_repr::OwnedRepr<f64>, ndarray::dimension::dim::Dim<[usize; 2]>>`
Why does the compiler allow the definition of x but not y? How can I fix the code such that y can be assigned?
It is a dependency version mismatch.polars-core depends on ndarray version 0.13.x as of 0.14.7, whereas
ndarray-stats 0.5 requires ndarray 0.15. As you use the latest version of ndarray in your project as well, the 2D array type of x will be compatible with the extension trait CovExt provided by ndarray-stats, but y will not.
Regardless of the nature of a type in a library, once multiple semver-incompatible versions of a library are included, their types will typically not be interchangeable. In other words, even though these Array2<_> may appear to be the same type, they are treated as different types by the compiler.
The multiple versions of a crate in a package can be found by inspecting the output of cargo tree -d, which shows only duplicate dependencies and the reverse tree that shows the crates depending on them. Duplicates do not necessarily pose a problem, but problems arise if the project consumes more than one API directly.
The lowest common denominator at the time of writing is to downgrade ndarray to 0.13 and ndarray-stats to 0.3, which also has the method cov. It may also be worth looking into contributing to the polars project in order to update ndarray there.

Does partial application in Rust have overhead?

I like using partial application, because it permits (among other things) to split a complicated function call, that is more readable.
An example of partial application:
fn add(x: i32, y: i32) -> i32 {
x + y
}
fn main() {
let add7 = |x| add(7, x);
println!("{}", add7(35));
}
Is there overhead to this practice?
Here is the kind of thing I like to do (from a real code):
fn foo(n: u32, things: Vec<Things>) {
let create_new_multiplier = |thing| ThingMultiplier::new(thing, n); // ThingMultiplier is an Iterator
let new_things = things.clone().into_iter().flat_map(create_new_multiplier);
things.extend(new_things);
}
This is purely visual. I do not like to imbricate too much the stuff.
There should not be a performance difference between defining the closure before it's used versus defining and using it it directly. There is a type system difference — the compiler doesn't fully know how to infer types in a closure that isn't immediately called.
In code:
let create_new_multiplier = |thing| ThingMultiplier::new(thing, n);
things.clone().into_iter().flat_map(create_new_multiplier)
will be the exact same as
things.clone().into_iter().flat_map(|thing| {
ThingMultiplier::new(thing, n)
})
In general, there should not be a performance cost for using closures. This is what Rust means by "zero cost abstraction": the programmer could not have written it better themselves.
The compiler converts a closure into implementations of the Fn* traits on an anonymous struct. At that point, all the normal compiler optimizations kick in. Because of techniques like monomorphization, it may even be faster. This does mean that you need to do normal profiling to see if they are a bottleneck.
In your particular example, yes, extend can get inlined as a loop, containing another loop for the flat_map which in turn just puts ThingMultiplier instances into the same stack slots holding n and thing.
But you're barking up the wrong efficiency tree here. Instead of wondering whether an allocation of a small struct holding two fields gets optimized away you should rather wonder how efficient that clone is, especially for large inputs.

How to compare strings in constant time?

How does one safely compare two strings with bounded length in such a way that each comparison takes the same time? Hashing unfortunately has a timing attack vulnerability.
Is there any way to compare two strings without hashing in a way that is not vulnerable to timing-attacks?
TL;DR: Use assembly.
Constant Time code is really hard to pull off. To be truly constant time you need:
a constant time algorithm,
a constant time implementation of said algorithm.
What does "constant time algorithm" mean?
The example of string comparison is great. Most of the time, you want the comparison to take as little as possible, which means bailing out at the first difference:
fn simple_compare(a: &str, b: &str) -> bool {
if a.len() != b.len() { return false; }
for (a, b) in a.bytes().zip(b.bytes()) {
if a != b { return false; }
}
true
}
The constant time version algorithm version however should have constant time regardless of the input:
the input should always have the same size,
the time taken to compute the result should be identical no matter where the difference is located (if any).
The algorithm Lukas gave is almost right:
/// Prerequisite: a.len() == b.len()
fn ct_compare(a: &str, b: &str) -> bool {
debug_assert!(a.len() == b.len());
a.bytes().zip(b.bytes())
.fold(0, |acc, (a, b)| acc | (a ^ b) ) == 0
}
What does "constant time implementation" mean?
Even if the algorithm is constant time, the implementation may not be.
If the exact same sequence of CPU instructions is not used, then on some architecture one of the instructions could be faster, while the other is slower, and the implementation would lose.
If the algorithm uses table look-up, then there could be more or less cache misses.
Can you write a constant time implementation of string comparison in Rust?
No.
The Rust language could potentially be suited to the task, however its toolchain is not:
the LLVM optimizer will wreak havoc with your algorithm, short-circuiting it, eliminating unnecessary reads, now or in the future,
the LLVM backends will wreak havoc with your implementation, picking different instructions.
The short and long is that, today, the only way to access a constant time implementation from Rust is to write said implementation in assembly.
To write a timing-attack-safe string comparison algorithm yourself is pretty easy in theory. There are many resources online on how to do it in other languages. The important part is to trick the optimizer in not optimizing your code in a way you don't want. Here is one example Rust implementation which uses the algorithm described here:
fn ct_compare(a: &str, b: &str) -> bool {
if a.len() != b.len() {
return false;
}
a.bytes().zip(b.bytes())
.fold(0, |acc, (a, b)| acc | (a ^ b) ) == 0
}
(Playground)
Of course, this algorithm can be easily generalized to everything that is AsRef<[u8]>. This is left as an exercise to the reader ;-)
It looks like there is a crate already offering these kinds of comparisons: consistenttime. I haven't tested it, but the documentation looks quite good.
For those looking for a crate providing such implementation, you can use rust-crypto that provides the function fixed_time_eq.
The implementation is very similar to the one of Lukas Kalbertodt.

Overloading the Add-operator without copying the operands

I'm writing an application in Rust that will have to use vector arithmetic intensively and I stumbled upon a problem of designing operator overload for a structure type.
So I have a vector structure like that:
struct Vector3d {
pub x: f64,
pub y: f64,
pub z: f64,
}
and I want to be able to write something like that:
let x = Vector3d {x: 1.0, y: 0.0, z: 0.0};
let y = Vector3d {x: -1.0, y: 0.0, z: 0.0};
let u = x + y;
As far as I can see, there are three different ways to do it:
Implement std::ops::Add trait for Vector3d directly. That works, but this trait's method signature is:
fn add(self, other: Vector3d)
So it will invalidate its arguments after usage (because it moves them) which is undesirable in my case since many vectors will be used in multiple expressions.
Implement Add trait for Vector3d and also implement the Copy trait. This works, but I feel iffy on that since Vector3d isn't exactly a lightweight thing (24 bytes at least) that can be copied quickly, especially when there are many calls to arithmetic functions.
Implement Add for references to Vector3d, as suggested here. This works, but in order to apply the operator, I will have to write
let u = &x + &y;
I don't like this notation because it doesn't exactly looks like its mathematic equivalent, just u = x + y.
I'm not sure which variant is optimal. So, the question is: is there a way to overload the '+' operator in such a way that
It accepts its arguments as references instead of copying or moving them;
It allows to write just u = x + y instead of u = &x + &y?
Is there a way to overload the '+' operator in such a way that
It accepts its arguments as references instead of copying or moving them;
It allows to write just u = x + y instead of u = &x + &y?
No, there is no way to do that. Rust greatly values explicitness and hardly converts between types automatically.
However, the solution to your problem is simple: just #[derive(Copy)]. I can assure you that 24 bytes are not a lot. Computers these days love to crunch a lot of data at once instead of working on little chunks of data.
Apart from that, Copy is not really about the performance overhead of copying/cloning:
Types that can be copied by simply copying bits (i.e. memcpy).
And later in the documentation:
Generally speaking, if your type can implement Copy, it should.
Your type Vector3d can be copied by just copying bits, so it should implement Copy (by just #[derive()]ing it).
The performance overhead is a different question. If you have a type that can (and thus does) implement Copy, but you still think the type is too big (again: 24 bytes aren't!), you should design all your methods in a way that they accept references (it's not that easy; please read Matthieu's comment). This also includes the Add impl. And if you want to pass something to a function by reference, the programmer shall explicitly write it. That's what Rust's philosophy would say anyway.

Resources