Generic function for modifying scalars and slices in place - rust

I don't understand some basics in Rust. I want to compute a function sinc(x), with x being a scalar or a slice, which modifies the values in place. I can implement methods for both types, calling them with x.sinc(), but I find it more convenient (and easier to read in long formulas) to make a function, e.g. sinc(&mut x). So how do you do that properly?
pub trait ToSinc<T> {
fn sinc(self: &mut Self) -> &mut Self;
}
pub fn sinc<T: ToSinc<T>>(y: &mut T) -> &mut T {
y.sinc()
}
impl ToSinc<f64> for f64 {
fn sinc(self: &mut Self) -> &mut Self {
*self = // omitted
self
}
}
impl<'a> ToSinc<&'a mut [f64]> for &'a mut [f64] {
fn sinc(self: &mut Self) -> &mut Self {
for yi in (**self).iter_mut() { ... }
self
}
}
This seems to work, but isn't the "double indirection" in the last impl costly? I also thought about doing
pub trait ToSinc<T> {
fn sinc(self: Self) -> Self;
}
pub fn sinc<T: ToSinc<T>>(y: T) -> T {
y.sinc()
}
impl<'a> ToSinc<&'a mut f64> for &'a mut f64 {
fn sinc(self) -> Self {
*self = ...
self
}
}
impl<'a> ToSinc<&'a mut [f64]> for &'a mut [f64] {
fn sinc(self) -> Self {
for yi in (*self).iter_mut() { ... }
self
}
}
This also works, the difference is that if x is a &mut [f64] slice, I can call sinc(x) instead of sinc(&mut x). So I have the impression there is less indirection going on in the second one, and I think that's good. Am I on the wrong track here?

I find it highly unlikely that any differences from the double-indirection won't be inlined away in this case, but you're right that the second is to be preferred.
You have ToSinc<T>, but don't use T. Drop the template parameter.
That said, ToSinc should almost certainly be by-value for f64s:
impl ToSinc for f64 {
fn sinc(self) -> Self {
...
}
}
You might also want ToSinc for &mut [T] where T: ToSinc.
You might well say, "ah - one of these is by value, and the other by mutable reference; isn't that inconsistent?"
The answer depends on what you're actually intend the trait to be used as.
An interface for sinc-able types
If your interface represents those types that you can run sinc over, as traits of this kind are intended to be used, the goal would be to write functions
fn do_stuff<T: ToSinc>(value: T) { ... }
Now note that the interface is by-value. ToSinc takes self and returns Self: that is a value-to-value function. In fact, even when T is instantiated to some mutable reference, like &mut [f64], the function is unable to observe any mutation to the underlying memory.
In essence, these functions treat the underlying memory as an allocation source, and to value transformations on the data held in these allocations, much like a Box → Box operation is a by-value transformation of heap memory. Only the caller is able to observe mutations to the memory, but even then implementations which treat their input as a value type will return a pointer that prevents needing to access the data in this memory. The caller can just treat the source data as opaque in the same way that an allocator is.
Operations which depend on mutability, like writing to buffers, should probably not be using such an interface. Sometimes to support these cases it makes sense to build a mutating basis and a convenient by-value accessor. ToString is an interesting example of this, as it's just a wrapper over Display.
pub trait ToSinc: Sized {
fn sinc_in_place(&mut self);
fn sinc(mut self) -> Self {
self.sinc_in_place();
self
}
}
where impls mostly just implement sinc_in_place and users tend to prefer sinc.
As fakery for ad-hoc overloading
In this case, one doesn't care if the trait is actually usable generically, or even that it's consistent. sinc("foo") might do a sing and dance, for all we care.
As such, although the trait is needed it should be defined as weakly as possible:
pub trait Sincable {
type Out;
fn sinc(self) -> Self::Out;
}
Then your function is far more generic:
pub fn sinc<T: Sincable>(val: T) -> T::Out {
val.sinc()
}
To implement a by-value function you do
impl Sincable for f64 {
type Out = f64;
fn sinc(self) -> f64 {
0.4324
}
}
and a by-mut-reference one is just
impl<'a, T> Sincable for &'a mut [T]
where T: Sincable<Out=T> + Copy
{
type Out = ();
fn sinc(self) {
for i in self {
*i = sinc(*i);
}
}
}
since () is the default empty type. This acts just like an ad-hoc overloading would.
Playpen example of emulated ad-hoc overloading.

Related

Optimizing reading a `Vec` of custom types

I am writing a protocol parsing library and want to optimize its speed while reading lists of data.
The library defines a read trait
trait Readable: Sized {
fn read(reader: &mut impl std::io::Read) -> std::io::Result<Self>;
}
and implements all the primitives like so:
impl Readable for u8 {
fn read(reader: &mut impl std::io::Read) -> std::io::Result<Self> {
Ok(reader.read_u8()?)
}
}
// same for i8, u16, i16 etc
The problem is with the Vec implementation, which is defined as
impl <T> Readable for Vec<T>
where T: Readable {
fn read(reader: &mut impl std::io::Read) -> std::io::Result<Self> {
let length = u16::read(reader)?;
let mut items = Vec::with_capacity(length as usize);
for _ in 0..length {
items.push(T::read(reader)?);
}
Ok(items)
}
}
It is common in the protocol to be reading large Vec<u8>, so the current implementation of reading read_u8 on every item is not ideal. Using read_exact has almost 5x the performance at 100 bytes in my benchmarks, and the difference only gets larger as the size increases. (the reader is in-memory, not doing any IO)
The question is, how can I optimize this trait when reading Vec<u8> while keeping the functionality of being able to read generic lists of Vec<T:Readable>?
My current solutions are either:
Use specialization on nightly, but I would rather stay on the stable channel.
add a const IS_U8 : bool to the Readable trait and call read_exact if T::IS_u8 is true in the Vec impl, using mem::transmutate to return it as a Vec<T>. Adding a const for just this case is annoying and I'm not sure I can guarantee the safety of the implementation.
Is there a solution to this problem in stable rust?
There's a way to force a sort of specialization in stable Rust. Here's something that compiles, though it's very awkward. I hope someone else has a less-awkward method. But this trick is to define a separate trait for the unspecialized types. For example:
trait XReadable: Sized {
fn xread(reader: &mut impl std::io::Read) -> std::io::Result<Self>;
}
impl <T: XReadable> Readable for T {
fn read(reader: &mut impl std::io::Read) -> std::io::Result<Self> {
Self::xread(reader)
}
}
So that now you can either implement Readable or XReadable for a type, and then use it as Readable. Now that you have that separation, you can define your unspecialized Vec behavior for XReadable only:
impl <T> XReadable for Vec<T>
where T: XReadable {
fn xread(reader: &mut impl std::io::Read) -> std::io::Result<Self> {
// ...
}
}
And define the specialized behavior explicitly with Readable:
impl Readable for Vec<u8> {
// ...
}

Patterns for proxying and implementation hiding in Rust?

So I'm an experienced developer, pretty new to Rust, expert in Java - but started out in assembly language, so I get memory and allocation, and have written enough compiler-y things to fathom the borrow-checker pretty well.
I decided to port a very useful, high-performance bitset-based graph library I wrote in Java, both to use it in a larger eventual project, and because it's darned useful. Since it's all integer positions in bitsets, and if you want an object graph you map indices into an array or whatever - I'm not tangled up in building a giant tree of objects that reference each other, which would be a mess to try to do in Rust. The problem I'm trying to solve is much simpler - so simple I feel like I must be missing an obvious pattern for how to do it:
Looking over the various bit-set libraries available for Rust, FixedBitSet seemed like a good fit. However, I would rather not expose it via my API and tie every consumer of my library irrevocably to FixedBitSet (it is nice, but, it can also be useful to swap in, say, an implementation backed by atomics; and being married to usize may not be ideal, but FixedBitSet is).
In Java, you'd just create an interface that wraps an instance of the concrete type, and exposes the functionality you want, hiding the implementation type. In Rust, you have traits, so it's easy enough to implement:
pub trait Bits<'i, S: Sized + Add<S>> {
fn size(&'i self) -> S;
fn contains(&'i self, s: S) -> bool;
...
impl<'i, 'a> Bits<'i, usize> for FixedBitSet {
fn size(&'i self) -> usize {
self.len()
}
fn contains(&'i self, s: usize) -> bool {
FixedBitSet::contains(self, s)
}
...
this gets a little ugly in that, if I don't want to expose FixedBitSet, everything has to return Box<dyn Bits<'a, usize> + 'a>, but so be it for now - though that creates its own problems with the compiler not knowing the size of dyn Bits....
So far so good. Where it gets interesting (did I say this was a weirdly simple problem?) is proxying an iterator. Rust iterators seem to be irrevocably tied to a concrete Trait type which has an associated type. So you can't really abstract it (well, sort of, via Box<dyn Iterator<Item=usize> + 'a> where ..., and it looks like it might be possible to create a trait that extends iterator and also has a type Item on it, and implement it for u32, u64, usize and ?? maybe the compiler coalesces the type Item members of the traits?). And as far as I can tell, you can't narrow the return type in a trait implementation method to something other than the trait specifies, either.
This gets further complicated by the fact that the Ones type in FixedBitSet for iterating set bits has its own lifetime - but Rust's Iterator does not - so any generic Iterator implementation is going to need to return an iterator scoped to that lifetime, not '_ or there will be issues with how long the iterator lives vis-a-vis the thing that created it.
The tidiest thing I could come up with - which is not tidy at all - after experimenting with various containers for an iterator (an implementation of Bits that adds an offset to the base value is also useful) that expose it, was something like:
pub trait Biterable<'a, 'b, S: Sized + Add<S>, I: Iterator<Item = S> + 'b> where 'a: 'b {
fn set_bits<'c>(&'a mut self) -> I where I: 'c, 'b: 'c;
}
which is implementable enough:
impl<'a, 'b> Biterable<'a, 'b, usize, Ones<'b>> for FixedBitSet where 'a: 'b {
fn set_bits<'c>(&'a mut self) -> Ones<'b> where Ones<'b>: 'c, 'b: 'c {
self.ones()
}
}
but then, we know we're going to be dealing with Boxes. So, we're going to need an implementation for that. Great! A signature like impl<'a, 'b> Biterable<'a, 'b, usize, Ones<'b>> for Box<FixedBitSet> where 'a: 'b { is implementable. BUUUUUUT, that's not what anything is going to return if we're not exposing FixedBitSet anywhere - we need it for Box<dyn Bits<...> + ...>. For that, we wind up in a hall of mirrors, scribbling out increasingly baroque and horrifying (and uncompilable) variants on
impl<'a, 'b, B> Biterable<'a, 'b, usize, &'b mut dyn Iterator<Item=usize>>
for Box<dyn B + 'a> where 'a: 'b, B : Bits<'a, usize> + Biterable<'a, 'b, usize, Ones<'b>> {
in a vain search for something that compiles and works (this fails because while Bits and Biterable are traits, evidently Biterable + Bits is not a trait). Seriously - a stateless, no-allocation-needed wrapper for one call on this thing returning one call on that thing, just not exposing that thing's type to the caller. That's it. The Java equivalent would be Supplier<T> a = ...; return () -> a.get();
I have to be thinking about this problem wrong. How?
It does certainly seem like you're over-complicating things. You have a lot of lifetime annotations that don't seem necessary. Here's a straightforward implementation (ignoring generic S):
use fixedbitset::FixedBitSet; // 0.2.0
pub trait Bits {
fn size(&self) -> usize;
fn contains(&self, s: usize) -> bool;
fn set_bits<'a>(&'a mut self) -> Box<dyn Iterator<Item = usize> + 'a>;
}
impl Bits for FixedBitSet {
fn size(&self) -> usize {
self.len()
}
fn contains(&self, s: usize) -> bool {
self.contains(s)
}
fn set_bits<'a>(&'a mut self) -> Box<dyn Iterator<Item = usize> + 'a> {
Box::new(self.ones())
}
}
pub fn get_bits_from_api() -> impl Bits {
FixedBitSet::with_capacity(64)
}
If you want the index type to be anonymous as well, make it an associated type and not define it when exposing your Bits:
use fixedbitset::FixedBitSet; // 0.2.0
pub trait Bits {
type Idx: std::ops::Add<Self::Idx>;
fn size(&self) -> Self::Idx;
fn contains(&self, s: Self::Idx) -> bool;
fn set_bits<'a>(&'a self) -> Box<dyn Iterator<Item = Self::Idx> + 'a>;
}
impl Bits for FixedBitSet {
type Idx = usize;
fn size(&self) -> Self::Idx {
self.len()
}
fn contains(&self, s: Self::Idx) -> bool {
self.contains(s)
}
fn set_bits<'a>(&'a self) -> Box<dyn Iterator<Item = Self::Idx> + 'a> {
Box::new(self.ones())
}
}
pub fn get_bits_from_api() -> impl Bits {
// ^^^^^^^^^ doesn't have <Idx = usize>
FixedBitSet::with_capacity(64)
}
fn main() {
let bits = get_bits_from_api();
// just a demonstration that it compiles
let size = bits.size();
if bits.contains(size) {
for indexes in bits.set_bits() {
// ...
}
}
}
I highly encourage against this though for many reasons. 1) You'd need many more constraints than just Add for this to be remotely usable. 2) You are severely limited with impl Bits; its not fully defined so you can't have dyn Bits or store it in a struct. 3) I don't see much benefit in being generic in this regard.

Mutually exclusive traits

I need to create operations for an operation sequence. The operations share the following behaviour. They can be evaluated, and at construction they can either be parametrized by a single i32 (eg. Sum) or not parametrized at all (eg. Id).
I create a trait Operation. The evaluate part is trivial.
trait Operation {
fn evaluate(&self, operand: i32) -> i32;
}
But I don't know how to describe the second demand. The first option is to simply let concrete implementations of Operation handle that behaviour.
pub struct Id {}
impl Id {
pub fn new() -> Id {
Id {}
}
}
impl Operation for Id {
fn evaluate(&self, operand: i32) -> i32 {
operand
}
}
pub struct Sum {
augend: i32,
}
impl Sum {
pub fn new(augend: i32) -> Sum {
Sum { augend }
}
}
impl Operation for Sum {
fn evaluate(&self, addend: i32) -> i32 {
augend + addend
}
}
Second option is a new function that takes an optional i32. Then the concrete implementations deal with the possibly redundant input. I find this worse than the first option.
trait Operation {
fn evaluate(&self, operand: i32) -> i32;
fn new(parameter: std::Option<i32>)
}
Google has lead me to mutually exclusive traits: https://github.com/rust-lang/rust/issues/51774. It seems promising, but it doesn't quite solve my problem.
Is there a way to achieve this behaviour?
trait Operation = Evaluate + (ParametrizedInit or UnparametrizedInit)
How about you use an associated type to define the initialization data?
trait Operation {
type InitData;
fn init(data: Self::InitData) -> Self;
fn evaluate(&self, operand: i32) -> i32;
}
impl Operation for Id {
type InitData = ();
fn init(_: Self::InitData) -> Self {
Id {}
}
fn evaluate(&self, operand: i32) -> i32 {
operand
}
}
impl Operation for Sum {
type InitData = i32;
fn init(augend: Self::InitData) -> Self {
Sum { augend }
}
fn evaluate(&self, addend: i32) -> i32 {
augend + addend
}
}
For the Id case you specify () to say that the initialization does not need data. It's still a bit meh to call Operation::init(()), but I think the trait at least captures the logic fairly well.
To actually get mutually exclusive traits (which is apparently what you want), you have to use some workaround. The Rust language does not support mutually exclusive traits per-se. But you can use associated types and some marker types to get something similar. This is a bit strange, but works for now.
trait InitMarker {}
enum InitFromNothingMarker {}
enum InitFromI32Marker {}
impl InitMarker for InitFromNothingMarker {}
impl InitMarker for InitFromI32Marker {}
trait Operation {
type InitData: InitMarker;
fn init() -> Self
where
Self: Operation<InitData = InitFromNothingMarker>;
fn init_from(v: i32) -> Self
where
Self: Operation<InitData = InitFromI32Marker>;
}
trait UnparametrizedInit: Operation<InitData = InitFromNothingMarker> {}
trait ParametrizedInit: Operation<InitData = InitFromI32Marker> {}
impl<T: Operation<InitData = InitFromNothingMarker>> UnparametrizedInit for T {}
impl<T: Operation<InitData = InitFromI32Marker>> ParametrizedInit for T {}
(Ideally you want to have a Sealed trait that is defined in a private submodule of your crate. That way, no one (except for you) can implement the trait. And then make Sealed a super trait for InitMarker.)
This is quite a bit of code, but at least you can make sure that implementors of Operation implement exactly one of ParametrizedInit and UnparametrizedInit.
In the future, you will likely be able to replace the marker types with an enum and the associated type with an associated const. But currently, "const generics" are not finished enough, so we have to take the ugly route by using marker types. I'm actually discussing these solutions in my master's thesis (section 4.2, just search for "mutually exclusive").

Understanding the 'self' parameter in the context of trait implementations

When implementing a trait, we often use the keyword self, a sample is as follows. I want to understand the representation of the many uses of self in this code sample.
struct Circle {
x: f64,
y: f64,
radius: f64,
}
trait HasArea {
fn area(&self) -> f64; // first self: &self is equivalent to &HasArea
}
impl HasArea for Circle {
fn area(&self) -> f64 { //second self: &self is equivalent to &Circle
std::f64::consts::PI * (self.radius * self.radius) // third:self
}
}
My understanding is:
The first self: &self is equivalent to &HasArea.
The second self: &self is equivalent to &Circle.
Is the third self representing Circle? If so, if self.radius was used twice, will that cause a move problem?
Additionally, more examples to show the different usage of the self keyword in varying context would be greatly appreciated.
You're mostly right.
The way I think of it is that in a method signature, self is a shorthand:
impl S {
fn foo(self) {} // equivalent to fn foo(self: S)
fn foo(&self) {} // equivalent to fn foo(self: &S)
fn foo(&mut self) {} // equivalent to fn foo(self: &mut S)
}
It's not actually equivalent since self is a keyword and there are some special rules (for example for lifetime elision), but it's pretty close.
Back to your example:
impl HasArea for Circle {
fn area(&self) -> f64 { // like fn area(self: &Circle) -> ...
std::f64::consts::PI * (self.radius * self.radius)
}
}
The self in the body is of type &Circle. You can't move out of a reference, so self.radius can't be a move even once. In this case radius implements Copy, so it's just copied out instead of moved. If it were a more complex type which didn't implement Copy then this would be an error.
You are mostly correct.
There is a neat trick to let the compiler tell you the type of variables rather than trying to infer them: let () = ...;.
Using the Playground I get for the 1st case:
9 | let () = self;
| ^^ expected &Self, found ()
and for the 2nd case:
16 | let () = self;
| ^^ expected &Circle, found ()
The first case is actually special, because HasArea is not a type, it's a trait.
So what is self? It's nothing yet.
Said another way, it poses for any possible concrete type that may implement HasArea. And thus the only guarantee we have about this trait is that it provides at least the interface of HasArea.
The key point is that you can place additional bounds. For example you could say:
trait HasArea: Debug {
fn area(&self) -> f64;
}
And in this case, Self: HasArea + Debug, meaning that self provides both the interfaces of HasArea and Debug.
The second and third cases are much easier: we know the exact concrete type for which the HasArea trait is implemented. It's Circle.
Therefore, the type of self in the fn area(&self) method is &Circle.
Note that if the type of the parameter is &Circle then it follows that in all its uses in the method it is &Circle. Rust has static typing (and no flow-dependent typing) so the type of a given binding does not change during its lifetime.
Things can get more complicated, however.
Imagine that you have two traits:
struct Segment(Point, Point);
impl Segment {
fn length(&self) -> f64;
}
trait Segmentify {
fn segmentify(&self) -> Vec<Segment>;
}
trait HasPerimeter {
fn has_perimeter(&self) -> f64;
}
Then, you can implement HasPerimeter automatically for all shapes that can be broken down in a sequence of segments.
impl<T> HasPerimeter for T
where T: Segmentify
{
// Note: there is a "functional" implementation if you prefer
fn has_perimeter(&self) -> f64 {
let mut total = 0.0;
for s in self.segmentify() { total += s.length(); }
total
}
}
What is the type of self here? It's &T.
What's T? Any type that implements Segmentify.
And therefore, all we know about T is that it implements Segmentify and HasPerimeter, and nothing else (we could not use println("{:?}", self); because T is not guaranteed to implement Debug).

Unable to hold/pass reference of parent to the composition object

In C++ it would something like struct A is composed of struct B and some function of B takes a pointer to the parent object A. So function of A calling that function of B will simply pass the this pointer to it. I'm trying this in Rust but failing to get it to work - this is what I want to achieve:
struct A<Type: T> {
composition: Type,
value: usize,
}
impl<Type> A<Type> where Type: T {
fn new(obj: Type) -> A<Type> {
A {
composition: obj,
value: 999,
}
}
fn f(&mut self) {
println!("Value: {:?}", self.value);
}
fn call(&mut self) {
self.composition.f(&mut self);
}
}
trait T {
fn f(&mut self, &mut A<Self>);
}
struct B {
value: usize,
}
impl B {
fn new() -> B {
B { value: 0, }
}
}
impl T for B {
fn f(&mut self, parent: &mut A<B>) {
println!("B::f");
parent.f();
}
}
fn main() {
let objA = A::new(B::new());
// I want call sequence -> A::call() -> B::f() -> A::f()
objA.call();
}
Note that i require mutability in all the functions although in example above it might seem that &mut self in most function parameters don't make much sense. How do it do this in Rust?
This cannot work because you're violating mutable aliasing requirements - you're trying to mutably borrow A and its substructure at the same time:
self.composition.f(self);
// roughtly equivalent to:
let c = &mut self.composition; // borrow substructure
c.f(self /* borrow self */);
(I've removed explicit &mut self because it is incorrect (as it gives you &mut &mut A<...>, but it does not change the whole picture at all.)
This is a natural error in Rust framework. Suppose that f implementation on this particular composition X rewrites composition field on the passed object:
impl T for X {
fn f(&mut self, a: &mut A<X>) {
a.composition = create_x_somehow();
}
}
And suddenly the object this method is called on is destroyed, and self is invalidated!
Naturally, the compiler prevents you from doing this even if you know that you don't modify composition, because such kind of knowledge cannot be encoded statically (especially given that this is a trait method which can be implemented by anyone having access to your trait).
You have essentially two choices in such situations:
reformulate the problem so it does not require such architecture anymore, or
use special language/library constructs to work around such static checks.
The second point is about using such things as Cell/RefCell (they are safe, i.e. don't require unsafe blocks, but they can panic at runtime - probably these can work in your case) or, if nothing else helps, dropping to raw pointers and unsafe code. But, frankly, the first option usually is better: if you design your code based on ownership semantics and aliasing rules enforced by the compiler, almost always the resulting architecture would be of much better quality.

Resources