How do I use a custom comparator function with BTreeSet?

How do I use a custom comparator function with BTreeSet? - rust

In C++, it is possible to customize the code std::set uses to sort its arguments. By default it uses std::less, but that can be changed with the Compare template parameter.
Rust's BTreeSet uses the Ord trait to sort the type. I don't know of a way to override this behavior -- it's built into the type constraint of the type stored by the container.
However, it often makes sense to build a list of items that are sorted by some locally-useful metric that nevertheless is not the best way to always compare the items by. Or, suppose I would like to sort items of a used type; in this case, it's impossible to implement Ord myself for the type, even if I want to.
The workaround is of course to build a plain old Vec of the items and sort it afterward. But in my opinion, this is not as clean as automatically ordering them on insertion.
Is there a way to use alternative comparators with Rust's container types?

Custom comparators currently do not exist in the Rust standard collections. The idiomatic way to solve the issue is to define a newtype:
struct Wrapper(Wrapped);
You can then define a custom Ord implementation for Wrapper with exactly the semantics you want.
Furthermore, since you have a newtype, you can also easily implement other traits to facilitate conversion:
convert::From can be implemented, giving you convert::Into for free
ops::Deref<Target = Wrapped> can be implemented, reducing the need for mapping due to auto-deref
Note that accessing the wrapped entity is syntactically lightweight as it's just two characters: .0.

Related

How to pass a custom comparator function for BinaryHeap in rust? [duplicate]

In C++, it is possible to customize the code std::set uses to sort its arguments. By default it uses std::less, but that can be changed with the Compare template parameter.
Rust's BTreeSet uses the Ord trait to sort the type. I don't know of a way to override this behavior -- it's built into the type constraint of the type stored by the container.
However, it often makes sense to build a list of items that are sorted by some locally-useful metric that nevertheless is not the best way to always compare the items by. Or, suppose I would like to sort items of a used type; in this case, it's impossible to implement Ord myself for the type, even if I want to.
The workaround is of course to build a plain old Vec of the items and sort it afterward. But in my opinion, this is not as clean as automatically ordering them on insertion.
Is there a way to use alternative comparators with Rust's container types?

Custom comparators currently do not exist in the Rust standard collections. The idiomatic way to solve the issue is to define a newtype:
struct Wrapper(Wrapped);
You can then define a custom Ord implementation for Wrapper with exactly the semantics you want.
Furthermore, since you have a newtype, you can also easily implement other traits to facilitate conversion:
convert::From can be implemented, giving you convert::Into for free
ops::Deref<Target = Wrapped> can be implemented, reducing the need for mapping due to auto-deref
Note that accessing the wrapped entity is syntactically lightweight as it's just two characters: .0.

Why do I need "use rand::Rng" to call gen() on rand::thread_rng()?

When I'm using Rust's rand crate, if I want to produce a rand number, I would write:
use rand::{self, Rng};
let rand = rand::thread_rng().gen::<usize>();
If I don't use rand::Rng, an error occurs:
no method named gen found for struct rand::prelude::ThreadRng in the current scope
That's quite different from what I'm used to. Usually I treat mods like:
import rand from "path";
rand.generate();
Once I import the mod I don't need to import something else, and I can use every method it exports.
Why must I use rand::Rng to enable the gen method on rand::thread_rng()?

That's quite different from what I used to know.
It feels different because it is indeed different. You are probably used to dynamic dispatch via some kind of virtual method table (as in e.g. C++), or, in case of JS, to dynamic dispatch by looking up either the own properties of the receiver object, or its ancestors via the __proto__-chain. In any case, the object on which you are invoking a method carries around some data that tells it how to get the method that you're invoking. Given the signature of the invoked method, the receiver object itself knows how to get the method with that signature.
That's not the only way, though. For example,
modules / functors in OCaml or SML
Typeclasses in Haskell
implicits / givens in Scala
traits in Rust
work on a rather different principle: the methods are not tied to the receiver, but to the module / typeclass / given / trait instances. In each case, those are entities that are separate from the receiver of the method call. It opens some new possibilities, e.g. it allows you to do some ad-hoc polymorphism (i.e. to define instances of traits after the fact, for types that are not necessarily under your control). At the same time, the compiler typically requires a bit more information from you in order to be able to select the correct instances: it behaves somewhat like a little type-directed search engine, or even a little "theorem prover", and for this to work, you have to tell the compiler where to look for the suitable building blocks for those synthetically generated instances.
If you've never worked before with any language that has a compiler with a subsystem that is "searching for instances" based on type information, this should indeed feel quite foreign. The error messages and the solution approaches do indeed feel rather different, because instead of comparing your implementation against an interface and looking for conflicts, you have to guide this instance-searching mechanism by providing more hints (e.g. by importing more traits etc.).
In your particular case, rand::thread_rng returns a struct ThreadRng. On its own, the struct knows nothing about the gen method, because this method is not tied directly to the struct. Instead, it's defined in the Rng trait. But at the same time, it could be defined in some entirely unrelated trait, and have some completely different meaning. In order to disambiguate the intended meaning, you therefore have to explicitly specify that you want to work with the Rng trait. This is why you have to mention it in the use-clause.

I don't know the specific library you're using, but I can guess at the problem. I would guess that Rng is a trait which defines gen. Traits can be thought of as somewhat like Java's interfaces: they enable ad-hoc polymorphism by allowing you to define different behaviors for the same function on different datatypes.
However, Rust's traits fix one major problem (well, they fix several major problems, but one that's relevant here) with Java's interfaces. In Java, if you define an interface, then anyone writing a class can implement the interface, but you can't implement it for other people. In particular, the built-in types String and int and the like can never implement any new interfaces downstream. In Rust, either the trait writer or the struct/enum writer can implement the trait.
But this poses another issue. Now, if I have a value foo of type Foo and I write foo.bar(), then bar might not be a method defined on Foo; it might be something some trait writer implemented in some other file. We can't go search every Rust file on your computer for possible matching traits, so Rust makes the logical decision to restrict this search to traits that are in scope. If you want to call foo.bar() and bar is a method on trait Bar, then trait Bar has to be in scope when you call it. Otherwise, Rust won't see it.
So, in your case, thread_rng() returns a rand::prelude::ThreadRng. The method gen is not defined on rand::prelude::ThreadRng. Instead, it's defined on a trait called rand::Rng which is *implemented by ThreadRng. That trait has to be in-scope to use the method.

Create a map using type as key

I need a HashMap<K,V> where V is a trait (it will likely be Box or an Rc or something, that's not important), and I need to ensure that the map stores at most one of a given struct, and more importantly, that I can query the presence of (and retrieve/insert) items by their type. K can be anything that is unique to each type (a uint would be nice, but a String or even some large struct holding type information would be sufficient as long as it can be Eq and Hashable)
This is occurring in a library, so I cannot use an enum or such since new types can be added by external code.
I looked into std::any::TypeId but besides not working for non-'static types, it seems they aren't even unique (and allegedly collisions were achieved accidentally with a rather small number of types) so I'd prefer to avoid them if feasible since the number of types I'll have may be very large. (hence this is not a duplicate of this IMO)
I'd like something along the lines of a macro to ensure uniqueness but I can't figure out how to have some kind of global compile time counter. I could use a proper UUID, but it'd be nice to have guaranteed uniqueness since this is, in theory at least, statically determinable.
It is safe to assume that all relevant types are defined either in this lib or in a singular crate that directly depends on it, if that allows for a solution that might be otherwise impossible.
e.g. my thoughts are to generate ids for types in the lib, and also export a constant of the counter, which can be used by the consumer of the lib in the same macro (or a very similar one) but I don't see a way to have such a const value modified by const code in multiple places.
Is this possible or do I need some kind of build script that provides values before compile time?

Reasons for Dot Notation for Tuple

Is there any technical reason Rust is designed to use dot notation for tuples instead of using index notation (t[2])?
let t = (20u32, true, 'b')
t.2 // -> 'b'
Dot notation seems natural in accessing struct's and object's properties. I couldn't find a resource or explanation online.

I had no part in the design decisions, but here's my perspective:
Tuples contain mixed types. That is, the property type_of(t[i]) == type_of(t[j]) cannot be guaranteed.
However, conventional indexing works on the premise that the i in t[i] need not be a compile-time constant, which in turn means that the type of t[i] needs to be uniform for all possible i. This is true in all other rust collections that implement indexing. Specifically, rust types are made indexable through implementing the Index trait, defined as below:
pub trait Index<Idx> where Idx: ?Sized {
type Output: ?Sized;
fn index(&'a self, index: Idx) -> &'a Self::Output;
}
So if you wanted a tuple to implement indexing, what type should Self::Output be? The only way to pull this off would be to make Self::Output an enum, which means that element accesses would have to be wrapped around a useless match t[i] clause (or something similar) on the programmer's side, and you'll be catching type errors at runtime instead of compile-time.
Furthermore, you now have to implement bounds-checking, which is again a runtime error, unless you're clever in your tuple implementation.
You could bypass these issues by requiring that the index by a compile-time constant, but at that point tuple item accesses are pretending to behave like a normal index operation while actually behaving inconsistently with respect to all other rust containers, and there's nothing good about that.

This decision was made in RFC 184. The Motivation section has details:
Right now accessing fields of tuples and tuple structs is incredibly painful—one must rely on pattern-matching alone to extract values. This became such a problem that twelve traits were created in the standard library (core::tuple::Tuple*) to make tuple value accesses easier, adding .valN(), .refN(), and .mutN() methods to help this. But this is not a very nice solution—it requires the traits to be implemented in the standard library, not the language, and for those traits to be imported on use. On the whole this is not a problem, because most of the time std::prelude::* is imported, but this is still a hack which is not a real solution to the problem at hand. It also only supports tuples of length up to twelve, which is normally not a problem but emphasises how bad the current situation is.
The discussion in the associated pull request is also useful.

The reason for using t.2 syntax instead of t[2] is best explained in this comment:
Indexing syntax everywhere else has a consistent type, but a tuple is heterogenous so a[0] and a[1] would have different types.

I want to provide an answer from my experience using a functional language (Ocaml) for the while since I've posted this question.
Apart from #rom1v reference, indexing syntax like a[0] everywhere else also used in some kind of sequence structure, of which tuples aren't. In Ocaml, for instance, a tuple (1, "one") is said to have type int * string, which conforms to the Cartesian product in mathematics (i.e., the plane is R^2 = R * R). Plus, accessing a tuple by nth index is considered unidiomatic.
Due to its polymorphic nature, a tuple can almost be thought of as a record / object, which often prefer dot notation like a.fieldName as a convention to access its field (except in language like Javascript, which treats objects like dictionaries and allows string literal access like a["fieldname"]. The only language I'm aware of that's using indexing syntax to access a field is Lua.
Personally, I think syntax like a.(0) tends to look better than a.0, but this may be intentionally (or not) awkward considering in most functional languages it is ideal to pattern-match a tuple instead of accessing it by its index. Since Rust is also imperative, syntax like a.10 can be a good reminder to pattern-match or "go use a struct" already.

Can I determine the zero value of generic types?

The closest I managed to find was the std::num::Int and std::num::Float traits, which define zero(). However, they are specific to primitive types.

No, because it doesn't make sense in general. In fact, there are several types where "zero" is very specifically not valid at all. For example, if you were to take an appropriately-sized zero value and transmute it into a Box, that would violate memory safety!
There's an alternative to "zero", which is the Default trait. It allows you to say Default::default() to get a type's "default" value, whatever that happens to be. However, there's no consistent, sensible definition of "default" for all types. As such, you can only use it for types which explicitly implement it.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string