Is there any technical reason Rust is designed to use dot notation for tuples instead of using index notation (t[2])?
let t = (20u32, true, 'b')
t.2 // -> 'b'
Dot notation seems natural in accessing struct's and object's properties. I couldn't find a resource or explanation online.
I had no part in the design decisions, but here's my perspective:
Tuples contain mixed types. That is, the property type_of(t[i]) == type_of(t[j]) cannot be guaranteed.
However, conventional indexing works on the premise that the i in t[i] need not be a compile-time constant, which in turn means that the type of t[i] needs to be uniform for all possible i. This is true in all other rust collections that implement indexing. Specifically, rust types are made indexable through implementing the Index trait, defined as below:
pub trait Index<Idx> where Idx: ?Sized {
type Output: ?Sized;
fn index(&'a self, index: Idx) -> &'a Self::Output;
}
So if you wanted a tuple to implement indexing, what type should Self::Output be? The only way to pull this off would be to make Self::Output an enum, which means that element accesses would have to be wrapped around a useless match t[i] clause (or something similar) on the programmer's side, and you'll be catching type errors at runtime instead of compile-time.
Furthermore, you now have to implement bounds-checking, which is again a runtime error, unless you're clever in your tuple implementation.
You could bypass these issues by requiring that the index by a compile-time constant, but at that point tuple item accesses are pretending to behave like a normal index operation while actually behaving inconsistently with respect to all other rust containers, and there's nothing good about that.
This decision was made in RFC 184. The Motivation section has details:
Right now accessing fields of tuples and tuple structs is incredibly painful—one must rely on pattern-matching alone to extract values. This became such a problem that twelve traits were created in the standard library (core::tuple::Tuple*) to make tuple value accesses easier, adding .valN(), .refN(), and .mutN() methods to help this. But this is not a very nice solution—it requires the traits to be implemented in the standard library, not the language, and for those traits to be imported on use. On the whole this is not a problem, because most of the time std::prelude::* is imported, but this is still a hack which is not a real solution to the problem at hand. It also only supports tuples of length up to twelve, which is normally not a problem but emphasises how bad the current situation is.
The discussion in the associated pull request is also useful.
The reason for using t.2 syntax instead of t[2] is best explained in this comment:
Indexing syntax everywhere else has a consistent type, but a tuple is heterogenous so a[0] and a[1] would have different types.
I want to provide an answer from my experience using a functional language (Ocaml) for the while since I've posted this question.
Apart from #rom1v reference, indexing syntax like a[0] everywhere else also used in some kind of sequence structure, of which tuples aren't. In Ocaml, for instance, a tuple (1, "one") is said to have type int * string, which conforms to the Cartesian product in mathematics (i.e., the plane is R^2 = R * R). Plus, accessing a tuple by nth index is considered unidiomatic.
Due to its polymorphic nature, a tuple can almost be thought of as a record / object, which often prefer dot notation like a.fieldName as a convention to access its field (except in language like Javascript, which treats objects like dictionaries and allows string literal access like a["fieldname"]. The only language I'm aware of that's using indexing syntax to access a field is Lua.
Personally, I think syntax like a.(0) tends to look better than a.0, but this may be intentionally (or not) awkward considering in most functional languages it is ideal to pattern-match a tuple instead of accessing it by its index. Since Rust is also imperative, syntax like a.10 can be a good reminder to pattern-match or "go use a struct" already.
Related
In C++, it is possible to customize the code std::set uses to sort its arguments. By default it uses std::less, but that can be changed with the Compare template parameter.
Rust's BTreeSet uses the Ord trait to sort the type. I don't know of a way to override this behavior -- it's built into the type constraint of the type stored by the container.
However, it often makes sense to build a list of items that are sorted by some locally-useful metric that nevertheless is not the best way to always compare the items by. Or, suppose I would like to sort items of a used type; in this case, it's impossible to implement Ord myself for the type, even if I want to.
The workaround is of course to build a plain old Vec of the items and sort it afterward. But in my opinion, this is not as clean as automatically ordering them on insertion.
Is there a way to use alternative comparators with Rust's container types?
Custom comparators currently do not exist in the Rust standard collections. The idiomatic way to solve the issue is to define a newtype:
struct Wrapper(Wrapped);
You can then define a custom Ord implementation for Wrapper with exactly the semantics you want.
Furthermore, since you have a newtype, you can also easily implement other traits to facilitate conversion:
convert::From can be implemented, giving you convert::Into for free
ops::Deref<Target = Wrapped> can be implemented, reducing the need for mapping due to auto-deref
Note that accessing the wrapped entity is syntactically lightweight as it's just two characters: .0.
In Rust, vectors are indexed using usize, so when writing
let my_vec: Vec<String> = vec!["Hello", "world"];
let index: u32 = 0;
println!("{}", my_vec[index]);
you get an error, as index is expected to be of type usize. I'm aware that this can be fixed by explicitly converting index to usize:
my_vec[index as usize]
but this is tedious to write. Ideally I'd simply overload the [] operator by implementing
impl<T> std::ops::Index<u32> for Vec<T> { ... }
but that's impossible as Rust prohibits this (as neither the trait nor struct are local). The only alternative that I can see is to create a wrapper class for Vec, but that would mean having to write lots of function wrappers as well. Is there any more elegant way to address this?
Without a clear use case it is difficult to recommend the best approach.
There are basically two questions here:
do you really need indexing?
do you really need to use u32 for indices?
When using functional programming style, indexing is generally unnecessary as you operate on iterators instead. In this case, the fact that Vec only implements Index for usize really does not matter.
If your algorithm really needs indexing, then why not use usize? There are many ways to convert from u32 to usize, converting at the last moment possible is one possibility, but there are other sites where you could do the conversion, and if you find a chokepoint (or create it) you can get away with only a handful of conversions.
At least, that's the YAGNI point of view.
Personally, as a type freak, I tend to wrap things around a lot. I just like to add semantic information, because let's face it Vec<i32> just doesn't mean anything.
Rust offers a simple way to create wrapper structures: struct MyType(WrappedType);. That's it.
Once you have your own type, adding indexing is easy. There are several ways to add other operations:
if only a few operations make sense, then adding explicitly is best.
if many operations are necessary, and you do not mind exposing the fact that underneath is a Vec<X>, then you can expose it:
by making it public: struct MyType(pub WrappedType);, users can then call .0 to access it.
by implementing AsRef and AsMut, or creating a getter.
by implementing Deref and DerefMut (which is implicit, make sure you really want to).
Of course, breaking encapsulation can be annoying later, as it also prevents the maintenance of invariants, so I would consider it a last ditch solution.
I prefer to store "references" to nodes as u32 rather than usize. So when traversing the graph I keep retrieving adjacent vertex "references", which I then use to look up the actual vertex object in the Vec object
So actually you don't want u32, because you will never do calculations on it, and u32 easily allows you to do math. You want an index-type that can just do indexing but whose values are immutable otherwise.
I suggest you implement something along the line of rustc_data_structures::indexed_vec::IndexVec.
This custom IndexVec type is not only generic over the element type, but also over the index type, and thus allows you to use a NodeId newtype wrapper around u32. You'll never accidentally use a non-id u32 to index, and you can use them just as easily as a u32. You don't even have to create any of these indices by calculating them from the vector length, instead the push method returns the index of the location where the element has just been inserted.
In C++, you have the ability to pass integrals inside templates
std::array<int, 3> arr; //fixed size array of 3
I know that Rust has built in support for this, but what if I wanted to create something like linear algebra vector library?
struct Vec<T, size: usize> {
data: [T; size],
}
type Vec3f = Vec<f32, 3>;
type Vec4f = Vec<f32, 4>;
This is currently what I do in D. I have heard that Rust now has Associated Constants.
I haven't used Rust in a long time but this doesn't seem to address this problem at all or have I missed something?
As far as I can see, associated constants are only available in traits and that would mean I would still have to create N vector types by hand.
No, associated constants don't help and aren't intended to. Associated anything are outputs while use cases such as the one in the question want inputs. One could in principle construct something out of type parameters and a trait with associated constants (at least, as soon as you can use associated constants of type parameters — sadly that doesn't work yet). But that has terrible ergonomics, not much better than existing hacks like typenum.
Integer type parameters are highly desired since, as you noticed, they enable numerous things that aren't really feasible in current Rust. People talk about this and plan for it but it's not there yet.
Integer type parameters are not supported as of now, however there's an RFC for that IIRC, and a long-standing discussion.
You could use typenum crate in the meanwhile.
In C++, it is possible to customize the code std::set uses to sort its arguments. By default it uses std::less, but that can be changed with the Compare template parameter.
Rust's BTreeSet uses the Ord trait to sort the type. I don't know of a way to override this behavior -- it's built into the type constraint of the type stored by the container.
However, it often makes sense to build a list of items that are sorted by some locally-useful metric that nevertheless is not the best way to always compare the items by. Or, suppose I would like to sort items of a used type; in this case, it's impossible to implement Ord myself for the type, even if I want to.
The workaround is of course to build a plain old Vec of the items and sort it afterward. But in my opinion, this is not as clean as automatically ordering them on insertion.
Is there a way to use alternative comparators with Rust's container types?
Custom comparators currently do not exist in the Rust standard collections. The idiomatic way to solve the issue is to define a newtype:
struct Wrapper(Wrapped);
You can then define a custom Ord implementation for Wrapper with exactly the semantics you want.
Furthermore, since you have a newtype, you can also easily implement other traits to facilitate conversion:
convert::From can be implemented, giving you convert::Into for free
ops::Deref<Target = Wrapped> can be implemented, reducing the need for mapping due to auto-deref
Note that accessing the wrapped entity is syntactically lightweight as it's just two characters: .0.
What does TypeState refer to in respect to language design? I saw it mentioned in some discussions regarding a new language by mozilla called Rust.
Note: Typestate was dropped from Rust, only a limited version (tracking uninitialized and moved from variables) is left. See my note at the end.
The motivation behind TypeState is that types are immutable, however some of their properties are dynamic, on a per variable basis.
The idea is therefore to create simple predicates about a type, and use the Control-Flow analysis that the compiler execute for many other reasons to statically decorate the type with those predicates.
Those predicates are not actually checked by the compiler itself, it could be too onerous, instead the compiler will simply reasons in terms of graph.
As a simple example, you create a predicate even, which returns true if a number is even.
Now, you create two functions:
halve, which only acts on even numbers
double, which take any number, and return an even number.
Note that the type number is not changed, you do not create a evennumber type and duplicate all those functions that previously acted on number. You just compose number with a predicate called even.
Now, let's build some graphs:
a: number -> halve(a) #! error: `a` is not `even`
a: number, even -> halve(a) # ok
a: number -> b = double(a) -> b: number, even
Simple, isn't it ?
Of course it gets a bit more complicated when you have several possible paths:
a: number -> a = double(a) -> a: number, even -> halve(a) #! error: `a` is not `even`
\___________________________________/
This shows that you reason in terms of sets of predicates:
when joining two paths, the new set of predicates is the intersection of the sets of predicates given by those two paths
This can be augmented by the generic rule of a function:
to call a function, the set of predicates it requires must be satisfied
after a function is called, only the set of predicates it established is satisfied (note: arguments taken by value are not affected)
And thus the building block of TypeState in Rust:
check: checks that the predicate holds, if it does not fail, otherwise adds the predicate to set of predicates
Note that since Rust requires that predicates are pure functions, it can eliminate redundant check calls if it can prove that the predicate already holds at this point.
What Typestate lack is simple: composability.
If you read the description carefully, you will note this:
after a function is called, only the set of predicates it established is satisfied (note: arguments taken by value are not affected)
This means that predicates for a types are useless in themselves, the utility comes from annotating functions. Therefore, introducing a new predicate in an existing codebase is a bore, as the existing functions need be reviewed and tweaked to cater to explain whether or not they need/preserve the invariant.
And this may lead to duplicating functions at an exponential rate when new predicates pop up: predicates are not, unfortunately, composable. The very design issue they were meant to address (proliferation of types, thus functions), does not seem to be addressed.
It's basically an extension of types, where you don't just check whether some operation is allowed in general, but in this specific context. All that at compile time.
The original paper is actually quite readable.
There's a typestate checker written for Java, and Adam Warski's explanatory page gives some useful information. I'm only just figuring this material out myself, but if you are familiar with QuickCheck for Haskell, the application of QuickCheck to monadic state seems similar: categorise the states and explain how they change when they are mutated through the interface.
Typestate is explained as:
leverage type system to encode state changes
Implemented by creating a type for each state
Use move semantics to invalidate a state
Return the next state from the previous state
Optionally drop the state(close file, connections,...)
Compile time enforcement of logic
struct Data;
struct Signed;
impl Data {
fn sign(self) -> Signed {
Signed
}
}
let data = Data;
let singed = data.sign();
data.sign() // Compile error