Understanding Enums in Rust - rust

I am having a hard time understanding the Enum custom type in Rust. In a broad way, The Book describes an Enum as a custom data type that has different variants. How should I think about these variants? Are these sub-types, or are these specific values that the Enum type can take?
Looking online I see examples like:
enum Day {
Monday,
Tuesday,
Wednesday,
Thursday,
Friday,
Saturday,
Sunday,
}
In the case above, the variants are the possible values of the Day type. But in The Book we see examples like:
struct Ipv4Addr {
// --snip--
}
struct Ipv6Addr {
// --snip--
}
enum IpAddr {
V4(Ipv4Addr),
V6(Ipv6Addr),
}
To me, it seems like IpAddr::V4 is a sub-type rather than a specific value, but a variable is really of type IpAddr and can have a value IpAddr::V4.
Does it make sense to make the distinction I mention above? What is the correct interpretation of an Enum?

Rust is a static, strongly typed language. It is also very fast. In many cases it is more efficient to use the stack, instead of the heap. However, when you use the stack Rust must know the size of the data that is needed. That's not a problem for simple fixed types like i16, u128, etc. It also isn't a problem for tuples, structs or arrays, because they have a fixed data structure with a known size.
However, sometimes you will need to use different data types, depending on some runtime condition/state. In languages like Java, .NET, JS, Python, PHP, etc., in such situations you will be using the heap (one way or another). In Rust you also have ways to use the heap, but that's often suboptimal. Enums in Rust allow you to define additional, variant-specific fields with custom data types. That can be very flexible and at the same time, in many cases, would be faster than solutions that make use of the heap.
Note that in languages like Java, you would often end up creating a hierarchy of classes to achieve what you can do in Rust with enums. Both approaches have their pros and cons. But if you come from a language like Java, you should keep that in mind.
Maybe a good example would be to think about how you would represent a JSON in your language of choice. If the JSON has a fixed data structure, you can use standard structs in Rust, classes in Java, etc. But what if you don't know the structure of a JSON object in advance? In most modern languages the parser would create some sort of a (Linked)HashMap that contains strings for the keys and some object instances (integers, strings, lists, maps, etc.) for the values. Compare that to serde's Value enum. Another example, which is not for JSON, but is conceptually similar in that you can read data of different types, is mysql's Value.
It might also be useful to understand how Rust allocates memory for enums. It basically determines (at compile time, of course) of all the variants, which one needs most memory. Let's say variant A needs 12 bytes, variant B needs 16 bytes, variant C needs 4 bytes. Rust will allocate 16 bytes for the associated data of every enum instance, because that's the minimum size that all variants can fit in.

It is reasonable to see the Day as a C-style enum. It describes all possible values of the type and has a numeric discriminant to identify each.
The IpAddr type is a tagged union. It is some tag (a number like in the c-style enum) followed by the value you give in brackets. It is not really a subtype, more a variant of IpAddr.
Once identified by its tag (which match and such do for you) you can use the values inside.

Related

Rust Storage of Option<i32>

How is an Option laid out in memory? Since a i32 already takes up an even number of bytes, it Rust forced to use a full byte to store the single bit None/Some?
EDIT: According to this answer, Rust in fact uses an extra 4 (!) bytes. Why?
For structs and enums declared without special layout modifiers, the Rust docs state
Nominal types without a repr attribute have the default representation. Informally, this representation is also called the rust representation.
There are no guarantees of data layout made by this representation.
Option cannot possibly be repr(transparent) or repr(i*) since it is neither a newtype struct nor a fieldless enum, and we can check the source code and see that it's not declared repr(C). So no guarantees are made about the layout.
If it were declared repr(C), then we'd get the C representation, which is what you're envisioning. We need one integer to indicate whether it's None or Some (which size of integer is implementation-defined) and then we need enough space to store the actual i32.
In reality, since Rust is given a lot of leeway here, it can do clever things. If you have a variable which is only ever Some, it needn't store the tag bit (and, again, no guarantees are made about layout, so it's free to make this change internally). If you have an i32 that starts at 0 and goes up to 10, it's provably never negative, so Rust might choose to use, say, -1 to indicate None.

Runtime Building: String not found in this scope

A common problem substrate developers might run into: developing a custom pallet to store the mapping into storage with common types, such as String. As an example:
#[derive(Encode, Decode, Clone, Default, RuntimeDebug)]
pub struct ClusterMetadata {
ip_address: String,
namespace: String,
whitelisted_ips: String,
}
On building the runtime, you get this error for every String:
|
21 | ip_address: String,
| ^^^^^^ not found in this scope
Why are Strings not included in scope? And other std rust types?
The error here is not related to no_std, so you probably just need to import the String type to get the real errors with using strings in the runtime.
The real issue you will find is that String is not encodable by Parity SCALE Codec, which is obviously a requirement for any storage item (or most any type you want to use) in the runtime.
So the question is "Why does SCALE not encode String"?
This is by choice. In general, String is surprisingly complex type. The Rust book spends a whole section talking about the complexities of the type.
As such, it can easily become a footgun within the runtime environment that people use Strings incorrectly.
Furthermore, it is generally bad practice to store Strings in runtime storage. I think we can easily agree that minimizing storage usage in the runtime is a best practice, and thus you should only put into storage items which you need to be able to derive consensus and state transitions in your runtime. Most often, String data would be used for metadata, and this kind of usage is not best practice.
If you look more closely at Substrate, you will find that we break this best practice more than once, but this is a decision we explicitly make, having the information at hand to be able to correctly evaluate the cost/benefit.
All of this combined is why Strings are not treated as a first class object in the runtime. Instead, we ask users to encode strings into bytes, and then work with that byte array instead.

How do you approach creating a complete new datatype on the "bit-level"?

I would like to create a new data type in Rust on the "bit-level".
For example, a quadruple-precision float. I could create a structure that has two double-precision floats and arbitrarily increase the precision by splitting the quad into two doubles, but I don't want to do that (that's what I mean by on the "bit-level").
I thought about using a u8-array or a bool-array but in both cases, I waste 7 bits of memory (because also bool is a byte large). I know there are several crates that implement something like bit-arrays or bit-vectors, but looking through their source code didn't help me to understand their implementation.
How would I create such a bit-array without wasting memory, and is this the way I would want to choose when implementing something like a quad-precision type?
I don't know how to implement new data types that don't use the basic types or are structures that combine the basic types, and I haven't been able to find a solution on the internet yet; maybe I'm not searching with the right keywords.
The question you are asking has no direct answer: Just like any other programming language, Rust has a basic set of rules for type layouts. This is due to the fact that (most) real-world CPUs can't address individual bits, need certain alignments when referencing memory, have rules regarding how pointer arithmetic works etc. etc.
For instance, if you create a type of just two bits, you'll still need an 8-bit byte to represent that type, because there is simply no way to address two individual bits on most CPU's opcodes; there is also no way to take the address of such a type because addressing works at least on the byte-level. More useful information regarding this can be found here, section 2, The Anatomy of a Type. Be aware that the non-wasting bit-level type you are thinking about needs to fulfill all the rules mentioned there.
It's a perfectly reasonable approach to represent what you want to do e.g. either as a single, wrapped u128 and implement all arithmetic on top of that type. Another, more generic, approach would be to use a Vec<u8>. You'll always do a relatively large amount of bit-masking, indirecting and such.
Having a look at rust_decimal or similar crates might also be a good idea.

Can associated constants be used to initialize the length of fixed size arrays?

In C++, you have the ability to pass integrals inside templates
std::array<int, 3> arr; //fixed size array of 3
I know that Rust has built in support for this, but what if I wanted to create something like linear algebra vector library?
struct Vec<T, size: usize> {
data: [T; size],
}
type Vec3f = Vec<f32, 3>;
type Vec4f = Vec<f32, 4>;
This is currently what I do in D. I have heard that Rust now has Associated Constants.
I haven't used Rust in a long time but this doesn't seem to address this problem at all or have I missed something?
As far as I can see, associated constants are only available in traits and that would mean I would still have to create N vector types by hand.
No, associated constants don't help and aren't intended to. Associated anything are outputs while use cases such as the one in the question want inputs. One could in principle construct something out of type parameters and a trait with associated constants (at least, as soon as you can use associated constants of type parameters — sadly that doesn't work yet). But that has terrible ergonomics, not much better than existing hacks like typenum.
Integer type parameters are highly desired since, as you noticed, they enable numerous things that aren't really feasible in current Rust. People talk about this and plan for it but it's not there yet.
Integer type parameters are not supported as of now, however there's an RFC for that IIRC, and a long-standing discussion.
You could use typenum crate in the meanwhile.

Reasons for Dot Notation for Tuple

Is there any technical reason Rust is designed to use dot notation for tuples instead of using index notation (t[2])?
let t = (20u32, true, 'b')
t.2 // -> 'b'
Dot notation seems natural in accessing struct's and object's properties. I couldn't find a resource or explanation online.
I had no part in the design decisions, but here's my perspective:
Tuples contain mixed types. That is, the property type_of(t[i]) == type_of(t[j]) cannot be guaranteed.
However, conventional indexing works on the premise that the i in t[i] need not be a compile-time constant, which in turn means that the type of t[i] needs to be uniform for all possible i. This is true in all other rust collections that implement indexing. Specifically, rust types are made indexable through implementing the Index trait, defined as below:
pub trait Index<Idx> where Idx: ?Sized {
type Output: ?Sized;
fn index(&'a self, index: Idx) -> &'a Self::Output;
}
So if you wanted a tuple to implement indexing, what type should Self::Output be? The only way to pull this off would be to make Self::Output an enum, which means that element accesses would have to be wrapped around a useless match t[i] clause (or something similar) on the programmer's side, and you'll be catching type errors at runtime instead of compile-time.
Furthermore, you now have to implement bounds-checking, which is again a runtime error, unless you're clever in your tuple implementation.
You could bypass these issues by requiring that the index by a compile-time constant, but at that point tuple item accesses are pretending to behave like a normal index operation while actually behaving inconsistently with respect to all other rust containers, and there's nothing good about that.
This decision was made in RFC 184. The Motivation section has details:
Right now accessing fields of tuples and tuple structs is incredibly painful—one must rely on pattern-matching alone to extract values. This became such a problem that twelve traits were created in the standard library (core::tuple::Tuple*) to make tuple value accesses easier, adding .valN(), .refN(), and .mutN() methods to help this. But this is not a very nice solution—it requires the traits to be implemented in the standard library, not the language, and for those traits to be imported on use. On the whole this is not a problem, because most of the time std::prelude::* is imported, but this is still a hack which is not a real solution to the problem at hand. It also only supports tuples of length up to twelve, which is normally not a problem but emphasises how bad the current situation is.
The discussion in the associated pull request is also useful.
The reason for using t.2 syntax instead of t[2] is best explained in this comment:
Indexing syntax everywhere else has a consistent type, but a tuple is heterogenous so a[0] and a[1] would have different types.
I want to provide an answer from my experience using a functional language (Ocaml) for the while since I've posted this question.
Apart from #rom1v reference, indexing syntax like a[0] everywhere else also used in some kind of sequence structure, of which tuples aren't. In Ocaml, for instance, a tuple (1, "one") is said to have type int * string, which conforms to the Cartesian product in mathematics (i.e., the plane is R^2 = R * R). Plus, accessing a tuple by nth index is considered unidiomatic.
Due to its polymorphic nature, a tuple can almost be thought of as a record / object, which often prefer dot notation like a.fieldName as a convention to access its field (except in language like Javascript, which treats objects like dictionaries and allows string literal access like a["fieldname"]. The only language I'm aware of that's using indexing syntax to access a field is Lua.
Personally, I think syntax like a.(0) tends to look better than a.0, but this may be intentionally (or not) awkward considering in most functional languages it is ideal to pattern-match a tuple instead of accessing it by its index. Since Rust is also imperative, syntax like a.10 can be a good reminder to pattern-match or "go use a struct" already.

Resources