Why does `Ord` require implementing `cmp`? - rust

As Ord requires PartialOrd, which has an implementation of partial_cmp, cmp can simply unwrap its result and return it. Why do we need a standalone cmp method in Ord? In other words, why is cmp required but not provided by Ord?
The flagged duplicate question is not a duplicate. This question is not about one trait being "automatically provided" by the other, but only the method inside the trait.

Related

Is transmuting (T, ()) to T safe?

I am implementing an alternative to a BTreeMap<K, V>. On top of this I'm building a BTreeSet, which is a wrapper around type MyBTreeSetContents<T> = MyBTreeMap<T, ()>.
Internally, leaf nodes of this BTree contain a Vec<(K, V)> of values.
In the case of the BTreeSet, this thus becomes a Vec<(K, ())>.
I want to provide a fast iterator over references of the values in the BTreeSet. An iterator that produces &T. But the best I can get so far without reaching for transmute is an iterator that produces &(T, ()).
So therefore the question:
Is the memory representation of K, (K, ) and (K, ()) the same?
Is it therefore OK to transmute between (K, ()) and K?
And by extension, is it OK to transmute the Vec<(K, ())> to a Vec<K>?
If there are alternative approaches that circumvent usage of std::mem::transmute all-together, those would of course also be very much appreciated!
No. As far as what is currently enforced, transmuting (T, ()) to T is not guaranteed. Tuples use the default representation which does not imply anything about the layout beyond what is said in The Rust Reference. Only #[repr(transparent)] will guarantee layout compatibility.
However, it will probably work and may eventually be guaranteed. From Structs and Tuples in the Unsafe Code Guidelines:
In general, an anonymous tuple type (T1..Tn) of arity N is laid out "as if" there were a corresponding tuple struct...
...
For the purposes of struct layout 1-ZST[1] fields are ignored.
In particular, if all but one field are 1-ZST, then the struct is equivalent to a single-field struct. In other words, if all but one field is a 1-ZST, then the entire struct has the same layout as that one field.
For example:
type Zst1 = ();
struct S1(i32, Zst1); // same layout as i32
[1] Types with zero size are called zero-sized types, which is abbreviated as "ZST". This document also uses the "1-ZST" abbreviation, which stands for "one-aligned zero-sized type", to refer to zero-sized types with an alignment requirement of 1.
If my understanding of this is correct, (K, ()) has the equivalent layout to K and thus can be transmuted safely. However, that will not extend to transmuting Vec<T> to Vec<U> as mentioned in Transmutes from the Rustonomicon:
Even different instances of the same generic type can have wildly different layout. Vec<i32> and Vec<u32> might have their fields in the same order, or they might not.
Unfortunately, you should take this with a grain of salt. The Unsafe Code Guidelines is an effort to recommend what unsafe code can rely on, but currently it only advertises itself as a work-in-progress and that any concrete additions to the language specification will be moved to the official Rust Reference. I say this "will probably work" because a facet of the guidelines is to document current behavior. But as of yet, no guarantee like this has been mentioned in the reference.

In Rust, what is !Unpin [duplicate]

This question already has answers here:
What does the exclamation point mean in a trait implementation?
(3 answers)
Closed 9 months ago.
I am unable to locate the documentation for !Unpin referred to here in the docs.
More generally, the ! operator seem to lack corresponding documentation regarding traits. Specifically, it seems to represent Not as in Not Unpin or perhaps Not Unpinable in this case. I suppose it is different from Pin in some way otherwise it would be redundant. Currently, searching for the documentation is challenging since ! occurs so frequently otherwise.
It would be good if the operator behavior of ! on traits could be included in Appendix B: Operators and Symbols of the docs.
Unpin is one of several auto-traits, which are implemented automatically for any type that's compatible with it. And in the case of Unpin, that's, well, basically all of the types.
Auto-traits (and only auto-traits) can have negative implementations written by preceding the trait name with a !.
// By default, A implements Unpin
struct A {}
// But wait! Don't do that! I know something you don't, compiler.
impl !Unpin for A {}
Unpin, specifically, indicates to Rust that it is safe to move values of the implementing type T out of a Pin. Normally, Pin indicates that the thing inside shall not be moved. Unpin is the sort of opposite of that, which says "I know we just pinned this value, but I, as the writer of this type, know it's safe to move it anyway".
Generally, anything in Safe Rust is Unpin. The only time you'd want to mark something as !Unpin is if it's interfacing with something in another language like C. If you have a datatype that you're storing pointers to in C, for instance, then the C code may be written on the assumption that the data never changes addresses. In that case, you'd want to negate the Unpin implementation.

inverse of EnumConstant as u32

Say I have
enum Grades { A, B, C, D, E, F }
Then I can easily use the associated value for each grade with
Grades::C as u32
However, there seems to be no opposite way. I understand that, in general, this would be a partial function, as, for example
42 as Grades
makes no sense. However, what is the idiomatic way to do this? Is it
impl From<u32> for Grades { ... }
with a panic! if the argument is invalid? Or even
impl From<u32> for Option<Grades> { ... }
A related question, is there something like an Enum trait? As the need to convert between integer and enumerations often arises because (to my bloody beginner knowledge) there is apparently nothing that provides the functionality of Haskells Enum and Bounded typeclasses. Hence, if I need a range of Grades B..E, or the next Grade (succ B) or the previous Grade (pred E) I feel out of luck.
(The suggested question doesn't answer my question fully, since it occured to me that the deeper reason I need this functionality in the first place is most often "missing" enum functionality - that is, "missing" from the standpoint of a Haskell programmer. This is what the second part is all about.)
Although the easiest solution is probably the enum_primitive crate (that has already been suggested), the idiomatic way to implement this yourself would be the TryFrom trait from the standard library. This is also very close to what you have already suggested in your question. The only difference is, that this returns a Result so that you don't have to panic! when a wrong input is given.
There is a enum_primitive crate that exports a macro enum_from_primitive! that automatically derives FromPrimitive which seems to do what you want.

How to break on Err creation

When debugging a Rust program is it possible to break execution when an Err() value is created?
This would serve the same purpose as breaking on exceptions in other languages (C++, Javascript, Java, etc.) in that it shows you the actual source of the error, rather than just the place where you unwrapped it, which is not usually very useful.
I'm using LLDB but interested in answers for any debugger. The Err I am interested in is generated deep in Serde so I cannot really modify any of the code.
I'll try give this one a shot.
I believe you want to accomplish is incompatible with how the (current) "one true Rust implementation" is currently constructed and its take on "enum constructors" without some serious hacks -- and I'll give my best inference about why (as of the time of writing - Thu Sep 22 00:58:49 UTC 2022), and give you some ideas and options.
Breaking it down: finding definitions
"What happens when you "construct" an enum, anyways...?"
As Rust does not have a formal language standard or specification document, its "semantics" are not particularly precisely defined, so there is no "legal" text to really provide the "Word of God" or final authority on this topic.
So instead, let's refer to community materials and some code:
Constructors - The Rustonomicon
There is exactly one way to create an instance of a user-defined type: name it, and initialize all its fields at once:
...
That's it. Every other way you make an instance of a type is just calling a totally vanilla function that does some stuff and eventually bottoms out to The One True Constructor.
Unlike C++, Rust does not come with a slew of built-in kinds of constructor. There are no Copy, Default, Assignment, Move, or whatever constructors. The reasons for this are varied, but it largely boils down to Rust's philosophy of being explicit.
Move constructors are meaningless in Rust because we don't enable types to "care" about their location in memory. Every type must be ready for it to be blindly memcopied to somewhere else in memory. This means pure on-the-stack-but-still-movable intrusive linked lists are simply not happening in Rust (safely).
In comparison to C++'s better-specified semantics for both enum class constructors and std::Variant<T...> (its closest analogue to Rust enum), Rust does not really say anything about "enum constructors" in-specific except that it's just part of "The One True Constructor."
The One True Constructor is not really a well-specified Rust concept. It's not really commonly used in any of its references or books, and it's not a general programming language theory concept (at least, by that exact name -- it's most-likely referring to type constructors, which we'll get to) -- but you can eke out its meaning by reading more and comparison to the programming languages that Rust takes direct inspiration from.
In fact, where C++ might have move, copy, placement new and other types of constructors, Rust simply has a sort of universal "dumb value constructor" for all values (like struct and enum) that does not have special operational semantics besides something like "create the value, wherever it might be stored in memory".
But that's not very precise at all. What if we try to look at the definition of an enum?
Defining an Enum - The Rust Programming Language
...
We attach data to each variant of the enum directly, so there is no need for an extra struct. Here it’s also easier to see another detail of how enums work: the name of each enum variant that we define also becomes a function that constructs an instance of the enum. That is, IpAddr::V4() is a function call that takes a String argument and returns an instance of the IpAddr type. We automatically get this constructor function defined as a result of defining the enum.
Aha! They dropped the words "constructor function" -- so it's pretty much something like a fn(T, ...) -> U or something? So is it some sort of function? Well, as a generally introductory text to Rust, The Rust Programming Language book can be thought as less "technical" and "precise" than The Rust Reference:
Enumerated types - The Rust Reference
An enumerated type is a nominal, heterogeneous disjoint union type, denoted by the name of an enum item. ^1 ...
...
Enum types cannot be denoted structurally as types, but must be denoted by named reference to an enum item.
...
Most of this is pretty standard -- most modern programming languages have "nomimal types" (the type identifier matters for type comparison) -- but the footnote here is the interesting part:
The enum type is analogous to a data constructor declaration in ML, or a pick ADT in Limbo.
This is a good lead! Rust is known for taking a large amount of inspiration from functional programming languages, which are much closer to the mathematical foundations of programming languages.
ML is a whole family of functional programming languages (e.g. OCaml, Standard ML, F#, and sometimes Haskell) and is considered one of the important defining language-families within the functional programming language space.
Limbo is an older concurrent programming language with support for abstract data types, of which enum is one of.
Both are strongly-rooted in the functional programming language space.
Summary: Rust enum in Functional Programming / Programming Language Theory
For brevity, I'll omit quotes and give a summary of the formal programming language theory behind Rust enum's.
Rust enum's are theoretically known as "tagged unions" or "sum types" or "variants".
Functional programming and mathematical type theory place a strong emphasis on modeling computation as basically "changes in typed-value structure" versus "changes in data state".
So, in object-oriented programming where "everything is an [interactable] object" that then send messages or interact with each other...
-- in functional programming, "everything is a pure [non-mutative] value" that is then "transformed" without side effects by "mathematically-pure functions" .
In fact, type theory goes as far as to say "everything is a type" -- they'll do stuff like mock-up the natural numbers by constructing some sort of mathematical recursive type that has properties like the natural numbers.
To construct "[typed] values" as "structures," mathematical type theory defines a fundamental concept called a "type constructor" -- and you can think of type constructors as being just a Rust () and compositions of such.
So functional/mathematical type constructors are not intended to "execute" or have any other behavior. They are simply there to "purely construct the structure of pure data."
Conclusion: "Rust doesn't want you to inject a breakpoint into data"
Per Rust's theoretical roots and inspiring influences, Rust enum type constructors are meant to be functional and only to wrap and create type-tagged data.
In other words, Rust doesn't really want to allow you to "inject" arbitrary logic into type constructors (unlike C++, which has a whole slew of semantics regarding side effects in constructors, such as throwing exceptions, etc.).
They want to make injecting a breakpoint into Err(T) sort of like injecting a breakpoint into 1 or as i32. Err(T) is more of a "data primitive" rather than a "transforming function/computation" like if you were to call foo(123).
In Code: why it's probably hard to inject a breakpoint in Err().
Let's start by looking at the definition of Err(T) itself.
The Definition of std::result::Result::Err()
Here's is where you can find the definition of Err() directly from rust-lang/rust/library/core/src/result.rs # v1.63.0 on GitHub:
// `Result` is a type that represents either success ([`Ok`]) or failure ([`Err`]).
///
/// See the [module documentation](self) for details.
#[derive(Copy, PartialEq, PartialOrd, Eq, Ord, Debug, Hash)]
#[must_use = "this `Result` may be an `Err` variant, which should be handled"]
#[rustc_diagnostic_item = "Result"]
#[stable(feature = "rust1", since = "1.0.0")]
pub enum Result<T, E> {
/// Contains the success value
#[lang = "Ok"]
#[stable(feature = "rust1", since = "1.0.0")]
Ok(#[stable(feature = "rust1", since = "1.0.0")] T),
/// Contains the error value
#[lang = "Err"]
#[stable(feature = "rust1", since = "1.0.0")]
Err(#[stable(feature = "rust1", since = "1.0.0")] E),
}
Err() is just a sub-case of the greater enum std::result::Result<T, E> -- and this means that Err() is not a function, but more of like a "data tagging constructor".
Err(T) in assembly is meant to be optimized out completely
Let's use Godbolt to breakdown usage of std::result::Result::<T, E>::Err(E): https://rust.godbolt.org/z/oocqGj5cd
// Type your code here, or load an example.
pub fn swap_err_ok(r: Result<i32, i32>) -> Result<i32, i32> {
let swapped = match r {
Ok(i) => Err(i),
Err(e) => Ok(e),
};
return swapped;
}
example::swap_err_ok:
sub rsp, 16
mov dword ptr [rsp], edi
mov dword ptr [rsp + 4], esi
mov eax, dword ptr [rsp]
test rax, rax
je .LBB0_2
jmp .LBB0_5
.LBB0_5:
jmp .LBB0_3
ud2
.LBB0_2:
mov eax, dword ptr [rsp + 4]
mov dword ptr [rsp + 12], eax
mov dword ptr [rsp + 8], 1
jmp .LBB0_4
.LBB0_3:
mov eax, dword ptr [rsp + 4]
mov dword ptr [rsp + 12], eax
mov dword ptr [rsp + 8], 0
.LBB0_4:
mov eax, dword ptr [rsp + 8]
mov edx, dword ptr [rsp + 12]
add rsp, 16
ret
Here is the (unoptimized) assembly code that corresponds to the line Ok(i) => Err(i), that constructs the Err:
mov dword ptr [rsp + 12], eax
mov dword ptr [rsp + 8], 1
and Err(e) is basically optimized out if you optimized with -C optlevel=3:
example::swap_err_ok:
mov edx, esi
xor eax, eax
test edi, edi
sete al
ret
Unlike in C++, where C++ leaves room to allow for injection of arbitrary logic in constructors and to even to represent actions like locking a mutex, Rust discourages this in the name of optimization.
Rust is designed to discourage inserting computation in type constructor calls -- and, in fact, if there is no computation associated with a constructor, it should have no operational value or action at the machine-instruction level.
Is there any way this is possible?
If you're still here, you really want a way to do this even though it goes against Rust's philosophy.
"...And besides, how hard can it be? If gcc and MSVC can instrument ALL functions with tracing at the compiler-level, can't rustc do the same?..."
I answered a related StackOverflow question like this in the past: How to build a graph of specific function calls?
In general, you have 2 strategies:
Instrument your application with some sort of logging/tracing framework, and then try to replicate some sort of tracing mixin-like functionality to apply global/local tracing depending on which parts of code you apply the mixins.
Recompile your code with some sort of tracing instrumentation feature enabled for your compiler or runtime, and then use the associated tracing compiler/runtime-specific tools/frameworks to transform/sift through the data.
For 1, this will require you to manually insert more code or something like _penter/_pexit for MSVC manually or create some sort of ScopedLogger that would (hopefully!) log async to some external file/stream/process. This is not necessarily a bad thing, as having a separate process control the trace tracking would probably be better in the case where the traced process crashes. Regardless, you'd probably have to refactor your code since C++ does not have great first-class support for metaprogramming to refactor/instrument code at a module/global level. However, this is not an uncommon pattern anyways for larger applications; for example, AWS X-Ray is an example of a commercial tracing service (though, typically, I believe it fits the use case of tracing network calls and RPC calls rather than in-process function calls).
For 2, you can try something like utrace or something compiler-specific: MSVC has various tools like Performance Explorer, LLVM has XRay, GCC has gprof. You essentially compile in a sort of "debug++" mode or there is some special OS/hardware/compiler magic to automatically insert tracing instructions or markers that help the runtime trace your desired code. These tracing-enabled programs/runtimes typically emit to some sort of unique tracing format that must then be read by a unique tracing format reader.
However, because Err(T) is a [data]type constructor and not really a first-class fn, this means that Err(T) will most likely NOT be instrumented like a usual fn call. Usually compilers with some sort of "instrumentation mode" will only inject "instrumentation code" at function-call boundaries, but not at data-creation points generically.
What about replacing std:: with an instrumented version such that I can instrument std::result::Result<T, E> itself? Can't I just link-in something?
Well, Err(T) simply does not represent any logical computation except the creation of a value, and so, there is no fn or function pointer to really replace or switch-out by replacing the standard library. It's not really part of the surface language-level interface of Rust to do something like this.
So now what?
If you really specifically need this, you would want a custom compiler flag or mode to inject custom instrumentation code every-time you construct an Err(T) data type -- and you would have to rebuild every piece of Rust code where you want it.
Possible Options
Do a text-replace or macro-replacement to turn every usage of /Err(.*)/ in your application code that you want to instrument into your own macro or fn call (to represent computation in the way Rust wants), and inject your own type of instrumentation (probably using either log or tracing crates).
Find or ask for a custom instrumentation flag on rustc that can generate specific assembly/machine-code to instrument per every usage of Err(T).
Yes, it is possible to break execution when an Err() value is created. This can be done by using the debugger to break on the Err() function, and then inspecting the stack trace to find the point at which the Err() value was created.

Can we automatically derive a user-defined trait? [duplicate]

This question already has answers here:
Is it possible to add your own derivable traits, or are these fixed by the compiler?
(2 answers)
Closed 5 years ago.
I have a struct like this
#[derive(CustomTrait)]
struct Sample {
v: Vec<u8>,
}
and my trait goes like this
trait CustomTrait {...}
Can I do the above stuff? It threw an error for me.
I want something similar to the Clone trait. Is this possible with Rust?
#[derive(Foo, Bar)] is sugar for #[derive_Foo] #[derive_Bar], so it is possible to implement your own decorator attribute in the same way as #[derive_Clone] is, but this requires you to write a compiler plugin, which is not a stable part of Rust and will not be stable in 1.0 (and will thus be unavailable in the stable and beta channels).
There is a little documentation on such matters in the book, but not much; you’re largely on your own with it.
Bear in mind that what you can actually do at that stage is limited; you have access to the struct definition only, and know nothing about the actual types mentioned. This is a good fit for all of the traits for which #[derive] support is built in, but is not for many other traits.
No, you can't. derive instructs the compiler to provide a basic implementation of the trait. You can't expect the compiler to magically know how to implement a user-defined trait.
You can only use derive with these traits (taken from http://rustbyexample.com/trait/derive.html):
Comparison traits: Eq, PartialEq, Ord, PartialOrd
Serialization: Encodable, Decodable
Clone, to create T from &T via a copy.
Hash, to compute a hash from &T.
Rand, to create a random instance of a data type.
Default, to create an empty instance of a data type.
Zero, to create a zero instance of a numeric data type.
FromPrimitive, to create an instance from a numeric primitive.
Debug, to format a value using the {:?} formatter.
NOTE: Apparently this was proposed and is being discussed here

Resources