The trait documentation says
Display is similar to Debug, but Display is for user-facing output, and so cannot be derived.
But what does that mean? Should it write the full string-encoded value even if that results in a 500 character output? Should it make a nice and friendly representation suitable for display in a user interface even if that results in to_string() not actually returning the full value as a string?
Let me illustrate:
Say I have a type that represents important data in my application. This data has a canonical string-encoding with a very specific format.
pub struct BusinessObject {
pub name: String,
pub reference: u32,
}
First, I want to implement Display so I can use it for making easily readable log messages:
impl Display for BusinessObject {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
// Formats like `Shampoglak (7384)`
write!(f, "{} ({})", self.name, self.reference)
}
}
Now, let's implement a method that returns the canonical standard string format for BusinessObject instances. As the as_str() method name is idiomatically only used when returning a string slice and that is not possible in this case, one could think that the most straightforward approach would be to implement a method to_string() for this.
impl BusinessObject {
fn to_string(&self) -> String {
// Formats like `Shampoglak00007384`
format!("{}{:0>8}", self.name, self.reference)
}
}
But no! This method name is already implemented as part of the automatic ToString trait implementation that we have because we implemented Display.
What does an idiomatic implementation of Display write? A full representation of a value as a string or a friendly, human-readable representation of it? How should I structure my code and name my methods if I need to implement both of those? I am specifically looking for a solution that can be applied generically and not just in this specific situation. I don't want to have to look up what the behavior of to_string() for a given struct is before I use it.
I didn't find anything in the documentation of associated traits and various Rust books and resources I looked into.
What does an idiomatic implementation of Display write? A full representation of a value as a string or a friendly, human-readable representation of it?
The latter: Display should produce a friendly, human-readable representation.
How should I structure my code and name my methods if I need to implement both of those?
"Full representations" of values as strings would more correctly be known as a string serialisation of the value. A method fn into_serialised_string(self) -> String would be one approach, but perhaps you want to consider a serialisation library like serde that separates the process of serialising (and deserialising) from the serialised format?
What does an idiomatic implementation of Display write?
It writes the obvious string form of that data. What exactly that will be depends on the data, and it may not exist for some types.
Such a string representation is not necessarily “human friendly”. For example, the Display implementation of serde_json::Value (a type which represents arbitrary JSON data structures) produces the JSON text corresponding to the value, by default without any whitespace for readability — because that's the conventional string representation.
Display should be implemented for types where you can say there is “the string representation of the data” — where there is one obvious choice. In my opinion, it should not be implemented for types where there isn't one obvious choice — instead, the API should omit it, in order to guide users of the type to think about which representation they want, rather than giving them a potentially bad default.
A full representation of a value as a string or a friendly, human-readable representation of it?
In my opinion, a Display implementation which truncates the data is incorrect. Display is for the string form of the data, not a name or snippet of the data.
How should I structure my code and name my methods if I need to implement both of those?
For convenient use in format strings, you can write one or more methods which return wrapper types that implements Display (like Path::display() in std).
Related
I am trying to implement a Serde Serializer/Deserializer for the Kafka wire protocol. In this protocol, there are 4 different string representations. This poses a problem when implementing the deserializer: when deserialize_str is called to attempt to deserialize a string for a given message, there’s no way to know whether a string starts with i32 or a varint, since neither the deserialize_str method or the provided visitor provides any kind of metadata or type information that could be used to help make this decision.
My first thought was that I could implement a new type wrapper and use a custom deserialize implementation, but I now understand this doesn’t make sense because it needs to be generic over all deserializers, not just the deserializer I’m building. The wrapper still just asks the Deserializer to read a string.
I’m struggling to come up with a good solution here, and can’t find examples of other data formats that use multiple representations for a given data type.
Here's the EXTREMELY HACKY way that I'm planning on solving this using a new type wrapper and custom Deserialize implementation. More specifically, my Deserialize impl will have it's own Visitor and we're going abuse Visitor here.
First I was hoping that I could use TypeId on Visitor::Value, but Visitor doesn't have a static lifetime, so that doesn't work.
Instead, I'm using the "expected" error message as a type tag, using the serde::de::Expected trait:
fn deserialize_str<V>(self, visitor: V) -> Result<V::Value>
where
V: Visitor<'de>,
{
use serde::de::Expected;
let ty = format!("{}", &visitor as &dyn Expected);
if ty == "NullableString" {
// ... do nullable dstring deserialization logic here
}
// ...
}
This... works! But feels extremely gross. Not totally satisfied, but goign to proceed with this for now.
I've an API that requires the object's fields to be sorted alphabetically because the struct has to be hashed.
In Java/Jackson, you can set a flag in the serializer: MapperFeature.SORT_PROPERTIES_ALPHABETICALLY. I can't find anything similar in Serde.
I'm using rmp-serde (MessagePack). It follows the annotations and serialization process used for JSON, so I thought that it would be fully compatible, but the sorting provided by #jonasbb doesn't work for it.
The struct has (a lot of) nested enums and structs which have to be flattened for the final representation. I'm using Serialize::serialize for that, but calling state.serialize_field at the right place (such that everything is alphabetical) is a pain because the enums need a match clause, so it has to be called multiple times for the same field in different places and the code is very difficult to follow.
As possible solutions, two ideas:
Create a new struct with the flat representation and sort the fields alphabetically manually.
This is a bit error prone, so a programmatic sorting solution for this flattened struct would be great.
Buffer the key values in Serialize::serialize (e.g. in a BTreeMap, which is sorted), and call state.serialize_field in a loop at the end.
The problem is that the values seem to have to be of type Serialize, which isn't object safe, so I wasn't able to figure out how to store them in the map.
How to sort HashMap keys when serializing with serde? is similar but not related because my question is about the sorting of the struct's fields/properties.
You are not writing which data format you are targetting. This makes it hard to find a solution, since some might not work in all cases.
This code works if you are using JSON (unless the preserve_order feature flag is used). The same would for for TOML by serializing into toml::Value as intermediate step.
The solution will also work for other data formats, but it might result in a different serialization, for example, emitting the data as a map instead of struct-like.
fn sort_alphabetically<T: Serialize, S: serde::Serializer>(value: &T, serializer: S) -> Result<S::Ok, S::Error> {
let value = serde_json::to_value(value).map_err(serde::ser::Error::custom)?;
value.serialize(serializer)
}
#[derive(Serialize)]
struct SortAlphabetically<T: Serialize>(
#[serde(serialize_with = "sort_alphabetically")]
T
);
#[derive(Serialize, Deserialize, Default, Debug)]
struct Foo {
z: (),
bar: (),
ZZZ: (),
aAa: (),
AaA: (),
}
println!("{}", serde_json::to_string_pretty(&SortAlphabetically(&Foo::default()))?);
because the struct has to be hashed
While field order is one source of indeterminism there are other factors too. Many formats allow different amounts of whitespace or different representations like Unicode escapes \u0066.
I wrote a program where I manipulated a lot of BigInt and BigUint values and perform some arithmetic operations.
I produced code where I frequently used BigInt::from(Xu8) because it is not possible to directly add numbers from different types (if I understand correctly).
I want to reduce the number of BigInt::from in my code. I thought about a function to "wrap" this, but I would need a function for each type I want to convert into BigInt/BigUint:
fn short_name(n: X) -> BigInt {
return BigInt::from(n)
}
Where X will be each type I want to convert.
I couldn't find any solution that is not in contradiction with the static typing philosophy of Rust.
I feel that I am missing something about traits, but I am not very comfortable with them, and I did not find a solution using them.
Am I trying to do something impossible in Rust? Am I missing an obvious solution?
To answer this part:
I produced code where I frequently used BigInt::from(Xu8) because it is not possible to directly add numbers from different types (if I understand correctly).
On the contrary, if you look at BigInt's documentation you'll see many impl Add:
impl<'a> Add<BigInt> for &'a u64
impl Add<u8> for BigInt
and so on. The first allows calling a_ref_to_u64 + a_bigint, the second a_bigint + an_u8 (and both set OutputType to be BigInt). You don't need to convert these types to BigInt before adding them! And if you want your method to handle any such type you just need an Add bound similar to the From bound in Frxstrem's answer. Of course if you want many such operations, From may end up more readable.
The From<T> trait (and the complementary Into<T> trait) is what is typically used to convert between types in Rust. In fact, the BigInt::from method comes from the From trait.
You can modify your short_name function into a generic function with a where clause to accept all types that BigInt can be converted from:
fn short_name<T>(n: T) -> BigInt // function with generic type T
where
BigInt: From<T>, // where BigInt implements the From<T> trait
{
BigInt::from(n)
}
Context
I have a pair of related structs in my program, Rom and ProfiledRom. They both store a list of u8 values and implement a common trait, GetRom, to provide access to those values.
trait GetRom {
fn get(&self, index: usize) -> u8;
}
The difference is that Rom just wraps a simple Vec<u8>, but ProfiledRom wraps each byte in a ProfiledByte type that counts the number of times it is returned by get.
struct Rom(Vec<u8>);
struct ProfiledRom(Vec<ProfiledByte>);
struct ProfiledByte {
value: u8;
get_count: u32;
};
Much of my program operates on trait GetRom values, so I can substitute in Rom or ProfiledRom type/value depending on whether I want profiling to occur.
Question
I have implemented From<Rom> for ProfiledRom, because converting a Rom to a ProfiledRom just involves wrapping each byte in a new ProfiledByte: a simple and lossless operation.
However, I'm not sure whether it's appropriate to implement From<ProfiledRom> for Rom, because ProfiledRom contains information (the get counts) that can't be represented in a Rom. If you did a round-trip conversion, these values would be lost/reset.
Is it appropriate to implement the From trait when only parts of the source object will be used?
Related
I have seen that the standard library doesn't implement integer conversions like From<i64> for i32 because these could result in bytes being truncated/lost. However, that seems like a somewhat distinct case from what we have here.
With the potentially-truncating integer conversion, you would need to inspect the original i64 to know whether it would be converted appropriately. If you didn't, the behaviour or your code could change unexpectedly when you get an out-of-bounds value. However, in our case above, it's always statically clear what data is being preserved and what data is being lost. The conversion's behaviour won't suddenly change. It should be safer, but is it an appropriate use of the From trait?
From implementations are usually lossless, but there is currently no strict requirement that they be.
The ongoing discussion at rust-lang/rfcs#2484 is related. Some possibilities include adding a FromLossy trait and more exactly prescribing the behaviour of From. We'll have to see where that goes.
For consideration, here are some Target::from(Source) implementations in the standard library:
Lossless conversions
Each Source value is converted into a distinct Target value.
u16::from(u8), i16::from(u8) and other conversions to strictly-larger integer types.
Vec<u8>::from(String)
Vec<T>::from(BinaryHeap<T>)
OsString::from(String)
char::from(u8)
Lossy conversions
Multiple Source values may be convert into the same Target value.
BinaryHeap<T>::from(Vec<T>) loses the order of elements.
Box<[T]>::from(Vec<T>) and Box<str>::from(String) lose any excess capacity.
Vec<T>::from(VecDeque<T>) loses the internal split of elements exposed by .as_slices().
Rust newbie here. What would be a good way to go about dynamically inferring the most probably type given a string? I am trying to code a function that given a string returns the most possible type but I have no idea where to start. In Python I would probably use a try-except block. This is what I would expect to have:
"4" -> u32 (or u64)
"askdjf" -> String
"3.2" -> f64
and so on? I know that some strings can be assigned to several possible types so the problem is not well defined but I am only interested in the general philosophy on how to solve the problem efficiently in rust.
There is a parse method on string slices (&str) that attempts to parse a string as a particular type. You'll have to know the specific types you're ready to handle, though. The parse method can return values of any type that implements FromStr.
fn main() {
if let Ok(i) = "1".parse::<u32>() {
println!("{}", i);
}
if let Ok(f) = "1.1".parse::<f64>() {
println!("{}", f);
}
}
Note that the ::<T> part is only necessary if the compiler is unable to infer what type you're trying to parse into (you'll get a compiler error in that case).
I am trying to code a function that given a string returns the most possible type but I have no idea where to start.
First of all: Rust is statically typed which means that a function returns one and only one type, so you can't just return different types, like in dynamically typed languages. However, there are ways to simulate dynamic typing -- namely two (that I can think of):
enum: If you have a fixed number of possible types, you could define an enum with one variant per type, like this:
enum DynType {
Integer(i64),
Float(f32),
String(String),
}
fn dyn_parse(s: &str) -> DynType {
...
}
You can read more on enums in this and the following Rust book chapter.
There is a trait in the standard library designed to simulate dynamic typing: Any. There is more information here. Your code could look like this:
fn dyn_parse(s: &str) -> Box<Any> {
...
}
You can't return trait objects directly, so you have to put it in a Box.
Keep in mind that both possibilities require the user of your function to do additional dispatch. Since Rust is statically typed, you can't do the things you are used to in a dynamically typed language.
Maybe you should try to solve your problems in a different way that makes more sense in the statically typed world.
About the implementation part: Like Francis Gagné said, there is parse which tries to parse a string as a type the programmer specifies. You could of course just chain those parse calls with different types and take the first one that succeeds. But this might not be what you want and maybe not the fastest implementation.
Of course you should first think of exact rules what string should parse as what type. After that you could, for example, build a finite state machine that detects the type of the string. Doing that properly could be a bit tricky though.