Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I am building a simulation (the coding equivalent of a model train set). It is a simulated economy with various economic agents interacting with each other. The main mode of interaction between economic agents is a transaction. At each "tic", each agent generates a list of zero or more proposed transactions (such as buying food). At each "toc" all the counter-parties process the proposed transactions that have been targeted at them in random order so that no biases are introduced. In these snippets a proposed transaction is represented as a u32.
My goal is to simulate as many of these economic agents as possible so performance is key. I am new to rust (or any kind of low level language for that matter) and my understanding from reading the rust book is if I want maximum performance then use "zero cost abstractions" and avoid dynamic dispatch.
So with that out the way I have come up with the following 3 approaches.
Option 1
trait EconomicAgent {
fn proposed_transactions(&self) -> Vec<u32>;
}
struct Person {
health:f64,
energy:f64,
nutrition:f64,
money:f64,
food:f64
}
impl EconomicAgent for Person {
fn proposed_transactions(&self) -> Vec<u32> {
vec![1, 2, 3]
}
}
struct FoodStore {
money:f64,
food:f64
}
impl EconomicAgent for FoodStore {
fn proposed_transactions(&self) -> Vec<u32> {
vec![4, 5, 6]
}
}
A person and a food store are different types that implement the EconomicAgent trait. I can then iterate over a vector of trait objects to get a list of proposed transactions. Each call is dynamically dispatched, I believe.
Option 2
enum EconomicAgent2 {
Person(Person),
FoodStore(FoodStore)
}
impl EconomicAgent2 {
fn proposed_transactions(&self) -> Vec<u32> {
match self{
EconomicAgent2::Person(person) => person.proposed_transactions(),
EconomicAgent2::FoodStore(food_store) => food_store.proposed_transactions()
}
}
}
Here, an EconomicAgent is not a trait, but rather an enum and, well you can see how it works.
Option 3
const HEALTH_INDEX : u8 = 0;
const ENERGY_INDEX : u8 = 1;
const NUTRITION_INDEX : u8 = 2;
const MONEY_INDEX : u8 = 3;
const FOOD_INDEX : u8 = 4;
enum EconomicAgentTag {
Person,
FoodStore
}
struct EconomicAgent3 {
tag: EconomicAgentTag,
resources:[f64; 5],
proposed_transactions: Box<fn(&EconomicAgent3) -> Vec<u32>>
}
fn new_person() -> EconomicAgent3 {
EconomicAgent3 {
tag: EconomicAgentTag::Person,
resources: [0.0,0.0,0.0,0.0,0.0],
proposed_transactions: Box::new(|_| vec![1, 2, 3])
}
}
fn new_food_Store() -> EconomicAgent3 {
EconomicAgent3 {
tag: EconomicAgentTag::FoodStore,
resources: [0.0,0.0,0.0,0.0,0.0],
proposed_transactions: Box::new(|_| vec![4, 5, 6])
}
}
Here an economic agent is more abstract representation.
Now imagine that there a many different types of economic agents (banks, mines, farms, clothing stores etc). They all interact by proposing and accepting transactions. Option 1 seems to suffer from dynamic dispatch. Option 2 seems to be my own version of dynamic dispatch via a match expression so is probably no better, right? Option 3 seems like it should be the most performant but does not really allow much cognitive ease on the part of the programmer.
So finally the questions:
Clearly dynamic dispatch is involved in option 1. What about options 2 and 3?
Which is expected to be most performant? Note I am not really in a position to do testing as the full idea (only on paper right now) is obviously more complex than these snippets and the choice now will affect the entire structure for the whole project.
What would be an idiomatic choice here?
All your options use dynamic dispatch or branches in one way or another to call the right function for each element. The reason is that you are mixing all the agents into a single place, which is where the different performance penalties come from (not just the indirect calls or branches, but also cache misses etc.).
Instead, for a problem like this, you want to separate the different "agents" into separate, independent "entities". Then, to reuse code, you will want to factor out "components" for which subsets of them are iterated by "systems".
This is what is usually called an "Entity-Component-System" (ECS) of which there are many models and implementations. They are typically used by games and other simulations.
If you search for ECS you will find many questions, articles and so on about it and the different approaches.
What is Dynamic Dispatch?
Dynamic Dispatch is usually reserved to indirect function calls, ie function calls which occurs via a function pointer.
In your case, both Option 1 and Option 3 are cases of dynamic dispatch:
Traits use a virtual table, which is a table of function pointers.
fn(...) -> ... is a function pointer.
What is the performance penalty of Dynamic Dispatch?
At run-time, there is little to no difference between a regular function call and a so-called virtual call:
Indirect function calls can be predicted, there's a special predictor for them in your CPU.
The cost of the function call is mostly saving/restoring registers, which happen in both cases.
The performance penalty is more insidious, it happens at compile-time.
The mother of optimizations is inlining, which essentially copy/paste the body of the function being called right at the call-site. Once a function is inlined, many other optimization passes can go to town on the (combined) code. This is especially lucrative on very small functions (a getter), but can also be quite beneficial on larger functions.
An indirect function call, however, is opaque. There are many candidate functions, and thus the optimizer cannot perform inlining... nipping many potential optimizations in the bud. Devirtualization is sometimes available -- the compiler deducing which function(s) can be called -- but should not be relied on.
Which Option to choose?
Among those presented: Option 2!
The main advantage of Option 2 is that there is no indirect function calls. In both branches of your match, the compiler has a known type for the receiver of the method and can therefore inline the method if suitable, enabling all the optimizations.
Is there better?
With an open-design, an Array of Structs is a better way to structure the system, mostly avoiding branch misprediction:
EconomicAgents {
Person(Vec<Person>),
FoodStore(Vec<FoodStore>),
}
This is the core design of the ECS solution proposed by #Acorn.
Note: as noted by #Acorn in comments, Array of Structs also is close to optimal cache wise -- no indirection, very little padding between elements.
Going with a full ECS is a trickier proposition. Unless you have dynamic entities -- Person/FoodStores are added/removed during the runs -- I would not bother. ECS are helpful for dynamism, but have to choose a trade-off between various characteristics: do you want faster add/remove, or faster iteration? Unless you need all their features, they will likely add their own overhead due to trade-offs that do not match your needs.
How to avoid dynamic dispatch?
You could go with option 1 and instead of having vector of trait objects, keep each type in its own vector and iterate them individually. It is not nice solution, so...
Instead...
Choose whichever option allows you to model your simulation best and don't worry about the const of dynamic dispatch. The overhead is small. There are other things that will impact the performance more, such as allocating new Vec for every call.
The main cost of dynamic dispatch is indirect branch predictor making wrong guess. To help your cpu make better guesses, you may try to keep objects of the same type next to each other in the vector. For example sort it by type.
Which one is idiomatic?
The option 1 has an issue that to store objects of different types into vector, you must go thru indirection. The easiest is to Box every object, but that means every access will have not only dynamic dispatch for the function, but will also have to follow extra pointer to get to the data.
The option 2 with enum is (in my opinion) more idiomatic - you have all your data together in continuous memory. But beware that the size of an enum is bigger (or equal) to its largest variant. So if you agents vary in size, it may be better to go with option 1.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
In Chapter 3 of The Rust Programming Language, the following code is used as an example for a kind of type inference that Rust cannot manage:
fn main() {
let condition = true;
let number = if condition { 5 } else { "six" };
println!("The value of number is: {}", number);
}
with the explanation that:
Rust needs to know at compile time what type the number variable is, definitively, so it can verify at compile time that its type is valid everywhere we use number. Rust wouldn’t be able to do that if the type of number was only determined at runtime; the compiler would be more complex and would make fewer guarantees about the code if it had to keep track of multiple hypothetical types for any variable.
I'm not certain I understand the rationale, because the example does seem like something where a simple compiler could infer the type.
What exactly makes this kind of type inference so difficult? In this case, the value of condition can be clearly inferred at compile time (it's true), and so thus the type of number can be too (it's i32?).
I can see how things could become a lot more complicated, if you were trying to infer types across multiple compilation units for instance, but is there something about this specific example that would add a lot of complexity to the compiler?
There are three main reasons I can think of:
1. Action at a distance effects
Let's suppose the language worked that way. Since we're extending type inference, we might as well make the language even smarter and have it infer return types as well. This allows me to write something like:
pub fn get_flux_capacitor() {
let is_prod = true;
if is_prod { FluxCapacitor::new() } else { MovieProp::new() }
}
And elsewhere in my project, I can get a FluxCapacitor by calling that function. However, one day, I change is_prod to false. Now, instead of getting an error that my function is returning the wrong type, I will get errors at every callsite. A small change inside one function has lead to errors in entirely unchanged files! That's pretty weird.
(If we don't want to add inferered return types, just imagine it's a very long function instead.)
2. Compiler internals exposed
What happens in the case where it's not so simple? Surely this should be the same as the above example:
pub fn get_flux_capacitor() {
let is_prod = (1 + 1) == 2;
...
}
But how far does that extend? The compiler's constant propagation is mostly an implementation detail. You don't want the types in your program to depend on how smart this version of the compiler is.
3. What did you actually mean?
As a human looking at this code, it looks like something is missing. Why are you branching on true at all? Why not just write FluxCapacitor::new()? Perhaps there's logic missing to check and see if a env=DEV environment variable is missing. Perhaps a trait object should actually be used so that you can take advantage of runtime polymorphism.
In this kind of situation where you're asking the computer to do something that doesn't seem quite right, Rust often chooses to throw its hands up and ask you to fix the code.
You're right, in this very specific case (where condition=true statically), the compiler could be made able to detect that the else branch is unreachable and therefore number must be 5.
This is just a contrived example, though... in the more general case, the value of condition would only be dynamically known at runtime.
It's in that case, as other have said, that inference becomes hard to implement.
On that topic, there are two things I haven't seen mentioned yet.
The Rust language design tends to err on the side of doing things as
explicitly as possible
Rust type inference is only local
On point #1, the explicit way for Rust to deal with the "this type can be one of multiple types" use case are enums.
You can define something like this:
#[derive(Debug)]
enum Whatsit {
Num(i32),
Text(&'static str),
}
and then do let number = if condition { Num(5) } else { Text("six") };
On point #2, let's see how the enum (while wordier) is the preferred approach in the language. In the example from the book we just try printing the value of number.
In a more real-case scenario we would at one point use number for something other than printing.
This means passing it to another function or including it in another type. Or (to even enable use of println!) implementing the Debug or Display traits on it. Local inference means that (if you can't name the type of number in Rust), you would not be able to do any of these things.
Suppose you want to create a function that does something with a number;
with the enum you would write:
fn do_something(number: Whatsit)
but without it...
fn do_something(number: /* what type is this? */)
In a nutshell, you're right that in principle it IS doable for the compiler to synthesize a type for number. For instance, the compiler might create an anonymous enum like Whatsit above when compiling that code.
But you - the programmer - would not know the name of that type, would not be able to refer to it, wouldn't even know what you can do with it (can I multiply two "numbers"?) and this would greatly limit its usefulness.
A similar approach was followed for instance to add closures to the language. The compiler would know what specific type a closure has, but you, the programmer, would not. If you're interested I can try finding out discussions on the difficulties that the approach introduced in the design of the language.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have been wondering about different ways of instantiating structs I came across in Rust so far. So there is the most basic/simple way of setting all the fields manually when everything is public:
let a = Structure { arg1: T, arg2: T, ... }
When there is a need for privacy and better interface and/or defaults, it's common to use 'contructors' such us new(), etc:
let a = Structure::new(arg1, arg2, ...)
Now, so far it kind of makes sens to me. However there seems to be a third common way of doing the same which confuses me the most. Here is a concrete example:
let mut image_file = OpenOptions::new()
.write(true)
.truncate(true)
.create(true)
.open(file_path)
.unwrap();
So my questions are:
What are the performance impact of these different solutions ( if any )?
What are general benefits and disadvantages of each?
Are there more ways of doing the same?
Which is the best practice?
You have identified 3 ways to create a struct:
Direct: directly initializing its fields,
Constructor: calling a single function which initializes the struct,
Builder: assembling the struct elements piece-meal then finally initializing a struct.
Are there more ways of doing the same?
Direct initialization has two variations: either initialing each field directly, or initializing a few fields and "defaulting" the others with struct S { f0, .. OTHERS } where OTHERS is an instance of S.
The Constructor and Builder ways have a exponential number of variations, depending on how you group the parameters, and in some instances the line between the two will be blurry.
All ways, however, must at some point converge and use (1) to create an instance of S.
What are general benefits and disadvantages of each?
This is... irrelevant, to some extent.
Each of the 3 alternatives caters to a different set of needs:
Direct initialization requires accessible fields; since pub fields are rare it is therefore mostly used within the crate but not usable by clients.
Constructor and Builder allow establishing invariants and are therefore the primary client's interface.
The Constructor is simple but inflexible: no new parameter can be added without breaking backward compatibility (another Constructor can, of course); the Builder on the other hand is flexible, at the cost of verbosity.
What are the performance impact of these different solutions ( if any )?
Ideally, in an optimized binary, both Constructor and Builder should have the same cost. If it matters, profile.
Direct initialization will be faster than either if they establish invariants, as it does not. Comparing the performance of non-equivalent functionality rarely matters though.
Which is the best practice?
Avoid Direct Initialization.
Direct Initialization does NOT establish invariants, it's up to the surrounding code to establishing them, which therefore means that any time Direct Initialization is used the invariant checking code is duplicated, which violates the DRY principle.
Direct Initialization also goes against encapsulation, preventing any further change of the underlying structure, down to the type of the fields used. This is generally undesirable.
There are exceptions, as always. The most prominent being that implementing the Constructor or Builder requires using Direct Initialization down the road.
Choosing between Constructor and Builder is more subjective. In general, I recommend a Constructor when the parameters are few, even if this means writing a few of them, such as Vec::{new, with_capacity}. When the number of Constructors would get out of hand if one needed to write one for each combination of parameters which makes sense, then use a Builder instead.
Should we use u32/i32 or it's lower variant (u8/i8, u16/i16) when dealing with limited range number like "days in month" which ranged from 1-30 or "score of a subject" which ranged from 0 to 100? Or why we shouldn't?
Is there any optimization or benefit on the lower variant (i.e. memory efficient)?
Summary
Correctness should be prioritized over performance and correctness-wise (for ranges like 1–100), all solutions (u8, u32, ...) are equally bad. The best solution would be to create a new type to benefit from strong typing.
The rest of my answer tries to justify this claim and discusses different ways of creating the new type.
More explanation
Let's take a look at the "score of subject" example: the only legal values are 0–100. I'd argue that correctness-wise, using u8 and u32 is equally bad: in both cases, your variable can hold values that are not legal in your semantic context; that's bad!
And arguing that the u8 is better, because there are less illegal values, is like arguing that wrestling a bear is better than walking through New York, because you only have one possibility of dying (blood loss by bear attack) as opposed to the many possibilities of death (car accident, knife attack, drowning, ...) in New York.
So what we want is a type that guarantees to hold only legal values. We want to create a new type that does exactly this. However, there are multiple ways to proceed; each with different advantages and disadvantages.
(A) Make the inner value public
struct ScoreOfSubject(pub u8);
Advantage: at least APIs are more easy to understand, because the parameter is already explained by the type. What is easier to understand:
add_record("peter", 75, 47) or
add_record("peter", StudentId(75), ScoreOfSubject(47))?
I'd say the latter one ;-)
Disadvantage: we don't actually do any range checking and illegal values can still occur; bad!.
(B) Make inner value private and supply a range checking constructor
struct ScoreOfSubject(pub u8);
impl ScoreOfSubject {
pub fn new(value: u8) -> Self {
assert!(value <= 100);
ScoreOfSubject(value)
}
pub fn get(&self) -> u8 { self.0 }
}
Advantage: we enforce legal values with very little code, yeah :)
Disadvantage: working with the type can be annoying. Pretty much every operation requires the programmer to pack & unpack the value.
(C) Add a bunch of implementations (in addition to (B))
(the code would impl Add<_>, impl Display and so on)
Advantage: the programmer can use the type and do all useful operations on it directly -- with range checking! This is pretty optimal.
Please take a look at Matthieu M.'s comment:
[...] generally multiplying scores together, or dividing them, does not produce a score! Strong typing not only enforces valid values, it also enforces valid operations, so that you don't actually divide two scores together to get another score.
I think this is a very important point I failed to make clear before. Strong typing prevents the programmer from executing illegal operations on values (operations that don't make any sense). A good example is the crate cgmath which distinguishes between point and direction vectors, because both support different operations on them. You can find additional explanation here.
Disadvantage: a lot of code :(
Luckily the disadvantage can be reduced by the Rust macro/compiler plugin system. There are crates like newtype_derive or bounded_integer that do this kind of code generation for you (disclaimer: I never worked with them).
But now you say: "you can't be serious? Am I supposed to spend my time writing new types?".
Not necessarily, but if you are working on production code (== at least somewhat important), then my answer is: yes, you should.
A no-answer answer: I doubt you would see any difference in benchmarks, unless you do A LOT of arithmetic or process HUGE arrays of numbers.
You should probably just go with the type which makes more sense (no reason to use negatives or have an upper bound in millions for a day of month) and provides the methods you need (e.g. you can't perform abs() directly on an unsigned integer).
There could be major benefits using smaller types but you would have to benchmark your application on your target platform to be sure.
The first and most easily realized benefit from the lower memory footprint is better caching. Not only is your data more likely to fit into the cache, but it is also less likely to discard other data in the cache, potentially improving a completely different part of your application. Whether or not this is triggered depends on what memory your application touches and in what order. Do the benchmarks!
Network data transfers have an obvious benefit from using smaller types.
Smaller data allows "larger" instructions. A 128-bit SIMD unit can handle 4 32-bit data OR 16 8-bit data, making certain operations 4 times faster. In benchmarks I´ve made these instructions do execute 4 times faster indeed BUT the whole application improved by less than 1%, and the code became more of a mess. Shaping your program into making better use of SIMD can be tricky.
As of signed/unsigned discussions unsigned has slightly better properties which a compiler may or may not take advantage of.
In C++, you have the ability to pass integrals inside templates
std::array<int, 3> arr; //fixed size array of 3
I know that Rust has built in support for this, but what if I wanted to create something like linear algebra vector library?
struct Vec<T, size: usize> {
data: [T; size],
}
type Vec3f = Vec<f32, 3>;
type Vec4f = Vec<f32, 4>;
This is currently what I do in D. I have heard that Rust now has Associated Constants.
I haven't used Rust in a long time but this doesn't seem to address this problem at all or have I missed something?
As far as I can see, associated constants are only available in traits and that would mean I would still have to create N vector types by hand.
No, associated constants don't help and aren't intended to. Associated anything are outputs while use cases such as the one in the question want inputs. One could in principle construct something out of type parameters and a trait with associated constants (at least, as soon as you can use associated constants of type parameters — sadly that doesn't work yet). But that has terrible ergonomics, not much better than existing hacks like typenum.
Integer type parameters are highly desired since, as you noticed, they enable numerous things that aren't really feasible in current Rust. People talk about this and plan for it but it's not there yet.
Integer type parameters are not supported as of now, however there's an RFC for that IIRC, and a long-standing discussion.
You could use typenum crate in the meanwhile.
There's some background on this topic in the Rust book section on static and dynamic dispatch, but the tl;dr is that calling a method on a trait reference and a few other various situation (function pointers, etc) results in dynamic instead of static dispatch.
What is actual runtime cost of this, after optimizations have been applied?
For example, imagine this set of structs & traits:
struct Buffer;
struct TmpBuffer;
struct TmpMutBuffer;
impl BufferType for Buffer { ... }
impl BufferType for BufferTmp { ... }
impl BufferType for BufferTmpMut { ... }
impl Buffer2D for BufferType { ... }
impl Buffer2DExt for Buffer2D { ... }
Notice that the traits here are implemented on traits themselves.
What is the calling cost of dynamic dispatch to invoke a method from Buffer2DExt on a struct reference?
The recent question What are Rust's exact auto-dereferencing rules? regards the dereferencing rules; are these rules applied at compile time, or runtime?
Disclaimer: the question is fairly open-ended, therefore this answer might be incomplete. Treat it with an even bigger grain of salt than you would normally.
Rust uses a simple "virtual table" to implement dynamic dispatch. This strategy is also used in C++ for which you can see a study here. The study is a bit dated though.
The cost of indirection
Virtual dispatch induces indirection, this has a cost for multiple reasons:
indirection is opaque: this inhibits inlining and constant propagation which are key enablers for many compiler optimizations
indirection has a runtime cost: if incorrectly predicted, you are looking at pipeline stalls and expensive memory fetches
Optimizing indirection
Compilers, however, muddies the water by trying their best at optimizing indirection away.
devirtualization: sometimes the compiler can resolve the virtual table look-up at compile time (usually, because it knows the concrete type of the object); if so, it can therefore use a regular function call rather than an indirect one, and optimize away the indirection
probabilistic devirtualization: last year Honza Hubička introduced a new optimization in gcc (read the 5-part series as it is very instructive). The gist of the strategy is to build the inheritance graph to make an educated guess at the potential type(s), and then use a pattern like if v.hasType(A) { v.A::call() } elif v.hasType(B) { v.B::call() } else { v.virtual-call() }; special-casing the most likely types means regular calls in this case, and therefore inlined/constant-propagated/full-goodies calls.
This latter strategy could be fairly interesting in Rust due to the coherence rules and privacy rules, as it should have more cases where the full "inheritance" graph is provably known.
The cost of monomorphization
In Rust, you can use compile-time polymorphism instead of run-time polymorphism; the compiler will emit one version of the function for each unique combination of compile-time parameters it takes. This, itself, has a cost:
compilation time cost: more code to be produced, more code to be optimized
binary size cost: the produced binary will end-up being larger, a typical size/speed trade-off
run-time cost: possibly, the larger code size might lead to cache misses at CPU level
The compiler might be able to merge together specialized functions that end up having the same implementation (because of phantom types, for example), however it is still more than likely than the produced binaries (executables and libraries) will end up larger.
As usual with performance, you have to measure in your case what is more beneficial.