Use cases for const HashMap/BTreeMap - rust

The docs say:
A constant item is an optionally named constant value which is not associated with a specific memory location in the program. Constants are essentially inlined wherever they are used, meaning that they are copied directly into the relevant context when used.
While that sounds perfectly reasonable for "simple" values like integers, booleans, etc., doesn't it also mean that for more complex/larger data structures, like HashMap/BTreeMaps, for example, consts can be quite inefficient? If so, why (in what situations) would one want to use consts for such structures, instead of, say, an immutable static variable?

If so, why (in what situations) would one want to use consts for such structures, instead of, say, an immutable static variable?
Constant variables are, technically, the only way of getting an "immutable static variable" in Rust.
Other solutions, like lazy_static, require overhead in initializing the value and tracking whether it is initialized or not, or risk unsafety.
Constant values are embedded into the read-only data in the executable, like strings or the machine code. This is pretty much as efficient as normal memory - there isn't any overhead.
Another reason for const is for constant generics.

Related

(Beginner) Why does the temporary variable change in example 1, but not in example 2? [duplicate]

I'm trying to get my head around mutable vs immutable objects. Using mutable objects gets a lot of bad press (e.g. returning an array of strings from a method) but I'm having trouble understanding what the negative impacts are of this. What are the best practices around using mutable objects? Should you avoid them whenever possible?
Well, there are a few aspects to this.
Mutable objects without reference-identity can cause bugs at odd times. For example, consider a Person bean with a value-based equals method:
Map<Person, String> map = ...
Person p = new Person();
map.put(p, "Hey, there!");
p.setName("Daniel");
map.get(p); // => null
The Person instance gets "lost" in the map when used as a key because its hashCode and equality were based upon mutable values. Those values changed outside the map and all of the hashing became obsolete. Theorists like to harp on this point, but in practice I haven't found it to be too much of an issue.
Another aspect is the logical "reasonability" of your code. This is a hard term to define, encompassing everything from readability to flow. Generically, you should be able to look at a piece of code and easily understand what it does. But more important than that, you should be able to convince yourself that it does what it does correctly. When objects can change independently across different code "domains", it sometimes becomes difficult to keep track of what is where and why ("spooky action at a distance"). This is a more difficult concept to exemplify, but it's something that is often faced in larger, more complex architectures.
Finally, mutable objects are killer in concurrent situations. Whenever you access a mutable object from separate threads, you have to deal with locking. This reduces throughput and makes your code dramatically more difficult to maintain. A sufficiently complicated system blows this problem so far out of proportion that it becomes nearly impossible to maintain (even for concurrency experts).
Immutable objects (and more particularly, immutable collections) avoid all of these problems. Once you get your mind around how they work, your code will develop into something which is easier to read, easier to maintain and less likely to fail in odd and unpredictable ways. Immutable objects are even easier to test, due not only to their easy mockability, but also the code patterns they tend to enforce. In short, they're good practice all around!
With that said, I'm hardly a zealot in this matter. Some problems just don't model nicely when everything is immutable. But I do think that you should try to push as much of your code in that direction as possible, assuming of course that you're using a language which makes this a tenable opinion (C/C++ makes this very difficult, as does Java). In short: the advantages depend somewhat on your problem, but I would tend to prefer immutability.
Immutable Objects vs. Immutable Collections
One of the finer points in the debate over mutable vs. immutable objects is the possibility of extending the concept of immutability to collections. An immutable object is an object that often represents a single logical structure of data (for example an immutable string). When you have a reference to an immutable object, the contents of the object will not change.
An immutable collection is a collection that never changes.
When I perform an operation on a mutable collection, then I change the collection in place, and all entities that have references to the collection will see the change.
When I perform an operation on an immutable collection, a reference is returned to a new collection reflecting the change. All entities that have references to previous versions of the collection will not see the change.
Clever implementations do not necessarily need to copy (clone) the entire collection in order to provide that immutability. The simplest example is the stack implemented as a singly linked list and the push/pop operations. You can reuse all of the nodes from the previous collection in the new collection, adding only a single node for the push, and cloning no nodes for the pop. The push_tail operation on a singly linked list, on the other hand, is not so simple or efficient.
Immutable vs. Mutable variables/references
Some functional languages take the concept of immutability to object references themselves, allowing only a single reference assignment.
In Erlang this is true for all "variables". I can only assign objects to a reference once. If I were to operate on a collection, I would not be able to reassign the new collection to the old reference (variable name).
Scala also builds this into the language with all references being declared with var or val, vals only being single assignment and promoting a functional style, but vars allowing a more C-like or Java-like program structure.
The var/val declaration is required, while many traditional languages use optional modifiers such as final in java and const in C.
Ease of Development vs. Performance
Almost always the reason to use an immutable object is to promote side effect free programming and simple reasoning about the code (especially in a highly concurrent/parallel environment). You don't have to worry about the underlying data being changed by another entity if the object is immutable.
The main drawback is performance. Here is a write-up on a simple test I did in Java comparing some immutable vs. mutable objects in a toy problem.
The performance issues are moot in many applications, but not all, which is why many large numerical packages, such as the Numpy Array class in Python, allow for In-Place updates of large arrays. This would be important for application areas that make use of large matrix and vector operations. This large data-parallel and computationally intensive problems achieve a great speed-up by operating in place.
Immutable objects are a very powerful concept. They take away a lot of the burden of trying to keep objects/variables consistent for all clients.
You can use them for low level, non-polymorphic objects - like a CPoint class - that are used mostly with value semantics.
Or you can use them for high level, polymorphic interfaces - like an IFunction representing a mathematical function - that is used exclusively with object semantics.
Greatest advantage: immutability + object semantics + smart pointers make object ownership a non-issue, all clients of the object have their own private copy by default. Implicitly this also means deterministic behavior in the presence of concurrency.
Disadvantage: when used with objects containing lots of data, memory consumption can become an issue. A solution to this could be to keep operations on an object symbolic and do a lazy evaluation. However, this can then lead to chains of symbolic calculations, that may negatively influence performance if the interface is not designed to accommodate symbolic operations. Something to definitely avoid in this case is returning huge chunks of memory from a method. In combination with chained symbolic operations, this could lead to massive memory consumption and performance degradation.
So immutable objects are definitely my primary way of thinking about object-oriented design, but they are not a dogma.
They solve a lot of problems for clients of objects, but also create many, especially for the implementers.
Check this blog post: http://www.yegor256.com/2014/06/09/objects-should-be-immutable.html. It explains why immutable objects are better than mutable. In short:
immutable objects are simpler to construct, test, and use
truly immutable objects are always thread-safe
they help to avoid temporal coupling
their usage is side-effect free (no defensive copies)
identity mutability problem is avoided
they always have failure atomicity
they are much easier to cache
You should specify what language you're talking about. For low-level languages like C or C++, I prefer to use mutable objects to conserve space and reduce memory churn. In higher-level languages, immutable objects make it easier to reason about the behavior of the code (especially multi-threaded code) because there's no "spooky action at a distance".
A mutable object is simply an object that can be modified after it's created/instantiated, vs an immutable object that cannot be modified (see the Wikipedia page on the subject). An example of this in a programming language is Pythons lists and tuples. Lists can be modified (e.g., new items can be added after it's created) whereas tuples cannot.
I don't really think there's a clearcut answer as to which one is better for all situations. They both have their places.
Shortly:
Mutable instance is passed by reference.
Immutable instance is passed by value.
Abstract example. Lets suppose that there exists a file named txtfile on my HDD. Now, when you are asking me to give you the txtfile file, I can do it in the following two modes:
I can create a shortcut to the txtfile and pass shortcut to you, or
I can do a full copy of the txtfile file and pass copied file to you.
In the first mode, the returned file represents a mutable file, because any change into the shortcut file will be reflected into the original one as well, and vice versa.
In the second mode, the returned file represents an immutable file, because any change into the copied file will not be reflected into the original one, and vice versa.
If a class type is mutable, a variable of that class type can have a number of different meanings. For example, suppose an object foo has a field int[] arr, and it holds a reference to a int[3] holding the numbers {5, 7, 9}. Even though the type of the field is known, there are at least four different things it can represent:
A potentially-shared reference, all of whose holders care only that it encapsulates the values 5, 7, and 9. If foo wants arr to encapsulate different values, it must replace it with a different array that contains the desired values. If one wants to make a copy of foo, one may give the copy either a reference to arr or a new array holding the values {1,2,3}, whichever is more convenient.
The only reference, anywhere in the universe, to an array which encapsulates the values 5, 7, and 9. set of three storage locations which at the moment hold the values 5, 7, and 9; if foo wants it to encapsulate the values 5, 8, and 9, it may either change the second item in that array or create a new array holding the values 5, 8, and 9 and abandon the old one. Note that if one wanted to make a copy of foo, one must in the copy replace arr with a reference to a new array in order for foo.arr to remain as the only reference to that array anywhere in the universe.
A reference to an array which is owned by some other object that has exposed it to foo for some reason (e.g. perhaps it wants foo to store some data there). In this scenario, arr doesn't encapsulate the contents of the array, but rather its identity. Because replacing arr with a reference to a new array would totally change its meaning, a copy of foo should hold a reference to the same array.
A reference to an array of which foo is the sole owner, but to which references are held by other object for some reason (e.g. it wants to have the other object to store data there--the flipside of the previous case). In this scenario, arr encapsulates both the identity of the array and its contents. Replacing arr with a reference to a new array would totally change its meaning, but having a clone's arr refer to foo.arr would violate the assumption that foo is the sole owner. There is thus no way to copy foo.
In theory, int[] should be a nice simple well-defined type, but it has four very different meanings. By contrast, a reference to an immutable object (e.g. String) generally only has one meaning. Much of the "power" of immutable objects stems from that fact.
Mutable collections are in general faster than their immutable counterparts when used for in-place
operations.
However, mutability comes at a cost: you need to be much more careful sharing them between
different parts of your program.
It is easy to create bugs where a shared mutable collection is updated
unexpectedly, forcing you to hunt down which line in a large codebase is performing the unwanted update.
A common approach is to use mutable collections locally within a function or private to a class where there
is a performance bottleneck, but to use immutable collections elsewhere where speed is less of a concern.
That gives you the high performance of mutable collections where it matters most, while not sacrificing
the safety that immutable collections give you throughout the bulk of your application logic.
If you return references of an array or string, then outside world can modify the content in that object, and hence make it as mutable (modifiable) object.
Immutable means can't be changed, and mutable means you can change.
Objects are different than primitives in Java. Primitives are built in types (boolean, int, etc) and objects (classes) are user created types.
Primitives and objects can be mutable or immutable when defined as member variables within the implementation of a class.
A lot of people people think primitives and object variables having a final modifier infront of them are immutable, however, this isn't exactly true. So final almost doesn't mean immutable for variables. See example here
http://www.siteconsortium.com/h/D0000F.php.
General Mutable vs Immutable
Unmodifiable - is a wrapper around modifiable. It guarantees that it can not be changed directly(but it is possibly using backing object)
Immutable - state of which can not be changed after creation. Object is immutable when all its fields are immutable. It is a next step of Unmodifiable object
Thread safe
The main advantage of Immutable object is that it is a naturally for concurrent environment. The biggest problem in concurrency is shared resource which can be changed any of thread. But if an object is immutable it is read-only which is thread safe operation. Any modification of an original immutable object return a copy
source of truth, side-effects free
As a developer you are completely sure that immutable object's state can not be changed from any place(on purpose or not). For example if a consumer uses immutable object he is able to use an original immutable object
compile optimisation
Improve performance
Disadvantage:
Copying of object is more heavy operation than changing a mutable object, that is why it has some performance footprint
To create an immutable object you should use:
1. Language level
Each language contains tools to help you with it. For example:
Java has final and primitives
Swift has let and struct[About].
Language defines a type of variable. For example:
Java has primitive and reference type,
Swift has value and reference type[About].
For immutable object more convenient is primitives and value type which make a copy by default. As for reference type it is more difficult(because you are able to change object's state out of it) but possible. For example you can use clone pattern on a developer level to make a deep(instead of shallow) copy.
2. Developer level
As a developer you should not provide an interface for changing state
[Swift] and [Java] immutable collection

Create a map using type as key

I need a HashMap<K,V> where V is a trait (it will likely be Box or an Rc or something, that's not important), and I need to ensure that the map stores at most one of a given struct, and more importantly, that I can query the presence of (and retrieve/insert) items by their type. K can be anything that is unique to each type (a uint would be nice, but a String or even some large struct holding type information would be sufficient as long as it can be Eq and Hashable)
This is occurring in a library, so I cannot use an enum or such since new types can be added by external code.
I looked into std::any::TypeId but besides not working for non-'static types, it seems they aren't even unique (and allegedly collisions were achieved accidentally with a rather small number of types) so I'd prefer to avoid them if feasible since the number of types I'll have may be very large. (hence this is not a duplicate of this IMO)
I'd like something along the lines of a macro to ensure uniqueness but I can't figure out how to have some kind of global compile time counter. I could use a proper UUID, but it'd be nice to have guaranteed uniqueness since this is, in theory at least, statically determinable.
It is safe to assume that all relevant types are defined either in this lib or in a singular crate that directly depends on it, if that allows for a solution that might be otherwise impossible.
e.g. my thoughts are to generate ids for types in the lib, and also export a constant of the counter, which can be used by the consumer of the lib in the same macro (or a very similar one) but I don't see a way to have such a const value modified by const code in multiple places.
Is this possible or do I need some kind of build script that provides values before compile time?

Is an object a data structure?

I know things like arrays, linked lists, etc are data structures...but what about objects?
Like say if I create an object say employee and it stores and keeps track of the employees name, salary, phone number...etc etc.
It's a question of semantics really, and I expect it could be argued either way.
When you create an object, you are telling a compiler, or an interpreter to store a group of information together. Usually the interpreter/compiler, will use some type of data structure to store that information (Python uses a hash table for example).
I might call that data structure the object if I pointed it out in a hex dump, but that's just because saying 'the bytes that represents the object' is a bit inconvenient.
You could (and maybe someone has) write a compiler that stores many objects in one data structure. In that case there would be no one to one mapping between object and data structure. So for that reason - I'm going to say no, an object is not a data structure, but it is normally stored in one.
Interestingly, before the OOP paradigm came into vogue, languages like C used struct's (structures). Which, in C is simply a contiguous block of virtual memory with named offsets (members). Although that does get more complex with things like unions, etc.
But essentially a structure, as #James pointed out, is exactly that - some collection or grouping of related things together in a way a programmer (or mad scientist) feels is logical.
In the modern programming lexicon, with languages such as Java, C#, etc - your Objects usually represent one real-world thing, such as a Customer, Order, < inset over-used object example here>, etc. While your data-structures (usually collections in the libraries of the languages I mentioned) represent the containment of multiple "Objects".
Strictly speaking, however (and just to confuse everyone), the data structures in languages like Java and C# are objects! (i.e. they are referenced passed around, can have methods called on them, etc.)
In a more classical (and perhaps more CS derived) sense, "data structures" are collections of behaviours (typically algorithms) that are used to manage the memory that stores data, while "objects" are that data.

How should I use storage class specifiers like ref, in, out, etc. in function arguments in D?

There are comparatively many storage class specifiers for functions arguments in D, which are:
none
in (which is equivalent to const scope)
out
ref
scope
lazy
const
immutable
shared
inout
What's the rational behind them? Their names already put forth the obvious use. However, there are some open questions:
Should I use ref combined with in for struct type function arguments by default?
Does out imply ref implicitely?
When should I use none?
Does ref on classes and/or interfaces make sense? (Class types are references by default.)
How about ref on array slices?
Should I use const for built-in arithmetic types, whenever possible?
More generally put: When and why should I use which storage class specifier for function argument types in case of built-in types, arrays, structs, classes and interfaces?
(In order to isolate the scope of the question a little bit, please don't discuss shared, since it has its own isolated meaning.)
I wouldn't use either by default. ref parameters only take lvalues, and it implies that you're going to be altering the argument that's being passed in. If you want to avoid copying, then use const ref or auto ref. But const ref still requires an lvalue, so unless you want to duplicate your functions, it's frequently more annoying than it's worth. And while auto ref will avoid copying lvalues (it basically makes it so that there's a version of the function which takes an lvalues by ref and one which takes rvalues without ref), it only works with templates, limiting its usefulness. And using const can have far-reaching consequences due to the fact that D's const is transitive and the fact that it's undefined behavior to cast away const from a variable and modify it. So, while it's often useful, using it by default is likely to get you into trouble.
Using in gives you scope in addition to const, which I'd generally advise against. scope on function parameters is supposed to make it so that no reference to that data can escape the function, but the checks for it aren't properly implemented yet, so you can actually use it in a lot more situations than are supposed to be legal. There are some cases where scope is invaluable (e.g. with delegates, since it makes it so that the compiler doesn't have to allocate a closure for it), but for other types, it can be annoying (e.g. if you pass an array be scope, then you couldn't return a slice to that array from the function). And any structs with any arrays or reference types would be affected. And while you won't get many complaints about incorrectly using scope right now, if you've been using it all over the place, you're bound to get a lot of errors once it's fixed. Also, its utterly pointless for value types, since they have no references to escape. So, using const and in on a value type (including structs which are value types) are effectively identical.
out is the same as ref except that it resets the parameter to its init value so that you always get the same value passed in regardless of what the previous state of the variable being passed in was.
Almost always as far as function arguments go. You use const or scope or whatnot when you have a specific need it, but I wouldn't advise using any of them by default.
Of course it does. ref is separate from the concept of class references. It's a reference to the variable being passed in. If I do
void func(ref MyClass obj)
{
obj = new MyClass(7);
}
auto var = new MyClass(5);
func(var);
then var will refer the newly constructed new MyClass(7) after the call to func rather than the new MyClass(5). You're passing the reference by ref. It's just like how taking the address of a reference (like var) gives you a pointer to a reference and not a pointer to a class object.
MyClass* p = &var; //points to var, _not_ to the object that var refers to.
Same deal as with classes. ref makes the parameter refer to the variable passed in. e.g.
void func(ref int[] arr)
{
arr ~= 5;
}
auto var = [1, 2, 3];
func(var);
assert(var == [1, 2, 3, 5]);
If func didn't take its argument by ref, then var would have been sliced, and appending to arr would not have affected var. But since the parameter was ref, anything done to arr is done to var.
That's totally up to you. Making it const makes it so that you can't mutate it, which means that you're protected from accidentally mutating it if you don't intend to ever mutate it. It might also enable some optimizations, but if you never write to the variable, and it's a built-in arithmetic type, then the compiler knows that it's never altered and the optimizer should be able to do those optimizations anyway (though whether it does or not depends on the compiler's implementation).
immutable and const are effectively identical for the built-in arithmetic types in almost all cases, so personally, I'd just use immutable if I want to guarantee that such a variable doesn't change. In general, using immutable instead of const if you can gives you better optimizations and better guarantees, since it allows the variable to be implicitly shared across threads (if applicable) and it always guarantees that the variable can't be mutated (whereas for reference types, const just means only that that reference can't mutate the object, not that it can't be mutated).
Certainly, if you mark your variables const and immutable as much as possible, then it does help the compiler with optimizations at least some of the time, and it makes it easier to catch bugs where you mutated something when you didn't mean to. It also can make your code easier to understand, since you know that the variable is not going to be mutated. So, using them liberally can be valuable. But again, using const or immutable can be overly restrictive depending on the type (though that isn't a problem with the built-in integral types), so just automatically marking everything as const or immutable can cause problems.

What's the advantage of a String being Immutable?

Once I studied about the advantage of a string being immutable because of something to improve performace in memory.
Can anybody explain this to me? I can't find it on the Internet.
Immutability (for strings or other types) can have numerous advantages:
It makes it easier to reason about the code, since you can make assumptions about variables and arguments that you can't otherwise make.
It simplifies multithreaded programming since reading from a type that cannot change is always safe to do concurrently.
It allows for a reduction of memory usage by allowing identical values to be combined together and referenced from multiple locations. Both Java and C# perform string interning to reduce the memory cost of literal strings embedded in code.
It simplifies the design and implementation of certain algorithms (such as those employing backtracking or value-space partitioning) because previously computed state can be reused later.
Immutability is a foundational principle in many functional programming languages - it allows code to be viewed as a series of transformations from one representation to another, rather than a sequence of mutations.
Immutable strings also help avoid the temptation of using strings as buffers. Many defects in C/C++ programs relate to buffer overrun problems resulting from using naked character arrays to compose or modify string values. Treating strings as a mutable types encourages using types better suited for buffer manipulation (see StringBuilder in .NET or Java).
Consider the alternative. Java has no const qualifier. If String objects were mutable, then any method to which you pass a reference to a string could have the side-effect of modifying the string. Immutable strings eliminate the need for defensive copies, and reduce the risk of program error.
Immutable strings are cheap to copy, because you don't need to copy all the data - just copy a reference or pointer to the data.
Immutable classes of any kind are easier to work with in multiple threads, the only synchronization needed is for destruction.
Perhaps, my answer is outdated, but probably someone will found here a new information.
Why Java String is immutable and why it is good:
you can share a string between threads and be sure no one of them will change the string and confuse another thread
you don’t need a lock. Several threads can work with immutable string without conflicts
if you just received a string, you can be sure no one will change its value after that
you can have many string duplicates – they will be pointed to a single instance, to just one copy. This saves computer memory (RAM)
you can do substring without copying, – by creating a pointer to an existing string’s element. This is why Java substring operation implementation is so fast
immutable strings (objects) are much better suited to use them as key in hash-tables
a) Imagine StringPool facility without making string immutable , its not possible at all because in case of string pool one string object/literal e.g. "Test" has referenced by many reference variables , so if any one of them change the value others will be automatically gets affected i.e. lets say
String A = "Test" and String B = "Test"
Now String B called "Test".toUpperCase() which change the same object into "TEST" , so A will also be "TEST" which is not desirable.
b) Another reason of Why String is immutable in Java is to allow String to cache its hashcode , being immutable String in Java caches its hash code and do not calculate every time we call hashcode method of String, which makes it very fast as hashmap key.
Think of various strings sitting on a common pool. String variables then point to locations in the pool. If u copy a string variable, both the original and the copy shares the same characters. These efficiency of sharing outweighs the inefficiency of string editing by extracting substrings and concatenating.
Fundamentally, if one object or method wishes to pass information to another, there are a few ways it can do it:
It may give a reference to a mutable object which contains the information, and which the recipient promises never to modify.
It may give a reference to an object which contains the data, but whose content it doesn't care about.
It may store the information into a mutable object the intended data recipient knows about (generally one supplied by that data recipient).
It may return a reference to an immutable object containing the information.
Of these methods, #4 is by far the easiest. In many cases, mutable objects are easier to work with than immutable ones, but there's no easy way to share with "untrusted" code the information that's in a mutable object without having to first copy the information to something else. By contrast, information held in an immutable object to which one holds a reference may easily be shared by simply sharing a copy of that reference.

Resources