Are there situations where cyclic references are absolutely necessary? - garbage-collection

I'm designing a programming language and I intend for it to support multiple garbage collection strategies including ordinary reference counting. In fact, ordinary reference counting should be the default. The language supports both owning and unowning references.
I'm considering whether the language should completely not support cyclic ownership, that is, when there's a path of owning references from an object back to itself, or whether to allow it as an option in which case the compiler chooses a garbage collection strategy that can handle reference cycles. I'm more partial to the former because cyclic ownership doesn't seem to make much sense to me. It doesn't seem natural. In the real world, everything is contained in some other thing; you can't contain yourself. I mean, in what situation might an object need to own itself? For reference, closures which capture themselves or the object that contain them capture the object using an unowning reference.
To be clear, I am aware that allowing cyclic owning references would certainly make solving some problems easier. However, many problems can be redesigned in such a way that ownership cycles are broken. For e.g., in a doubly-linked list, the "head" and "next" references could be made owning, while the "tail" and "previous" references could be made unowning. My question is, are there problems that cannot be refactored to break cycles? If so, please give an example.
Assume that we have access to the following type of references:
Unique owning reference - This reference has unique ownership of an object. Once this reference goes out of scope, the object can be dropped. This reference cannot be cloned.
Shared owning reference - This reference has joint ownership of an object. The object must only be dropped when all references have gone out of scope.
Unowning reference - This reference does not have ownership of an object. It can only exist as long the object has an owning reference with lifetime greater or equal to this reference.
Weak reference to a shared object - This also doesn't own the object, but the reference is allowed to outlive all the owning references to this object. This reference can tell us whether the object is still alive or not.

Related

How to tell the lifetime relation between two memory objects?

I have two complex objects that have some form of lifetime relation between each other. My goal is to remove this relation to have both objects completely independent.
Is there a quick way to tell what causes lifetime relations between objects?
Very simple: one has a reference to the other. Lifetimes don't really come into it (significantly) unless a reference is involved. If they are, then their lifetimes are related. They may be indirectly related if they both refer to the same 3rd object somehow, but that is probably not what you mean.
This is no different than in C/C++ where if you have an object with a pointer to another, that their lifetimes are related. It's just that in Rust the compiler enforces that you do it right (unless you use unsafe which is called that for a reason).
Use the borrow checker. Rust compiler has a borrow checker that compares scopes to determine whether all borrows are valid. It will show the lifetimes of other variables.
More info here:
https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html

(Beginner) Why does the temporary variable change in example 1, but not in example 2? [duplicate]

I'm trying to get my head around mutable vs immutable objects. Using mutable objects gets a lot of bad press (e.g. returning an array of strings from a method) but I'm having trouble understanding what the negative impacts are of this. What are the best practices around using mutable objects? Should you avoid them whenever possible?
Well, there are a few aspects to this.
Mutable objects without reference-identity can cause bugs at odd times. For example, consider a Person bean with a value-based equals method:
Map<Person, String> map = ...
Person p = new Person();
map.put(p, "Hey, there!");
p.setName("Daniel");
map.get(p); // => null
The Person instance gets "lost" in the map when used as a key because its hashCode and equality were based upon mutable values. Those values changed outside the map and all of the hashing became obsolete. Theorists like to harp on this point, but in practice I haven't found it to be too much of an issue.
Another aspect is the logical "reasonability" of your code. This is a hard term to define, encompassing everything from readability to flow. Generically, you should be able to look at a piece of code and easily understand what it does. But more important than that, you should be able to convince yourself that it does what it does correctly. When objects can change independently across different code "domains", it sometimes becomes difficult to keep track of what is where and why ("spooky action at a distance"). This is a more difficult concept to exemplify, but it's something that is often faced in larger, more complex architectures.
Finally, mutable objects are killer in concurrent situations. Whenever you access a mutable object from separate threads, you have to deal with locking. This reduces throughput and makes your code dramatically more difficult to maintain. A sufficiently complicated system blows this problem so far out of proportion that it becomes nearly impossible to maintain (even for concurrency experts).
Immutable objects (and more particularly, immutable collections) avoid all of these problems. Once you get your mind around how they work, your code will develop into something which is easier to read, easier to maintain and less likely to fail in odd and unpredictable ways. Immutable objects are even easier to test, due not only to their easy mockability, but also the code patterns they tend to enforce. In short, they're good practice all around!
With that said, I'm hardly a zealot in this matter. Some problems just don't model nicely when everything is immutable. But I do think that you should try to push as much of your code in that direction as possible, assuming of course that you're using a language which makes this a tenable opinion (C/C++ makes this very difficult, as does Java). In short: the advantages depend somewhat on your problem, but I would tend to prefer immutability.
Immutable Objects vs. Immutable Collections
One of the finer points in the debate over mutable vs. immutable objects is the possibility of extending the concept of immutability to collections. An immutable object is an object that often represents a single logical structure of data (for example an immutable string). When you have a reference to an immutable object, the contents of the object will not change.
An immutable collection is a collection that never changes.
When I perform an operation on a mutable collection, then I change the collection in place, and all entities that have references to the collection will see the change.
When I perform an operation on an immutable collection, a reference is returned to a new collection reflecting the change. All entities that have references to previous versions of the collection will not see the change.
Clever implementations do not necessarily need to copy (clone) the entire collection in order to provide that immutability. The simplest example is the stack implemented as a singly linked list and the push/pop operations. You can reuse all of the nodes from the previous collection in the new collection, adding only a single node for the push, and cloning no nodes for the pop. The push_tail operation on a singly linked list, on the other hand, is not so simple or efficient.
Immutable vs. Mutable variables/references
Some functional languages take the concept of immutability to object references themselves, allowing only a single reference assignment.
In Erlang this is true for all "variables". I can only assign objects to a reference once. If I were to operate on a collection, I would not be able to reassign the new collection to the old reference (variable name).
Scala also builds this into the language with all references being declared with var or val, vals only being single assignment and promoting a functional style, but vars allowing a more C-like or Java-like program structure.
The var/val declaration is required, while many traditional languages use optional modifiers such as final in java and const in C.
Ease of Development vs. Performance
Almost always the reason to use an immutable object is to promote side effect free programming and simple reasoning about the code (especially in a highly concurrent/parallel environment). You don't have to worry about the underlying data being changed by another entity if the object is immutable.
The main drawback is performance. Here is a write-up on a simple test I did in Java comparing some immutable vs. mutable objects in a toy problem.
The performance issues are moot in many applications, but not all, which is why many large numerical packages, such as the Numpy Array class in Python, allow for In-Place updates of large arrays. This would be important for application areas that make use of large matrix and vector operations. This large data-parallel and computationally intensive problems achieve a great speed-up by operating in place.
Immutable objects are a very powerful concept. They take away a lot of the burden of trying to keep objects/variables consistent for all clients.
You can use them for low level, non-polymorphic objects - like a CPoint class - that are used mostly with value semantics.
Or you can use them for high level, polymorphic interfaces - like an IFunction representing a mathematical function - that is used exclusively with object semantics.
Greatest advantage: immutability + object semantics + smart pointers make object ownership a non-issue, all clients of the object have their own private copy by default. Implicitly this also means deterministic behavior in the presence of concurrency.
Disadvantage: when used with objects containing lots of data, memory consumption can become an issue. A solution to this could be to keep operations on an object symbolic and do a lazy evaluation. However, this can then lead to chains of symbolic calculations, that may negatively influence performance if the interface is not designed to accommodate symbolic operations. Something to definitely avoid in this case is returning huge chunks of memory from a method. In combination with chained symbolic operations, this could lead to massive memory consumption and performance degradation.
So immutable objects are definitely my primary way of thinking about object-oriented design, but they are not a dogma.
They solve a lot of problems for clients of objects, but also create many, especially for the implementers.
Check this blog post: http://www.yegor256.com/2014/06/09/objects-should-be-immutable.html. It explains why immutable objects are better than mutable. In short:
immutable objects are simpler to construct, test, and use
truly immutable objects are always thread-safe
they help to avoid temporal coupling
their usage is side-effect free (no defensive copies)
identity mutability problem is avoided
they always have failure atomicity
they are much easier to cache
You should specify what language you're talking about. For low-level languages like C or C++, I prefer to use mutable objects to conserve space and reduce memory churn. In higher-level languages, immutable objects make it easier to reason about the behavior of the code (especially multi-threaded code) because there's no "spooky action at a distance".
A mutable object is simply an object that can be modified after it's created/instantiated, vs an immutable object that cannot be modified (see the Wikipedia page on the subject). An example of this in a programming language is Pythons lists and tuples. Lists can be modified (e.g., new items can be added after it's created) whereas tuples cannot.
I don't really think there's a clearcut answer as to which one is better for all situations. They both have their places.
Shortly:
Mutable instance is passed by reference.
Immutable instance is passed by value.
Abstract example. Lets suppose that there exists a file named txtfile on my HDD. Now, when you are asking me to give you the txtfile file, I can do it in the following two modes:
I can create a shortcut to the txtfile and pass shortcut to you, or
I can do a full copy of the txtfile file and pass copied file to you.
In the first mode, the returned file represents a mutable file, because any change into the shortcut file will be reflected into the original one as well, and vice versa.
In the second mode, the returned file represents an immutable file, because any change into the copied file will not be reflected into the original one, and vice versa.
If a class type is mutable, a variable of that class type can have a number of different meanings. For example, suppose an object foo has a field int[] arr, and it holds a reference to a int[3] holding the numbers {5, 7, 9}. Even though the type of the field is known, there are at least four different things it can represent:
A potentially-shared reference, all of whose holders care only that it encapsulates the values 5, 7, and 9. If foo wants arr to encapsulate different values, it must replace it with a different array that contains the desired values. If one wants to make a copy of foo, one may give the copy either a reference to arr or a new array holding the values {1,2,3}, whichever is more convenient.
The only reference, anywhere in the universe, to an array which encapsulates the values 5, 7, and 9. set of three storage locations which at the moment hold the values 5, 7, and 9; if foo wants it to encapsulate the values 5, 8, and 9, it may either change the second item in that array or create a new array holding the values 5, 8, and 9 and abandon the old one. Note that if one wanted to make a copy of foo, one must in the copy replace arr with a reference to a new array in order for foo.arr to remain as the only reference to that array anywhere in the universe.
A reference to an array which is owned by some other object that has exposed it to foo for some reason (e.g. perhaps it wants foo to store some data there). In this scenario, arr doesn't encapsulate the contents of the array, but rather its identity. Because replacing arr with a reference to a new array would totally change its meaning, a copy of foo should hold a reference to the same array.
A reference to an array of which foo is the sole owner, but to which references are held by other object for some reason (e.g. it wants to have the other object to store data there--the flipside of the previous case). In this scenario, arr encapsulates both the identity of the array and its contents. Replacing arr with a reference to a new array would totally change its meaning, but having a clone's arr refer to foo.arr would violate the assumption that foo is the sole owner. There is thus no way to copy foo.
In theory, int[] should be a nice simple well-defined type, but it has four very different meanings. By contrast, a reference to an immutable object (e.g. String) generally only has one meaning. Much of the "power" of immutable objects stems from that fact.
Mutable collections are in general faster than their immutable counterparts when used for in-place
operations.
However, mutability comes at a cost: you need to be much more careful sharing them between
different parts of your program.
It is easy to create bugs where a shared mutable collection is updated
unexpectedly, forcing you to hunt down which line in a large codebase is performing the unwanted update.
A common approach is to use mutable collections locally within a function or private to a class where there
is a performance bottleneck, but to use immutable collections elsewhere where speed is less of a concern.
That gives you the high performance of mutable collections where it matters most, while not sacrificing
the safety that immutable collections give you throughout the bulk of your application logic.
If you return references of an array or string, then outside world can modify the content in that object, and hence make it as mutable (modifiable) object.
Immutable means can't be changed, and mutable means you can change.
Objects are different than primitives in Java. Primitives are built in types (boolean, int, etc) and objects (classes) are user created types.
Primitives and objects can be mutable or immutable when defined as member variables within the implementation of a class.
A lot of people people think primitives and object variables having a final modifier infront of them are immutable, however, this isn't exactly true. So final almost doesn't mean immutable for variables. See example here
http://www.siteconsortium.com/h/D0000F.php.
General Mutable vs Immutable
Unmodifiable - is a wrapper around modifiable. It guarantees that it can not be changed directly(but it is possibly using backing object)
Immutable - state of which can not be changed after creation. Object is immutable when all its fields are immutable. It is a next step of Unmodifiable object
Thread safe
The main advantage of Immutable object is that it is a naturally for concurrent environment. The biggest problem in concurrency is shared resource which can be changed any of thread. But if an object is immutable it is read-only which is thread safe operation. Any modification of an original immutable object return a copy
source of truth, side-effects free
As a developer you are completely sure that immutable object's state can not be changed from any place(on purpose or not). For example if a consumer uses immutable object he is able to use an original immutable object
compile optimisation
Improve performance
Disadvantage:
Copying of object is more heavy operation than changing a mutable object, that is why it has some performance footprint
To create an immutable object you should use:
1. Language level
Each language contains tools to help you with it. For example:
Java has final and primitives
Swift has let and struct[About].
Language defines a type of variable. For example:
Java has primitive and reference type,
Swift has value and reference type[About].
For immutable object more convenient is primitives and value type which make a copy by default. As for reference type it is more difficult(because you are able to change object's state out of it) but possible. For example you can use clone pattern on a developer level to make a deep(instead of shallow) copy.
2. Developer level
As a developer you should not provide an interface for changing state
[Swift] and [Java] immutable collection

Understand smart pointers in Rust

I am a newbie to Rust and writing to understand the "Smart pointers" in Rust. I have basic understanding of how smart pointers works in C++ and has been using it for memory management since a few years ago. But to my very much surprise, Rust also provides such utility explicitly.
Because from a tutorial here (https://pcwalton.github.io/2013/03/18/an-overview-of-memory-management-in-rust.html), it seems that every raw pointers have been automatically wrapped with a smart pointer, which seems very reasonable. Then why do we still need such Box<T>, Rc<T>, and Ref<T> stuff? According to this specification: https://doc.rust-lang.org/book/ch15-00-smart-pointers.html
Any comments will be apprecicated a lot. Thanks.
You can think about the difference between a T and a Box<T> as the difference between a statically allocated object and a dynamically allocated object (the latter being created via a new expression in C++ terms).
In Rust, both T and Box<T> represent a variable that has ownership over the referent object (i.e. when the variable goes out of scope, the object will be destroyed, whether it was stored by value or by reference). On the contrary, &T and &mut T represent borrowing of the object (i.e. these variables are not responsible for destroying the object, and they cannot outlive the owner of the object).
By default, you'd probably want to use T, but sometimes you might want (or have) to use Box<T>. For example, you would use a Box<T> if you want to own a T that's too large to be allocated in place. You would also use it when the object doesn't have a known size at all, which means that your only choice to store it or pass it around is through the "pointer" (the Box<T>).
In Rust, an object is generally either mutable or aliased, but not both. If you have given out immutable references to an object, you normally need to wait until those references are over before you can mutate that object again.
Additionally, Rust's immutability is transitive. If you receive an object immutably, it means that you have access to its contents (and the contents of those contents, and so on) also immutably.
Normally, all of these things are enforced at compile time. This means that you catch errors faster, but you are limited to being able to express only what the compiler can prove statically.
Like T and Box<T>, you may sometimes use RefCell<T>, which is another ownership type. But unlike T and Box<T>, the RefCell<T> enforces the borrow checking rules at runtime instead of compile time, meaning that sometimes you can do things with it that are safe but wouldn't pass the compiler's static borrow checker. The main example for this is getting a mutable reference to the interior of an object that was received immutably (which, under the statically enforced rules of Rust, would make the entire interior immutable).
The types Ref<T> and RefMut<T> are the runtime-checked equivalents of &T and &mut T respectively.
(EDIT: This whole thing is somewhat of a lie. &mut really means "unique borrow" and & means "non-unique borrow". Certain types, like mutexes, can be non-uniquely but still mutably borrowed, because otherwise they would be useless.)
Rust's ownership model tries to push you to write programs in which objects' lifetimes are known at compile time. This works well in certain scenarios, but makes other scenarios difficult or impossible to express.
Rc<T> and its atomic sibling Arc<T> are reference-counting wrappers of T. They offer you an alternative to the ownership model.
They are useful when you want to use and properly dispose an object, but it is not easy (or possible) to determine, at the moment you're writing the code, which specific variable should be the owner of that object (and therefore should take care of disposing it). Much like in C++, this means that there is no single owner of the object and that the object will be disposed by the last reference-counting wrapper that points to it.
The article you linked uses outdated syntax. Certain smart pointers used to have special names and associated syntax that has been removed since some time before Rust 1.0:
Box<T> replaced ~T ("owned pointers")
Rc<T> replaced #T ("managed pointers")
Because the Internet never forgets, you can still find pre-1.0 documentation and articles (such as the one you linked) that use the old syntax. Check the date of the article: if it's before May 2015, you're dealing with an early, unstable Rust.

When should I use a reference instead of transferring ownership?

From the Rust book's chapter on ownership, non-copyable values can be passed to functions by either transferring ownership or by using a mutable or immutable reference. When you transfer ownership of a value, it can't be used in the original function anymore: you must return it back if you want to. When you pass a reference, you borrow the value and can still use it.
I come from languages where values are immutable by default (Haskell, Idris and the like). As such, I'd probably never think about using references at all. Having the same value in two places looks dangerous (or, at least, awkward) to me. Since references are a feature, there must be a reason to use them.
Are there situations I should force myself to use references? What are those situations and why are they beneficial? Or are they just for convenience and defaulting to passing ownership is fine?
Mutable references in particular look very dangerous.
They are not dangerous, because the Rust compiler will not let you do anything dangerous. If you have a &mut reference to a value then you cannot simultaneously have any other references to it.
In general you should pass references around. This saves copying memory and should be the default thing you do, unless you have a good reason to do otherwise.
Some good reasons to transfer ownership instead:
When the value's type is small in size, such as bool, u32, etc. It's often better performance to move/copy these values to avoid a level of indirection. Usually these values implement Copy, and actually the compiler may make this optimisation for you automatically. Something it's free to do because of a strong type system and immutability by default!
When the value's current owner is going to go out of scope, you may want to move the value somewhere else to keep it alive.

How do Rust's ownership semantics relate to uniqueness typing as found in Clean and Mercury?

I noticed that in Rust moving is applied to lvalues, and it's statically enforced that moved-from objects are not used.
How do these semantics relate to uniqueness typing as found in Clean and Mercury? Are they the same concept? If not, how do they differ?
The concept of ownership in Rust is not the same as uniqueness in Mercury and Clean, although they are related in that they both aim to provide safety via static checking, and they are both defined in terms of the number of references within a scope. The key differences are:
Uniqueness is a more abstract concept. While it can be interpreted as saying that a reference to a memory location is unique, like Rust's lvalues, it can also apply to abstract values such as the state of every object in the universe, to give an extreme but typical example. There is no pointer corresponding to such a value - it cannot be opened up and inspected within a debugger or anything like that - but it can be used through an interface just like any other abstract type. The aim is to give a value-oriented semantics that remains consistent in the presence of statefulness.
In Mercury, at least (I can't speak for Clean), uniqueness is a more limited concept than ownership, in that there must be exactly one reference. You can't share several copies of a reference on the proviso that they will not be written to, as can be done in Rust. You also can't lend a reference for writing but get it back later after the borrower has finished with it.
Declaring something unique in Mercury does not guarantee that writing to references will occur, just that the compiler will check that it would be safe to do so; it is still valid for an implementation to copy the contents of a unique reference rather than update in place. The compiler will arrange for the update in place if it deems it appropriate at its given optimization level. Alternatively, authors of abstract types may perform similar (or sometimes drastically better) optimizations manually, safe in the knowledge that users will be forced to use the abstract type in a way that is consistent with them. Ownership in Rust, on the other hand, is more directly connected to the memory model and gives stronger guarantees about behaviour.

Resources