why the sweep phase of cms does not have to stop the world? - garbage-collection

I have a question on cms sweep phase, the sweep phase will not stop the world ,considering the below case , a pointer c is null which will not be marked as reachable , after remark phase , the c pointer is modified to be a new object,or do the things like c=b, b is unreachable ,but if c points the it ,it becomes reachable, for these two cases ,will the c will be collected? it is not marked on remark phase ,so if it is collected, i think it is wrong.

There’s a huge misconception in your question. Garbage collectors collect objects, i.e. the memory occupied by them, not pointers.
Pointers are traversed in the marking phase to determine, which objects are reachable. When your pointer c is null, it doesn’t point to any object and there is nothing to traverse. Whether there are unreachable objects can’t be determined from c, it just doesn’t contribute to this process. All that matters, is which pointers do point to an object, as these objects are reachable. All objects not encountered during the traversal are unreachable.
Since being unreachable implies that no pointer to the object exists, no subsequent pointer assignment can make an unreachable object reachable. When you do c = b, there are only two possible scenarios, 1) b is null, hence c will be null too after the assignment or 2) b points to an object which, of course, has been marked as reachable during the marking phase, so now there’s one more pointer to that reachable object, which doesn’t change its reachable nature.
The only change that may happen during a concurrent sweeping phase, is that an object marked as reachable may become unreachable concurrently, e.g. if c was the only pointer to an object, a concurrent c = b makes that object unreachable. This, however, just implies that this object is still treated like being reachable in this sweeping phase and needs to be collected in the next garbage collection cycle.

Related

What happens to the memory that was 'moved into'?

Does rust drop the memory after move is used to reassign a value?
What happens to the string "aaa" in this example?
let mut s = String::from("aaa");
s = String::from("bbb");
I'm guessing that the "aaa" String is dropped - it makes sense, as it is not used anymore. However, I cannot find anything confirming this in the docs. (for example, the Book only provides an explanation of what happens when we assign a new value using move).
I'm trying to wrap my head around the rules Rust uses to ensure memory safety, but I cannot find an explicit rule of what happens in such a situation.
Yes, assignment to a variable will drop the value that is being replaced.
Not dropping the replaced value would be a disaster - since Drop is frequently used for deallocation and other cleanup, if the value wasn't dropped, you'd end up with all sorts of leaks.
Move semantics are implicit here. The data in s is initialized by moving from the String produced by String::from("bbb"). The original data stored in s is dropped by side-effect (the process of replacing it leaves it with no owner, so it's dropped as part of the operation).
Per the destructor documentation (emphasis added):
When an initialized variable or temporary goes out of scope, its destructor is run, or it is dropped. Assignment also runs the destructor of its left-hand operand, if it's initialized. If a variable has been partially initialized, only its initialized fields are dropped.

Compare and swap with and without garbage collector

How does CAS works? How does it work with garbage collector? Where is the problem and how does it work without garbage collector?
I was reading a presentation about CAS and using it on "write rarely, read many" problem and there was said, that use of CAS is convenient while you can use garbage collector, but there is problem (not specified) while you can not use garbage collector.
Can you tell me something about this? If you can sum up principle of CAS at first, it would be appreciated.
Ok, so CAS is an atomic instruction, that is there is special hardware support for it.
Its main use is to not use locks at all when implementing your data structures and other operations, since using locks, if a thread takes a page fault, a cache miss or is being descheduled by the OS for instance the thread takes the lock with it and all the rest of the threads are blocked. This obviously yields serious performance issues.
CAS is the core of lock-free programming and here and here.
CAS basically is the following:
CAS(CURRENT_VALUE, OLD_VALUE, NEW_VALUE) <=>
if CURRENT_VALUE==OLD_VALUE then CURRENT_VALUE = NEW_VALUE
You have a variable (e.g. class variable) and you have no clue if it was modified or not by other threads in the time you read from it and you want to write to it.
CAS helps you here on the write part since this CAS is done atomically (in hardware) and no lock is being implemented there, thus even if your thread goes to sleep the rest of the threads can operate on your data structure.
The issue with CAS on non-GC systems is the ABA problem and an example is the following:
You have a single linked list: HEAD->A->X->Y->Z
Thread 1: let's read A: localA = A; localA_Value = A.Value (let's say 5)
Thread 2: let's delete A: HEAD->A->X->Y->Z
Thread 3: let's add a new node at start (the malloc will find the right spot right were old A was): HEAD->A'->X->Y->Z (A'.Value = 10)
Thread 1 resumes and wants to swap A with B: CAS(localA, A', B) => but this thread expects that if CAS passes the value of A to be 5; wrong: since CAS passes given that localA and A' have the same memory location but localA.Value!=A'.Value => thus the operation shouldn't be performed.
The thing is that in GC enabled systems this will never happen since localA holds a reference to that memory location and thus A' will never get allocated to that memory location.

Implementing reference counting in a stack-based approach in C

I am making an interpreter in C, and I'm having a problem with my reference counting.
Each value (which is the interpreter's representation... of a value) is allocated with refcount 0. Once it gets added to the stack, it increments the refcount.
The only way to get a value off the stack is to pop it off it, but that leads to problems. My popping function returns the value that is popped, but if the refcount is 0 and I destroy the value I can no longer return it.
I get that I should probably put the refcount check somewhere else, but that just seems ugly as there are a lot of places that use the popping function.
What can I do to workaround this issue? Is implementing a real GC algorithm necessary in this case?
I use my own data base system which also uses a kind of refcount.
When an object is stored into a data base, then its refcount is incremented. When I get an object from a data base, its refcount remains unchanged. It is decremented only if the object is deleted by any way (usually the deletion of a data base containing it or its replacement by another object in a data base containing it). The object is really destroyed only when its refcount is equal to zero AND its deletion is required.
whenever you create object or value in your case, you should set the refcount to 1. On pushing to the stack, increment it. On poping, decrement. On pop each operation decrement and check th refcount, destroy value if refcount is zero. Which function destoy-value already be doing so you just need to call that function on pop.
As a general rule, increment the count when creating a reference and decrement when deleting a reference. But there's also a third type of transaction (or an optimized composition of the two) where there's just a transfer and you don't change the count at all.
This is the case if you pop the value from the stack and them proceed to use the value (in a local variable, maybe). First the object was on the stack, and now its in a variable; but there's still only one object. The reference count doesn't change until you're done with it and ready to discard the reference.

Garbage collection / linked list

Will the garbage collector (in theory) collect a structure like this?
package main
type node struct {
next *node
prev *node
}
func (a *node) append(b *node) {
a.next = b
b.prev = a
}
func main() {
a := new(node)
b := new(node)
a.append(b)
b = nil
a = nil
}
This should be a linked list. a points to b, b points back to a. When I remove the reference in a and b (the last two lines) the two nodes are not accessible any more. But each node still has a reference. Will the go garbage collector remove these nodes nonetheless?
(Obviously not in the code above, but in a longer running program).
Is there any documentation on the garbage collector that handles these questions?
The set of garbage collector (GC) roots in your program is {a, b}. Setting all of them to nil makes all heap content eligible for collection, because now all of the existing nodes, even though they are referenced, are not reachable from any root.
The same principle guarantees also for example that structures with circular and/or self references get collected once they become not reachable.
The concern you describe is actually a real problem with a simple but little-used garbage collection scheme known as "reference counting." Essentially, exactly as you imply, the garbage collector (GC) counts how many references exist to a given object, and when that number reaches 0, it is GC'd. And, indeed, circular references will prevent a reference counting system from GC-ing that structure.
Instead what many modern GCs do (including Go; see this post) is a process known as mark-and-sweep. Essentially, all top-level references (pointers that you have in the scope of some function) are marked as "reachable," and then all things referenced from those references are marked as reachable, and so on, until all reachable objects have been marked. Then, anything which hasn't been marked is known to be unreachable, and is GC'd. Circular references aren't a problem because, if they aren't referenced from the top-level, they won't get marked.

Clojure mutable storage types

I'm attempting to learn Clojure from the API and documentation available on the site. I'm a bit unclear about mutable storage in Clojure and I want to make sure my understanding is correct. Please let me know if there are any ideas that I've gotten wrong.
Edit: I'm updating this as I receive comments on its correctness.
Disclaimer: All of this information is informal and potentially wrong. Do not use this post for gaining an understanding of how Clojure works.
Vars always contain a root binding and possibly a per-thread binding. They are comparable to regular variables in imperative languages and are not suited for sharing information between threads. (thanks Arthur Ulfeldt)
Refs are locations shared between threads that support atomic transactions that can change the state of any number of refs in a single transaction. Transactions are committed upon exiting sync expressions (dosync) and conflicts are resolved automatically with STM magic (rollbacks, queues, waits, etc.)
Agents are locations that enable information to be asynchronously shared between threads with minimal overhead by dispatching independent action functions to change the agent's state. Agents are returned immediately and are therefore non-blocking, although an agent's value isn't set until a dispatched function has completed.
Atoms are locations that can be synchronously shared between threads. They support safe manipulation between different threads.
Here's my friendly summary based on when to use these structures:
Vars are like regular old variables in imperative languages. (avoid when possible)
Atoms are like Vars but with thread-sharing safety that allows for immediate reading and safe setting. (thanks Martin)
An Agent is like an Atom but rather than blocking it spawns a new thread to calculate its value, only blocks if in the middle of changing a value, and can let other threads know that it's finished assigning.
Refs are shared locations that lock themselves in transactions. Instead of making the programmer decide what happens during race conditions for every piece of locked code, we just start up a transaction and let Clojure handle all the lock conditions between the refs in that transaction.
Also, a related concept is the function future. To me, it seems like a future object can be described as a synchronous Agent where the value can't be accessed at all until the calculation is completed. It can also be described as a non-blocking Atom. Are these accurate conceptions of future?
It sounds like you are really getting Clojure! good job :)
Vars have a "root binding" visible in all threads and each individual thread can change the value it sees with out affecting the other threads. If my understanding is correct a var cannot exist in just one thread with out a root binding that is visible to all and it cant be "rebound" until it has been defined with (def ... ) the first time.
Refs are committed at the end of the (dosync ... ) transaction that encloses the changes but only when the transaction was able to finish in a consistent state.
I think your conclusion about Atoms is wrong:
Atoms are like Vars but with thread-sharing safety that blocks until the value has changed
Atoms are changed with swap! or low-level with compare-and-set!. This never blocks anything. swap! works like a transaction with just one ref:
the old value is taken from the atom and stored thread-local
the function is applied to the old value to generate a new value
if this succeeds compare-and-set is called with old and new value; only if the value of the atom has not been changed by any other thread (still equals old value), the new value is written, otherwise the operation restarts at (1) until is succeeds eventually.
I've found two issues with your question.
You say:
If an agent is accessed while an action is occurring then the value isn't returned until the action has finished
http://clojure.org/agents says:
the state of an Agent is always immediately available for reading by any thread
I.e. you never have to wait to get the value of an agent (I assume the value changed by an action is proxied and changed atomically).
The code for the deref-method of an Agent looks like this (SVN revision 1382):
public Object deref() throws Exception{
if(errors != null)
{
throw new Exception("Agent has errors", (Exception) RT.first(errors));
}
return state;
}
No blocking is involved.
Also, I don't understand what you mean (in your Ref section) by
Transactions are committed on calls to deref
Transactions are committed when all actions of the dosync block have been completed, no exceptions have been thrown and nothing has caused the transaction to be retried. I think deref has nothing to do with it, but maybe I misunderstand your point.
Martin is right when he say that Atoms operation restarts at 1. until is succeeds eventually.
It is also called spin waiting.
While it is note really blocking on a lock the thread that did the operation is blocked until the operation succeeds so it is a blocking operation and not an asynchronously operation.
Also about Futures, Clojure 1.1 has added abstractions for promises and futures.
A promise is a synchronization construct that can be used to deliver a value from one thread to another. Until the value has been delivered, any attempt to dereference the promise will block.
(def a-promise (promise))
(deliver a-promise :fred)
Futures represent asynchronous computations. They are a way to get code to run in another thread, and obtain the result.
(def f (future (some-sexp)))
(deref f) ; blocks the thread that derefs f until value is available
Vars don't always have a root binding. It's legal to create a var without a binding using
(def x)
or
(declare x)
Attempting to evaluate x before it has a value will result in
Var user/x is unbound.
[Thrown class java.lang.IllegalStateException]

Resources