I was recently talking to somebody who said he did program Fortran(from way back), but he could not tell me if Fortran had a garbage collector. He told me he did not use malloc or free in Fortran, so my assumption is that it does have a garbage collector? Or does fortran not have a garbage collector and just leak memory, which will get reclaimed by the operating system when the program ends? I do not know anything about Fortran, except that it was used way back. I also tried a quick Google search, but could not find anything that quickly.
Modern Fortran has many ways of declaring variables. Items simply declared will exist for the duration of the scope of the entity. So "real, dimension (N) :: array" declared in a procedure will automatically disappear when that procedure returns. Naturally variables declared in the main program or module variables or common (outmoded) will persist for the duration of the program.
Variables can be dynamically allocated with "allocate" (to do so, they have to be declared with the allocatable attribute). Since Fortran 95 allocatable variables that are local to a procedure are automatically deallocated when the procedure returns! They will not leak memory! (Some programmers might consider it good practice to explicitly deallocate the variables anyway, even though it isn't strictly necessary.) (Of course, you can waste memory in the sense of not explicitly deallocating a variable that you know that you don't need anymore.)
It is possible to leak memory with pointers. You can allocate memory with a pointer, then assign the pointer to to another variable, losing the previous association. If you didn't deallocate that memory you have a leak. The need for pointers is less in Fortran than in some other languages ... many things can be done with allocatable variables, which are safer -- no memory leaks.
Related questions: Fortran allocatable array lifetime and ALLOCATABLE arrays or POINTER arrays?
No, Fortran does not have a garbage collector. However there is an add-on package for F90 to this extent. No, I have not used it.
Related
I have read that Rust's compiler "inserts" memory management code during compile time, and this sounds kind of like "compile-time garbage collection".
What is the difference between these two ideas?
I've seen What does Rust have instead of a garbage collector? but that is about runtime garbage collection, not compile-time.
Compile-time garbage collection is commonly defined as follows:
A complementary form of automatic memory management is compile-time memory management (CTGC), where the decisions for memory management are taken at compile-time instead of at run-time. The compiler determines the life-time of the variables that are created during the execution of the program, and thus also the memory that will be associated with these variables. Whenever the compiler can guarantee that a variable, or more precisely, parts of the memory resources that this variable points to at run-time, will never ever be accessed beyond a certain program instruction, then the compiler can add instructions to deallocate these resources at that particular instruction without compromising the correctness of the resulting code.
(From Compile-Time Garbage Collection for the Declarative Language Mercury by Nancy Mazur)
Rust handles memory by using a concept of ownership and borrow checking. Ownership and move semantics describe which variable owns a value. Borrowing describes which references are allowed to access a value. These two concepts allow the compiler to "drop" the value when it is no longer accessible, causing the program to call the dtop method from the Drop trait).
However, the compiler itself doesn't handle dynamically allocated memory at all. It only handles drop checking (figuring out when to call drop) and inserting the .drop() calls. The drop implementation is responsible for determining what happens at this point, whether that is deallocating some dynamic memory (which is what Box's drop does, for example), or doing anything else. The compiler therefore never really enforces garbage collection, and it doesn't enforce deallocating unused memory. So we can't claim that Rust implements compile-time garbage collection, even if what Rust has is very reminiscent of it.
Suppose that an object on the heap goes out of scope. Why can't the program free the memory right after the scope ends? Or, if we have a pointer to an object that is replaced by the address to a new object, why can't the program deallocate the old one before assigning the new one? I'm guessing that it's faster not to free it immediately and instead have the freeing be done asynchronously at a later point in time, but I'm not really sure.
Why is garbage collection necessary?
It is not strictly necessary. Given enough time and effort you can always translate a program that depends on garbage collection to one that doesn't.
In general, garbage collection involves a trade-off.
On the one hand, garbage collection allows you to write an application without worrying about the details of memory allocation and deallocation. (And the pain of debugging crashes and memory leaks caused by getting the deallocation logic wrong.)
The downside of garbage collection is that you need more memory. A typical garbage collector is not efficient if it doesn't have plenty of spare space1.
By contrast, if you do manual memory management, you can code your application to free up heap objects as soon as they are no longer used. Furthermore, you don't get awkward "pauses" while the GC is doing its thing.
The downside of manual memory management is that you have to write the code that decides when to call free, and you have to get it correct. Furthermore, if you try to manage memory by reference counting:
you have the cost of incrementing and decrementing ref counts whenever pointers are assign or variables go out of scope,
you have to deal with cycles in your data structures, and
it is worse when your application is multi-threaded and you have to deal with memory caches, synchronization, etc.
For what it is worth, if you use a decent garbage collector and tune it appropriately (e.g. give it enough memory, etc) then the CPU costs of GC and manual storage management are comparable when you apply them to a large application.
Reference:
"The measured cost of conservative garbage collection" by Benjamin Zorn
1 - This is because the main cost of a modern collector is in traversing and dealing with the non-garbage objects. If there is not a lot of garbage because you are being miserly with the heap space, the GC does a lot of work for little return. See https://stackoverflow.com/a/2414621/139985 for an analysis.
It's more complicated, but
1) what if there is memory pressure before the scope is over? Scope is only a language notion, not related to reachability. So an object can be "freed" before it goes out of scope ( java GCs do that on regular basis). Also, if you free objects after each scope is done, you might be doing too little work too often
2) As far as the references go, you are not considering that the reference might have hierarchies and when you change one, there has to be code that traverses those. It might not be the right time to do it when that happens.
In general, there is nothing wrong with such a proposal that you describer, as a matter of fact this is almost exactly how Rust programming language works, from a high level point of view.
Assume you have a function in D that is pure and nothrow and by its return type and argument types cannot pass out any newly allocated memory. Can I add the #nogc attribute to this function then? If not, are there chances that this will be possible in the future?
My point here is the following: Since the function does not have any visible side effects, all memory that was allocated on the way can be freed deterministically at function exit. Hence, the GC is not really required, since the mark and sweep step can be avoided. Or can it not?
You can always try adding #nogc and compiling. A pure function may still allocate internal buffers, even if it doesn't return any of them, so the question of garbage collection is on a different axis than purity.
If it passes compilation with #nogc, it will not allocate (and thus not collect, the D GC will only ever collect when you ask it to allocate) regardless of purity.
https://dlang.org/spec/attribute.html#nogc
... means that that function does not allocate memory on the GC heap,
either directly such as with NewExpression or indirectly through
functions it may call, or through language features such as array
concatenation and dynamic closures.
It doesn't tell anything about GC state after function execution or overall memory usage increase. Only thing that matters is that function itself is guaranteed to never call any GC allocation functions, i.e. you could reliably build and run such function with a custom runtime that doesn't have GC implementation linked in at all.
Another important point is that #nogc can't be affected by optimizations because same valid code must keep compiling with different optimization levels and different compilers. Any such optimizations would need to become mandatory in language specification before #nogc can take use of it.
With all that in mind your described function could get valid #nogc annotation only when both following conditions apply:
it doesn't actually make any GC calls at all, it is completely optimized out
such optimization is mandatory and is guaranteed to always happen in such cases for all compliant compilers by language specification
I see it as extremely unlikely.
If I have a garbage collector that tracks every object allocated and deallocates them as soon as they no longer have usable references to them can you still have a memory leak?
Considering a memory leak is allocations without any reference isn't that impossible or am I missing something?
Edit: So what I'm counting as a memory leak is allocations which you no longer have any reference to in the code. Large numbers of accumulating allocations which you still have references to aren't the leaks I'm considering here.
I'm also only talking about normal state of the art G.C., It's been a while but I know cases like cyclical references don't trip them up. I don't need a specific answer for any language, this is just coming from a conversation I was having with a friend. We were talking about Actionscript and Java but I don't care for answers specific to those.
Edit2: From the sounds of it, there doesn't seem to be any reason code can completely lose the ability to reference an allocation and not have a GC be able to pick it up, but I'm still waiting for more to weigh in.
If your question is really this:
Considering a memory leak is allocations without any reference isn't
that impossible or am I missing something?
Then the answer is "yes, that's impossible" because a properly implemented garbage collector will reclaim all allocations that don't have active references.
However, you can definitely have a "memory leak" in (for example) Java. My definition of a "memory leak" is an allocation that still has an active reference (so that it won't be reclaimed by the garbage collector) but the programmer doesn't know that the object isn't reclaimable (ie: for the programmer, this object is dead and should be reclaimed). A simple example is something like this:
ObjectA -> ObjectB
In this example, ObjectA is an object in active use in the code. However, ObjectA contains a reference to ObjectB that is effectively dead (ie: ObjectB has been allocated and used and is now, from the programmer's perspective, dead) but the programmer forgot to set the reference in ObjectA to null. In this case, ObjectB has been "leaked".
Doesn't sound like a big problem, but there are situations where these leaks are cumulative. Let's imagine that ObjectA and ObjectB are actually instances of the same class. And this problem that the programmer forgot to set the reference to null happens every time such an instance is used. Eventually you end up with something like this:
ObjectA -> ObjectB -> ObjectC -> ObjectD -> ObjectE -> ObjectF -> ObjectG -> ObjectH -> etc...
Now ObjectB through ObjectH are all leaked. And problems like this will (eventually) cause your program to crash. Even with a properly implemented garbage collector.
To decide whether a program has a memory leak, one must first define what a leak is. I would define a program as having a memory leak if there exists some state S and series of inputs I such that:
If the program is in state `S` and it receives inputs `I`, it will still be in state `S` (if it doesn't crash), but...
The amount of memory required to repeat the above sequence `N` times will increase without bound.
It is definitely possible for programs that run entirely within garbage-collected frameworks to have memory leaks as defined above. A common way in which that can occur is with event subscriptions.
Suppose a thread-safe collection exposes a CollectionModified event, and the IEnumerator<T> returned by its IEnumerable<T>.GetEnumerator() method subscribes to that event on creation, and unsubscribes on Dispose; the event is used to allow enumeration to proceed sensibly even when the collection is modified (e.g. ensuring that objects are in the collection continuously throughout the enumeration will be returned exactly once; those that exist during part of it will be returned no more than once). Now suppose a long-lived instance of that collection class is created, and some particular input will cause it to be enumerated. If the CollectionModified event holds a strong reference to every non-disposed IEnumerator<T>, then repeatedly enumerating the collection will create and subscribe an unbounded number of enumerator objects. Memory leak.
Memory leaks don't just depend how efficient a garbage collection algorithm is, if your program holds on to object references which have long life time, say in an instance variable or static variable without being used, your program will have memory leaks.
Reference count have a known problem of cyclic refernces meaning
Object 1 refers to Object 2 and Object 2 refers to Object 1
but no one else refers to object 1 or Object 2, reference count algorithm will fail in this scenario.
Since you are working on garbage collector itself, its worth reading about different implementation strategies.
You can have memory leaks with a GC in another way: if you use a conservative garbage collector that naively scans the memory and for everything that looks like a pointer, doesn't free the memory it "points to", you may leave unreachable memory allocated.
Imagine that you call from a language with GC repetitively a function from another language (e.g., Fortran 95). The Fortran function leaves something allocated in the memory between calls which might be seen from the caller language as unreferenced rubbish.
Could GC from the caller language access the memory allocated in Fortran and consider it as rubbish and free it?
I guess that it won't happen. The memory allocated by the Fortran function should have its own memory management separated from the memory managed by GC, however, I would be happy if anyone could confirm that.
Why do I need it? (if anyone is interested)
As described above, I need to write a function in F95 which allocates its own memory, is called several times and it needs to keep the reference to the allocated memory between calls. The problem is that Fortran pointers are incompatibile with the outside world so I can't just pass something as 'void *' from Fortran. Therefore the Fortran function would store the pointer not as a pointer but would cast it (for example) as an integer array for the outside world. However, if GC could anyhow interfere memory from Fortran, it might not understand that the reference is kept in the integer array and might want to free the memory allocated in Fortran, which would be bad.
No, unless the language explicitely integrated with the host langauge (using the garbage collector). In .NET... a C++ application can use C++/CLI to allocate .NET objects and return those - and those naturally are garbage collected. I do that in a number of projects.
But a pure C++ object... the garbage colelctor knows nothing about and does not know how to handle.
There probably isn't a single answer to this question that's guaranteed to be correct. As a rule, however, a garbage collector will be associated with some sort of heap allocator, and can/will only collect memory within the heap it controls. Since your Fortran function will (presumably) allocate its memory completely separately, it probably won't be affected by the garbage collector.
Without knowing exactly what garbage collector you're talking about, it's probably not possible to say with certainty though.