Memory Leak in Functional Programming - memory-leaks

Because in pure functional programming objects are all immutable, is it still possible to create memory leak?
By pure functional program, I meant there are no side effect. Of course, this is not realistic since every program has I/O. But, let's just ignore the I/O for now.

I think this is a matter of how we define a memory leak. A program that runs for a very long time (potentially endlessly, like a server) can have a memory usage that is bounded or that grows more and more the longer the program is running. In the latter case, one speaks usually of a memory leak.
It is easy to write a functional program that needs more and more memory the longer it runs. So in that sense, memory leaks are possible.

Related

Why is garbage collection necessary?

Suppose that an object on the heap goes out of scope. Why can't the program free the memory right after the scope ends? Or, if we have a pointer to an object that is replaced by the address to a new object, why can't the program deallocate the old one before assigning the new one? I'm guessing that it's faster not to free it immediately and instead have the freeing be done asynchronously at a later point in time, but I'm not really sure.
Why is garbage collection necessary?
It is not strictly necessary. Given enough time and effort you can always translate a program that depends on garbage collection to one that doesn't.
In general, garbage collection involves a trade-off.
On the one hand, garbage collection allows you to write an application without worrying about the details of memory allocation and deallocation. (And the pain of debugging crashes and memory leaks caused by getting the deallocation logic wrong.)
The downside of garbage collection is that you need more memory. A typical garbage collector is not efficient if it doesn't have plenty of spare space1.
By contrast, if you do manual memory management, you can code your application to free up heap objects as soon as they are no longer used. Furthermore, you don't get awkward "pauses" while the GC is doing its thing.
The downside of manual memory management is that you have to write the code that decides when to call free, and you have to get it correct. Furthermore, if you try to manage memory by reference counting:
you have the cost of incrementing and decrementing ref counts whenever pointers are assign or variables go out of scope,
you have to deal with cycles in your data structures, and
it is worse when your application is multi-threaded and you have to deal with memory caches, synchronization, etc.
For what it is worth, if you use a decent garbage collector and tune it appropriately (e.g. give it enough memory, etc) then the CPU costs of GC and manual storage management are comparable when you apply them to a large application.
Reference:
"The measured cost of conservative garbage collection" by Benjamin Zorn
1 - This is because the main cost of a modern collector is in traversing and dealing with the non-garbage objects. If there is not a lot of garbage because you are being miserly with the heap space, the GC does a lot of work for little return. See https://stackoverflow.com/a/2414621/139985 for an analysis.
It's more complicated, but
1) what if there is memory pressure before the scope is over? Scope is only a language notion, not related to reachability. So an object can be "freed" before it goes out of scope ( java GCs do that on regular basis). Also, if you free objects after each scope is done, you might be doing too little work too often
2) As far as the references go, you are not considering that the reference might have hierarchies and when you change one, there has to be code that traverses those. It might not be the right time to do it when that happens.
In general, there is nothing wrong with such a proposal that you describer, as a matter of fact this is almost exactly how Rust programming language works, from a high level point of view.

How can garbage collectors be faster than explicit memory deallocation?

I was reading this html generated, (may expire, Here is the original ps file.)
GC Myth 3: Garbage collectors are always slower than explicit memory deallocation.
GC Myth 4: Garbage collectors are always faster than explicit memory deallocation.
This was a big WTF for me. How would GC be faster then explicit memory deallocation? isnt it essentially calling a explicit memory deallocator when it frees the memory/make it for use again? so.... wtf.... what does it actually mean?
Very small objects & large sparse
heaps ==> GC is usually cheaper,
especially with threads
I still don't understand it. Its like saying C++ is faster then machine code (if you don't understand the wtf in this sentence please stop programming. Let the -1 begin). After a quick google one source suggested its faster when you have a lot of memory. What i am thinking is it means it doesn't bother will the free at all. Sure that can be fast and i have written a custom allocator that does that very thing, not free at all (void free(void*p){}) in ONE application that doesnt free any objects (it only frees at end when it terminates) and has the definition mostly in case of libs and something like stl. So... i am pretty sure this will be faster the GC as well. If i still want free-ing i guess i can use an allocator that uses a deque or its own implementation thats essentially
if (freeptr < someaddr) {
*freeptr=ptr;
++freeptr;
}
else
{
freestuff();
freeptr = freeptrroot;
}
which i am sure would be really fast. I sort of answered my question already. The case the GC collector is never called is the case it would be faster but... i am sure that is not what the document means as it mention two collectors in its test. i am sure the very same application would be slower if the GC collector is called even once no matter what GC used. If its known to never need free then an empty free body can be used like that one app i had.
Anyways, i post this question for further insight.
How would GC be faster then explicit memory deallocation?
GCs can pointer-bump allocate into a thread-local generation and then rely upon copying collection to handle the (relatively) uncommon case of evacuating the survivors. Traditional allocators like malloc often compete for global locks and search trees.
GCs can deallocate many dead blocks simultaneously by resetting the thread-local allocation buffer instead of calling free on each block in turn, i.e. O(1) instead of O(n).
By compacting old blocks so more of them fit into each cache line. The improved locality increases cache efficiency.
By taking advantage of extra static information such as immutable types.
By taking advantage of extra dynamic information such as the changing topology of the heap via the data recorded by the write barrier.
By making more efficient techniques tractable, e.g. by removing the headache of manual memory management from wait free algorithms.
By deferring deallocation to a more appropriate time or off-loading it to another core. (thanks to Andrew Hill for this idea!)
One approach to make GC faster then explicit deallocation is to deallocate implicitly :the heap is divided in partitions, and the VM switches between the partitions from time to time (when a partition gets too full for example). Live objects are copied to the new partition and all the dead objects are not deallocated - they are just left forgotten. So the deallocation itself ends up costing nothing. The additional benefit of this approach is that the heap defragmentation is a free bonus.Please note this is a very general description of the actual processes.
The trick is, that the underlying allocator for garbage collector can be much simpler than the explicit one and take some shortcuts that the explicit one can't.
If the collector is copying (java and .net and ocaml and haskell runtimes and many others actually use one), freeing is done in big blocks and allocating is just pointer increment and cost is payed per object surviving collection. So it's faster especially when there are many short-lived temporary objects, which is quite common in these languages.
Even for a non-copying collector (like the Boehm's one) the fact that objects are freed in batches saves a lot of work in combining the adjacent free chunks. So if the collection does not need to be run too often, it can easily be faster.
And, well, many standard library malloc/free implementations just suck. That's why there are projects like umem and libraries like glib have their own light-weight version.
A factor not yet mentioned is that when using manual memory allocation, even if object references are guaranteed not to form cycles, determining when the last entity to hold a reference has abandoned it can be expensive, typically requiring the use of reference counters, reference lists, or other means of tracking object usage. Such techniques aren't too bad on single-processor systems, where the cost of an atomic increment may be essentially the same as an ordinary one, but they scale very badly on multi-processor systems, where atomic-increment operations are comparatively expensive.

What's up with GTK+ windows having so many memory leaks according to Valgrind?

Whenever I load any GTK+-powered application in valgrind, it reports a lot of memory leaks. What's up with that? Is GTK+ buggy?
GTK+ and GLib don't free "allocate once" memory. They follow the paradigm that says it is unnecessary to free resources just before the process exits that will be freed by the system anyway (this mostly applies to memory). This is of course not quite debugging-friendly, but allows to speed up program termination a bit and simplify the code (this is C, even "trivial" tasks take lines to code).
So, "still reachable" memory is most likely just not freed intentionally, and not a leak. Or, of course, might be a bug. "Definitely lost" memory, however, is almost certainly a bug.
Also note that program's memory leaking bugs can look as if they are triggered by GTK+ itself. For instance, GTK+ might allocate an object to be dereferenced (and freed) later by the program, which fails to do so. Valgrind will show a stack trace deep in GTK+, despite the bug being in the program.

Why are memory leaks common?

Is it due to basic misunderstandings of how memory is dynamically allocated and deallocated on the programmer's part? Is it due to complacency?
No. It's due to the sheer amount of accounting it takes to keep track of every memory allocation. Who is responsible for allocating the memory? Who is responsible for freeing it? Ensuring that you use the same API to allocate and free the memory, etc... Ensuring you catch every possible program flow and clean up in every situation(for example, ensure you clean up after you catch an error or exception). The list goes on...
In a decent sized project, one can lose track of allocated resources.
Sometimes a function is written expecting an uninitialized data structure as input that it will then initialize. Someone passes in a data structure that already initialized, and thus the previously allocated memory is leaked.
Memory leaks are caused by basic misunderstandings the same sense every bug is. And I would be shocked to find out anyone writes bug free code the first time every time. Memory leaks just happen to be the kind of bug that rarely causes a crash or explicitly wrong behavior (other than using too much memory, of course), so unless memory leaks are explicitly tested for a developer will likely never know they are present. Given that changes in the codebase always add bugs, and memory leaks are virtually invisible, memory leaks expand as a program ages and expands in size.
Even in languages which have automatic memory management, memory can be leaked because of cyclical references, depending on the garbage collection algorithm used.
I think it is due to the pressures of working in job that requires dead-lines and upper management pushing the project to get it out the door. So you could imagine, with the testing, q&a, peer code reviews, in such pressurized environments, that memory leaks could slip through the net.
Since your question did not mention language, today, there's automatic memory management that takes care of the memory accounting/tracking to ensure no memory leaks occur, think Java/.NET, but a few can slip through the net. It would have been with the likes of C/C++ that uses the malloc/new functions, and invariably are harder to check, due to the sheer volume of memory being allocated.
Then again, tracking down those leaks can be hard to find which is throwing another curveball to this answer - is it that it works on the dev's machine that it doesn't show up, but when in production, the memory starts leaking like hell, is it the configuration, hardware, software configuration, or worse, the memory leak can appear at random situation that is unique to within the production environment, or is it the time/cost constraint that allowed the memory leaks to occur or is it that the memory profiling tools are cost prohibitive or lack of funding to help the dev team track down leaks...
All in all, each and everyone within the dev team, have their own responsibility to ensure the code works, and know the rules about memory management (for example, such as for every malloc there should be a free, for every new there should be a delete), but no blame should be accounted for the dev team themselves, neither is finger pointing at the management for 'piling on the pressure on the dev team' either.
At the end of day, it would be false economy to rely on just the dev team and place 'complacency' on their shoulders.
Hope this helps,
Best regards,
Tom.
Bugs.
Even without bugs, it can be impossible to know in advance which function should deallocate memory. It's easy enough if the code structure is essentially functional (the main function calls sub-functions, which process data then return a result), but it isn't trivial if several treads (or several different objects) share a piece of memory. Smart pointers can be used (in C++), but otherwise it's more or less impossible.
Leaks aren't the worst kind of bug. Their effect is generally just a cumulative degradation in performance (until you run out of memory), so they just aren't as high a priority.
Lack of structured scopes and clear ownership of allocated memory.

Why Is Garbage Collection So Important? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I don't understand garbage collection so good, then I want to know, why it's so important to a language and to the developer?
Many other answers have stated that garbage collection can help to prevent memory leaks but, surprisingly, nobody seems to have mentioned the most important benefit that GC facilitates memory safety. This means that most garbage collected environments completely abstract away the notion of memory locations (i.e. raw pointers) and, consequently, eliminate a major class of bugs.
For example, a common mistake with manual memory management is to accidentally free a value slightly too early and continue to use the memory at that location after it has been freed. This can be extremely difficult to debug because freed memory might not be reallocated and, consequently, seemingly valid operations can be performed on the freed memory that can only fail sporadically with corruption or memory access violations or segmentation faults later in the program's execution, often with no direct link back to the offending code. This class of bugs simply do not exist in most garbage collected environments.
Garbage Collection is a part of many modern languages that attempts to abstract the disposal and reallocation of memory with less direction intervention by the developer.
When you hear talk of "safe" objects, this usually refers to something whose memory can be automatically reallocated by the Garbage Collector after an object falls out of scope, or is explicitly disposed.
While you can write the same program without a garbage collector to help manage memory usage, abstracting this away lets the developer think about more high level things and deliver value to the end user more quickly and efficiently without having to necessarily concentrate as much on lower level portions of the program.
In essence the developer can say
Give me a new object
..and some time later when the object is no longer being used (falls out of scope) the developer does not have to remember to say
throw this object away
Developers are lazy (a good virtue) and sometimes forget things. When working with GC properly, it's okay to forget to take out the trash, the GC won't let it pile up and start to smell.
Garbage Collection is a form of automatic memory management. It is a special case of resource management, in which the limited resource being managed is memory.
Benefits for the programmer is that garbage collection frees the programmer from manually dealing with memory allocation and deallocation.
The bottom line is that garbage collection helps to prevent memory leaks. In .NET, for example, when nothing references an object, the resources used by the object are flagged to be garbage collected. In unmanaged languages, like C and C++, it was up to the developer to take care of cleaning up.
It's important to note, however, that garbage collection isn't perfect. Check out this article on a problem that occurred because the developers weren't aware of a large memory leak.
In many older and less strict languages deallocating memory was hard-coded into programs by the programmer; this of course will cause problems if not done correctly as the second you reference memory that hasn't been deallocated your program will break. To combat this garbage collection was created, to automatically deallocate memory that was no longer being used. The benefits of such a system is easy to see; programs become far more reliable, deallocating memory is effectively removed from the design process, debugging and testing times are far shorter and more.
Of course, you don't get something for nothing. What you lose is performance, and sometimes you'll notice irregular behaviour within your programs, although nowadays with more modern languages this rarely is the case. This is the reason many typical applications are written in Java, it's quick and simple to write without the trauma of chasing memory leaks and it does the job, it's perfect for the world of business and the performance costs are little with the speed of computers today. Obviously some industries need to manage their own memory within their programs (the Games industry) for performance reasons, which is why nearly all major games are written in C++. A lecturer once told me that if every software house was in the same area, with a bar in the middle you'd be able to tell the game developers apart from the rest because they'd be the ones drinking heavily long into the night.
Garbage collection is one of the features required to allow the automatic management of memory allocation. This is what allows you to allocate various objects, maybe introduce other variables referencing or containing these in a fashion or other, and yet never worry about disposing of the object (when it is effectively not in use anymore).
The garbage collection, specifically takes care of "cleaning up" the heap(s) where all these objects are found, by removing unused objects an repacking the others together.
You probably hear a lot about it, because this is a critical function, which happens asynchronously with the program and which, if not handled efficiently can produce some random performance lagging in the program, etc. etc. Nowadays, however the algorithms related to the memory management at-large and the GC (garbage collection) in particular are quite efficient.
Another reason why the GC is sometimes mentioned is in relation to the destructor of some particular object. Since the application has no (or little) control over when particular objects are Garbage-Collected (hence destroyed), it may be an issue if an object waits till its destructor to dispose of some resource and such. That is why many objects implement a Dispose() method, which allow much of that clean-up (of the object itself) to be performed explicitly, rather than be postponed till the destructor is eventually called from the GC logic.
Automatic garbage collection, like java, reuses memory leaks or memory that is no longer being used, making your program efficient. In c++, you have to control the memory yourself, and if you lose access to a memory, then that memory can no longer be used, wasting space.
This is what I know so far from one year of computer science and using/learning java and c++.
Because someone can write code like
consume(produce())
without caring about cleanup. Just like in our current society.

Resources