I am considering using d for my ongoing graphics engine. The one thing that turns me down is the GC.
I am still a young programmer and I probably have a lot of misconceptions about GC's and I hope you can clarify some concerns.
I am aiming for low latency and timing in general is crucial. From what I know is that GC's are pretty unpredictable, for example my application could render a frame every 16.6ms and when to GC's kicks in it could go up to any number like 30ms because it is not deterministic right?
I read that you can turn down the GC in D, but then you can't use the majority of D's standard library and the GC is not completely off. is this true?
Do you think it makes sense to use D in a timing crucial application?
Short answer: it requires lot of customization and can be really difficult if you are not an experienced D developer.
List of issues:
Memory management itself is not that big problem. In real-time applications you never ever want to allocate memory in a main loop. Having pre-allocated memory pools for all main data is pretty much de-facto standard way to do such applications. In that sense, D is not different - you still call C malloc directly to get some heap for your pools and this memory won't be managed by a GC, it won't even know about it.
However, certain language features and large parts of Phobos do use GC automagically. For example, you can't really concatenate slices without some form of automatically managed allocation. And Phobos simply has not had a strong policy about this for quite a long time.
Few language-triggered allocations won't be a problem on their own as most memory used is managed via pools anyway. However, there is a killer issue for real-time software in stock D : default D garbage collector is stop-the-world. Even if there is almost no garbage your whole program will hit a latency spike when collection cycle is ran, as all threads get blocked.
What can be done:
1) Use GC.disable(); to switch off collection cycles. It will solve stop-the-world issue but now your program will start to leak memory in some cases, as GC-based allocations still work.
2) Dump hidden GC allocations. There was a pull request for -vgc switch which I can't find right now, but in its absence you can compile your own druntime version that prints backtrace upon gc_malloc() call. You may want to run this as part of automatic test suite.
3) Avoid Phobos entirely and use something like https://bitbucket.org/timosi/minlibd as an alternative.
Doing all this should be enough to target soft real-time requirements typical for game dev, but as you can see it is not simple at all and requires stepping out of stock D distribution.
Future alternative:
Once Leandro Lucarella ports his concurrent garbage collector to D2 (which is planned, but not scheduled), situation will become much more simple. Small amount of GC-managed memory + concurrent implementation will allow to meet soft real-time requirements even without disabling GC. Even Phobos can be used after it is stripped from most annoying allocations. But I don't think it will happen any soon.
But what about hard real-time?
You better not even try. But that is yet another story to tell.
If you do not like GC - disable it.
Here is how:
import core.memory;
void main(string[] args) {
GC.disable;
// your code here
}
Naturally, then you will have to do the memory manage yourself. It is doable, and there are several articles about it. It has been discussed here too, I just do not remember the thread.
dlang.org also has useful information about this. This article, http://dlang.org/memory.html , touches the topic of real-time programming and you should read it.
Yet another good article: http://3d.benjamin-thaut.de/?p=20 .
Related
I'm thinking about learning D (basically "C++ done right and with garbage collection and message passing between threads") and talked to a colleague who's been long-time C++ programmer and basically he complained that the garbage collector as such has severe timing issues even in soft realtime type applications.
It's not that I need to write realtime app - far from it - but I'm curious how problematic GC would be in developing, say, database? (abstracting from additional memory usage overhead that GC seems to impose, statistically)
(now I know that GC can be turned off in D but that's like saying you can get rid of problems related to a car by getting rid of a car - true but that's not the solution I'd like to choose)
Is this true? How severe are such issues in practice? Is developing, say, a device driver in D and with use of GC is practical/sensible/good practice?
While D has a GC, it does not force you to use it for everything. D also has structs, which act like C++ classes&structs(minus the polymorphism).
In modern managed languages, the GC is not a problem as long as you have enough memory. This is also true for unmanaged languages like C++ - but in C++, running out of memory means you can't allocate any more memory, while in Java running out of memory means a delay while the GC kicks in.
So, if you are planning to allocate tons of objects then yes - the GC can be a problem. But you probably don't really need to allocate so many objects. In Java, you have to use objects to store things like strings and dates and coordinates - and that can really fill up your heap and invoke the GC(luckily, modern JVM use generational GC to optimize those types of objects). In D, you'll just use structs for these things, and only use classes for cases that actually require GC.
As a rule of thumb, you'll usually want to use structs wherever you can, but if you find yourself doing something special to take care of deallocating or to prevent copying&destructing(though it's really fast in D) - make that type a class without a second thought.
I personally don't really approve of the statement "as long as you have enough memory a GC is not a problem". I mean, that basically means, you goahead and waste your memory instead of properly taking care of it and when it's out you suddenly have to wait 1> second for the GC to collect everything.
For one thing, that only happens if it's a really bad GC. The GC in c# for example collects objects extremly fast and often. You won't get a problem, even if you allocate in an often used function and it won't wait till you run out of memory to do a collection.
I am not fully up to date on the current features of the D2 GC (we use D1) but the behavior at the time was that it would allocate a pool of memory and for each of your allocations it would give you some of it. When it has given out 90% and you need more it would start a collection and/or allocate more from the system. (or something like that). There is (for D1) also the concurrent GC which would start collections earlier, but having them run in the background, but it is linux-only as it uses the fork syscall.
So, I think the current D GC can cause small but noticable freezes if not used with care. But you can disable/enable it, e.g. when you do something real-time critical, disable it, when that critical part of the code is over, enable it again.
For a database, I don't think the D GC is ready yet. I would heavily re-use memory and not rely on the GC at all for that kind of application.
As I know, the tracing GC can't avoid thread blocking during complete GC.
I had used XNA+C#, and GC time were impossible to remove. So I switched to lower level language C, but I realized I need scripting language. I'm considering Lua, but I'm worrying about Lua's GC mechanism. Lua is using incremental tracing GC, and thread blocking should be too.
So how should I handle this in realtime game?
The power of Lua is that it gets out of your way. Want classes? That can be build with metatables. Want sandboxing? use lua_setfenv.
As for the garbage collector. Use it as is first. If later you find performance issues use lua_gc to fine tune its behavior.
Some examples:
Disable the garbage collector during those times when the slow down would be a problem.
Leave the garbage collector disabled and only step it when game logic says you've got some head room on your FPS count. You could pre-tune the step size, or discover the optimal step size at runtime.
Disable the collector and perform full collection at stopping points, ie a load screen or cut scene or at turn change in a hot seat game.
You might also consider an alternative scripting language. Squirrel tries very hard to be a second generation Lua. It tries to keep all of Lua's good features, while ditching any of its design mistakes. One of the big differences between the two is squirrel uses reference counting instead of garbage collection. It turns out that reference counting can be a bit slower than garbage collection but it is very deterministic(AKA realtime).
The correct way to handle this is:
Write a small prototype with just the core things that you want to test.
Profile it a lot, reproducing the different scenarios that could happen in your game (lots of memory available, little memory available, different numbers of threads, that kind of thing)
If you don't find a visible bottleneck, you can use Lua. Otherwise, you will have to look for alternative solutions (maybe Lisp or Javascript)
You can patch the Lua GC so as to time-limit each collection cycle. E.g: http://www.altdevblogaday.com/2011/07/23/predictable-garbage-collection-with-lua/
I believe that it is still possible to have long GC step times when collecting very large tables. Therefore you need to adopt a programming style that avoids large tables.
The following article discusses two strategies for using Lua for real-time robot control (1. don't generate garbage, or 2. using an O(1) allocator and tune when GC collection is run):
https://www.osadl.org/?id=1117
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Most of the modern languages have built in garbage collection (GC). e.g. Java, .NET languages, Ruby, etc. Indeed GC simplifies application development in many ways.
I am interested to know the limitations / disadvantages of writing applications in GCed languages. Assuming GC implemenation is optimal, I am just wondering we may be limited by GC to take some optimization decisions.
The main disadvantages to using a garbage collector, in my opinion, are:
Non-deterministic cleanup of resources. Sometimes, it is handy to say "I'm done with this, and I want it cleaned NOW". With a GC, this typically means forcing the GC to cleanup everything, or just wait until it's ready - both of which take away some control from you as a developer.
Potential performance issues which arise from non-deterministic operation of the GC. When the GC collects, it's common to see (small) hangs, etc. This can be particularly problematic for things such as real-time simulations or games.
Take it from a C programmer ... it is about cost/benefit and appropriate use
The garbage collection algorithms such as tri-color/mark-and-sweep there is often significant latency between a resource being 'lost' and the physical resource being freed. In some runtimes the GC will actually pause execution of the program to perform garbage collection.
Being a long time C programmer, I can tell you:
a) Manual free() garbage collection is hard -- This is because there is usually a greater error rate in human placement of free() calls than GC algorithms.
b) Manual free() garbage collection costs time -- Does the time spent debugging outweigh the millisecond pauses of a GC? It may be beneficial to use garbage collection if you are writing a game than say an embedded kernel.
But, when you can't afford the runtime disadvantage (right resources, real-time constraints) then performing manual resource allocation is probably better. It may take time but can be 100% efficient.
Try and imagine an OS kernel written in Java? or on the .NET runtime with GC ... Just look at how much memory the JVM accumulates when running simple programs. I am aware that projects exist like this ... they just make me feel a bit sick.
Just bear in mind, my linux box does much the same things today with 3GB RAM than it did when it had 512MB ram years ago. The only difference is I have mono/jvm/firefox etc running.
The business case for GC is clear, but it still makes me uncomfortable alot of the time.
Good books:
Dragon book (recent edition), Modern Compiler Implementation in C
For .NET, there are two disadvantages that I can see.
1) People assume that the GC knows best, but that's not always the case. If you make certain types of allocations, you can cause yourself to experience some really nasty program deaths without direct invokation of the GC.
2) Objects larger than 85k go onto the LOH, or Large Object Heap. That heap is currently NEVER compacted, so again, your program can experience out-of-memory exceptions when really the LOH is not compacted enough for you to make another allocation.
Both of these bugs are shown in code that I posted in this question:
How do I get .NET to garbage collect aggressively?
I am interested to know the limitations / disadvantages of writing applications in GCed languages. Assuming GC implemenation is optimal, I am just wondering we may be limited by GC to take some optimization decisions.
My belief is that automatic memory management imposes a glass ceiling on efficiency but I have no evidence to substantiate that. In particular, today's GC algorithms offer only high throughput or low latency but not both simultaneously. Production systems like .NET and the HotSpot JVM incur significant pauses precisely because they are optimized for throughput. Specialized GC algorithms like Staccato offer much lower latency but at the cost of much lower minimum mutator utilisation and, therefore, low throughput.
If you are confident(good) about your memory management skills, there is no advantage.
The concept was introduced to minimize the time of development and due to lack of experts in programming who thoroughly understood memory.
The biggest problem when it comes to performance (especially on or real-time systems) is, that your program may experience some unexpected delays when GC kicks in. However, modern GC try to avoid this and can be tuned for real time purposes.
Another obvious thing is that you cannot manage your memory by yourself (for instance, allocate on numa local memory), which you may need to do when you implement low-level software.
It is almost impossible to make a non-GC memory manager work in a multi-CPU environment without requiring a lock to be acquired and released every time memory is allocated or freed. Each lock acquisition or release will require a CPU to coordinate its actions with other CPUs, and such coordination tends to be rather expensive. A garbage-collection-based system can allow many memory allocations to occur without requiring any locks or other inter-CPU coordination. This is a major advantage. The disadvantage is that many steps in garbage collection require that the CPU's coordinate their actions, and getting good performance generally requires that such steps be consolidated to a significant degree (there's not much benefit to eliminating the requirement of CPU coordination on each memory allocation if the CPUs have to coordinate before each step of garbage collection). Such consolidation will often cause all tasks in the system to pause for varying lengths of time during collection; in general, the longer the pauses one is willing to accept, the less total time will be needed for collection.
If processors were to return to a descriptor-based handle/pointer system (similar to what the 80286 used, though nowadays one wouldn't use 16-bit segments anymore), it would be possible for garbage collection to be done concurrently with other operations (if a handle was being used when the GC wanted to move it, the task using the handle would have to be frozen while the data was copied from its old address to its new one, but that shouldn't take long). Not sure if that will ever happen, though (Incidentally, if I had my druthers, an object reference would be 32 bits, and a pointer would be an object reference plus a 32-bit offset; I think it will be awhile before there's a need for over 2 billion objects, or for any object over 4 gigs. Despite Moore's Law, if an application would have over 2 billion objects, its performance would likely be improved by using fewer, larger, objects. If an application would need an object over 4 gigs, its performance would likely be improved by using more, smaller, objects.)
Typically, garbage collection has certain disadvantages:
Garbage collection consumes computing resources in deciding what memory is to be freed, reconstructing facts that may have been known to the programmer. The penalty for the convenience of not annotating object lifetime manually in the source code is overhead, often leading to decreased or uneven performance. Interaction with memory hierarchy effects can make this overhead intolerable in circumstances that are hard to predict or to detect in routine testing.
The point when the garbage is actually collected can be unpredictable, resulting in stalls scattered throughout a session. Unpredictable stalls can be unacceptable in real-time environments such as device drivers, in transaction processing, or in interactive programs.
Memory may leak despite the presence of a garbage collector, if references to unused objects are not themselves manually disposed of. This is described as a logical memory leak.[3] For example, recursive algorithms normally delay release of stack objects until after the final call has completed. Caching and memoizing, common optimization techniques, commonly lead to such logical leaks. The belief that garbage collection eliminates all leaks leads many programmers not to guard against creating such leaks.
In virtual memory environments typical of modern desktop computers, it can be difficult for the garbage collector to notice when collection is needed, resulting in large amounts of accumulated garbage, a long, disruptive collection phase, and other programs' data swapped out.
Perhaps the most significant problem is that programs that rely on garbage collectors often exhibit poor locality (interacting badly with cache and virtual memory systems), occupy more address space than the program actually uses at any one time, and touch otherwise idle pages. These may combine in a phenomenon called thrashing, in which a program spends more time copying data between various grades of storage than performing useful work. They may make it impossible for a programmer to reason about the performance effects of design choices, making performance tuning difficult. They can lead garbage-collecting programs to interfere with other programs competing for resources
Is it due to basic misunderstandings of how memory is dynamically allocated and deallocated on the programmer's part? Is it due to complacency?
No. It's due to the sheer amount of accounting it takes to keep track of every memory allocation. Who is responsible for allocating the memory? Who is responsible for freeing it? Ensuring that you use the same API to allocate and free the memory, etc... Ensuring you catch every possible program flow and clean up in every situation(for example, ensure you clean up after you catch an error or exception). The list goes on...
In a decent sized project, one can lose track of allocated resources.
Sometimes a function is written expecting an uninitialized data structure as input that it will then initialize. Someone passes in a data structure that already initialized, and thus the previously allocated memory is leaked.
Memory leaks are caused by basic misunderstandings the same sense every bug is. And I would be shocked to find out anyone writes bug free code the first time every time. Memory leaks just happen to be the kind of bug that rarely causes a crash or explicitly wrong behavior (other than using too much memory, of course), so unless memory leaks are explicitly tested for a developer will likely never know they are present. Given that changes in the codebase always add bugs, and memory leaks are virtually invisible, memory leaks expand as a program ages and expands in size.
Even in languages which have automatic memory management, memory can be leaked because of cyclical references, depending on the garbage collection algorithm used.
I think it is due to the pressures of working in job that requires dead-lines and upper management pushing the project to get it out the door. So you could imagine, with the testing, q&a, peer code reviews, in such pressurized environments, that memory leaks could slip through the net.
Since your question did not mention language, today, there's automatic memory management that takes care of the memory accounting/tracking to ensure no memory leaks occur, think Java/.NET, but a few can slip through the net. It would have been with the likes of C/C++ that uses the malloc/new functions, and invariably are harder to check, due to the sheer volume of memory being allocated.
Then again, tracking down those leaks can be hard to find which is throwing another curveball to this answer - is it that it works on the dev's machine that it doesn't show up, but when in production, the memory starts leaking like hell, is it the configuration, hardware, software configuration, or worse, the memory leak can appear at random situation that is unique to within the production environment, or is it the time/cost constraint that allowed the memory leaks to occur or is it that the memory profiling tools are cost prohibitive or lack of funding to help the dev team track down leaks...
All in all, each and everyone within the dev team, have their own responsibility to ensure the code works, and know the rules about memory management (for example, such as for every malloc there should be a free, for every new there should be a delete), but no blame should be accounted for the dev team themselves, neither is finger pointing at the management for 'piling on the pressure on the dev team' either.
At the end of day, it would be false economy to rely on just the dev team and place 'complacency' on their shoulders.
Hope this helps,
Best regards,
Tom.
Bugs.
Even without bugs, it can be impossible to know in advance which function should deallocate memory. It's easy enough if the code structure is essentially functional (the main function calls sub-functions, which process data then return a result), but it isn't trivial if several treads (or several different objects) share a piece of memory. Smart pointers can be used (in C++), but otherwise it's more or less impossible.
Leaks aren't the worst kind of bug. Their effect is generally just a cumulative degradation in performance (until you run out of memory), so they just aren't as high a priority.
Lack of structured scopes and clear ownership of allocated memory.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I don't understand garbage collection so good, then I want to know, why it's so important to a language and to the developer?
Many other answers have stated that garbage collection can help to prevent memory leaks but, surprisingly, nobody seems to have mentioned the most important benefit that GC facilitates memory safety. This means that most garbage collected environments completely abstract away the notion of memory locations (i.e. raw pointers) and, consequently, eliminate a major class of bugs.
For example, a common mistake with manual memory management is to accidentally free a value slightly too early and continue to use the memory at that location after it has been freed. This can be extremely difficult to debug because freed memory might not be reallocated and, consequently, seemingly valid operations can be performed on the freed memory that can only fail sporadically with corruption or memory access violations or segmentation faults later in the program's execution, often with no direct link back to the offending code. This class of bugs simply do not exist in most garbage collected environments.
Garbage Collection is a part of many modern languages that attempts to abstract the disposal and reallocation of memory with less direction intervention by the developer.
When you hear talk of "safe" objects, this usually refers to something whose memory can be automatically reallocated by the Garbage Collector after an object falls out of scope, or is explicitly disposed.
While you can write the same program without a garbage collector to help manage memory usage, abstracting this away lets the developer think about more high level things and deliver value to the end user more quickly and efficiently without having to necessarily concentrate as much on lower level portions of the program.
In essence the developer can say
Give me a new object
..and some time later when the object is no longer being used (falls out of scope) the developer does not have to remember to say
throw this object away
Developers are lazy (a good virtue) and sometimes forget things. When working with GC properly, it's okay to forget to take out the trash, the GC won't let it pile up and start to smell.
Garbage Collection is a form of automatic memory management. It is a special case of resource management, in which the limited resource being managed is memory.
Benefits for the programmer is that garbage collection frees the programmer from manually dealing with memory allocation and deallocation.
The bottom line is that garbage collection helps to prevent memory leaks. In .NET, for example, when nothing references an object, the resources used by the object are flagged to be garbage collected. In unmanaged languages, like C and C++, it was up to the developer to take care of cleaning up.
It's important to note, however, that garbage collection isn't perfect. Check out this article on a problem that occurred because the developers weren't aware of a large memory leak.
In many older and less strict languages deallocating memory was hard-coded into programs by the programmer; this of course will cause problems if not done correctly as the second you reference memory that hasn't been deallocated your program will break. To combat this garbage collection was created, to automatically deallocate memory that was no longer being used. The benefits of such a system is easy to see; programs become far more reliable, deallocating memory is effectively removed from the design process, debugging and testing times are far shorter and more.
Of course, you don't get something for nothing. What you lose is performance, and sometimes you'll notice irregular behaviour within your programs, although nowadays with more modern languages this rarely is the case. This is the reason many typical applications are written in Java, it's quick and simple to write without the trauma of chasing memory leaks and it does the job, it's perfect for the world of business and the performance costs are little with the speed of computers today. Obviously some industries need to manage their own memory within their programs (the Games industry) for performance reasons, which is why nearly all major games are written in C++. A lecturer once told me that if every software house was in the same area, with a bar in the middle you'd be able to tell the game developers apart from the rest because they'd be the ones drinking heavily long into the night.
Garbage collection is one of the features required to allow the automatic management of memory allocation. This is what allows you to allocate various objects, maybe introduce other variables referencing or containing these in a fashion or other, and yet never worry about disposing of the object (when it is effectively not in use anymore).
The garbage collection, specifically takes care of "cleaning up" the heap(s) where all these objects are found, by removing unused objects an repacking the others together.
You probably hear a lot about it, because this is a critical function, which happens asynchronously with the program and which, if not handled efficiently can produce some random performance lagging in the program, etc. etc. Nowadays, however the algorithms related to the memory management at-large and the GC (garbage collection) in particular are quite efficient.
Another reason why the GC is sometimes mentioned is in relation to the destructor of some particular object. Since the application has no (or little) control over when particular objects are Garbage-Collected (hence destroyed), it may be an issue if an object waits till its destructor to dispose of some resource and such. That is why many objects implement a Dispose() method, which allow much of that clean-up (of the object itself) to be performed explicitly, rather than be postponed till the destructor is eventually called from the GC logic.
Automatic garbage collection, like java, reuses memory leaks or memory that is no longer being used, making your program efficient. In c++, you have to control the memory yourself, and if you lose access to a memory, then that memory can no longer be used, wasting space.
This is what I know so far from one year of computer science and using/learning java and c++.
Because someone can write code like
consume(produce())
without caring about cleanup. Just like in our current society.