How to determine whether I need to use a garbage collector?

How to determine whether I need to use a garbage collector? - garbage-collection

I am working on a program which will be used for drawing vector pictures. As such, it will have to store points, paths defined by these points, pictures defined by these paths etc. Inkscape (http://inkscape.org/) which does something similar seems to use the Boehm Garbage Collector (http://www.hpl.hp.com/personal/Hans_Boehm/gc/). Does that mean it would be advisable for me to do the same also? I mean, what criteria should I use to determine whether I need to use a GC in my program?
Thanks.

Whether garbage collection will be used or not depends on the programming language used to develop your application.
Garbage-collection is a memory-management technique and as such its use depends on the choice of programming language. Some programming languages are using garbage collection (such as Java, C#) and some do not use garbage collection (C/C++).
Btw. Inkscape uses the C, C++, Python, Perl, XSLT programming languages out of which Python and Perl use garbage collection for its memory management.
UPDATE: To learn more about C/C++ and Garbage Collection, I would recommend:
Why doesn't C++ have a garbage collector?
Garbage Collection in C++ -- why?

Related

is Haskell a managed language?

I'm a complete newbie in Haskell. One thing that always bugs me is the ambiguity in whether Haskell is a managed(term borrowed from MS) language like Java or a compile-to-native code like C?
The GHC page says this "GHC compiles Haskell code either directly to native code or using LLVM as a back-end".
In the case of "compiled to native code", how can features like garbage collection be possible without something like a JVM?
/Update/
Thanks so much for your answer. Conceptually, can you please help point out which one of my following understandings of garbage collection in Haskell is correct:
GHC compiles Haskell code to native code. In the processing of compiling, garbage collection routines will be added to the original program code?
OR
There is a program that runs along side a Haskell program to perform garbage collection?

As far as I am aware the term "managed language" specifically means a language that targets .NET/the Common Language Runtime. So no, Haskell is not a managed language and neither is Java.
Regarding what Haskell is compiled to: As the documentation you quoted says, GHC compiles Haskell to native code. It can do so by either directly emitting native code or by first emitting LLVM code and then letting LLVM compile that to native code. Either way the end result of running GHC is a native executable.
Besides GHC there are also other implementations of Haskell - most notably Hugs, which is a pure interpreter that never produces an executable (native or otherwise).
how can features like garbage collection be possible without something like a JVM?
The same way that they're possible with the JVM: Every time memory is allocated, it is registered with the garbage collector. Then from time to time the garbage collector runs, following the steps of the given garbage collection algorithm. GHC-compiled code uses generational garbage collection.
In response to your edit:
GHC compiles Haskell code to native code. In the processing of compiling, garbage collection routines will be added to the original program code?
Basically. Except that saying "garbage collection routines will be added to the original program code" might paint the wrong picture. The GC routines are just part of the library that every Haskell program is linked against. The compiled code simply contains calls to those routines at the appropriate places.
Basically all there is to it is to call the GC's alloc function every time you would otherwise call malloc.
Just look at any GC library for C and how it's used: All you need to do is to #include the library's header and link against the library, and replace each occurence of malloc with the GC library's alloc function (and remove all calls to free) and bam, your code is garbage collected.
There is a program that runs along side a Haskell program to perform garbage collection?
No.

whether Haskell is a managed(term borrowed from MS) language like Java
GHC-compiled programs include a garbage collector. (As far as I know, all implementations of Haskell include garbage collection, but this is not part of the specification.)
or a compile-to-native code like C?
GHC-compiled programs are compiled to native code. Hugs interprets programs, and does not compile to native code. There are several other implementations which all, as far as I know, compile to native code, but I list these separately because I'm not as confident of this fact.
In the case of "compiled to native code", how can features like garbage collection be possible without something like a JVM?
GHC-compiled programs include a runtime system that provides some basic capabilities like M-to-N green threading, garbage collection, and an IO manager. In a sense, this is a bit like having "something like a JVM" in that it provides many of the same features, but it's very different in implementation: there is no common bytecode across all architectures (and hence no "virtual machine").
which one of my following understandings of garbage collection in Haskell is correct:
GHC compiles Haskell code to native code. In the processing of compiling, garbage collection routines will be added to the original program code?
There is a program that runs along side a Haskell program to perform garbage collection?
Case 1 is correct: the runtime system code is added to the program code during compilation.

"Managed language" is an overloaded term so here are one-word answers and then some details for the usual different meanings that come to (my) mind:
Managed as in a CLR target
No, Haskell does not compile to Microsoft CLI's IL.
Well, I read there are some solutions that can do that, but imo, don't.. the CLR isn't built for FP and will seriously lack optimizations, probably yielding a research language performance. If I personally would really really want to target the CLR, I'd use F# -- it's not a functional language but it's close.
N.B. This is the most accurate and actual meaning for the term "managed language". The next meanings are, well, wrong, but nevertheless & unfortunately common.
Managed as in automatically garbage-collected
Yes, and this is pretty much a must have. I mean, beyond the specification: If we would have to garbage collect it would destroy the functional theme that makes us work in the high altitudes that are our beloved home.
It would also enforce impurity and a memory model.
Managed as in compiled to bytecode which is ran by a VM
No (usually).
It depends on your backend:
Not only we have different Haskell compilers today, some compilers have different backends -- there are even backends for JavaScript!
So if you do want to target a VM, you can use an existing / make a backend for it. But Haskell doesn't require it. So just as you can compile to native raw-metal binary, you can compile to anything else.
In contrast to CLR languages like C#1, VB.NET, and in contrast to Java, etc. you don't have to target a JVM, the CLR, Mono, etc. as Haskell doesn't require a VM at all.
GHC is a good example. When you compile in GHC, it doesn't compile you straight to binary, it compiles to an intermediate language called Core, and then optimizes from Core to Core for some times before it proceeds to another language called STG, and only then proceeds to code generation (it can stop there if you tell it to).2 And these days you can also use it to compile to LLVM bytecode (which is subject to some awesome optimizations). With the LLVM backend, GHC can produce wildly faster programs. For more information about it and about GHC backends, go here.
The diagram below illustrates the GHC compilation pipeline, and here you can find more information about the various stages.
See the fork at the bottom for three different targets? those are the backends I was referring to.
1 A future exception and a fun fact: Microsoft are currently working on native .NET! the cunningly named: Microsoft .NET Native.

What, for you, is the defining feature of a "managed language"? The phrase "GHC compiles Haskell code either directly to native code or using LLVM as a back-end" that you quote is quite clear about what GHC does, so I suspect the "ambiguity" that bugs you is rather in the term "managed language" than in GHC's docs.
In the case of "compiled to native code", how can features like garbage collection be possible without something like a JVM?
How exactly do you think "something like a JVM" implements features like garbage collection? The JVM isn't magic, it's just a program like everything else. At some level you need to have native code in order for the CPU to execute it, so clearly features like garbage collection are possible in native code.

For where you currently are, it's probably best to think of (GHC) Haskell as "managed," but that the platform GHC compiles to is not targeted by anything else. There is, of course, more to it than that, but that's a sufficient explanation in lieu of more Haskell experience.

Is there a language with the speed of C/C++ but without buffer overflows and has a garbage collector?

I am looking for a programming language that is fast like C and C++ and has a garbage collector and is not prone to buffer overflows. I am looking for something between Java/C# and C/C++. Is there such a language?

Checking for buffer overflows and collecting garbage has a cost: if you need these features, then you will not get the speed of C/C++. Tradeoff.
Java and C# are very, very close to C++ speed in most types of applications, so unless you need something very specific, I suggest you go with one of those 2 languages.
If you just want a garbage collector for C++, you can get one here.

You could take a look at D. It's a compiled language with most of the features from C++ in addition to garbage collection and some others.

Language "speed" is highly application dependent. The JVM is darn fast for certain kinds of code--hot spot can actually be faster than native code. On the other hand, functional style and a good optimized can let you get good performance with less code--often Haskell apps are as fast in practice as ones in C.
For a real cross of Java/C# and C++ the best place to look is the D language. It has garbage collection, and optional access to malloc and free and even inline assembly for C level performance. It has enough safety to be less prone to buffer overflows, but you can still have them. http://www.digitalmars.com/d/2.0/index.html
You can always garbage collect C/C++, but it will cost you. Java, Haskell, ML, even Python can use garbage collectors that know what values might be pointers, so are faster than using a collector for C, C++, or D.

Heap Object representation for OO language

As part of my masters thesis I am writing a compiler for an object oriented language that was developed at my home university. Currently the compiler outputs assembler that runs on a virtual machine. The virtual machine handles all things like stack operations, object generation, heap management and garbage collection.
Target architecture for my compiler is a MIPS-alike CPU.
I am searching for strategies to develop an object layout and ideas to implement and trigger garbage collection during runtime. I could of course analyze how GCC implements this with C++, but I'd prefer to be pointed to some good publications/ressources.

Read up on Python's internal object management. They use reference counting and dispose of objects when the reference count goes to zero.
Here's an older (but still helpful) document: http://docs.python.org/release/2.5.2/ext/refcounts.html
Here's general stuff: http://en.wikipedia.org/wiki/Reference_counting
And some more: http://code.google.com/p/augustus/wiki/OptionalGarbageCollection

Is there a better C? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I want a better C. Let me explain:
I do a lot of programming in C, which is required for applications that have real-time needs such as audio programming, robotics, device drivers, etc.
While I love C, one thing that gets on my nerves after having spent a lot of time with Haskell is the lack of a proper type system. That is, as soon as you want to write a more general-purpose function, say something that manipulates a generic pointer, (like say a generic linked list) you have to cast things to void* or whatever, and you loose all type information. It's an all-or-nothing system, which doesn't let you write generic functions without losing all the advantages of type checking.
C++ doesn't solve this. And I don't want to use C++ anyways. I find OO classes and templates to be a headache.
Haskell and its type classes do solve this. You can have semantically useful types, and use type constraints to write functions that operate on classes of types, that don't depend on void.
But the domain I'm working in, I can't use Haskell, because it's not real-time capable--mostly due to garbage collection. GC is needed because it's very difficult to do functional programming, which is allocation-heavy, without automatic memory management. However, there is nothing specifically in the idea of type classes that goes against C's semantics. I want C, but with Haskell's dependable type system, to help me write well-typed systems. However, I really want C: I want to be in control of memory management, I want to know how the data structures are layed out, I want to use (well-typed) pointer arithmetic, I want mutability.
Is there any language like this? If so, why is it not more popular for low-level programming?
Aside: I know there are some small language experiments in this direction, but I'm interested in things that would be really usable in real-world projects. I'm interesting in growing-to-well-developed languages, but not so much "toy" languages.
I should add, I heard of Cyclone, which is interesting, but I couldn't get it to compile for me (Ubuntu) and I haven't heard of any projects actually using it.. any other suggestions in this vein are welcome.
Thanks!

Since nobody brought it up yet: I think the ATS language is a very good candidate for a better C! Especially since you enjoy Haskell and thus functional programming with strong types. Note that ATS seems to be specifically designed for systems programming and hard real-time applications as most of it can do without garbage collection.
If you check the shootout you will find that performance is basically on par with C. I think this is quite impressive since modern c compilers have years and years and years of optimization work behind them while ATS is basically developed by one guy. -- while other languages providing similar safety features usually introduce overhead ATS ensures things entirely at compile time and thus yields very similar performance characteristics as C.
To quote the website:
What is ATS?
ATS is a statically typed programming language that unifies implementation with formal specification. It is equipped with a highly expressive type system rooted in the framework Applied Type System, which gives the language its name. In particular, both dependent types and linear types are available in ATS. The current implementation of ATS (ATS/Anairiats) is written in ATS itself. It can be as efficient as C/C++ (see The Computer Language Benchmarks Game for concrete evidence) and supports a variety of programming paradigms that include:
Functional programming. The core of ATS is a functional language based on eager (aka. call-by-value) evaluation, which can also accommodate lazy (aka. call-by-need) evaluation. The availability of linear types in ATS often makes functional programs written in it run not only with surprisingly high efficiency (when compared to C) but also with surprisingly small (memory) footprint (when compared to C as well).
Imperative programming. The novel and unique approach to imperative programming in ATS is firmly rooted in the paradigm of programming with theorem-proving. The type system of ATS allows many features considered dangerous in other languages (e.g., explicit pointer arithmetic and explicit memory allocation/deallocation) to be safely supported in ATS, making ATS a viable programming language for low-level systems programming.
Concurrent programming. ATS, equipped with a multicore-safe implementation of garbage collection, can support multithreaded programming through the use of pthreads. The availability of linear types for tracking and safely manipulating resources provides an effective means to constructing reliable programs that can take advantage of multicore architectures.
Modular programming. The module system of ATS is largely infuenced by that of Modula-3, which is both simple and general as well as effective in supporting large scale programming.
In addition, ATS contains a subsystem ATS/LF that supports a form of (interactive) theorem-proving, where proofs are constructed as total functions. With this component, ATS advocates a programmer-centric approach to program verification that combines programming with theorem-proving in a syntactically intertwined manner. Furthermore, this component can serve as a logical framework for encoding deduction systems and their (meta-)properties.

What about Nimrod or Vala languages ?

Rust
Another (real) candidate for a better C is The Rust Programming Language.
Unlike some other suggestions, (Go, Nimrod, D, ...) Rust can directly compete with C and C++ because it has manual memory management and does not require garbage collection (see [1]).
What sets Rust apart is that it has safe manual memory management. (The link is to pc walton's blog, one of Rusts main contributors and generally worth a read ;) Among other things, this means it fixes the billion dollar mistake of nullpointers. Many of the other languages suggested here either require garbage collection (Go) or have garbage collection turned on by default and do not provide facilities for safe manual memory management beyond what C++ provides (Nimrod, D).
While Rust has an imperative heart, it does borrow a lot of nice things from functional languages, for example sum types aka tagged unions. It is also really concerned with being a safe and performance oriented language.
[1] Right now there are two main pointer types owned pointers (like std::unique_ptr in C++ but with better support from the typechecker) and managed pointers. As the name suggests the latter do require task-local garbage collection, but there are thoughts to remove them from the language and only provide them as a library.
EDITED to reflect #ReneSacs comment: Garbage collection is not required in D and Nimrod.

I don't know much about Haskell, but if you want a strong type system, take a look at Ada. It is heavily used in embedded systems for aerospace applications. The SIGADA moto is "In strong typing we trust." It won't be of much use, however, if you have to do Windows/Linux type device drivers.
A few reasons it is not so popular:
verbose syntax -- designed to be read, not written
compilers were historically expensive
the relationship to DOD and design committees, which programmers seem to knock
I think the truth is that most programmers don't like strong type systems.

Nim (former Nimrod) has a powerful type system, with concepts and easy generics. It also features extensive compile time mechanisms with templates and macros. It also has easy C FFI and all the low level features that you expect from a system programming language, so you can write your own kernel, for example.
Currently it compiles to C, so you can use it everywhere GCC runs, for example. If you only want to use Nim as better C, you can do it via the --os:standalone compiler switch, that gives you a bare bones standard library, with no OS ties.
For example, to compile to an AVR micro-controller you can use:
nim c --cpu:avr --os:standalone --deadCodeElim:on --genScript x.nim
Nim has a soft real-time GC where you can specify when it runs and the max pause time in microseconds. If you really can't afford the GC, you can disable it completely (--gc:none compiler switch) and use only manual memory management like C, losing most of the standard library, but still retaining the much saner and powerful type system.
Also, tagged pointers are a planned feature, that ensure you don't mix kernel level pointers with user level pointers, for example.

D might offer what you want. It has a very rich type system, but you can still control memory layout if you need to. It has unrestricted pointers like C. It’s garbage collected, but you aren’t forced to use the garbage collector and you can write your own memory management code if you really want.
However, I’m not sure to what extent you can mix the type richness with the low-level approach you want to use.
Let us know if you find something that suits your needs.

I'm not sure what state Cyclone is in, but that provided more safety for standard C. D can be also considered a "better C" to some extent, but its status is not very clear with its split-brain in standard library.
My language of choice as a "better C" is OOC. It's still young, but it's quite interesting. It gives you the OO without C++'s killer complexity. It gives you easy access to C interfaces (you can "cover" C structs and use them normally when calling external libraries / control the memory layout this way). It uses GC by default, but you can turn it off if you really don't want it (but that means you cannot use the standard library collections anymore without leaking).
The other comment mentioned Ada which I forgot about, but that reminded me: there's Oberon, which is supposed to be a safe(-er) language, but that also contains garbage collection mechanisms.

You might also want to look at BitC. It’s a serious language and not a toy, but it isn’t ready yet and probably won’t be ready in time to be of any use to you.
Nonetheless, a specific design goal of BitC is to support low-level development in conjunction with a Haskell-style type system. It was originally designed to support development of the Coyotos microkernel. I think that Coyotos was killed off, but BitC is still apparently being developed.

C++ doesn't solve this. And I don't want to use C++ anyways. I find OO classes and templates to be a headache.
Get over this attitude. Just use C++. You can start with coding C in C++ and keep gradually moving to better style.

Replacement language for C++?

When working on hobby projects I really like to program in low-level languages (in the sense that C and C++ are low level). I don't want to work with managed languages with garbage collection and whatnot that takes all the fun away (yeah, we're all different ;-) ).
Normally I use C++ for these type of projects. C++ is rather complex and not so elegant so I have been looking for a language to replace it. Anybody can give me suggestions?
Preferences (not requirements):
should be low-level (like C and C++)
compile to native code (kind of follows from the above but no harm in being explicit)
preferrably target win32/win64
object oriented
statically typed
I have looked at Objective C but I don't like it.

D? (Wikipedia page)
The D language is statically typed and
compiles directly to machine code.
It's multiparadigm, supporting many
programming styles: imperative, object
oriented, and metaprogramming. It's a
member of the C syntax family, and its
appearance is very similar to that of
C++. For a quick comparison of the
features, see this comparison of D
with C, C++, C# and Java.
I think that covers everything in your requirements except Windows support, which it has too.
Note that it has garbage collection, but your question seems to associate garbage collection with being managed - they're not the same thing. I believe garbage collection can be pretty tightly controlled in D.
I should note that I have absolutely no experience in the language whatsoever :)

Ada - http://en.wikipedia.org/wiki/Ada_programming_language
Oberon - http://en.wikipedia.org/wiki/Oberon_(programming_language)
Modula 3 - http://en.wikipedia.org/wiki/Modula-3

Delphi? Pascal syntax, but still quote powerful and just a little more high-level than C++.

Requesting no gc is rather strong and eliminate almost every modern language - things like Ocaml, for example, fill all the other requirements.
There is also ADA which fill every of your desire, but that's a very strict language. The syntax is somewhat similar to Pascal I think, and the language has much less holes compared to C. It has built-in support for threads and 'modules' (better than C headers).

FreePascal
Delphi
Oberon
Any 3 would be great replacements. They're easier to use than C++ too.

Ada is a really good language, however, it uses garbage collections (noticed that mamboking mentioned it.) Not sure about Oberon and Modula 3.
Pascal/Delphi is also using garbage collection as far as I know. (or at least smart pointers of some kind.)

I suggest Limbo!
It's a language created by Rob Pike (co-author with Kerninghan of many programming books). This language is interpreted by the DIS virtual (memory-to-memory) machine or compiled.
It has many data types built in like tuple, pipe, list, array, channel (useful to EASILY comunicate between thread), etc. it's concurrent, modular.
It implements many modern features! and it's used to write application for the Inferno OS.
Limbo review by Dennis Ritchie and
Limbo review by Kernighan

I would suggest Vala! try it is is amazing

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string