Can scripting language and interpreted language force garbage collection? - garbage-collection

In javascript, you can't force garbage collection to happen, instead you have to wait for the interpreter to automatically collect it.
Does this behaviour exist in interpreted languages like Python and Java as well?

I don't know about Java, but in Python you can manually force a garbage collection cycle to happen with gc.collect().
From the docs:
gc.collect([generation])
With no arguments, run a full collection. The
optional argument generation may be an integer specifying which
generation to collect (from 0 to 2). A ValueError is raised if the
generation number is invalid. The number of unreachable objects found
is returned.
You can read this SO answer for a reference on how garbage collection works in Python.

Related

Compiled Language with Dynamic Typing

I'm a bit confused when it comes to a compiled language (compilation to native code) with dynamic typing.
Dynamic typing says that the types in a program are only inferred at runtime.
Now if a language is compiled, there's no interpreter running at runtime; it's just your CPU reading instructions off memory and executing them. In such a scenario, if any instruction violating the type semantics of the language happens to execute at runtime, there's no interpreter to intercept the execution of the program and throw any errors. How does the system work then?
What happens when an instruction violating the type semantics of a dynamically typed compiled language is executed at runtime?
PS: Some of the dynamically typed compiled languages I know of include Scheme, Lua and Common Lisp.
A compiler for a dynamically typed language would simply generate instructions that check the type where necessary. In fact even for some statically typed languages this is sometimes necessary, say, in case of an object oriented language with checked casts (like dynamic_cast in C++). In dynamically types languages it is simply necessary more often.
For these checks to be possible each value needs to be represented in a way that keeps track of its type. A common way to represent values in dynamically typed languages is to represent them as pointers to a struct that contains the type as well as the value (as an optimization this is often avoided in case of (sufficiently small) integers by storing the integer directly as an invalid pointer (by shifting the integer one to the left and setting its least significant bit)).

Compiler: How to implement Reference Counting (in a simple VM)

Ive written a very simple Compiler that translates my source language to bytecode, this code gets processed by the VM (as a simple stack machine, so 3 + 3 will get translated into
push 3
push 3
add
right now I struggle at the garbage collection (I want to use reference counting).
I know the basic concept of it, if a reference gets assigned, the reference counter of that object is incremented, and if it leaves scope, it gets decremented, but the thing thats not clear to me is how the GC can free objects that get passed to functions...
here some more concrete examples of what i mean
string a = "im a string" //ok, assignment, refcount + 1 at declare time and - 1 when it leaves scope
print(new Object()) //how is a parameter solved? is the reference incremented before calling the function?
string b = "a" + "b" + "c" //dont know how to solve this, because 2 strings get pushed, then concanated, then the last gets pushed and concanated again, but should the push operation increase the ref count too or what, and where to decrease them then?
I would be glad if anyone could give me links to tutorials for implementing reference counting or help me with this very specific problem if someone had this problem before (my problem is that i dont understand when to inc, dec the references or where the count is stored)
I think a couple of things can happen with literals. You can treat them like literal numbers, and they are constants and there forever, or you can have an implicit variable that has retrain count of 1 before print, and releases it after.
In response to your edit:
You can use the implicit variable solution, or you can use the "autorelease" concept from Objective-C. You have a an object that is placed in the autorelease pool that will be released in a small amount of time, in which the receiver of the object can retain it.
First, what types of objects does your language allow to be put on the heap? Strings? Do you have mutable or immutable strings?
Check out this post about Strings in Java. So in a Java like language strings get copied every time you concatenate them because they are immutable. Also "this is a string" is actually a call to the constructor of the string class.
If the argument to print() is a call to a constructor (new Object()), there is no reference to the object in the scope calling the function, thus the object lives in the scope of the function and the counters should be incremented and decremented accordingly to entering and leaving the scope of the print() function. If the constructor is called in the calling scope and assigned to a variable, it lives in the calling scope.
While reading about the stuff, Wikipedia is a good start, but Andrew Appel's compiler book would be handy to have (there should be a 2nd edition out there and there is a C and ML version of the book available too). Lambda-the-Ultimate is the place where many of the programming language researchers discuss things, so definitely a place worth looking at.

Weak Tables in lua - What are the practical uses?

I understand what weak tables are.
But I'd like to know where weak tables can be used practically?
The docs say
Weak tables are often used in situations where you wish to annotate
values without altering them.
I don't understand that. What does that mean?
Posted as an answer from comments...
Since Lua doesn't know what you consider garbage, it won't collect anything it isn't sure to be garbage. In some situations (one of which could be debugging) you want to specify a value for a variable without causing it to be considered "not trash" by Lua. From my understanding, weak tables allow you to do what you'd normally do with variables/objects/etc, but if they're weak referenced (or in a weak table), they will still be considered garbage by Lua and collected when the garbage collection function is called.
Example: Think about if you wanted to use an associative array, with key/value pairs in two separate private tables. If you only wanted to use the key table for one specific use, once you are done using it, it will be locked into existence in Lua. If you were to use a weak table, however, you'd be able to collect it as garbage as soon as you were done using it, freeing up the resources it was using.
To explain that one cryptic sentence about annotating, when you "alter" a variable, you lock it into existence and Lua no longer considers it to be garbage. To "annotate" a variable means to give it a name, number, or some other value. So, it means that you're allowed to give a variable a name/value without locking it into existence (so then Lua can garbage collect it).
Translation:
Weak tables are often used in situations where you wish to give a name to a value without locking the value into existence, which takes up memory.
Normally, storing a reference to an obect will prevent that object from being reclaimed when the object goes out of scope. Weak references do not prevent garbage collection.

How to store references in a mark and sweep garbage collector?

I started writing my own scripting language over the most recent weekend for both the learning experience and for my resume when I graduate high school. So far things have gone great, I can parse variables with basic types (null, boolean, number, and string) and mathematical expressions with operator precedence, and have a rudimentary mark and sweep garbage collector in place (after completing the mark/sweep collector I will implement a generational garbage collector, I know naive mark/sweep isn't very fast). I am unsure how to store the referenced objects for the garbage collector, though. As of now I have a class GCObject that stores a pointer to the it's memory and whether it is marked or not. Should I store a linked list to it's referenced objects in the class? I have looked at garbage collectors from other languages but I see no linked lists of references per GCObject, so it is confusing me.
TLDR: How do I store objects that are referenced by other objects in a mark and sweep garbage collector? Do I just store linked lists of objects in all my GCObjects?
Thanks guys.
You generally don't store the references to an object in anything but the locations at which those references naturally occur. During the mark operation, you don't need to know which references point to an object; rather, you need to know which references an object (or root) contains, so you can recursively mark those objects.
You also need, for the sweep phase, a way to iterate through all objects so you can finalise any unreferenced objects and return their storage to the allocation pool. How you would do this exactly depends on your general purpose allocator - you probably want to write a custom one.
(I'm assuming you don't want to do compaction - that's a whole lot more complicated).

How can I leak memory in Clojure?

For a presentation at the Bay Area Clojure Meetup on Thursday I am compiling a list of ways to leak memory in Clojure.
So far I have:
hold onto the head of an infinite sequence
creating lots of generic classes by calling lambda in a loop (is this still a problem)
holding a reference to unused data
...
What else?
By keeping a reference to a seq on a large collection. eg:
(drop 999990 (vec (range 1000000)))
returns a seq of ten elements that holds a reference to the whole vector!
Another obvious way is to use any Java library that leaks memory. (e.g. Qt Jambi)
About lambdas, read here and here and here. I think this is fixed in the latest versions of Clojure.
There is the intern call as well.
Note that your examples are not leaking memory in the common sense of the word. You can still access the objects (not sure about the classes -- I assume one can re-find them via some API), i.e. they haven't been lost. With certain things like the classes and interned strings it is just impossible to forget the data so the effect is the same.
Clojure memory leaks will usually be very similar to Java memory leaks. However the fact that collections are "persistent" means that if you add something into a collection and don't realize that you retained a reference to the old version of the collection as well as the new value means that memory is consumed to keep the old version hanging around.

Resources