Compiler: How to implement Reference Counting (in a simple VM) - garbage-collection

Ive written a very simple Compiler that translates my source language to bytecode, this code gets processed by the VM (as a simple stack machine, so 3 + 3 will get translated into
push 3
push 3
add
right now I struggle at the garbage collection (I want to use reference counting).
I know the basic concept of it, if a reference gets assigned, the reference counter of that object is incremented, and if it leaves scope, it gets decremented, but the thing thats not clear to me is how the GC can free objects that get passed to functions...
here some more concrete examples of what i mean
string a = "im a string" //ok, assignment, refcount + 1 at declare time and - 1 when it leaves scope
print(new Object()) //how is a parameter solved? is the reference incremented before calling the function?
string b = "a" + "b" + "c" //dont know how to solve this, because 2 strings get pushed, then concanated, then the last gets pushed and concanated again, but should the push operation increase the ref count too or what, and where to decrease them then?
I would be glad if anyone could give me links to tutorials for implementing reference counting or help me with this very specific problem if someone had this problem before (my problem is that i dont understand when to inc, dec the references or where the count is stored)

I think a couple of things can happen with literals. You can treat them like literal numbers, and they are constants and there forever, or you can have an implicit variable that has retrain count of 1 before print, and releases it after.
In response to your edit:
You can use the implicit variable solution, or you can use the "autorelease" concept from Objective-C. You have a an object that is placed in the autorelease pool that will be released in a small amount of time, in which the receiver of the object can retain it.

First, what types of objects does your language allow to be put on the heap? Strings? Do you have mutable or immutable strings?
Check out this post about Strings in Java. So in a Java like language strings get copied every time you concatenate them because they are immutable. Also "this is a string" is actually a call to the constructor of the string class.
If the argument to print() is a call to a constructor (new Object()), there is no reference to the object in the scope calling the function, thus the object lives in the scope of the function and the counters should be incremented and decremented accordingly to entering and leaving the scope of the print() function. If the constructor is called in the calling scope and assigned to a variable, it lives in the calling scope.
While reading about the stuff, Wikipedia is a good start, but Andrew Appel's compiler book would be handy to have (there should be a 2nd edition out there and there is a C and ML version of the book available too). Lambda-the-Ultimate is the place where many of the programming language researchers discuss things, so definitely a place worth looking at.

Related

Temporary materialization conversion - Confusion about terminology and concepts

Hi stackoverflow community,
I'm a few months into C++ and recently I've been trying to grasp the concepts revolving around the "new" value categories, move semantics, and especially temporary materialization.
First of all, it's not straightforward to me how to interpret the term "temporary materialization conversion". The conversion part is clear to me (prvalue -> xvalue). But how exactly is a "temporary" defined in this context? I used to think that temporaries were unnamed objects that only exist - from a language point of view - until the last step in the evaluation of the expression they were created in. But this conception doesn't seem to match what temporaries actually seem to be in the broader context of temporary materialization, the new value categories, etc.
The lack of clarity about the term "temporary" results in me not being able to tell if a "temporary materialization" is a temporary that gets materialized or a materialization that is temporary. I assume it's the former, but I'm not sure. Also: Is the term temporary only used for class types?
This directly brings me to the next point of confusion: What roles do prvalues and xvalues play regarding temporaries? Suppose I have an rvalue expression that needs to be evaluated in such a way that it has to be converted into an xvalue, e.g. by performing member access.
What will exactly happen? Is the the prvalue something that is actually existent (in memory or elsewhere) and is the prvalue already the temporary? Now, the "temporary materialization conversion" described as "A prvalue of any complete type T can be converted to an xvalue of the same type T. This conversion initializes a temporary object of type T from the prvalue by evaluating the prvalue with the temporary object as its result object, and produces an xvalue denoting the temporary object" at cppreference.com (https://en.cppreference.com/w/cpp/language/implicit_conversion) converts the prvalue into an xvalue. This extract makes me think that a prvalue is something that is not existent anywhere in memory or a register up until it gets "materialized" by such conversion. (Also, I'm not sure if a temporary object is the same as a temporary.) So, as far as I understand, this conversion is done by the evaluation of the prvalue expression which has "real" object as a result. This object is then REPRESENTED (= denoted?) by the xvalue expression. What happens in memory? Where has the rvalue been, where is the xvalue now?
My next question is more specific question about a certain part of temporary materialization. In the talk "Understanding value categories in C++" by Kris van Rens on YouTube (https://www.youtube.com/watch?v=liAnuOfc66o&t=3576s) at ~56:30 he shows this slide:
Based on what cppreference.com says about temporary materialization numbers 1 and 2 are clear cases (1: member access on a class pravlue, 2: binding a reference to a prvalue (as in the std::string +operator).
I'm not too sure about number 3, though. Cppreference says: "Note that temporary materialization does not occur when initializing an object from a prvalue of the same type (by direct-initialization or copy-initialization): such object is initialized directly from the initializer. This ensures "guaranteed copy elision"."
The +operator returns a prvalue. Now, this prvalue of type std::string is used to initialize an auto (which should resolve to std::string as well) variable. This sounds like the case that is discussed in the prior cppreference excerpt. So does temporary materialization really occur here? And what happens to the objects (1 and 2) that were "denoted" by the xvalue expressions in between? When do they get destroyed? And if the +operator is returning an prvalue, does it even "exist" somewhere? And how is the object auto x " initialized directly from the initializer" (the prvalue) if the prvalue is not even a real (materialized?) object?
In the talk "Nothing is better than copy or move - Roger Orr [ACCU 2018]" on YouTube (https://www.youtube.com/watch?v=-dc5vqt2tgA&t=2557s) at ~ 40:00 there is nearly the same example:
This slide even says that temporary materialization occurs when initializing a variable which clearly contradicts the exception from cppference from above. So what's true?
As you can see, I'm pretty confused about this whole topic. For me, it's especially hard to grasp these concepts as I cannot find any clear definitions of various term that are used in a uniform way online. I'd appreciate any help a lot!
Best regards,
Ruperrrt
TL;DR: What is a temporary in the temporary materialization conversion context? Does that mean that a temporary gets materialized or that it is materialization that is temporary? Also temporary = temporary object?
In the slides, is 3 (first slide) respectively 1 (second slide) really a point where temporary materialization occurs (conflicts with what cppreference says about initialization from pravlues of the same type)?
107 views, 6 months and no answer nor comments. Interesting. Here's my take on your question.
Temporary materialization should mean "temporary that gets materialized". I don't even know what "materialization that is temporary" would even mean to be honest.
The term temporary is not only used for class types.
Prvalues, loosely speaking, don't exist in memory unlike xvalues. The thing you should care about is the context. Let's say you have defined a structure
struct S { int m; };.
In the expression S x = S();, subexpression S() denotes a prvalue. The compiler with treat it just as if you have written S x{}; (note that I've put curly brackets on purpose because S x(); is actually a declaration of a function). On the other hand in expression like int i = S().m;, subexpression S() is a prvalue that will be converted to xvalue that is, S() will denote something that will exist in the memory.
Regarding your second question, the thing you need to know about is that with the C++-17, the circumstances in which temporaries are going to be created were brought down to minimum (cppreference describes it very well). However, the expression
auto x = std::string("Guaca") + std::string("mole").c_str();
will require two temporary objects to be created before assignment. Firstly, you are doing member access with c_str() method so a temporary std::string will be created. Secondly, the operator + will bind one a reference to the std::string("Guaca") (new temporary). and one to the result object of c_str(), but without creating additional temporary because of:. That's pretty much it. It's worth to note that the order of creation temporary objects isn't known - it's totally implementation-defined.
After that, we're calling the operator + which probably constructs another std::string which technically isn't a temporary object because that's a part of the implementation. That object might or might not be constructed into the memory location of x depending on the NRVO. In any case, whatever value does the prvalue expression std::string("Guaca") + std::string("mole").c_str() denote will be the same value (of the same object) denoted by the expression x because of cpp.ref:
Note that temporary materialization does not occur when initializing an object from a prvalue of the same type (by direct-initialization or copy-initialization): such object is initialized directly from the initializer. This ensures "guaranteed copy elision".
This quote isn't really precise and might confuse you so I also suggest reading copy elision (the first part about mandatory elision).
I'm not a C++ expert so take all of this with a grain of salt.

In Python, how to know if a function is getting a variable or an object?

How can you test whether your function is getting [1,2,4,3] or l?
That might be useful to decide whether you want to return, for example, an ordered list or replace it in place.
For example, if it gets [1,2,4,3] it should return [1,2,3,4]. If it gets l, it should link the ordered list to l and do not return anything.
You can't tell the difference in any reasonable way; you could do terrible things with the gc module to count references, but that's not a reasonable way to do things. There is no difference between an anonymous object and a named variable (aside from the reference count), because it will be named no matter what when received by the function; "variables" aren't really a thing, Python has "names" which reference objects, with the object utterly unconcerned with whether it has named or unnamed references.
Make a consistent API. If you need to have it operate both ways, either have it do both things (mutate in place and return the mutated copy for completeness), or make two distinct APIs (one of which can be written in terms of the other, by having the mutating version used to implement the return new version by making a local copy of the argument, passing it to the mutating version, then returning the mutated local copy).

How to minimize the garbage collection in Go?

Some times you could want to avoid/minimize the garbage collector, so I want to be sure about how to do it.
I think that the next one is correct:
Declare variables at the beginning of the function.
To use array instead of slice.
Any more?
To minimize garbage collection in Go, you must minimize heap allocations. To minimize heap allocations, you must understand when allocations happen.
The following things always cause allocations (at least in the gc compiler as of Go 1):
Using the new built-in function
Using the make built-in function (except in a few unlikely corner cases)
Composite literals when the value type is a slice, map, or a struct with the & operator
Putting a value larger than a machine word into an interface. (For example, strings, slices, and some structs are larger than a machine word.)
Converting between string, []byte, and []rune
As of Go 1.3, the compiler special cases this expression to not allocate: m[string(b)], where m is a map and b is a []byte
Converting a non-constant integer value to a string
defer statements
go statements
Function literals that capture local variables
The following things can cause allocations, depending on the details:
Taking the address of a variable. Note that addresses can be taken implicitly. For example a.b() might take the address of a if a isn't a pointer and the b method has a pointer receiver type.
Using the append built-in function
Calling a variadic function or method
Slicing an array
Adding an element to a map
The list is intended to be complete and I'm reasonably confident in it, but am happy to consider additions or corrections.
If you're uncertain of where your allocations are happening, you can always profile as others suggested or look at the assembly produced by the compiler.
Avoiding garbage is relatively straight forward. You need to understand where the allocations are being made and see if you can avoid the allocation.
First, declaring variables at the beginning of a function will NOT help. The compiler does not know the difference. However, human's will know the difference and it will annoy them.
Use of an array instead of a slice will work, but that is because arrays (unless dereferenced) are put on the stack. Arrays have other issues such as the fact that they are passed by value (copied) between functions. Anything on the stack is "not garbage" since it will be freed when the function returns. Any pointer or slice that may escape the function is put on the heap which the garbage collector must deal with at some point.
The best thing you can do is avoid allocation. When you are done with large bits of data which you don't need, reuse them. This is the method used in the profiling tutorial on the Go blog. I suggest reading it.
Another example besides the one in the profiling tutorial: Lets say you have an slice of type []int named xs. You continually append to the []int until you reach a condition and then you reset it so you can start over. If you do xs = nil, you are now declaring the underlying array of the slice as garbage to be collected. Append will then reallocate xs the next time you use it. If instead you do xs = xs[:0], you are still resetting it but keeping the old array.
For the most part, trying to avoid creating garbage is premature optimization. For most of your code it does not matter. But you may find every once in a while a function which is called a great many times that allocates a lot each time it is run. Or a loop where you reallocate instead of reusing. I would wait until you see the bottle neck before going overboard.

What's the advantage of a String being Immutable?

Once I studied about the advantage of a string being immutable because of something to improve performace in memory.
Can anybody explain this to me? I can't find it on the Internet.
Immutability (for strings or other types) can have numerous advantages:
It makes it easier to reason about the code, since you can make assumptions about variables and arguments that you can't otherwise make.
It simplifies multithreaded programming since reading from a type that cannot change is always safe to do concurrently.
It allows for a reduction of memory usage by allowing identical values to be combined together and referenced from multiple locations. Both Java and C# perform string interning to reduce the memory cost of literal strings embedded in code.
It simplifies the design and implementation of certain algorithms (such as those employing backtracking or value-space partitioning) because previously computed state can be reused later.
Immutability is a foundational principle in many functional programming languages - it allows code to be viewed as a series of transformations from one representation to another, rather than a sequence of mutations.
Immutable strings also help avoid the temptation of using strings as buffers. Many defects in C/C++ programs relate to buffer overrun problems resulting from using naked character arrays to compose or modify string values. Treating strings as a mutable types encourages using types better suited for buffer manipulation (see StringBuilder in .NET or Java).
Consider the alternative. Java has no const qualifier. If String objects were mutable, then any method to which you pass a reference to a string could have the side-effect of modifying the string. Immutable strings eliminate the need for defensive copies, and reduce the risk of program error.
Immutable strings are cheap to copy, because you don't need to copy all the data - just copy a reference or pointer to the data.
Immutable classes of any kind are easier to work with in multiple threads, the only synchronization needed is for destruction.
Perhaps, my answer is outdated, but probably someone will found here a new information.
Why Java String is immutable and why it is good:
you can share a string between threads and be sure no one of them will change the string and confuse another thread
you don’t need a lock. Several threads can work with immutable string without conflicts
if you just received a string, you can be sure no one will change its value after that
you can have many string duplicates – they will be pointed to a single instance, to just one copy. This saves computer memory (RAM)
you can do substring without copying, – by creating a pointer to an existing string’s element. This is why Java substring operation implementation is so fast
immutable strings (objects) are much better suited to use them as key in hash-tables
a) Imagine StringPool facility without making string immutable , its not possible at all because in case of string pool one string object/literal e.g. "Test" has referenced by many reference variables , so if any one of them change the value others will be automatically gets affected i.e. lets say
String A = "Test" and String B = "Test"
Now String B called "Test".toUpperCase() which change the same object into "TEST" , so A will also be "TEST" which is not desirable.
b) Another reason of Why String is immutable in Java is to allow String to cache its hashcode , being immutable String in Java caches its hash code and do not calculate every time we call hashcode method of String, which makes it very fast as hashmap key.
Think of various strings sitting on a common pool. String variables then point to locations in the pool. If u copy a string variable, both the original and the copy shares the same characters. These efficiency of sharing outweighs the inefficiency of string editing by extracting substrings and concatenating.
Fundamentally, if one object or method wishes to pass information to another, there are a few ways it can do it:
It may give a reference to a mutable object which contains the information, and which the recipient promises never to modify.
It may give a reference to an object which contains the data, but whose content it doesn't care about.
It may store the information into a mutable object the intended data recipient knows about (generally one supplied by that data recipient).
It may return a reference to an immutable object containing the information.
Of these methods, #4 is by far the easiest. In many cases, mutable objects are easier to work with than immutable ones, but there's no easy way to share with "untrusted" code the information that's in a mutable object without having to first copy the information to something else. By contrast, information held in an immutable object to which one holds a reference may easily be shared by simply sharing a copy of that reference.

What is a Pointer? [duplicate]

This question already has answers here:
Closed 14 years ago.
See: Understanding Pointers
In many C flavoured languages, and some older languages like Fortran, one can use Pointers.
As someone who has only really programmed in basic, javascript, and actionscript, can you explain to me what a Pointer is, and what it is most useful for?
Thanks!
This wikipedia article will give you detailed information on what a pointer is:
In computer science, a pointer is a programming language data type whose value refers directly to (or "points to") another value stored elsewhere in the computer memory using its address. Obtaining or requesting the value to which a pointer refers is called dereferencing the pointer. A pointer is a simple implementation of the general reference data type (although it is quite different from the facility referred to as a reference in C++). Pointers to data improve performance for repetitive operations such as traversing string and tree structures, and pointers to functions are used for binding methods in Object-oriented programming and run-time linking to dynamic link libraries (DLLs).
A pointer is a variable that contains the address of another variable. This allows you to reference another variable indirectly. For example, in C:
// x is an integer variable
int x = 5;
// xpointer is a variable that references (points to) integer variables
int *xpointer;
// We store the address (& operator) of x into xpointer.
xpointer = &x;
// We use the dereferencing operator (*) to say that we want to work with
// the variable that xpointer references
*xpointer = 7;
if (5 == x) {
// Not true
} else if (7 == x) {
// True since we used xpointer to modify x
}
Pointers are not as hard as they sound. As others have said already, they are variables that hold the address of some other variable. Suppose I wanted to give you directions to my house. I wouldn't give you a picture of my house, or a scale model of my house; I'd just give you the address. You could infer whatever you needed from that.
In the same way, a lot of languages make the distinction between passing by value and passing by reference. Essentially it means will i pass an entire object around every time I need to refer to it? Or, will I just give out it's address so that others can infer what they need?
Most modern languages hide this complexity by figuring out when pointers are useful and optimizing that for you. However, if you know what you're doing, manual pointer management can still be useful in some situations.
There have been several discussions in SO about this topic. You can find information about the topic with the links below. There are several other relevant SO discussions on the subject, but I think that these were the most relevant. Search for 'pointers [C++]' in the search window (or 'pointers [c]') and you will get more information as well.
In C++ I Cannot Grasp Pointers and Classes
What is the difference between modern ‘References’ and traditional ‘Pointers’?
As someone already mention, a pointer is a variable that contains the address of another variable.
It's mostly used when creating new objects (in run-time).

Resources