Dynamic Languages and Variable Allocation - dynamic-languages

How does a dynamic language decide how much memory to allocate for a variable?
eg. How does the compiler change variable= 5 to variable ="xxx" without too much memory overhead? When does it use the hardware stack and when does it use the memory heap?

The compiler allocates enough memory for each variable to hold a pointer plus whatever metadata the language runtime requires. But I think you mean to be asking how much memory is allocated for each object. In that case the answer is that it depends on the type of object. When a variable gets assigned to a different object, the pointer associated with that variable changes what it points to.

The answer, of course, varies by language - both the hosted dynamic language and the lower-level implementation language. That which applies to Perl does not necessarily apply to Python, nor does what applies in Tcl apply in Java or LISP or ... well, do they count as dynamic languages.
In Perl, there's a C-level structure that goes by the name SV (scalar variable) that contains different storage for different versions of the variable's value. These often heap-based; the storage for strings always ends up being heap based, though a pure numeric value that has never been converted to string might be in an SV that is strictly on the stack. In Perl, these things are reference counted (and mortalized, or immortalized, and all sorts of other interesting terms). More complicated types (AV, HV, RV, etc) are based on SV.

Related

Why is Box called like that in Rust?

Box<> is explained like this on the Rust Book:
... allow you to store data on the heap rather than the stack. What remains on the stack is the pointer to the heap data.
With a description like that, I would expect the described object to be called Heap<> or somethingHeapsomethingelse (DerefHeap, perhaps?). Instead, we use Box.
Why was the name Box chosen?
First, Heap is a very overloaded term, and importantly a heap is an abstract datastructure often used to implement things like priority queues. Having a type called Heap which is not a heap would be extremely confusing, a good reason to avoid that.
Second, "box" is related to the concept of "boxing" or "boxed" objects, in languages which strongly distinguish between value and reference types e.g. Java or Javascript: https://en.wikipedia.org/wiki/Object_type_(object-oriented_programming), in those a "boxed" type is the heap-allocated version of a value type e.g. int/Integer in java, or number/Number in Javascript.
Rust's Box performs an operation which is similar in spirit. Box also originally had a built-in "lifting" operator called box (it's still an internal operation and was originally planned to be stabilised for placement new), as such "box"/"boxing" makes sense linguistically in a way "heap"/"heaping" really does not (as "heaping" hints at a lot of things being put on a heap).

Stack allocation of `isbits` types in Julia

Summary of question and answers
Objects of a particular type, say
type Foo
a::A
b::B
end
can be stored in either of two ways:
Inlined (aka by value): in this case, the statement "variable foo::Foo is stored at location x" effectively means we have a variable foo.a::A at location x and a variable foo.b::B at location x + sizeof(A) (technically the addresses could be a bit more complicated, but that's irrelevant for our purposes).
Referenced (aka by reference): "foo::Foo is stored at location x" means the location x contains a pointer fooptr::Ptr{Foo} such that there is a variable foo.a::A at location fooptr and foo.b::B at location fooptr + sizeof(A).
Unlike other languages (I'm looking at you, C/C++), Julia decides by itself whether to store variables inlined or referenced, and it does so based on the properties of the type:
mutable types -> referenced,
immutable types -> referenced if at least one of its fields is referenced, inlined otherwise.
There are at least two reasons for this rule:
StefanKarpinski's answer: The garbage collector needs be able to find all pointers to heap-allocated objects on the stack. Currently, Julia ensures this by storing all such pointers on a separate "shadow stack", but if we allowed composite types containing pointers to be placed on the stack then such a neat separation would no longer be possible. Instead, the compiler would need to look for pointers among other variables which poses technical difficulties.
yuyichao's answer: Julia requires the inline/reference decision to be made on a per-type rather than per-object basis, which means a hypothetical type
immutable A
a::A
end
would have to be infinitely big if we insisted on inlining it. So we would either have to forbid such recursive immutable types, or we could at most allow non-recursive immutable types to be inlined.
Original question
My understanding of memory management in Julia is:
mutable types -> heap-allocated,
immutable types and tuples -> stack-allocated unless one of their fields is heap-allocated (i.e. mutable).
I don't quite understand the rationale for this behaviour, however. I've read somewhere that the problem with stack-allocating immutables with pointers to mutables is that then the garbage collector might consider the mutables unreachable and destroy them prematurely. On the other hand, if we place the immutable on the heap then there will still be a pointer to the mutables, so it might seem like we avoided the problem, but actually we just shifted it to making sure that now the immutable itself will not be destroyed.
Can anyone explain this to me who has only very superficial knowledge of how garbage collection works?
The problem with stack-allocation of objects which reference other objects is knowing that they need to be traced during garbage collection. The simplest way to do this is what Julia does: heap allocate the objects and "root" them using "shadow stack" which is pushed and popped in sync with the actual stack. This introduces a fair bit of overhead and forces these objects to be heap allocated.
A more sophisticated approach that avoids the overhead of a shadow stack and heap allocation is to stack allocated these objects and then scan the stack which doing garbage collection and follow references from objects in the stack to objects on the heap. However, this requires knowing which objects in the stack are pointers to objects on the heap – in general, non heap-allocated objects are not guaranteed to be kept intact or contiguous in registers or the stack. One approach to doing this is called "conservative stack scanning" which entails assuming during gc that any value on the stack which looks like it could be a pointer to an object on the heap actually is. That approach has been successfully used in applications like Safari's JavaScript engine, but it's not without it's challenges. We've contemplated using conservative stack scanning in Julia, and an initial effort to do so was started but the effort was never completed.
References:
https://github.com/JuliaLang/julia/issues/11714
https://github.com/JuliaLang/julia/pull/8134
There are multiple issues/concepts that are frequently mixed together whenever this is brought up.
mutable or non-pointerfree immutable doesn't necessarily mean heap allocation, we already have optimization passes to elide some of the optimizations and are working on improving them further.
The object layout ABI is an user visible behavior and not something an optimization pass can easily change (unless it can prove that the local optimization it wants to do does not escape). The current ABI is that only isbits immutable will be stored inline (and "stack allocated" when used as local variable). There's a fundamental limitation of lifting the requirement of pointerfree-ness for inlined object, i.e. the necessity to handle recursive types. It is impossible to make all types in a reference circle stored inline and the loop has to be broken somewhere if we want to make some of them inlined. I believe we do have a consistent and predictable model to do this though whether this is desireable is another issue.
This is somewhat related to performance but not always. Stored inline means more copy so it's hard to make sure there's no regression if we do the switch.
Edit: And I should also mention that pointer-free is a sufficient condition for cycle free and is easier to compute, which is partly why we are currently using it to break inlining cycles.
GC support. This is basically the easiest part. It's very easy to make GC recognize pointers on the stack. It just needs to be done if we decide to change the object layout ABI.
Edit: And I should add that "GC support" is needed because we currently only support a limited / simple stack layout for object reference (i.e. an array of pointers). It's this that needs to be improved.

Why does Rust need to return static sizes?

I thought one of the big features of Rust is being a systems language comparable to C but with a garbage collector. If this is the case, why do you need to return values of a static size (or use Box from what I gather)?
Why does Rust need to return static sizes?
Every value in every language needs to have a static size. That's how the compiler / interpreter / runtime / virtual machine / hardware knows how to access the bits that make up the value.
In many languages, every value is comparable to a Rust Box, so they all take up one or two pointer's worth of space. The statically-known size for those values allows a layer of indirection which can point to something with a runtime-determined size.
In Rust (and C, C++, probably other system languages), you can also directly store arbitrary values on the stack, unboxed. In these cases, you still need to know the size that the value will occupy.
This is a simplification, as some languages allow certain specific values to reside on the stack, while others "embed" certain value types inside of the fixed-size indirection. Tricks like these are usually for performance reasons.
but with a garbage collector
Rust does not have a garbage collector. It does have smart pointers that deallocate resources when the pointer goes out of scope.
Box is the obvious smart pointer, but there's also Rc and Arc.

Compiled Language with Dynamic Typing

I'm a bit confused when it comes to a compiled language (compilation to native code) with dynamic typing.
Dynamic typing says that the types in a program are only inferred at runtime.
Now if a language is compiled, there's no interpreter running at runtime; it's just your CPU reading instructions off memory and executing them. In such a scenario, if any instruction violating the type semantics of the language happens to execute at runtime, there's no interpreter to intercept the execution of the program and throw any errors. How does the system work then?
What happens when an instruction violating the type semantics of a dynamically typed compiled language is executed at runtime?
PS: Some of the dynamically typed compiled languages I know of include Scheme, Lua and Common Lisp.
A compiler for a dynamically typed language would simply generate instructions that check the type where necessary. In fact even for some statically typed languages this is sometimes necessary, say, in case of an object oriented language with checked casts (like dynamic_cast in C++). In dynamically types languages it is simply necessary more often.
For these checks to be possible each value needs to be represented in a way that keeps track of its type. A common way to represent values in dynamically typed languages is to represent them as pointers to a struct that contains the type as well as the value (as an optimization this is often avoided in case of (sufficiently small) integers by storing the integer directly as an invalid pointer (by shifting the integer one to the left and setting its least significant bit)).

statically/dynamically typed vs static/dynamic binding

everyone what is the difference between those 4 terms, can You give please examples?
Static and dynamic are jargon words that refer to the point in time at which some programming element is resolved. Static indicates that resolution takes place at the time a program is constructed. Dynamic indicates that resolution takes place at the time a program is run.
Static and Dynamic Typing
Typing refers to changes in program structure that are due to the differences between data values: integers, characters, floating point numbers, strings, objects and so on. These differences can have many effects, for example:
memory layout (e.g. 4 bytes for an int, 8 bytes for a double, more for an object)
instructions executed (e.g. primitive operations to add small integers, library calls to add large ones)
program flow (simple subroutine calling conventions versus hash-dispatch for multi-methods)
Static typing means that the executable form of a program generated at build time will vary depending upon the types of data values found in the program. Dynamic typing means that the generated code will always be the same, irrespective of type -- any differences in execution will be determined at run-time.
Note that few real systems are either purely one or the other, it is just a question of which is the preferred strategy.
Static and Dynamic Binding
Binding refers to the association of names in program text to the storage locations to which they refer. In static binding, this association is predetermined at build time. With dynamic binding, this association is not determined until run-time.
Truly static binding is almost extinct. Earlier assemblers and FORTRAN, for example, would completely precompute the exact memory location of all variables and subroutine locations. This situation did not last long, with the introduction of stack and heap allocation for variables and dynamically-loaded libraries for subroutines.
So one must take some liberty with the definitions. It is the spirit of the concept that counts here: statically bound programs precompute as much as possible about storage layout as is practical in a modern virtual memory, garbage collected, separately compiled application. Dynamically bound programs wait as late as possible.
An example might help. If I attempt to invoke a method MyClass.foo(), a static-binding system will verify at build time that there is a class called MyClass and that class has a method called foo. A dynamic-binding system will wait until run-time to see whether either exists.
Contrasts
The main strength of static strategies is that the program translator is much more aware of the programmer's intent. This makes it easier to:
catch many common errors early, during the build phase
build refactoring tools
incur a significant amount of the computational cost required to determine the executable form of the program only once, at build time
The main strength of dynamic strategies is that they are much easier to implement, meaning that:
a working dynamic environment can be created at a fraction of the cost of a static one
it is easier to add language features that might be very challenging to check statically
it is easier to handle situations that require self-modifying code
Typing - refers to variable tyes and if variables are allowed to change type during program execution
http://en.wikipedia.org/wiki/Type_system#Type_checking
Binding - this, as you can read below can refer to variable binding, or library binding
http://en.wikipedia.org/wiki/Binding_%28computer_science%29#Language_or_Name_binding

Resources