Related
I came across this term when exploring Rust.
I saw different kinds of explanations regarding this and still don't quite get the ideas.
In The Embedded Rust Book, it said
Type states are also an excellent example of Zero Cost Abstractions
the ability to move certain behaviors to compile time execution or analysis.
These type states contain no actual data, and are instead used as
markers.
Since they contain no data, they have no actual representation in
memory at runtime:
Does it mean the runtime is faster because there is no memory in runtime?
Appreciate it if anyone can explain it in an easy to understand way.
Zero Cost Abstractions means adding higher-level programming concepts, like generics, collections and so on do not come with a run-time cost, only compile time cost (the code will be slower to compile). Any operation on zero-cost abstractions is as fast as you would write out matching functionality by hand using lower-level programming concepts like for loops, counters, ifs and using raw pointers.
Or another way to view this is that using zero-cost abstraction tools, functions, templates, classes and such come with "zero cost" for the performance of your code.
Zero cost abstractions are ones that bear no runtime costs in execution speed or memory usage.
By contrast, virtual methods are a good example of a costly abstraction: in many OO languages the type of the method's caller is determined at runtime which requires maintaining a lookup table (runtime memory usage) and then actually performing the lookup (runtime overhead per method call, likely at least an extra pointer dereference) with the runtime type to determine which version of the method to call. Another good example would be garbage collection: in return for being able to not worry about the details of memory allocation you pay with GC pauses.
Rust though mostly tries to have zero cost abstractions: ones that let you have your cake and eat it too. Ones that compiler can safely and correctly convert to forms that bear no extra indirection/memory usage. In fact, the only thing that I'm aware of (somebody more knowledgeable correct me if I'm wrong) that you really pay for at runtime in Rust is bounds checking.
The concept of zero cost abstractions originally came from the functional world. However, the terminology comes from C++. According to Bjarne Stroustrup,
In general, C++ implementations obey the
zero-overhead principle: What you don’t use, you don’t pay for. And further: What you do use, you couldn’t hand code any better.
This quote along with most answers fail to deliver the idea in its entirety because the context in which these were said isn't explicitly stated.
If there was only one programming language in the world: be it Rust or C++, zero-cost abstractions would be indistinguishable from most other compiler optimizations. The implication here is that there are countless other languages that let you do the same things as Rust or C++, but with a nonzero and often runtime-specific cost.
While this may sound as theoretical question, suppose I decide to invest and build a mission-critical application written in Haskell. A year later I find that I absolutely need to improve performance of some very thin bottleneck and this will require optimizing memory access close to raw machine capabilities.
Some assumptions:
It isn't realtime system - occasional latency spikes are tolerable (from interrupts, thread scheduling irregularities, occasional GC etc.)
It isn't a numeric problem - data layout and cache-friendly access patterns are most important (avoiding pointer chasing, reducing conditional jumps etc.)
Code may be tied to specific GHC release (but no forking)
Performance goal requires inplace modification of pre-allocated offheap arrays taking alignment into account (C strings, bit-packed fields etc.)
Data is statically bounded in arrays and allocations are rarely if ever needed
What mechanisms does GHC offer to perfom this kind of optimization? By saying reliably I mean that if source change causes code to no longer perform, it is correctible in source code without rewriting it in assembly.
Is it already possible using GHC-specific extensions and libraries?
Would custom FFI help avoid C calling convention overhead?
Could a special purpose compiler plugin do it through a restricted source DSL?
Could source code generator from a "high-level" assembly (LLVM?) be solution?
It sounds like you're looking for unboxed arrays. "unboxed" in haskell-land means "has no runtime heap representation". You can usually learn whether some part of your code is compiled to an unboxed loop (a loop that performs no allocation), say, by looking at the core representation (this is a very haskell-like language, that's the first stage in compilation). So e.g. you might see Int# in the core output which means an integer which has no heap representation (it's gonna be in a register).
When optimizing haskell code we regularly look at core and expect to be able to manipulate or correct for performance regressions by changing the source code (e.g. adding a strictness annotation, or fiddling with a function such that it can be inlined). This isn't always fun, but will be fairly stable especially if you are pinning your compiler version.
Back to unboxed arrays: GHC exposes a lot of low-level primops in GHC.Prim, in particular it sounds like you want mutable unboxed arrays (MutableByteArray). The primitive package exposes these primops behind a slightly safer, friendlier API and is what you should use (and depend on if writing your own library).
There are many other libraries that implement unboxed arrays, such as vector, and which are built on MutableByteArray, but the point is that operations on that structure generate no garbage and likely compile down to pretty predictable machine instructions.
You might also like to check out this technique if you're doing numeric work and want to use a particular instruction or implement some loop directly in assembly.
GHC also has a very powerful FFI, and you can research about how to write portions of your program in C and interop; haskell supports pinned arrays among other structures for this purpose.
If you need more control than those give you then haskell is likely the wrong language. It's impossible to tell from your description if this is the case for your problem (Your requirements seem contradictory: you need to be able to write a carefully cache-tuned algorithm, but arbitrary GC pauses are okay?).
One last note: you can't rely on GHC's native code generator to perform any of the low-level strength reduction optimizations that e.g. GCC performs (GHC's NCG will probably never ever know about bit-twiddling hacks, autovectorization, etc. etc.). Instead you can try the LLVM backend, but whether you see a speedup in your program is by no means guaranteed.
According to Wikipedia, the translation from lambda calculus to combinatory logic is trivial. Concatenative programming languages can rely solely on a stack for memory allocation.
What's stopping GHC from translating Haskell into a concatenative programming language, such as combinatory logic, and then simply using stack allocation for everything?
Is it feasible to do this translation and thus eliminate garbage collection for languages such as Haskell and OCaml? Are there downsides to doing this?
Suppose I have a function that generates a linked list of some size. The size is the function parameter.
The question is: where do I have to allocate memory for the list?
I can't allocate it on the function's stack, since it's invalid after the function is out. And I can't allocate it on the caller's stack, since I don't know how many memory I need to allocate before the function call. So I need to allocate it on the heap.
I think there may be RAII with manual heap management usable, But I can't see how to eliminate heap allocation at all.
Edit
I can't fit all my thoughts in the comment, so I write them here.
There is no magic about stack-based allocation languages. You still need to know when your data is relevant and remove them when they're not.
Imagine you have a separate stack, and your function has control to push and pop data in it. First, there is no automatic memory management anymore, i.e. the function terminates but the data is not deallocated automatically. Second, if you function allocates some memory, needed to support e.g. the list calculation, then all that stuff will be shuffled with the list that you want to return. No chance you can free unused memory (other lists, trees or so) since you have just push and pop operations. If you have other operations, then what is the difference with the heap?
What about few stacks, not one?
You need to allocate them somewhere, manage their growth and sometimes get them back. That stacks are separate constructions that you need manage by hands. No automatic memory management.
Stack-based languages are ok, but forget about the huge amount of algorithms, that was invented with the concept "get memory from somewhere" and "put the memory back", like hash maps, red-black trees, linked lists. Of course, we can allocate all of those structs on a stack, but we can't free their parts if they don't need anymore.
What about "trivial" lambda calculus translation to Turing machine?
Of course, it is trivial, if you resources are infinite. The math theory clarifies nothing about time and memory complexity of such translated constructions. It just approves that both of that models are equivalent, and all that we can say with Turing machine we can say with lambda calculus, and vice-versa. No guarantees that it can work with real-life limitations.
A concatenative programming language is every bit as capable of running out of memory as a functional programming language.
The fundamental challenge garbage collection addresses is freeing memory that is not, or is not known to be, used in a stack-like fashion. It is most especially useful when there is no clear place in the source code that can be pinpointed as the end of the object's lifetime.
If you simply translate a functional language into a concatenative one with only stack allocation, then you will end up overflowing the stack.
There have definitely been various efforts over the years to reduce the need for garbage collection. One interesting (but very complicated) attempt is the region inference system used in the ML Kit. Unfortunately, that's a bit much for most programmers, including myself, to understand. I believe others have worked on such systems since; I don't know the current state of the art.
The take-away is that some very heavy compiler machinery, along with careful programmer discipline and perhaps special annotations, can sometimes reduce or eliminate the need for garbage collection; no trivial transformation is going to do the trick.
If you were designing a programming language that features automatic memory management, would using reference counting allow for determinism guarantees that are not possible with a garbage collector?
Would there be a different answer to this question for functional vs. imperative languages?
Would using reference counting allow for determinism guarantees that are not possible with a garbage collector?
The word guarantee is a strong one. Here are the guarantees you can provide with reference counting:
Constant time overhead at an assignment to adjust reference counts.
Constant time to free an object whose reference count goes to zero. (The key is that you must not decrement that object's children right away; instead you must do it lazily when the object is used to satisfy a future allocation request.)
Constant time to allocate a new object when the relevant free list is not empty. This guarantee is conditional and isn't worth much.
Here are some things you can't guarantee with reference counting:
Constant time to allocate a new object. (In the worst case, the heap may be growing, and depending on the system the delay to organize new memory may be considerable. Or even worse, you may fill the heap and be unable to allocate.)
All unreachable objects are reclaimed and reused while maintaining constant time for other operations. (A standard reference counter can't collect cyclic garbage. There are a variety of ingenious workarounds, but generally they invalidate constant-time guarantees for simple operations.)
There are now some real-time garbage collectors that provide pretty interesting guarantees about pause times, and in the last 5 years there have been pretty interesting developments in both reference counting and garbage collection. From where I sit as an informed outsider, there's no obvious winner.
Some of the best recent work on reference counting is by David Bacon of IBM and by Erez Petrank of Technion. If you want to learn what a sophisticated, modern reference-counting system can do, look up their papers. Among other things, they are using multiple processors in amazing ways.
For information about memory management and real-time guarantees more generally, check out the International Symposium on Memory Management.
Would there be a different answer to this question for functional vs. imperative languages?
Because you asked about guarantees, no. But for memory management in general, the performance tradeoffs are quite different for an imperative language (lots of mutation but low allocation rates), an impure functional language (hardly any mutation but high allocation rates), and a pure, lazy functional language (lots of mutation—all those thinks being updated—and high allocation rates).
would using reference counting allow for determinism guarantees that are not possible with a garbage collector?
I don't see how. The process of lowering the reference count of an object is not time-bounded, as that object may be the single root for an arbitrary large object graph.
The only way to approach the problem of GC for real-time systems is by using either a concurrent collector or an incremental one - and no matter if the system uses reference counting or not; in my opinion your distinction between reference counting and "collection" is not precise anyway, e.g. systems which utilize reference counting might still occasionally perform some memory sweep (for example, to handle cycles).
You might be interested in IBM's Metronome, and I also know Microsoft has done some research in direction of good, real-time memory management.
If you look at the RTSJ spec (JSR-1), you'll see they did an end-run around the problem by providing for no-heap realtime threads. By having a separate category of thread that isn't allowed to touch any object that might require the thread to be stopped for garbage collection, JSR-1 side stepped the issue. There aren't many RTSJ implementations right now, but the area of realtime garbage collection is a hot topic in that community.
For real time programming, does reference counting have an advantage over garbage collection in terms of determinism?
Yes. The main advantage of reference counting is simplicity.
If you were designing a programming language that features automatic memory management, would using reference counting allow for determinism guarantees that are not possible with a garbage collector?
A GC like Baker's Treadmill should attain the same level of guarantees regarding determinism that reference counting offers.
Would there be a different answer to this question for functional vs. imperative languages?
Yes. Reference counting alone does not handle cycles. Some functional languages make it impossible to create cycles by design (e.g. Erlang and Mathematica) so they trivially permit reference counting alone as an exact approach to GC.
In real time programming garbage collection could be harmful, because you don't know when the garbage collector will collect... so yes, reference counting is definitely better in this context.
As a side note, usually in real time system only some parts needs real time processing, so you could avoid garbage collection just in sensitive components. A real world example is a C# program running on a Windows CE target.
From some involvement in various projects migrating significant chunks of code from C++ (with various smart pointer classes, including reference counted) to garbage collected Java/C#, I observe that the biggest pain-points all seem to be related to classes with non-empty destructors (particularly when used for RAII). This is a pretty big flag that deterministic cleanup is expected.
The issue is surely much the same for any language with objects; I don't think hybrid OO-functional languages like Scala or Ocaml enjoy any particular advantages in this area. Situation might be different for more "pure" functional languages.
So I'm currently working on a new programming language. Inspired by ideas from concurrent programming and Haskell, one of the primary goals of the language is management of side effects. More or less, each module will be required to specify which side effects it allows. So, if I were making a game, the graphics module would have no ability to do IO. The input module would have no ability to draw to the screen. The AI module would be required to be totally pure. Scripts and plugins for the game would have access to a very restricted subset of IO for reading configuration files. Et cetera.
However, what constitutes a side effect isn't clear cut. I'm looking for any thoughts or suggestions on the subject that I might want to consider in my language. Here are my current thoughts.
Some side effects are blatant. Whether its printing to the user's console or launching your missiles, anything action that reads or write to a user-owned file or interacts with external hardware is a side effect.
Others are more subtle and these are the ones I'm really interested in. These would be things like getting a random number, getting the system time, sleeping a thread, implementing software transactional memory, or even something very fundamental such as allocating memory.
Unlike other languages built to control side effects (looking at you Haskell), I want to design my language to be pragmatic and practical. The restrictions on side effects should serve two purposes:
To aid in the separations of concerns. (No one module can do everything).
To sandbox each module in the application. (Any module could be used as a plugin)
With that in mind, how should I handle "pseudo"-side effects, like random numbers and sleeping, as I mention above? What else might I have missed? In what ways might I manage memory usage and time as resources?
The problem of how to describe and control effects is currently occupying some of the best scientific minds in programming languages, including people like Greg Morrisett of Harvard University. To my knowledge, the most ambitious pioneering work in this area was done by David Gifford and Pierre Jouvelot in the FX programming language started in 1987. The language definition is online, but you may get more insight into the ideas by reading their 1991 POPL paper.
This is a really interesting question, and it represents one of the stages I've gone through and, frankly, moved beyond.
I remember seminars in which Carl Hewitt, in talking about his Actors formalism, discussed this. He defined it in terms of a method giving a response that was solely a function of its arguments, or that could give different answers at different times.
I say I moved beyond this because it makes the language itself (or the computational model) the main subject, as opposed to the problem(s) it is supposed to solve. It is based on the idea that the language should have a formal underlying model so that its properties are easy to verify. That is fine, but still remains a distant goal, because there is still no language (to my knowledge) in which the correctness of something as simple as bubble sort is easy to prove, let alone more complex systems.
The above is a fine goal, but the direction I went was to look at information systems in terms of information theory. Specifically, assuming a system starts with a corpus of requirements (on paper or in somebody's head), those requirements can be transmitted to a program-writing machine (whether automatic or human) to generate source code for a working implementation. THEN, as changes occur to the requirements, the changes are processed through as delta changes to the implementation source code.
Then the question is: What properties of the source code (and the language it is encoded in) facilitate this process? Clearly it depends on the type of problem being solved, what kinds of information go in and out (and when), how long the information has to be retained, and what kind of processing needs to be done on it. From this one can determine the formal level of the language needed for that problem.
I realized the process of cranking through delta changes of requirements to source code is made easier as the format of the code comes more to resemble the requirements, and there is a nice quantitative way to measure this resemblence, not in terms of superficial resemblence, but in terms of editing actions. The well-known technology that best expresses this is domain specific languages (DSL). So I came to realize that what I look for most in a general-purpose language is the ability to create special-purpose languages.
Depending on the application, such special-purpose languages may or may not need specific formal features like functional notation, side-effect control, paralellism, etc. In fact, there are many ways to make a special-purpose language, from parsing, interpreting, compiling, down to just macros in an existing language, down to simply defining classes, variables, and methods in an existing language. As soon as you declare a variable or subroutine you're created new vocabulary and thus, a new language in which to solve your problem. In fact, in this broad sense, I don't think you can solve any programming problem without being, at some level, a language designer.
So best of luck, and I hope it opens up new vistas for you.
A side effect is having any effect on anything in the world other than returning a value, i.e. mutating something that could be visible in some way outside the function.
A pure function neither depends on or affects any mutable state outside the scope of that invocation of the function, which means that the function's output depends only on constants and its inputs. This implies that if you call a function twice with the same arguments, you are guaranteed to get the same result both times, regardless of how the function is written.
If you have a function that modifies a variable that it has been passed, that modification is a side effect because it's visible output from the function other than the return value. A void function that is not a no-op must have side effects, because it has no other way of affecting the world.
The function could have a private variable only visible to that function that it reads and modifies, and calling it would still have the side effect of changing the way the function behaves in the future. Being pure means having exactly one channel for output of any kind: the return value.
It is possible to generate random numbers purely, but you have to pass around the random seed manually. Most random functions keep a private seed value that is updated each time its called so that you get a different random each time. Here's a Haskell snippet using System.Random:
randomColor :: StdGen -> (Color, Int, StdGen)
randomColor gen1 = (color, intensity, gen2)
where (color, gen2) = random gen1
(intensity, gen3) = randomR (1, 100) gen2
The random functions each return the randomized value and a new generator with a new seed (based on the previous one). To get a new value each time, the chain of new generators (gen1,gen2,gen3) have to be passed along. Implicit generators just use an internal variable to store the gen1.. values in the background.
Doing this manually is a pain, and in Haskell you can use a state monad to make it a lot easier. You'll want to implement something less pure or use a facility like monads, arrows or uniqueness values to abstract it away.
Getting the system time is impure because the time could be different each time you ask.
Sleeping is fuzzier because sleep doesn't affect the result of the function, and you could always delay execution with a busy loop, and that wouldn't affect purity. The thing is that sleeping is done for the sake of something else, which IS a side effect.
Memory allocation in pure languages has to happen implicitly, because explicitly allocating and freeing memory are side effects if you can do any kind of pointer comparisons. Otherwise, creating two new objects with the same parameters would still produce different values because they would have different identities (e.g. not be equal by Java's == operator).
I know I've rambled on a bit, but hopefully that explains what side effects are.
Give a serious look to Clojure, and their use of software transactional memory, agents, and atoms to keep side effects under control.