Recursion in production-quality VC++ Code - visual-c++

When writing production-quality VC++ code, is the use of recursion acceptable? Why or why not?

Is there a way to determine at what point I would encounter a stack overflow?
Not really. A stack overflow happens when you exhaust the stack space - however...
The initial stack size can be changed programatically and may default to different amounts depending on your OS/compiler/etc
How much of it is already used up depends on what your app (and the libraries your app uses) has previously done - this is often impossible to predict
How much of the stack each call requires depends on what you do in your function. If you only allocate say 1 integer on the stack, you may be able to recurse an enourmous amount of times, but if you are allocating a 200k buffer on the stack, not so much.
The only times I've ever hit one is in an infinite loop, or using the aforementioned 200k buffer.
I find it far more prefereable for my app to just crash, than for it to loop forever using 100% CPU and have to be forcefully killed (this is a right PITA on a remote server over a bad connection as windows lacks SSH)
A rough guideline: Do you think your recursive function is likely to call itself more than say 10,000 times consecutively? Or are you doing something dumb like allocating 200k buffers on the stack?
If yes, worry about it.
If no, carry on with more important things.

Yes. But never in dead code. That would be silly.

Sure - e.g. if you want to traverse a tree structure what else would you use ?
Maybe you would like to have something like a maximum depth to be sure you're not writing an infinite loop. (if this makes sense in your example)

Is there a way to determine at what
point I would encounter a stack
overflow?
Depends how deep you go, and how large the actual recursion is. I take it you understand what recursion does?

Recursion is almost essential to traverse File structures like folder/directories.
Traversing a tree like structure is very easy if recursion is used.

Related

Is rbp/ebp(x86-64) register still used in conventional way?

I have been writing a small kernel lately based on x86-64 architecture. When taking care of some user space code, I realized I am virtually not using rbp. I then looked up at some other things and found out compilers are getting smarter these days and they really dont use rbp anymore. (I could be wrong here.)
I was wondering if conventional use of rbp/epb is not required anymore in many instances or am I wrong here. If that kind of usage is not required then can it be used like a general purpose register?
Thanks
It is only needed if you have variable-length arrays in your stack frames (recording the array length would require more memory and more computations). It is no longer needed for unwinding because there is now metadata for that.
It is still useful if you are hand-writing entire assembly functions, but who does that? Assembly should only be used as glue to jump into a C (or whatever) function.

How can a garbage collector find out about object references done from the stack?

In languages with automatic garbage collection like Haskell or Go, how can the garbage collector find out which values stored on the stack are pointers to memory and which are just numbers? If the garbage collector just scans the stack and assumes all addresses to be references to objects, a lot of objects might get incorrectly marked as reachable.
Obviously, one could add a value to the top of each stack frame that described how many of the next values are pointers, but wouldn't that cost a lot of performance?
How is it done in reality?
Some collectors assume everything on the stack is a potential pointer (like Boehm GC). This turns out to be not as bad as one might expect, but is clearly suboptimal. More often in managed languages, some extra tagging information is left with the stack to help the collector figure out where the pointers are.
Remember that in most compiled languages, the layout of a stack frame is the same every time you enter a function, therefore it is not that hard to ensure that you tag your data in the right way.
The "bitmap" approach is one way of doing this. Each bit of the bitmap corresponds to one word on the stack. If the bit is a 1 then the location on the stack is a pointer, and if it is a 0 then the location is just a number from the point of view of the collector (or something along those lines). The exceptionally well written GHC runtime and calling conventions use a one word layout for most functions, such that a few bits communicate the size of the stack frame, with the rest serving as the bitmap. Larger stack frames need a multi word structure, but the idea is the same.
The point is that the overhead is low, since the layout information is computed at compile time, and then included in the stack every time a function is called.
An even simpler approach is "pointer first", where all the pointers are located at the beginning of the stack. You only need to include a length prior to the pointers, or a special "end" word after them, to tell which words are pointers given this layout.
Interestingly, trying to get this management information on to the stack produces a host of problem related to interop with C. For example, it is sub optimal to compile high level languages to C, since even though C is portable, it is hard to carry this kind of information. Optimizing compilers designed for C like languages (GCC,LLVM) may restructure the stack frame, producing problems, so the GHC LLVM backend uses its own "stack" rather than the LLVM stack which costs it some optimizations. Similarly, the boundary between C code, and "managed" code needs to be constructed carefully to keep from confusing the GC.
For this reason, when you create a new thread on the JVM you actually create two stacks (one for Java, one for C).
The Haskell stack uses a single word of memory in each stack frame describing (with a bitmap) which of the values in that stack frame are pointers and which are not. For details, see the "Layout of the stack" article and the "Bitmap layout" article from the GHC Commentary.
To be fair, a single word of memory really isn't much cost, all things considered. You can think of it as just adding a single variable to each method; that's not all that bad.
There exist GCs that assume that every bit pattern that is the address of something the GC is managing is in fact a pointer (and so don't release the something). This can actually work pretty well, because calls pointers are usually bigger than small common integers, and usually have to be aligned. But yes, this can cause collection of some objects to be delayed. The Boehm collector for C works this way, because it's library-based and so don't get any specific help from the compiler.
There are also GCs that are more tightly coupled to the language they're used in, and actually know the structure of the objects in memory. I've never read up specifically in stack frame handling, but you could record information to help the GC if the compiler and GC are designed to work together. One trick would be putting all the pointer references together and using one word per stack frame to record how many there are, which is not such a huge overhead. If you can work out what function corresponds to each stack frame without adding a word saying so, then you could have a per-function "stack frame layout map" compiled in. Another option would be to use tagged words, where you set the low order bit of words that are not pointers to 1, which (due to address alignment) is never needed for pointers, so you can tell them apart. That means you have to shift unboxed values in order to use them though.
It's important to realize that GHC maintains its own stack and does not use the C stack (other than for FFI calls). There's no portable way to access all of the contents of the C stack (for instance, in a SPARC some of it is hidden away in register windows), so GHC maintains a stack where it has full control. Once you maintain your own stack you can pick any scheme to distinguish pointers from non-pointers on the stack (like a using a bitmap).

nodejs buffers vs typed arrays

What is more efficient - nodejs buffers or typed arrays? What should I use for better performance?
I think that only those who know interiors of V8 and NodeJs could answer this question.
A Node.js buffer should be more efficient than a typed array. The reason is simply because when a new Node.js Buffer is created it does not need to be initialized to all 0's. Whereas, the HTML5 spec states that initialization of typed arrays must have their values set to 0. Allocating the memory and then setting all of the memory to 0's takes more time.
In most applications picking either one won't matter. As always, the devil lies in the benchmarks :) However, I recommend that you pick one and stick with it. If you're often converting back and forth between the two, you'll take a performance hit.
Nice discussion here: https://github.com/joyent/node/issues/4884
There are a few things that I think are worth mentioning:
Buffer instances are Uint8Array instances but there are subtle incompatibilities with the TypedArray specification in ECMAScript 2015. For example, while ArrayBuffer#slice() creates a copy of the slice, the implementation of Buffer#slice() creates a view over the existing Buffer without copying, making Buffer#slice() far more efficient.
When using Buffer.allocUnsafe() and Buffer.allocUnsafeSlow() the memory isn't zeroed-out (as many have pointed out already). So make sure you completely overwrite the allocated memory or you can allow the old data to be leaked when the Buffer memory is read.
TypedArrays are not readable right away, you'll need a DataView for that. Which means you might need to rewrite your code if you were to migrate back to Buffer. Adapter pattern could help here.
You can use for-of on Buffer. You cannot on TypedArrays. Also you won't have the classic entries(), values(), keys() and length support.
Buffer is not supported in the frontend while TypedArray may well be. So if your code is shared between frontend or backend you might consider sticking to one.
More info in the docs here.
This is a tough one, but I think that it will depend on what are you planning to do with them and how much data you are planning to work with?
typed arrays themselves need node buffers, but are easier to play with and you can overcome the 1GB limit (kMaxLength = 0x3fffffff).
If you are doing common stuff such as iterations, setting, getting, slicing, etc... then typed arrays should be your best shot for performance, not memory ( specially if you are dealing with float and 64bits integer types ).
In the end, probably only a good benchmark with what you want to do can shed real light on this doubt.

What would programming languages look like if every computable thing could be done in 1 second?

Inspired by this question
Suppose we had a magical Turing Machine with infinite memory, and unlimited CPU power.
Use your imagination as to how this might be possible, e.g. it uses some sort of hyperspace continuum to automatically parallelize anything as much as is desired, so that it could calculate the answer to any computable question, no matter what it's time complexity is and number of actual "logical steps", in one second.
However, it can only answer computable questions in one second... so I'm not positing an "impossible" machine (at least I don't think so)... For example, this machine still wouldn't be able to solve the halting problem.
What would the programming language for such a machine look like? All programming languages I know about currently have to make some concessions to "algorithmic complexity"... with that constraint removed though, I would expect that all we would care about would be the "expressiveness" of the programming language. i.e. its ability to concisely express "computable questions"...
Anyway, in the interests of a hopefully interesting discussion, opening it up as community wiki...
SendMessage travelingSalesman "Just buy a ticket to the same city twice already. You'll spend much more money trying to solve this than you'll save by visiting Austin twice."
SendMessage travelingSalesman "Wait, they built what kind of computer? Nevermind."
This is not really logical. If a thing takes O(1) time, then doing n times will take O(n) time, even on a quantum computer. It is impossible that "everything" takes O(1) time.
For example: Grover's algorithm, the one mentioned in the accepted answer to the question you linked to, takes O(n^1/2) time to find an element in a database of n items. And thats not O(1).
The amount of memory or the speed of the memory or the speed of the processor doesn't define the time and space complexity of an algorithm. Basic mathematics do that. Asking what would programming languages look like if everything could be computed in O(1) is like asking how would our calculators look like if pi was 3 and the results of all square roots are integers. It's really impossible and if it isn't, it's not likely to be very useful.
Now, asking ourself what we would do with infinite process power and infinite memory could be a useful exercise. We'll still have to deal with complexity of algorithms but we'd probably work somehow differently. For that I recommend The Hundred-Year Language.
Note that even if the halting problem is not computable, "does this halt within N steps on all possible inputs of size smaller than M" is!
As such any programming language would become purely specification. All you need to do is accurately specify the pre and post conditions of a function and the compiler could implement the fastest possible code which implements your spec.
Also, this would trigger a singularity very quickly. Constructing an AI would be a lot easier if you could do near infinite computation -- and once you had one, of any efficiency, it could ask the computable question "How would I improve my program if I spent a billion years thinking about it?"...
It could possibly be a haskell-ish language. Honestly it's a dream to code in. You program the "laws" of your types, classes, and functions and then let them loose. It's incredibly fun, powerful, and you can write some very succinct and elegant code. It's like an art.
Maybe it would look more like pseudo-code than "real" code. After all, you don't have to worry about any implementation details any more because whichever way you go, it'll be sufficiently fast enough.
Scalability would not be an issue any longer. We'd have AIs way smarter than us.
We wouldn't need to program any longer and instead the AI would figure out our intentions before we realize them ourselves.
SQL is such a language - you ask for some piece of data and you get it. If you didn't have to worry about minute implementation details of the db this might even be fun to program in.
Your underestimate the O(1). It means that there exists a constant C>0 such that time to compute a problem is limited to this C.
What you ignore is that the actual value of C can be large and it can (and mostly is) different for different algorithms. You may have two algorithms (or computers - doesn't matter) both with O(1) but in one this C may be billion times bigger that in another - then the latter will be much slower and perhaps very slow in terms of time.
If it will all be done in one second, then most languages will eventually look like this, I call it DWIM theory (Do what I mean theory):
Just do what I said (without any bugs this time)
Because if we ever develop a machine that can compute everything in one second, then we will probably have mind control at that stage, and at the very least artificial intelligence.
I don't know what new languages would come up (I'm a physicist, not a computer scientist) but I'd still write my programs for it in Python.

How to implement closures without gc?

I'm designing a language. First, I want to decide what code to generate. The language will have lexical closures and prototype based inheritance similar to javascript. But I'm not a fan of gc and try to avoid as much as possible. So the question: Is there an elegant way to implement closures without resorting to allocate the stack frame on the heap and leave it to garbage collector?
My first thoughts:
Use reference counting and garbage collect the cycles (I don't really like this)
Use spaghetti stack (looks very inefficient)
Limit forming of closures to some contexts such a way that, I can get away with a return address stack and a locals' stack.
I won't use a high level language or follow any call conventions, so I can smash the stack as much as I like.
(Edit: I know reference counting is a form of garbage collection but I am using gc in its more common meaning)
This would be a better question if you can explain what you're trying to avoid by not using GC. As I'm sure you're aware, most languages that provide lexical closures allocate them on the heap and allow them to retain references to variable bindings in the activation record that created them.
The only alternative to that approach that I'm aware of is what gcc uses for nested functions: create a trampoline for the function and allocate it on the stack. But as the gcc manual says:
If you try to call the nested function through its address after the containing function has exited, all hell will break loose. If you try to call it after a containing scope level has exited, and if it refers to some of the variables that are no longer in scope, you may be lucky, but it's not wise to take the risk. If, however, the nested function does not refer to anything that has gone out of scope, you should be safe.
Short version is, you have three main choices:
allocate closures on the stack, and don't allow their use after their containing function exits.
allocate closures on the heap, and use garbage collection of some kind.
do original research, maybe starting from the region stuff that ML, Cyclone, etc. have.
This thread might help, although some of the answers here reflect answers there already.
One poster makes a good point:
It seems that you want garbage collection for closures
"in the absence of true garbage collection". Note that
closures can be used to implement cons cells. So your question
seem to be about garbage collection "in the absence of true
garbage collection" -- there is rich related literature.
Restricting problem to closures does not really change it.
So the answer is: no, there is no elegant way to have closures and no real GC.
The best you can do is some hacking to restrict your closures to a particular type of closure. All this is needless if you have a proper GC.
So, my question reflects some of the other ones here - why do you not want to implement GC? A simple mark+sweep or stop+copy takes about 2-300 lines of (Scheme) code, and isn't really that bad in terms of programming effort. In terms of making your programs slower:
You can implement a more complex GC which has better performance.
Just think of all the memory leaks programs in your language won't suffer from.
Coding with a GC available is a blessing. (Think C#, Java, Python, Perl, etc... vs. C++ or C).
I understand that I'm very late, but I stumbled upon this question by accident.
I believe that full support of closures indeed requires GC, but in some special cases stack allocation is safe. Determining these special cases requires some escape analysis. I suggest that you take a look at the BitC language papers, such as Closure Implementation in BitC. (Although I doubt whether the papers reflect the current plans.) The designers of BitC had the same problem you do. They decided to implement a special non-collecting mode for the compiler, which denies all closures that might escape. If turned on, it will restrict the language significantly. However, the feature is not implemented yet.
I'd advise you to use a collector - it's the most elegant way. You should also consider that a well-built garbage collector allocates memory faster than malloc does. The BitC folks really do value performance and they still think that GC is fine even for the most parts of their operating system, Coyotos. You can migitate the downsides by simple means:
create only a minimal amount of garbage
let the programmer control the collector
optimize stack/heap use by escape analysis
use an incremental or concurrent collector
if somehow possible, divide the heap like Erlang does
Many fear garbage collectors because of their experiences with Java. Java has a fantastic collector, but applications written in Java have performance problems because of the sheer amount of garbage generated. In addition, a bloated runtime and fancy JIT compilation is not really a good idea for desktop applications because of the longer startup and response times.
The C++ 0x spec defines lambdas without garbage collection. In short, the spec allows non-deterministic behavior in cases where the lambda closure contains references which are no longer valid. For example (pseudo-syntax):
(int)=>int create_lambda(int a)
{
return { (int x) => x + a }
}
create_lambda(5)(4) // undefined result
The lambda in this example refers to a variable (a) which is allocated on the stack. However, that stack frame has been popped and is not necessarily available once the function returns. In this case, it would probably work and return 9 as a result (assuming sane compiler semantics), but there is no way to guarantee it.
If you are avoiding garbage collection, then I'm assuming that you also allow explicit heap vs. stack allocation and (probably) pointers. If that is the case, then you can do like C++ and just assume that developers using your language will be smart enough to spot the problem cases with lambdas and copy to the heap explicitly (just like you would if you were returning a value synthesized within a function).
Use reference counting and garbage collect the cycles (I don't really like this)
It's possible to design your language so there are no cycles: if you can only make new objects and not mutate old ones, and if making an object can't make a cycle, then cycles never appear. Erlang works essentially this way, though in practice it does use GC.
If you have the machinery for a precise copying GC, you could allocate on the stack initially and copy to the heap and update pointers if you discover at exit that a pointer to this stack frame has escaped. That way you only pay if you actually do capture a closure that includes this stack frame. Whether this helps or hurts depends on how often you use closures and how much they capture.
You might also look into C++0x's approach (N1968), though as one might expect from C++ it consists of counting on the programmer to specify what gets copied and what gets referenced, and if you get it wrong you just get invalid accesses.
Or just don't do GC at all. There can be situations where it's better to just forget the memory leak and let the process clean up after it when it's done.
Depending on your qualms about GC, you might be afraid of the periodic GC sweeps. In this case you could do a selective GC when an item falls out of scope or the pointer changes. I'm not sure how expensive this would be though.
#Allen
What good is a closure if you can't use them when the containing function exits? From what I understand that's the whole point of closures.
You could work with the assumption that all closures will be called eventually and exactly one time. Now, when the closure is called you can do the cleanup at the closure return.
How do you plan on dealing with returning objects? They have to be cleaned up at some point, which is the exact same problem with closures.
So the question: Is there an elegant way to implement closures without resorting to allocate the stack frame on the heap and leave it to garbage collector?
GC is the only solution for the general case.
Better late than never?
You might find this interesting: Differential Execution.
It's a little-known control stucture, and its primary use is in programming user interfaces, including ones that can change dynamically while in use. It is a significant alternative to the Model-View-Controller paradigm.
I mention it because one might think that such code would rely heavily on closures and garbage-collection, but a side effect of the control structure is that it eliminates both of those, at least in the UI code.
Create multiple stacks?
I've read that the last versions of ML use GC only sparingly
I guess if the process is very short, which means it cannot use much memory, then GC is unnecessary. The situation is analogous to worrying about stack overflow. Don't nest too deeply, and you cannot overflow; don't run too long, and you cannot need the GC. Cleaning up becomes a matter of simply reclaiming the large region that you pre-allocated. Even a longer process can be divided into smaller processes that have their own heaps pre-allocated. This would work well with event handlers, for example. It does not work well, if you are writing compiler; in that case, a GC is surely not much of a handicap.

Resources