How to minimize the garbage collection in Go? - garbage-collection

Some times you could want to avoid/minimize the garbage collector, so I want to be sure about how to do it.
I think that the next one is correct:
Declare variables at the beginning of the function.
To use array instead of slice.
Any more?

To minimize garbage collection in Go, you must minimize heap allocations. To minimize heap allocations, you must understand when allocations happen.
The following things always cause allocations (at least in the gc compiler as of Go 1):
Using the new built-in function
Using the make built-in function (except in a few unlikely corner cases)
Composite literals when the value type is a slice, map, or a struct with the & operator
Putting a value larger than a machine word into an interface. (For example, strings, slices, and some structs are larger than a machine word.)
Converting between string, []byte, and []rune
As of Go 1.3, the compiler special cases this expression to not allocate: m[string(b)], where m is a map and b is a []byte
Converting a non-constant integer value to a string
defer statements
go statements
Function literals that capture local variables
The following things can cause allocations, depending on the details:
Taking the address of a variable. Note that addresses can be taken implicitly. For example a.b() might take the address of a if a isn't a pointer and the b method has a pointer receiver type.
Using the append built-in function
Calling a variadic function or method
Slicing an array
Adding an element to a map
The list is intended to be complete and I'm reasonably confident in it, but am happy to consider additions or corrections.
If you're uncertain of where your allocations are happening, you can always profile as others suggested or look at the assembly produced by the compiler.

Avoiding garbage is relatively straight forward. You need to understand where the allocations are being made and see if you can avoid the allocation.
First, declaring variables at the beginning of a function will NOT help. The compiler does not know the difference. However, human's will know the difference and it will annoy them.
Use of an array instead of a slice will work, but that is because arrays (unless dereferenced) are put on the stack. Arrays have other issues such as the fact that they are passed by value (copied) between functions. Anything on the stack is "not garbage" since it will be freed when the function returns. Any pointer or slice that may escape the function is put on the heap which the garbage collector must deal with at some point.
The best thing you can do is avoid allocation. When you are done with large bits of data which you don't need, reuse them. This is the method used in the profiling tutorial on the Go blog. I suggest reading it.
Another example besides the one in the profiling tutorial: Lets say you have an slice of type []int named xs. You continually append to the []int until you reach a condition and then you reset it so you can start over. If you do xs = nil, you are now declaring the underlying array of the slice as garbage to be collected. Append will then reallocate xs the next time you use it. If instead you do xs = xs[:0], you are still resetting it but keeping the old array.
For the most part, trying to avoid creating garbage is premature optimization. For most of your code it does not matter. But you may find every once in a while a function which is called a great many times that allocates a lot each time it is run. Or a loop where you reallocate instead of reusing. I would wait until you see the bottle neck before going overboard.

Related

Why are string literals &str instead of String in Rust?

I'm just asking why Rust decided to use &str for string literals instead of String. Isn't it possible for Rust to just automatically convert a string literal to a String and put it on the heap instead of putting it into the stack?
To understand the reasoning, consider that Rust wants to be a systems programming language. In general, this means that it needs to be (among other things) (a) as efficient as possible and (b) give the programmer full control over allocations and deallocations of heap memory. One use case for Rust is for embedded programming where memory is very limited.
Therefore, Rust does not want to allocate heap memory where this is not strictly necessary. String literals are known at compile time and can be written into the ro.data section of an executable/library, so they don't consume stack or heap space.
Now, given that Rust does not want to allocate the values on the heap, it is basically forced to treat string literals as &str: Strings own their values and can be moved and dropped, but how do you drop a value that is in ro.data? You can't really do that, so &str is the perfect fit.
Furthermore, treating string literals as &str (or, more accurately &'static str) has all the advantages and none of the disadvantages. They can be used in multiple places, can be shared without worrying about using heap memory and never have to be deleted. Also, they can be converted to owned Strings at will, so having them available as String is always possible, but you only pay the cost when you need to.
To create a String, you have to:
reserve a place on the heap (allocate), and
copy the desired content from a read-only location to the freshly allocated area.
If a string literal like "foo" did both, every string would effectively be allocated twice: once inside the executable as the read-only string, and the other time on the heap. You simply couldn't just refer to the original read-only data stored in the executable.
&str literals give you access to the most efficient string data: the one present in the executable image on startup, put there by the compiler along with the instructions that make up the program. The data it points to is not stored on the stack, what is stack-allocated is just the pointer/size pair, as is the case with any Rust slice.
Making "foo" desugar into what is now spelled "foo".to_owned() would make it slower and less space-efficient, and would likely require another syntax to get a non-allocating &str. After all, you don't want x == "foo" to allocate a string just to throw it away immediately. Languages like Python alleviate this by making their strings immutable, which allows them to cache strings mentioned in the source code. In Rust mutating String is often the whole point of creating it, so that strategy wouldn't work.

Is the only reason to use drain(..) to preserve the memory allocation of the original structure?

Rust has a feature to drain an entire sequence,
If you do need to drain the entire sequence, use the full range, .., as the argument. - Programming Rust
Why would you ever need to drain the entire sequence? I can see this documented, but I don't see any use cases for this,
let mut drain = vec.drain(..);
If draining does not take ownership but clears the original structure, what's the point of not taking ownership? I thought the point of a mutable reference was because the "book was borrowed" and that you could give it back. If the original structure is cleared why not "own" the book? Why would you want to only borrow something and destroy it? It makes sense to want to borrow a subset of a vector, and clearing that subset -- but I can't seem to wrap my head around wanting to borrow the entire thing clearing the original structure.
I think you are approaching this question from the wrong direction.
Having made a decision that you would like to have a drain method that takes a RangeBounds, you then need to consider the pros and cons of disallowing an unbounded RangeBounds.
Pros
If you disallowed an unbounded range, there would be less confusion about whether to use drain(..) vs into_iter(), although noting that these two are not exactly identical.
Cons
You would actually have to go out of your way to disallow an unbounded range.
Ideally, you would want the use of an unbounded range to cause a compilation error. I'm new to Rust so I am not certain of this, but as far as I know there is no way to express that drain should take a generic that implements trait RangeBounds as long as it is not a RangeFull.
If it could not be checked at compile time, what behavior would you want at runtime? A panic would seem to be the only option.
As observed in the comments and in the proposed duplicate, after completely draining it, the Vec will have a length of 0 but a capacity the same as it did before calling drain. By allowing an unbounded range with drain you are making it easier to avoid a repeated memory allocation in some use cases.
To me at least, the cons outweigh the pros.

Any way to manually indicate element of a MutableArray# safe to GC?

In my application I'm working with MutableArrays (via the primitive package) shared across threads. I know when individual elements are no longer used and I'd like some way (unsafeMarkGarbage or something) to indicate to the runtime that they can be collected. At least I'd like to experiment with that if such a function or equivalent technique exists.
EDIT, to add a bit more detail: I've got a conceptual "infinite tape" implemented as a linked list of short MutableArray segments, something like:
data Seg a = Seg (MutableArray a) (IORef (Maybe (Seg a)))
I access the tape using a concurrent counter and always know when an element of the tape will no longer be accessed. In certain cases when a thread is descheduled it's possible that entire array segments (both the array and its elements) which could have been GC'd will stick around as their references will persist.
An ideal solution would avoid an additional write (maybe that's silly), avoid another layer of indirection in the array, and allow entire MutableArrays to be collected when all their elements expire.
Weak references do seem to be the most promising sort of mechanism I've seen, but I can't yet see how they can help me here.
I would suggest you store undefined in the positions that you would like to garbage collect.

Proper way to release string for garbage collection after slicing

According to this Go Data Structures article, under the Strings section it states that taking a slice of a string will keep the original string in memory.
"(As an aside, there is a well-known gotcha in Java and other languages that when you slice a string to save a small piece, the reference to the original keeps the entire original string in memory even though only a small amount is still needed. Go has this gotcha too. The alternative, which we tried and rejected, is to make string slicing so expensive—an allocation and a copy—that most programs avoid it.)"
So if we have a very long string:
s := "Some very long string..."
And we take a small slice:
newS := s[5:9]
The original s will not be released until we also release newS. Considering this, what is the proper approach to take if we need to keep newS long term, but release s for garbage collection?
I thought maybe this:
newS := string([]byte(s[5:9]))
But I wasn't certain if that would actually work, or if there's a better way.
Yes, converting to a slice of bytes will create a copy of the string, so the original one is not referenced anymore, and can be GCed somewhere down the line.
As a "proof" of this (well, it proves that the slice of bytes doesn't share the same underlying data as the original string):
http://play.golang.org/p/pwGrlETibj
Edit: and proof that the slice of bytes only has the necessary length and capacity (in other words, it doesn't have a capacity equal to that of the original string):
http://play.golang.org/p/3pwZtCgtWv
Edit2: And you can clearly see what happens with the memory profiling. In reuseString(), the memory used is very stable. In copyString(), it grows fast, showing the copies of the string done by the []byte conversion.
http://play.golang.org/p/kDRjePCkXq
The proper way to ensure a string might eventually get eligible for garbage collection after slicing it and keeping the slice "live", is to create a copy of the slice and keeping "live" the copy instead. But now one is buying better memory performance at the cost of worsened time performance. Might be good somewhere, but might be evil elsewhere. Sometimes only proper measurements, not guessing, will tell where the real gain is.
I'm, for example, using StrPack, when I prefer a bit of evilness ;-)

Creating strings in D without allocating memory?

Is there any typesafe way to create a string in D, using information only available at runtime, without allocating memory?
A simple example of what I might want to do:
void renderText(string text) { ... }
void renderScore(int score)
{
char[16] text;
int n = sprintf(text.ptr, "Score: %d", score);
renderText(text[0..n]); // ERROR
}
Using this, you'd get an error because the slice of text is not immutable, and is therefore not a string (i.e. immutable(char)[])
I can only think of three ways around this:
Cast the slice to a string. It works, but is ugly.
Allocate a new string using the slice. This works, but I'd rather not have to allocate memory.
Change renderText to take a const(char)[]. This works here, but (a) it's ugly, and (b) many functions in Phobos require string, so if I want to use those in the same manner then this doesn't work.
None of these are particularly nice. Am I missing something? How does everyone else get around this problem?
You have static array of char. You want to pass it to a function that takes immutable(char)[]. The only way to do that without any allocation is to cast. Think about it. What you want is one type to act like it's another. That's what casting does. You could choose to use assumeUnique to do it, since that does exactly the cast that you're looking for, but whether that really gains you anything is debatable. Its main purpose is to document that what you're doing by the cast is to make the value being cast be treated as immutable and that there are no other references to it. Looking at your example, that's essentially true, since it's the last thing in the function, but whether you want to do that in general is up to you. Given that it's a static array which risks memory problems if you screw up and you pass it to a function that allows a reference to it to leak, I'm not sure that assumeUnique is the best choice. But again, it's up to you.
Regardless, if you're doing a cast (be it explicitly or with assumeUnique), you need to be certain that the function that you're passing it to is not going to leak references to the data that you're passing to it. If it does, then you're asking for trouble.
The other solution, of course, is to change the function so that it takes const(char)[], but that still runs the risk of leaking references to the data that you're passing in. So, you still need to be certain of what the function is actually going to do. If it's pure, doesn't return const(char)[] (or anything that could contain a const(char)[]), and there's no way that it could leak through any of the function's other arguments, then you're safe, but if any of those aren't true, then you're going to have to be careful. So, ultimately, I believe that all that using const(char)[] instead of casting to string really buys you is that you don't have to cast. That's still better, since it avoids the risk of screwing up the cast (and it's just better in general to avoid casting when you can), but you still have all of the same things to worry about with regards to escaping references.
Of course, that also requires that you be able to change the function to have the signature that you want. If you can't do that, then you're going to have to cast. I believe that at this point, most of Phobos' string-based functions have been changed so that they're templated on the string type. So, this should be less of a problem now with Phobos than it used to be. Some functions (in particular, those in std.file), still need to be templatized, but ultimately, functions in Phobos that require string specifically should be fairly rare and will have a good reason for requiring it.
Ultimately however, the problem is that you're trying to treat a static array as if it were a dynamic array, and while D definitely lets you do that, you're taking a definite risk in doing so, and you need to be certain that the functions that you're using don't leak any references to the local data that you're passing to them.
Check out assumeUnique from std.exception Jonathan's answer.
No, you cannot create a string without allocation. Did you mean access? To avoid allocation, you have to either use slice or pointer to access a previously created string. Not sure about cast though, it may or may not allocate new memory space for the new string.
One way to get around this would be to copy the mutable chars into a new immutable version then slice that:
void renderScore(int score)
{
char[16] text;
int n = sprintf(text.ptr, "Score: %d", score);
immutable(char)[16] itext = text;
renderText(itext[0..n]);
}
However:
DMD currently doesn't allow this due to a bug.
You're creating an unnecessary copy (better than a GC allocation, but still not great).

Resources