If we need FIFO or LIFO collections (with basically push, pop and front/back) what should we use in Rust? Something like std::queue or std::stack from C++.
First of all, Rust does not offer (in the Standard library) any collection with guaranteed latency for adding elements: Rust collections may generally allocate memory when adding new elements, and allocating memory may take an unbounded amount of time in the worst case.
That being said, there are two contenders for each case:
a stack may be implemented either on top of Vec or LinkedList (both feature pop_back and push_back)
a queue may be implemented either on top of VecDeque or LinkedList (both feature pop_front and push_back)
The difference between Vec* and LinkedList is that the latter is simplistic: for each call to push_back a memory allocation is made. On the one hand, this is great because it means that the cost of push_back is independent of the number of elements already in the collection, on the other hand... well, a memory allocation may take a really long time.
The former is a bit more complicated:
it has better throughput, thanks to being more cache-friendly
it has additional capacity, guaranteeing non-allocating push_back as long as there is excess capacity
it still maintains amortized O(1) push_back even when not reserving excess capacity ahead of time
In general, I would advise to use Vec for a stack and VecDeque for a queue.
Both VecDeque and LinkedList have push/pop_front/back.
Matthieu M. has it just about perfect. Vec is your stack (LIFO) and VecDeque is a double ended queue that supports all 4 variants (FIFO, FILO, LIFO, and LILO) using:
.push_front(x) | .front() | .pop_front()
.push_back(x) | .back() | .pop_back()
If you're looking to maximize your efficiency, I recommend checking out "Unterstanding Rust’s Vec and its capacity for fast and efficient programs". It goes into a lot more detail about how allocation and reallocation occurs in Vec and VecDeque, but the biggest take away is that if you can predict the maximum number of elements you're going to need in the queue you can use VecDeque::with_capacity(x) if you know when you initialize it, or .reserve_exact(x) if at some point you know exactly how many more slots you're going to need
I strongly recommend checking out the Rust docs on std::collections, it has an excellent list of the most common collections used in Rust, along with suggestions on when to pick each
One last thing, VecDeque isn't part of the default prelude in Rust so if you want to use it you need to include this:
use std::collections::VecDeque;
Related
For some HPC computing I need to port a wait free trie (but you can think of it as a tree) from C to Rust. The trie is 99% read 1% write, and is used both in parallel-concurrent scenarios , and concurrent only scenarios (single thread with multiple coroutines). The size of the tree is usually between 100kb and 8 mb.
Basically the tree is like:
pub struct WFTrie {
nodes: Vec<Node>,
leaves: Vec<Leaf>,
updates: Vec<Update>,
update_pos: AtomicUsize,
update_cap: usize,
}
//....
let mut wftrie = WFTrie::new();
let wftrie_ptr = AtomicPtr::new(&mut wftrie);
//....
As you can see the trie already uses something very similar to the arena approach that many suggests, by using a vec and storing
When I want to update, I do a fetch_and_add on update_pos. If it's greater than update_cap I return an error (size exhausted), otherwise I'm sure sure my coroutine/thread has exclusive access to updates[update_pos % update_cap], where I can put my update
Every X updates (if update_pos % 8 == 0 { //update tree} ) a coroutine will clone the tree, apply all pending updates, and then compare_and_swap the wftrie_ptr
When I want to read I do an atomic load on wtftrie_ptr and I can access the tree, taking care of considering also the pending updates.
My questions are:
If I have multiple coroutines having an immutable reference, how can I have one coroutine doing updates? What's the most idiomatic way to translate this to rust?
If a coroutine is still holding a ref to the old tree during an update what happens? Should I replace the AtomicPtr with Arc?
Does this design fit well with the rust borrow checker or am I going to have to use unsafe?
In the case of concurrent only scenario, can I drop atomics without using unsafe?
I'm not particularly informed about HPC, but I hope I can give you some useful pointers about programming with concurrency in Rust.
… I'm sure sure my coroutine/thread has exclusive access to updates[update_pos], where I can put my update
This is necessary, but not sufficient: if you are going to write to data behind an & reference (whether from multiple threads or not), then you are implementing interior mutability, and you must always signal that to the compiler by using some cell type. The minimal at-your-own-risk way to do that is with an UnsafeCell, which provides no synchronization and is unsafe to operate on:
updates: Vec<UnsafeCell<Update>>,
but you can also use a safer type such as a lock (that you know will never experience any contention since you arranged that no other thread is using it). In Rust, all locks and other interior mutability primitives — including atomics — are built on top of UnsafeCell; it is how you tell the compiler “this memory is exempt from the usual rule of N readers xor 1 writer”.
Every X updates … a coroutine will clone the tree …
You will have to arrange so that the clone isn't trying to read the data that is being modified. That is, don't clone the WFTrie, but only the nodes and leaves vectors since that's the data you actually need.
If you were to use UnsafeCell or a lock as I mentioned above, then you'll find that you can't just clone the updates vector, precisely because it wouldn't be sound to do so without explicit locking. If necessary, you can read it step-by-step in some way that agrees with your concurrency requirements.
If I have multiple coroutines having an immutable reference, how can I have one coroutine doing updates? What's the most idiomatic way to translate this to rust?
I'm not aware of a specifically Rust-y way to do this. It sounds like you could manage it with just an AtomicBool, but you probably know more than I do here.
If a coroutine is still holding a ref to the old tree during an update what happens? Should I replace the AtomicPtr with Arc?
Yes, this is a problem. Note that AtomicPtr is unsafe to read, because it does not guarantee the pointed-to thing is alive. You need a way to use something like Arc and atomically swap which specific Arc is in the wftrie_ptr location; the arc-swap library provides that.
Does this design fit well with the rust borrow checker or am I going to have to use unsafe?
From a perspective of using the data structure, it seems fine — it will not be any more inconvenient to read or write than any other data structure held behind Arc.
For implementation, you will not be having very much “fight with the borrow checker” because you are not going to be writing functions with very many borrows — since your data structure owns its contents. But, insofar as you are writing your own concurrency primitives with unsafe, you will need to be careful to obey all the memory rules anyway. Remember that unsafe is not a license to disregard the rules; it's telling the compiler “I'm going to obey the rules but in a way you can't check”.
In the case of concurrent only scenario, can I drop atomics without using unsafe?
Yes; in a no-threads scenario, you can use Cell for all purposes that atomics would serve.
Rust has a feature to drain an entire sequence,
If you do need to drain the entire sequence, use the full range, .., as the argument. - Programming Rust
Why would you ever need to drain the entire sequence? I can see this documented, but I don't see any use cases for this,
let mut drain = vec.drain(..);
If draining does not take ownership but clears the original structure, what's the point of not taking ownership? I thought the point of a mutable reference was because the "book was borrowed" and that you could give it back. If the original structure is cleared why not "own" the book? Why would you want to only borrow something and destroy it? It makes sense to want to borrow a subset of a vector, and clearing that subset -- but I can't seem to wrap my head around wanting to borrow the entire thing clearing the original structure.
I think you are approaching this question from the wrong direction.
Having made a decision that you would like to have a drain method that takes a RangeBounds, you then need to consider the pros and cons of disallowing an unbounded RangeBounds.
Pros
If you disallowed an unbounded range, there would be less confusion about whether to use drain(..) vs into_iter(), although noting that these two are not exactly identical.
Cons
You would actually have to go out of your way to disallow an unbounded range.
Ideally, you would want the use of an unbounded range to cause a compilation error. I'm new to Rust so I am not certain of this, but as far as I know there is no way to express that drain should take a generic that implements trait RangeBounds as long as it is not a RangeFull.
If it could not be checked at compile time, what behavior would you want at runtime? A panic would seem to be the only option.
As observed in the comments and in the proposed duplicate, after completely draining it, the Vec will have a length of 0 but a capacity the same as it did before calling drain. By allowing an unbounded range with drain you are making it easier to avoid a repeated memory allocation in some use cases.
To me at least, the cons outweigh the pros.
Summary of question and answers
Objects of a particular type, say
type Foo
a::A
b::B
end
can be stored in either of two ways:
Inlined (aka by value): in this case, the statement "variable foo::Foo is stored at location x" effectively means we have a variable foo.a::A at location x and a variable foo.b::B at location x + sizeof(A) (technically the addresses could be a bit more complicated, but that's irrelevant for our purposes).
Referenced (aka by reference): "foo::Foo is stored at location x" means the location x contains a pointer fooptr::Ptr{Foo} such that there is a variable foo.a::A at location fooptr and foo.b::B at location fooptr + sizeof(A).
Unlike other languages (I'm looking at you, C/C++), Julia decides by itself whether to store variables inlined or referenced, and it does so based on the properties of the type:
mutable types -> referenced,
immutable types -> referenced if at least one of its fields is referenced, inlined otherwise.
There are at least two reasons for this rule:
StefanKarpinski's answer: The garbage collector needs be able to find all pointers to heap-allocated objects on the stack. Currently, Julia ensures this by storing all such pointers on a separate "shadow stack", but if we allowed composite types containing pointers to be placed on the stack then such a neat separation would no longer be possible. Instead, the compiler would need to look for pointers among other variables which poses technical difficulties.
yuyichao's answer: Julia requires the inline/reference decision to be made on a per-type rather than per-object basis, which means a hypothetical type
immutable A
a::A
end
would have to be infinitely big if we insisted on inlining it. So we would either have to forbid such recursive immutable types, or we could at most allow non-recursive immutable types to be inlined.
Original question
My understanding of memory management in Julia is:
mutable types -> heap-allocated,
immutable types and tuples -> stack-allocated unless one of their fields is heap-allocated (i.e. mutable).
I don't quite understand the rationale for this behaviour, however. I've read somewhere that the problem with stack-allocating immutables with pointers to mutables is that then the garbage collector might consider the mutables unreachable and destroy them prematurely. On the other hand, if we place the immutable on the heap then there will still be a pointer to the mutables, so it might seem like we avoided the problem, but actually we just shifted it to making sure that now the immutable itself will not be destroyed.
Can anyone explain this to me who has only very superficial knowledge of how garbage collection works?
The problem with stack-allocation of objects which reference other objects is knowing that they need to be traced during garbage collection. The simplest way to do this is what Julia does: heap allocate the objects and "root" them using "shadow stack" which is pushed and popped in sync with the actual stack. This introduces a fair bit of overhead and forces these objects to be heap allocated.
A more sophisticated approach that avoids the overhead of a shadow stack and heap allocation is to stack allocated these objects and then scan the stack which doing garbage collection and follow references from objects in the stack to objects on the heap. However, this requires knowing which objects in the stack are pointers to objects on the heap – in general, non heap-allocated objects are not guaranteed to be kept intact or contiguous in registers or the stack. One approach to doing this is called "conservative stack scanning" which entails assuming during gc that any value on the stack which looks like it could be a pointer to an object on the heap actually is. That approach has been successfully used in applications like Safari's JavaScript engine, but it's not without it's challenges. We've contemplated using conservative stack scanning in Julia, and an initial effort to do so was started but the effort was never completed.
References:
https://github.com/JuliaLang/julia/issues/11714
https://github.com/JuliaLang/julia/pull/8134
There are multiple issues/concepts that are frequently mixed together whenever this is brought up.
mutable or non-pointerfree immutable doesn't necessarily mean heap allocation, we already have optimization passes to elide some of the optimizations and are working on improving them further.
The object layout ABI is an user visible behavior and not something an optimization pass can easily change (unless it can prove that the local optimization it wants to do does not escape). The current ABI is that only isbits immutable will be stored inline (and "stack allocated" when used as local variable). There's a fundamental limitation of lifting the requirement of pointerfree-ness for inlined object, i.e. the necessity to handle recursive types. It is impossible to make all types in a reference circle stored inline and the loop has to be broken somewhere if we want to make some of them inlined. I believe we do have a consistent and predictable model to do this though whether this is desireable is another issue.
This is somewhat related to performance but not always. Stored inline means more copy so it's hard to make sure there's no regression if we do the switch.
Edit: And I should also mention that pointer-free is a sufficient condition for cycle free and is easier to compute, which is partly why we are currently using it to break inlining cycles.
GC support. This is basically the easiest part. It's very easy to make GC recognize pointers on the stack. It just needs to be done if we decide to change the object layout ABI.
Edit: And I should add that "GC support" is needed because we currently only support a limited / simple stack layout for object reference (i.e. an array of pointers). It's this that needs to be improved.
Some times you could want to avoid/minimize the garbage collector, so I want to be sure about how to do it.
I think that the next one is correct:
Declare variables at the beginning of the function.
To use array instead of slice.
Any more?
To minimize garbage collection in Go, you must minimize heap allocations. To minimize heap allocations, you must understand when allocations happen.
The following things always cause allocations (at least in the gc compiler as of Go 1):
Using the new built-in function
Using the make built-in function (except in a few unlikely corner cases)
Composite literals when the value type is a slice, map, or a struct with the & operator
Putting a value larger than a machine word into an interface. (For example, strings, slices, and some structs are larger than a machine word.)
Converting between string, []byte, and []rune
As of Go 1.3, the compiler special cases this expression to not allocate: m[string(b)], where m is a map and b is a []byte
Converting a non-constant integer value to a string
defer statements
go statements
Function literals that capture local variables
The following things can cause allocations, depending on the details:
Taking the address of a variable. Note that addresses can be taken implicitly. For example a.b() might take the address of a if a isn't a pointer and the b method has a pointer receiver type.
Using the append built-in function
Calling a variadic function or method
Slicing an array
Adding an element to a map
The list is intended to be complete and I'm reasonably confident in it, but am happy to consider additions or corrections.
If you're uncertain of where your allocations are happening, you can always profile as others suggested or look at the assembly produced by the compiler.
Avoiding garbage is relatively straight forward. You need to understand where the allocations are being made and see if you can avoid the allocation.
First, declaring variables at the beginning of a function will NOT help. The compiler does not know the difference. However, human's will know the difference and it will annoy them.
Use of an array instead of a slice will work, but that is because arrays (unless dereferenced) are put on the stack. Arrays have other issues such as the fact that they are passed by value (copied) between functions. Anything on the stack is "not garbage" since it will be freed when the function returns. Any pointer or slice that may escape the function is put on the heap which the garbage collector must deal with at some point.
The best thing you can do is avoid allocation. When you are done with large bits of data which you don't need, reuse them. This is the method used in the profiling tutorial on the Go blog. I suggest reading it.
Another example besides the one in the profiling tutorial: Lets say you have an slice of type []int named xs. You continually append to the []int until you reach a condition and then you reset it so you can start over. If you do xs = nil, you are now declaring the underlying array of the slice as garbage to be collected. Append will then reallocate xs the next time you use it. If instead you do xs = xs[:0], you are still resetting it but keeping the old array.
For the most part, trying to avoid creating garbage is premature optimization. For most of your code it does not matter. But you may find every once in a while a function which is called a great many times that allocates a lot each time it is run. Or a loop where you reallocate instead of reusing. I would wait until you see the bottle neck before going overboard.
When reasoning about runtime cost in a garbage collected language, what is the cost of a statement such as myList = null; in terms of 'n' (the number of elements in the list)? For sake of argument, consider the list to be a singly linked list of reference types with no finalisation required.
More generally, I'm looking for any information on how runtime cost can be analysed in a language with GC.
My own thought is that the cost is likely to be either O(1) or O(n) depending on the collector implementation. In a mark and sweep collector the unreachable objects simply won't be reached, so I could imagine there being no cost associated with clearing them. (Infact there is an ongoing cost simply keeping objects alive, presumably amortised by using generations.) Conversely in a simple reference counting collector I could easily imagine it costing O(n) to do the cleanup...
It's not obvious to me how to reason about this when designing algorithms..
I suspect you can assume the amortized costs of that statement to be O(1). In practice, that is what I do, and I have never found situation that made me suspect this assumption to be incorrect.