Let's say I have a procedure getTuple(): (int, int, int). How do I iterate over the returned tuple? It doesn't look like items is defined for tuple, so I can't do for i in getTuple().
I initially had this returning a sequence, which proved to be a bottleneck. Can I get this to work without affecting performance? I guess as a last resort I could unroll my loop, but is there any way around that?
OK, I figured this out. You can iterate over the tuple's fields:
let t = (2,4,6)
for x in t.fields:
echo(x)
I initially had this returning a sequence, which proved to be a bottleneck. Can I get this to work without affecting performance?
Use an array[N, int] instead, it has no indirection. Why was the seq not performant enough? You might want to allocate it to correct size with newSeq[int](size) initially.
Related
I am kind of new using data structures in haskell besides of Lists. My goal is to chose one container among Data.Vector, Data.Sequence, Data.List, etc ... My problem is the following:
I have to create a sequence (mathematically speaking). The sequence starts at 0. In each iteration two new elements are generated but only one should be appended based in whether the first element is already in the sequence. So in each iteration there is a call to elem function (see the pseudo-code below).
appendNewItem :: [Integer] -> [Integer]
appendNewItem acc = let firstElem = someFunc
secondElem = someOtherFunc
newElem = if firstElem `elem` acc
then secondElem
else firstElem
in acc `append` newElem
sequenceUptoN :: Int -> [Integer]
sequenceUptoN n = (iterate appendNewItem [0]) !! n
Where append and iterate functions vary depending on which colection you use (I am using lists in the type signature for simplicity).
The question is: Which data structure should I use?. Is Data.Sequence faster for this task because of the Finger Tree inner structure?
Thanks a lot!!
No, sequences are not faster for searching. A Vector is just a flat chunk of memory, which gives generally the best lookup performance. If you want to optimise searching, use Data.Vector.Unboxed. (The normal, “boxed” variant is also pretty good, but it actually contains only references to the elements in the flat memory-chunk, so it's not quite as fast for lookups.)
However, because of the flat memory layout, Vectors are not good for (pure-functional) appending: basically, whenever you add a new element, the whole array must be copied so as to not invalidate the old one (which somebody else might still be using). If you need to append, Seq is a pretty good choice, although it's not as fast as destructive appending: for maximum peformance, you'll want to pre-allocate an uninitialized Data.Vector.Unboxed.Mutable.MVector of the required size, populate it using the ST monad, and freeze the result. But this is much more fiddly than purely-functional alternatives, so unless you need to squeeze out every bit of performance, Data.Sequence is the way to go. If you only want to append, but not look up elements, then a plain old list in reverse order would also do the trick.
I suggest using Data.Sequence in conjunction with Data.Set. The Sequence to hold the sequence of values and the Set to track the collection.
Sequence, List, and Vector are all structures for working with values where the position in the structure has primary importance when it comes to indexing. In lists we can manipulate elements at the front efficiently, in sequences we can manipulate elements based on the log of the distance the closest end, and in vectors we can access any element in constant time. Vectors however, are not that useful if the length keeps changing, so that rules out their use here.
However, you also need to lookup a certain value within the list, which these structures don't help with. You have to search the whole of a list/sequence/vector to be certain that a new value isn't present. Data.Map and Data.Set are two of the structures for which you define an index value based on Ord, and let you lookup/insert in log(n). So, at the cost of memory usage you can lookup the presence of firstElem in your Set in log(n) time and then add newElem to the end of the sequence in constant time. Just make sure to keep these two structures in synch when adding or taking new elements.
How does insert work in a Rust Vec? How efficient is it to insert an element at the start of a VERY large vector which has billions of elements?
The documentation lists the complexities for all the standard collections and operations:
Throughout the documentation, we will follow a few conventions. For
all operations, the collection's size is denoted by n. If another
collection is involved in the operation, it contains m elements.
Operations which have an amortized cost are suffixed with a *.
Operations with an expected cost are suffixed with a ~.
get(i) insert(i) remove(i) append split_off(i)
Vec O(1) O(n-i)* O(n-i) O(m)* O(n-i)
The documentation for Vec::insert explains details, emphasis mine:
Inserts an element at position index within the vector, shifting all elements after it to the right.
How efficient is it to insert an element at the start of a VERY large vector which has billions of elements?
A VERY bad idea, as everything needs to be moved. Perhaps a VecDeque would be better (or finding a different algorithm).
Found this question and need to add a thing.
It all depends on your usage. If you're inserting once, it's maybe worth to accept that O(n). If you then do millions of get requests with O(1).
Other datatypes maybe have better insertion time but have O(log(n)) or even O(N) for getting items.
Next thing is iteration where cache friendlyness comes into play for such large arrays, where Vector is perfect.
May advice: if you're inserting once and then do lot of requests, stay with Vec.
If inserting and removing is your main task, like a queue, go for something else.
I often found myself in some situation where I need sorted arrays and then go for something like Btreemap, or BTreeSet. I removed them completely and used a Vec now, where after adding all values, I do a sort and a dedup.
Suppose I have the following piece of code:
a = reverse b
doSomething a
Will memory for the list a be actually allocated, or will doSomething simply reuse the list b? If the memory is going to be allocated, is there a way to avoid it? Doubling memory usage just because I need a reversed list doesn't sound particularly nice.
In the worst case, both a and b will exist in memory in their entirety. Note that even then, the contents of the two lists will only exist once, shared between both lists, so we're only talking about the "spine" of the lists existing twice.
In the best case, depending on how b is defined and what doSomething does, the compiler might do some hoopy magic to turn the whole thing into a tight constant-space loop that generates the contents of the list as it processes them, possibly involving no memory allocation at all. Maybe.
But even in the very worst case, you're duplicating the spine of the lists. You'll never duplicate the actual elements in the list.
(Each cons node is, what, 3 pointers? I think...)
I want to check if an element exists in a list (a very big one in 10,000,000 order) in a O(1) instead of O(n). Lists with elem x ys take O(n)
So i want to use another data type/constructor but it has to be in Prelude(not Array); any suggestions? And if i have to build me data type what it would be like?
Also to sort a big list of numbers in the same order (10,000,000)and indexing an element in the shortest time possible.
The only way to search for an item in a data set in O(1) time is if you already know where it is, but then you don't need to search for it. For unsorted data, search is O(n) time. For sorted data, search is O(log n) time.
You should use either Bloom filter or Hashtable. Neither of them is in Prelude; moreover, both rely on Array to be available.
The only left option is some kind of tree; I would suggest heap. It’s not hard to implement and it also gives you sorting for free.
UPDATE: oops! I have forgotten that heap doesn’t provide lookup. BST is your choice, then.
Some times you could want to avoid/minimize the garbage collector, so I want to be sure about how to do it.
I think that the next one is correct:
Declare variables at the beginning of the function.
To use array instead of slice.
Any more?
To minimize garbage collection in Go, you must minimize heap allocations. To minimize heap allocations, you must understand when allocations happen.
The following things always cause allocations (at least in the gc compiler as of Go 1):
Using the new built-in function
Using the make built-in function (except in a few unlikely corner cases)
Composite literals when the value type is a slice, map, or a struct with the & operator
Putting a value larger than a machine word into an interface. (For example, strings, slices, and some structs are larger than a machine word.)
Converting between string, []byte, and []rune
As of Go 1.3, the compiler special cases this expression to not allocate: m[string(b)], where m is a map and b is a []byte
Converting a non-constant integer value to a string
defer statements
go statements
Function literals that capture local variables
The following things can cause allocations, depending on the details:
Taking the address of a variable. Note that addresses can be taken implicitly. For example a.b() might take the address of a if a isn't a pointer and the b method has a pointer receiver type.
Using the append built-in function
Calling a variadic function or method
Slicing an array
Adding an element to a map
The list is intended to be complete and I'm reasonably confident in it, but am happy to consider additions or corrections.
If you're uncertain of where your allocations are happening, you can always profile as others suggested or look at the assembly produced by the compiler.
Avoiding garbage is relatively straight forward. You need to understand where the allocations are being made and see if you can avoid the allocation.
First, declaring variables at the beginning of a function will NOT help. The compiler does not know the difference. However, human's will know the difference and it will annoy them.
Use of an array instead of a slice will work, but that is because arrays (unless dereferenced) are put on the stack. Arrays have other issues such as the fact that they are passed by value (copied) between functions. Anything on the stack is "not garbage" since it will be freed when the function returns. Any pointer or slice that may escape the function is put on the heap which the garbage collector must deal with at some point.
The best thing you can do is avoid allocation. When you are done with large bits of data which you don't need, reuse them. This is the method used in the profiling tutorial on the Go blog. I suggest reading it.
Another example besides the one in the profiling tutorial: Lets say you have an slice of type []int named xs. You continually append to the []int until you reach a condition and then you reset it so you can start over. If you do xs = nil, you are now declaring the underlying array of the slice as garbage to be collected. Append will then reallocate xs the next time you use it. If instead you do xs = xs[:0], you are still resetting it but keeping the old array.
For the most part, trying to avoid creating garbage is premature optimization. For most of your code it does not matter. But you may find every once in a while a function which is called a great many times that allocates a lot each time it is run. Or a loop where you reallocate instead of reusing. I would wait until you see the bottle neck before going overboard.