What's the complexity of inserting to a vector in Rust? - rust

How does insert work in a Rust Vec? How efficient is it to insert an element at the start of a VERY large vector which has billions of elements?

The documentation lists the complexities for all the standard collections and operations:
Throughout the documentation, we will follow a few conventions. For
all operations, the collection's size is denoted by n. If another
collection is involved in the operation, it contains m elements.
Operations which have an amortized cost are suffixed with a *.
Operations with an expected cost are suffixed with a ~.
get(i) insert(i) remove(i) append split_off(i)
Vec O(1) O(n-i)* O(n-i) O(m)* O(n-i)
The documentation for Vec::insert explains details, emphasis mine:
Inserts an element at position index within the vector, shifting all elements after it to the right.
How efficient is it to insert an element at the start of a VERY large vector which has billions of elements?
A VERY bad idea, as everything needs to be moved. Perhaps a VecDeque would be better (or finding a different algorithm).

Found this question and need to add a thing.
It all depends on your usage. If you're inserting once, it's maybe worth to accept that O(n). If you then do millions of get requests with O(1).
Other datatypes maybe have better insertion time but have O(log(n)) or even O(N) for getting items.
Next thing is iteration where cache friendlyness comes into play for such large arrays, where Vector is perfect.
May advice: if you're inserting once and then do lot of requests, stay with Vec.
If inserting and removing is your main task, like a queue, go for something else.
I often found myself in some situation where I need sorted arrays and then go for something like Btreemap, or BTreeSet. I removed them completely and used a Vec now, where after adding all values, I do a sort and a dedup.

Related

Concurrently check if string is in slice?

Generally to check if a string is in slice I write a function with for loop and if statement. but it's really inefficient in cases of large slices of string or struct types. is it possible to make this check concurrent?
The concurrent search on sequential data is usually not a great idea, simply because we already have a binary search that scales really well for even billions of records. All you have to do to utilize it is build indexing on top of the slice you are searching in. To build the most trivial indexing, you have to save keys into another slice along with the index of data they are pointing to. Once you have the slice, just sort it by strings, and indexing is done.
You have to perform the binary search on the indexing you just created to be more efficient. This way you have the complexity of O(log N).
Another much simpler option you have is creating the map[string]int and inserting all keys along with the indexes. Then find the index inside the map. Which can be O(1) best case.
The important thing to note is that if you have to perform just one search on a given slice, this is not worth it as creating indexing is a lot heavier than linear search.

Haskell alternative for Doubly-linked-list coupled with Hash-table pattern

There's a useful pattern in imperative programming, namely, a doubly-linked-list coupled with a hash-table for constant time lookup in the linked list.
One application of this pattern is in LRU cache. The head of the doubly-linked-list will contain the least recently used entry in the cache and the last element in the doubly-linked-list will contain the most recently used entry. The keys in the hash-table are keys of the entries and the values are pointers to nodes in the linked-list corresponding to the key/entry. When an entry is queried in the cache, hash-table will be used to point to its node in the linked-list and then the node will be removed from its current location in the linked-list and be placed at the end of the linked-list making it the most-recently-used entry. For eviction, we simply remove entries from the head of the linked-list as they are the least recently used ones. Both lookup and eviction operations will take constant time.
I can think of implementing this in Haskell using two TreeMaps and I know that the time complexity will be O(log n). But I am a little uncomfortable as the constant factor in the time complexity seems a little high. Specifically, to perform a look-up, first I need to check if the entry exists and save its value, then I need to first delete it from the LRU map and re-insert it with a new key. This means that each lookup will result in a root-to-node traversal three times.
Is there a better way of doing this in Haskell?
As comments indicate, mutable vectors are perfectly acceptable when required. However, I think there's an issue with the way you've stated the question - unless the idea is to duplicate "as closely as possible" (without mutable structures) the imperative code, why bother having 2 treemaps? A single priority search queue (see packages pqueue or PSQueue) would be an appropriate structure whilst maintaining purity. It supports efficiently both priorities (for eviction) and searching (for lookups of your desired cached argument).
On a related note, some structures support eg. Data.Map's alterF, which effectively provides you with a continuation allowing you to "do something else" dependent on the Maybe value at a key, but "remembering" where you are and thus avoiding to pay the full cost to re-traverse the structure to subsequently modify at this key. See also the at lens.

finding element in very big list in less than O(n)

I want to check if an element exists in a list (a very big one in 10,000,000 order) in a O(1) instead of O(n). Lists with elem x ys take O(n)
So i want to use another data type/constructor but it has to be in Prelude(not Array); any suggestions? And if i have to build me data type what it would be like?
Also to sort a big list of numbers in the same order (10,000,000)and indexing an element in the shortest time possible.
The only way to search for an item in a data set in O(1) time is if you already know where it is, but then you don't need to search for it. For unsorted data, search is O(n) time. For sorted data, search is O(log n) time.
You should use either Bloom filter or Hashtable. Neither of them is in Prelude; moreover, both rely on Array to be available.
The only left option is some kind of tree; I would suggest heap. It’s not hard to implement and it also gives you sorting for free.
UPDATE: oops! I have forgotten that heap doesn’t provide lookup. BST is your choice, then.

Looking for an efficient array-like structure that supports "replace-one-member" and "append"

As an exercise I wrote an implementation of the longest increasing subsequence algorithm, initially in Python but I would like to translate this to Haskell. In a nutshell, the algorithm involves a fold over a list of integers, where the result of each iteration is an array of integers that is the result of either changing one element of or appending one element to the previous result.
Of course in Python you can just change one element of the array. In Haskell, you could rebuild the array while replacing one element at each iteration - but that seems wasteful (copying most of the array at each iteration).
In summary what I'm looking for is an efficient Haskell data structure that is an ordered collection of 'n' objects and supports the operations: lookup i, replace i foo, and append foo (where i is in [0..n-1]). Suggestions?
Perhaps the standard Seq type from Data.Sequence. It's not quite O(1), but it's pretty good:
index (your lookup) and adjust (your replace) are O(log(min(index, length - index)))
(><) (your append) is O(log(min(length1, length2)))
It's based on a tree structure (specifically, a 2-3 finger tree), so it should have good sharing properties (meaning that it won't copy the entire sequence for incremental modifications, and will perform them faster too). Note that Seqs are strict, unlike lists.
I would try to just use mutable arrays in this case, preferably in the ST monad.
The main advantages would be making the translation more straightforward and making things simple and efficient.
The disadvantage, of course, is losing on purity and composability. However I think this should not be such a big deal since I don't think there are many cases where you would like to keep intermediate algorithm states around.

Haskell arrays vs lists

I'm playing with Haskell and Project Euler's 23rd problem. After solving it with lists I went here where I saw some array work. This solution was much faster than mine.
So here's the question. When should I use arrays in Haskell? Is their performance better than lists' and in which cases?
The most obvious difference is the same as in other languages: arrays have O(1) lookup and lists have O(n). Attaching something to the head of a list (:) takes O(1); appending (++) takes O(n).
Arrays have some other potential advantages as well. You can have unboxed arrays, which means the entries are just stored contiguously in memory without having a pointer to each one (if I understand the concept correctly). This has several benefits--it takes less memory and could improve cache performance.
Modifying immutable arrays is difficult--you'd have to copy the entire array which takes O(n). If you're using mutable arrays, you can modify them in O(1) time, but you'd have to give up the advantages of having a purely functional solution.
Finally, lists are just much easier to work with if performance doesn't matter. For small amounts of data, I would always use a list.
And if you're doing much indexing as well as much updating,you can
use Maps (or IntMaps), O(log size) indexing and update, good enough for most uses, code easy to get right
or, if Maps are too slow, use mutable (unboxed) arrays (STUArray from Data.Array.ST or STVectors from the vector package; O(1) indexing and update, but the code is easier to get wrong and generally not as nice.
For specific uses, functions like accumArray give very good performance too (uses mutable arrays under the hood).
Arrays have O(1) indexing (this used to be part of the Haskell definition), whereas lists have O(n) indexing. On the other hand, any kind of modification of arrays is O(n) since it has to copy the array.
So if you're going to do a lot of indexing but little updating, use arrays.

Resources